![]() |
![]() |
#12 |
"Dana Jacobsen"
Feb 2011
Bangkok, TH
22·227 Posts |
![]()
I guess a good thing to decide would be what the goal is.
Lowest energy use? Most efficient energy use? Most efficient cluster energy use? Easiest to set up? Highest raw performance under some budget threshold? etc. In my case it'd be nice to have something that excels at apps like Primo and other parallel searches as well, hence looking at many more cores. My daughter's mini-PC is an i5-4210U (2/4 core), probably lower power use but I bet it's not good performance. Maybe I'll annoy her and run something on it. I can also try my i7-6700HQ laptop. I suspect it's higher power use for not enough performance gain, as it's a gaming laptop not optimized for battery life. |
![]() |
![]() |
![]() |
#13 |
"Carlos Pinho"
Oct 2011
Milton Keynes, UK
5×7×139 Posts |
![]()
I''m an energy consultant so I need to stick with the energy efficiency....but UK is so cold for my body that we need USA to increase GHG so we can have more heat waves here. So I have contradiction feelings...
Last fiddled with by pinhodecarlos on 2017-10-03 at 20:05 |
![]() |
![]() |
![]() |
#14 |
Jun 2003
Oxford, UK
22·13·37 Posts |
![]()
I think that when this search reaches 2^64 we will return to parallel processes such as a*b#/c-d (a variable, b#/c fixed, d = prevprime) and hence large numbers of cores which run threads efficiently would be the best solution for gap hunters.
Energy efficiency needs to be decent but perhaps from a benchmark test p.o.v. it would be good to look at throughput/ energy use for a primorial series, with variable a. Is there a way to rewrite danaj's standard primorial code to run more efficiently as well or is it optimal in its use of threads? |
![]() |
![]() |
![]() |
#15 | |
"Dana Jacobsen"
Feb 2011
Bangkok, TH
16148 Posts |
![]() Quote:
1. Does it use threading well. It's not that bad, but for n threads, each thread takes number k*n of the range. If one of them finishes the range first then it waits. This isn't ideal, but in practice it isn't as bad as I thought. IIRC, some people run their searches in a different order so there are variants out there. 2. Could we optimize the search strategy. I haven't seen anything actionable, but there are threads on covering sets and the like that seem to have something to do with this. 3. Can the main computational task be faster. I'm sure it could be. It's been discussed before, and has gotten faster over the past few years. I see some things that could still be investigated. surround_primes looks good to me, so some possibilities include mod-6 or mod-30 replacing mod-2 bit mask, 3-prime-at-a-time to elide ~60000 mpz remainders, lots of prime_iterator optimizations (ensure everything is fast for initial primes; amortized bulk number extraction; use new slightly faster sieve in MPU). Or perhaps we could get an entirely better method from someone like JKA or RG. |
|
![]() |
![]() |
![]() |
#16 |
"Dana Jacobsen"
Feb 2011
Bangkok, TH
22×227 Posts |
![]()
So ... turns out there really is AVX2 code in the PGS source (in sieve_small_primes) and it's set on by default. Now I'm interested in seeing what difference it makes on vs. off.
My laptop, i7-6700HQ 2.6GHz, 4 threads inside VirtualBox, is getting 25.6 n/s running one of the 9-9.25 ranges. I should test it under Windows. |
![]() |
![]() |
![]() |
#17 |
Jun 2003
Oxford, UK
22×13×37 Posts |
![]() |
![]() |
![]() |
![]() |
#18 |
"Robert Gerbicz"
Oct 2005
Hungary
2×7×103 Posts |
![]()
We spend a little time there, the avx2 code gives only a few percentage faster code. Ofcourse you can disable it with #define USE_AVX2 0 to see the effect of this part.
|
![]() |
![]() |
![]() |
#19 | |
"Antonio Key"
Sep 2011
UK
32·59 Posts |
![]() Quote:
33.51e9 n/s i5-3570k 4.4GHz, 4 threads, 12.5GB (2x8GB DDR3 1600 CL9 rank 2) 36.24e9 n/s i5-3570k 4.4GHz, 4 threads, 12.5GB (2x8GB DDR3 2133 CL11 rank 2) 29.66e9 n/s i5-3570k 4.4GHz, 3 threads, 12.5GB (2x8GB DDR3 2133 CL11 rank 2) 20.43e9 n/s i5-3570k 4.4GHz, 2 threads, 12.5GB (2x8GB DDR3 2133 CL11 rank 2) 10.67e9 n/s i5-3570k 4.4GHz, 1 threads, 12.5GB (2x8GB DDR3 2133 CL11 rank 2) Tests used a windows batch file consisting of: gap11 -n1 9e18 -n2 925e16 -n 9e18 -res1 0 -res2 10 -res 0 -m1 1190 -m2 8151 -unknowngap 1382 -numcoprime 27 -sb 24 -bs 18 -t %1 -mem 12.5 The figures quoted above were those displayed at the end of the run. The ratio of sieve to search times is approx. 20:7 Last fiddled with by Antonio on 2017-10-09 at 06:58 |
|
![]() |
![]() |
![]() |
#20 |
"Dana Jacobsen"
Feb 2011
Bangkok, TH
16148 Posts |
![]()
Nice results, Antonio!
I'll try running something similar on my 4770K when it finishes this batch. I'll make sure I run gap11 as well, and we're already using the same sieve parameters. I'm on Linux while you're on Windows, so different compilers and scheduler. But it looks like the machines are pretty similar in some ways. |
![]() |
![]() |
![]() |
#21 |
"Dana Jacobsen"
Feb 2011
Bangkok, TH
22·227 Posts |
![]()
i7-6700K 4.2GHz, 4x8GB DDR4 2800 CL15 rank 2 1.2V, Fedora 23, gcc 5.3.1
1 thr 10.95e9 n/sec.; time=8404 sec 2 thr 21.07e9 n/sec.; time=4367 sec 3 thr 30.72e9 n/sec.; time=2995 sec 4 thr 38.68e9 n/sec.; time=2379 sec 6 thr 37.28e9 n/sec.; time=2468 sec 8 thr 43.73e9 n/sec.; time=2104 sec gcc -m64 -fopenmp -O2 -frename-registers -fomit-frame-pointer -flto -mavx2 -march=native -o g11 gap11.c -lm ./g11 -n1 9e18 -n2 9.25e18 -n 9e18 -res1 7300 -res2 7303 -res 7300 -m1 1190 -m2 8151 -numcoprime 27 -sb 25 -bs 18 -mem 13 -t $i |
![]() |
![]() |
![]() |
#22 | |
"Antonio Key"
Sep 2011
UK
32·59 Posts |
![]() Quote:
Is that dip in performance at 6 threads consistent? Have you tried >4 threads with -sb 24 -bs 17, it may improve the performance and it would be interesting to see if it made a difference. |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
NAS hardware | VictordeHolland | Hardware | 5 | 2015-03-05 23:37 |
Possible hardware errors... | SverreMunthe | Hardware | 16 | 2013-08-19 14:39 |
GPU hardware problem | Prime95 | GPU Computing | 33 | 2013-07-12 05:25 |
Hardware error | Citrix | Prime Sierpinski Project | 12 | 2006-06-07 09:40 |
Hardware Problem! Help!! | matermoh | Hardware | 14 | 2004-12-09 05:19 |