mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Prime Gap Searches

Reply
 
Thread Tools
Old 2017-10-12, 18:11   #23
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

2×7×103 Posts
Default

Quote:
Originally Posted by Antonio View Post
Interesting!
Is that dip in performance at 6 threads consistent?
Have you tried >4 threads with -sb 24 -bs 17, it may improve the performance and it would be interesting to see if it made a difference.
Good suggestion, however with sb I'd go to the other direction sb=26, just because in that test it is a non power-of-2 threads.
But in practice you could be even right, not tried, I've the non-K version of that processor and different speed of Ram.

Last fiddled with by R. Gerbicz on 2017-10-12 at 18:12
R. Gerbicz is offline   Reply With Quote
Old 2017-10-12, 18:30   #24
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

22·227 Posts
Default

My previous work runs have all been sb=24 on that machine because way back I tested and it was best. I should try looping the test again with 24 and 26 once the new job finishes.

Actually, I should be able to compare a long run with t=8 in the same 9e18 - 9.25e18 portion. I ran 4000-4500 with sb=24 and got 44.04e9. I'm running 7300-7600 now with sb=25. Oh wait, sigh, the earlier one was with gap10 and the new one gap11 so too many changes to properly compare. I'll just run the same test with different sb values.
danaj is offline   Reply With Quote
Old 2017-10-16, 01:11   #25
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

90810 Posts
Default


sb=26 with 1,2,3,4,6,8 threads:

sb-26/t1/nohup.out 7.78e9 n/sec.; time=11821 sec.
sb-26/t2/nohup.out 16.31e9 n/sec.; time=5641 sec.
sb-26/t3/nohup.out 26.83e9 n/sec.; time=3430 sec.
sb-26/t4/nohup.out 29.28e9 n/sec.; time=3142 sec.
sb-26/t6/nohup.out 34.80e9 n/sec.; time=2644 sec.
sb-26/t8/nohup.out 33.82e9 n/sec.; time=2721 sec.

sb=25 with 1,2,3,4,6,8 threads:

sb-25/t1/nohup.out 10.95e9 n/sec.; time=8404 sec.
sb-25/t2/nohup.out 21.07e9 n/sec.; time=4367 sec.
sb-25/t3/nohup.out 30.72e9 n/sec.; time=2995 sec.
sb-25/t4/nohup.out 38.68e9 n/sec.; time=2379 sec.
sb-25/t6/nohup.out 37.28e9 n/sec.; time=2468 sec.
sb-25/t8/nohup.out 43.73e9 n/sec.; time=2104 sec.

sb=24 with 1,2,3,4,6,8 threads:

sb-24/t1/nohup.out 11.03e9 n/sec.; time=8345 sec.
sb-24/t2/nohup.out 21.55e9 n/sec.; time=4269 sec.
sb-24/t3/nohup.out 31.54e9 n/sec.; time=2917 sec.
sb-24/t4/nohup.out 40.36e9 n/sec.; time=2280 sec.
sb-24/t6/nohup.out 35.93e9 n/sec.; time=2561 sec.
sb-24/t8/nohup.out 42.58e9 n/sec.; time=2161 sec.

gap11a, sb=24 with 1,2,3,4,6,8 threads (results match gap11 above):

sb-24-11a/t1/nohup.out 11.76e9 n/sec.; time=7825 sec.
sb-24-11a/t2/nohup.out 22.80e9 n/sec.; time=4035 sec.
sb-24-11a/t3/nohup.out 33.48e9 n/sec.; time=2748 sec.
sb-24-11a/t4/nohup.out 42.78e9 n/sec.; time=2151 sec.
sb-24-11a/t6/nohup.out 37.46e9 n/sec.; time=2456 sec.
sb-24-11a/t8/nohup.out 44.64e9 n/sec.; time=2061 sec.



All show some dropoff from 1 to 4 threads, but the sb=24 is much better. 8 threads is just a little faster than 4 but not by much. I'm not sure if the 4 thread results would improve if hyperthreading was disabled in BIOS. I'd also wonder if the machine was used for other things if it'd be best to leave at 4 so other processing wouldn't interfere much.

gap11a is gap11 with new asm. I am using the full 64-bit mulredc unconditionally as it doesn't seem worth it to conditionally use 63-bit / 64-bit considering our range. Small improvements to mulmod and addmod as well.
danaj is offline   Reply With Quote
Old 2017-10-16, 16:15   #26
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

5×7×139 Posts
Default

Dana,

Have you tried to run two instances of the client using 4 threads each and understand if the overall output is higher than running one instance with 8 threads?

Carlos
pinhodecarlos is online now   Reply With Quote
Old 2017-10-16, 16:46   #27
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

22×227 Posts
Default

Quote:
Originally Posted by pinhodecarlos View Post
Have you tried to run two instances of the client using 4 threads each and understand if the overall output is higher than running one instance with 8 threads?
Not on this machine. On the AWS instance I compared 2x8 vs 16, but there are a lot of unknowns with an AWS instance like that (it's probably shared). After this next real run finishes I could try it.

Another interesting thing to try would be seeing if Windows runs are any different than Linux. I was very disappointed with my laptop runs in a VM. It doesn't have Cygwin so I couldn't easily run the pre-built Windows executables, and my Windows gcc was giving me issues with one of the timing headers gap11.c uses, so I didn't do anything with it.

At the moment it's looking like hyperthreading is getting us very little. Something similar should be done for the large gap search.
danaj is offline   Reply With Quote
Old 2017-10-16, 16:59   #28
Antonio
 
Antonio's Avatar
 
"Antonio Key"
Sep 2011
UK

32×59 Posts
Default

Dana,
Have you tried tuning bs for 8 threads? With hyperthreading on you effectively have two cores sharing the same L2 cache, so reducing bs may help.
Antonio is offline   Reply With Quote
Old 2017-10-16, 18:00   #29
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

11100011002 Posts
Default

I did tuning on both sb and bs back in the 4e18 ranges, maybe early 5e18. Have not since. Once the next real range work finishes in ~4 days I'll try some more benchmarks on bs=...

For those looking at the trend above, I ran sb=23 with 8 threads and it came out slower than sb=24. Not unexpected, but I wanted to verify.
danaj is offline   Reply With Quote
Old 2017-10-16, 18:38   #30
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

5×7×139 Posts
Default

Basically we need to buy more computers....
pinhodecarlos is online now   Reply With Quote
Old 2017-10-16, 19:27   #31
rudy235
 
rudy235's Avatar
 
Jun 2015
Vallejo, CA/.

7×139 Posts
Default

Quote:
Originally Posted by pinhodecarlos View Post
Basically we need to buy more computers....
I'm buying one, hopefully next month or December, and join the good fight.
rudy235 is offline   Reply With Quote
Old 2017-10-21, 08:17   #32
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

22·227 Posts
Default

Quote:
Originally Posted by Antonio View Post
Have you tried tuning bs for 8 threads? With hyperthreading on you effectively have two cores sharing the same L2 cache, so reducing bs may help.
bs12/nohup.out:Search used 2032 sec. (Wall clock time), 16125.14 cpu sec.
bs13/nohup.out:Search used 2014 sec. (Wall clock time), 15937.78 cpu sec.
bs14/nohup.out:Search used 1993 sec. (Wall clock time), 15811.31 cpu sec.
bs15/nohup.out:Search used 1992 sec. (Wall clock time), 15762.47 cpu sec.
bs16/nohup.out:Search used 1982 sec. (Wall clock time), 15724.18 cpu sec.
bs17/nohup.out:Search used 1980 sec. (Wall clock time), 15713.83 cpu sec.
bs18/nohup.out:Search used 2061 sec. (Wall clock time), 16284.61 cpu sec.

All with sb 24 and 8 threads. Looks like bs 17 is the winner. None of them are horribly off, but 4% faster is definitely worth it.
danaj is offline   Reply With Quote
Old 2017-11-01, 03:03   #33
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

22·227 Posts
Default

I ran sets of 10 residues in the 6300-6400 range with different threads on my laptop. g12_haswell Windows executable, sb=24, bs=13, 10GB (machine has 16GB, for these tests I checked there was plenty of free memory).

10.18 n/s 1 thread
18.17 n/s 2 threads
25.34 n/s 3 threads
30.38 n/s 4 threads
28.60 n/s 5 threads
30.86 n/s 6 threads
33.11 n/s 7 threads
35.03 n/s 8 threads

Lenovo Y700 i7-6700HQ 2.6GHz dual channel 2x8GB DDR4-2400

In other news with latest range and g12:
26.57 n/s AWS c4.2xlarge 8 thr about $50/month
51.25 n/s AWS c4.4xlarge 16 thr about $100/month
danaj is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
NAS hardware VictordeHolland Hardware 5 2015-03-05 23:37
Possible hardware errors... SverreMunthe Hardware 16 2013-08-19 14:39
GPU hardware problem Prime95 GPU Computing 33 2013-07-12 05:25
Hardware error Citrix Prime Sierpinski Project 12 2006-06-07 09:40
Hardware Problem! Help!! matermoh Hardware 14 2004-12-09 05:19

All times are UTC. The time now is 19:58.

Sat Feb 27 19:58:54 UTC 2021 up 86 days, 16:10, 0 users, load averages: 1.70, 1.77, 1.68

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.