20170602, 00:57  #199 
"Curtis"
Feb 2005
Riverside, CA
17×271 Posts 

20170602, 06:35  #200  
Dec 2011
After milion nines:)
1389_{10} Posts 
Quote:
Many users here will say: put one candidate per core and run it. It is optimal. I will say it is all but optimal. Also there was post here or on Primegrid Quote:


20170602, 12:17  #201  
Quasi Admin Thing
May 2005
953 Posts 
Quote:
With LLR 3.8.20, thanks to Batalov, LLR became multithreaded, wich in reality means, that we ventured into a whole new area of unknows. What is know, is that most computers gain, from using more than 1 core per client, if the FFT length is large enough. What appears to make the big difference is that most of our clients still suffers bottlenecks, while the CPU waits for RAM to catch up. This bottleneck is severely reduced by running more cores per client. What works best at your test level on your machine, you have to try and figure out, by timing LLR. But most likely, you are loosing performance, if your are running one core per client. I can give you an example from my Sandy Bridge, it tested base 16 number at around 3999 sec/test at n=2.4M (base 2), now for the same k at n=2517108 it test on 2 cores at around 1899 seconds. This gives a difference between (premultithreading) most productive testing scheme and current testing scheme that looks like this: 3 clients running 1 core doing 3*86400s=259200CPUseconds / 3999 sec/test = 64.82 tests/day 2 clients running 2 cores each doing 2*86400s=172800CPUseconds / 1899 sec/test = 91,00 test/day So as you can see, even though I'm currently testing an nvalue 5% larger than the one completing on a single core, I'm doing 40.1% more work a day (in count of completed tests)... if you count the amount of completed bits, I'll have an even higher productivity gain But in order for you to find out wether or not you gain from multithreading or not, relies completely on testing done locally on your own system. Take care KEP Ps. The line in llr.ini you need to add is as follows: ThreadsPerTest= 

20170603, 02:31  #202 
A Sunny Moo
Aug 2007
USA (GMT5)
1100001101001_{2} Posts 
Got it. Probably worth trying, then, since (per pepi37's 8x rule for determining the working set of a given FFT) a 560K FFT x 8 = 4480 kB working set per core i.e. 8960 kB for two cores. Clearly much larger than my 4 MB L3 cache; even 1 core would still be larger, but perhaps less memory bandwidth pressure would still be a good thing.
I'll try this as soon as I get the chance  thanks for all the info! 
20170603, 19:22  #203  
A Sunny Moo
Aug 2007
USA (GMT5)
3×2,083 Posts 
Quote:
I say "at least" because normal variation in test times due to background processes, etc. makes it difficult to get an exact figure. I took a conservative estimate by taking the longest t2 test time I observed, multiplying it by 2, and comparing that with the shortest t1 test time I had on record from the last few days. So I'm probably getting more than 8% improvement on the whole. It's tough to get a more accurate measurement on this computer because it's also used for "real work" and doesn't run PRPnet continuously (just when I'm not using it). Hopefully I'll be able to get some more accurate numbers from some of my other boxes that crunch (sort of) full time. 

20170605, 07:33  #204 
A Sunny Moo
Aug 2007
USA (GMT5)
3·2,083 Posts 
After another day's worth of running with t2, I have a better sample size of test timings to work with and it looks like I am getting a solid 16% reduction in average test times (averaged over the last 5 tests with t2 compared with 5 tests on one of two singlethreaded clients, normalized by multiplying the t2 average time by 2).
Since even one client is still too big to fit in my 4 MB L3 cache (the x8 rule says that a 560K FFT = 4480 kB memory working set), it appears that this benefit is purely from reducing pressure on the memory bus. There should be a lot more to be gained for tests small enough to fit entirely within the cache when appropriately multithreaded. And, the benefit would be even greater for newer CPUs which are more prone to outrun their memory buses (my Sandy Bridge is relatively old at this point). Given this, I can totally see where those 40% and 70% productivity increases KEP cites are coming from! Thanks guys for pointing this out to me  I can see why everyone's talking about it as a big revolution! (And yes, this exchange should definitely be moved to the "Software/Instructions/Questions" thread....) Last fiddled with by mdettweiler on 20170605 at 07:34 
20170605, 13:18  #205 
Quasi Admin Thing
May 2005
953 Posts 

20170628, 23:51  #206 
Jul 2016
1 Posts 
R247 reservation
Dear Mr. Barnes!
Is it possible to reserve the following interval: Riesel b=247 k=1 to 469184 (all k, i want to start a new range) n=1 to 2^12=4096 I have only one PC and I want to check this interval with the program “Mathematica”. Should I send you the results in this forum? I want to transfer the results in an ExcelFile for better reading. Or do you prefer another program (not Mathematica)? If yes, how can I start ist? With regards! 
20170629, 09:58  #207  
Banned
"Luigi"
Aug 2002
Team Italia
2·5·479 Posts 
Quote:
Luigi 

20170629, 13:38  #208  
"Mark"
Apr 2003
Between here and the
6164_{10} Posts 
Quote:
Also, you need to reserve n to 10,000 in the minimum, but preferably to 25,000. 

20170630, 09:31  #209 
May 2007
Kansas; USA
19×541 Posts 
Wikimax,
We cannot accept reservations for bases to n<10000. You will need to read our software thread. As discussed by others, you will need to use the appropriate software to test the bases. Mathematica will be very inefficient for doing these searches. Gary 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Useless SSE instructions  __HRB__  Programming  41  20120707 17:43 
Questions about software licenses...  WraithX  GMPECM  37  20111028 01:04 
Software/instructions/questions  gd_barnes  No Prime Left Behind  48  20090731 01:44 
Instructions to manual LLR?  OmbooHankvald  PSearch  3  20050805 20:28 
Instructions please?  jasong  Sierpinski/Riesel Base 5  10  20050314 04:03 