mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Conjectures 'R Us

Reply
 
Thread Tools
Old 2017-06-02, 00:57   #199
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

24×263 Posts
Default

Quote:
Originally Posted by pepi37 View Post
288 K =1 M, 576 K = 2M
From where do you get these numbers? I get that an iteration moves more data than the exact FFT size, but I'd like a more definitive source if you know of one.
VBCurtis is offline   Reply With Quote
Old 2017-06-02, 06:35   #200
pepi37
 
pepi37's Avatar
 
Dec 2011
After milion nines:)

28×5 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
From where do you get these numbers? I get that an iteration moves more data than the exact FFT size, but I'd like a more definitive source if you know of one.
As I say before in this post " it is my observation"
Many users here will say: put one candidate per core and run it. It is optimal. I will say it is all but optimal.
Also there was post here or on Primegrid
Quote:
Multiply FFT size by 8 to get it in bytes e.g. 128k FFT size = 1024kB. That would be 256k for 2MB, 512k for 4MB, 768k for 6MB, 1024k for 8MB. I would caution that's only for the FFT, and I don't know if there is much else needed for other things, so beware if you're at a limit. As an observation it seems to hold ok.
pepi37 is online now   Reply With Quote
Old 2017-06-02, 12:17   #201
KEP
Quasi Admin Thing
 
KEP's Avatar
 
May 2005

91110 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Actually...should I be running *4* separate clients on this machine? 560K * 4 = 2240K, which still fits within L3 cache. The machine has 2 physical cores but 4 logical cores due to hyperthreading. I had always understood that LLR didn't gain much from hyperthreading, but since LLR performance is memory-bandwidth-intensive, perhaps there is some room to gain there...maybe I should try this.
Well, you should definently not run more clients than you have cores

With LLR 3.8.20, thanks to Batalov, LLR became multithreaded, wich in reality means, that we ventured into a whole new area of unknows. What is know, is that most computers gain, from using more than 1 core per client, if the FFT length is large enough. What appears to make the big difference is that most of our clients still suffers bottlenecks, while the CPU waits for RAM to catch up. This bottleneck is severely reduced by running more cores per client.

What works best at your test level on your machine, you have to try and figure out, by timing LLR. But most likely, you are loosing performance, if your are running one core per client. I can give you an example from my Sandy Bridge, it tested base 16 number at around 3999 sec/test at n=2.4M (base 2), now for the same k at n=2517108 it test on 2 cores at around 1899 seconds. This gives a difference between (pre-multithreading) most productive testing scheme and current testing scheme that looks like this:

3 clients running 1 core doing 3*86400s=259200CPU-seconds / 3999 sec/test = 64.82 tests/day
2 clients running 2 cores each doing 2*86400s=172800CPU-seconds / 1899 sec/test = 91,00 test/day

So as you can see, even though I'm currently testing an n-value 5% larger than the one completing on a single core, I'm doing 40.1% more work a day (in count of completed tests)... if you count the amount of completed bits, I'll have an even higher productivity gain

But in order for you to find out wether or not you gain from multithreading or not, relies completely on testing done locally on your own system.

Take care

KEP

Ps. The line in llr.ini you need to add is as follows: ThreadsPerTest=
KEP is offline   Reply With Quote
Old 2017-06-03, 02:31   #202
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Got it. Probably worth trying, then, since (per pepi37's 8x rule for determining the working set of a given FFT) a 560K FFT x 8 = 4480 kB working set per core i.e. 8960 kB for two cores. Clearly much larger than my 4 MB L3 cache; even 1 core would still be larger, but perhaps less memory bandwidth pressure would still be a good thing.

I'll try this as soon as I get the chance - thanks for all the info!
mdettweiler is offline   Reply With Quote
Old 2017-06-03, 19:22   #203
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Got it. Probably worth trying, then, since (per pepi37's 8x rule for determining the working set of a given FFT) a 560K FFT x 8 = 4480 kB working set per core i.e. 8960 kB for two cores. Clearly much larger than my 4 MB L3 cache; even 1 core would still be larger, but perhaps less memory bandwidth pressure would still be a good thing.

I'll try this as soon as I get the chance - thanks for all the info!
Just to follow up on this: I tried running with -t2 overnight and I've gotten at least an 8% improvement in overall efficiency. Nice!

I say "at least" because normal variation in test times due to background processes, etc. makes it difficult to get an exact figure. I took a conservative estimate by taking the longest -t2 test time I observed, multiplying it by 2, and comparing that with the shortest -t1 test time I had on record from the last few days. So I'm probably getting more than 8% improvement on the whole.

It's tough to get a more accurate measurement on this computer because it's also used for "real work" and doesn't run PRPnet continuously (just when I'm not using it). Hopefully I'll be able to get some more accurate numbers from some of my other boxes that crunch (sort of) full time.
mdettweiler is offline   Reply With Quote
Old 2017-06-05, 07:33   #204
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

After another day's worth of running with -t2, I have a better sample size of test timings to work with and it looks like I am getting a solid 16% reduction in average test times (averaged over the last 5 tests with -t2 compared with 5 tests on one of two single-threaded clients, normalized by multiplying the -t2 average time by 2).

Since even one client is still too big to fit in my 4 MB L3 cache (the x8 rule says that a 560K FFT = 4480 kB memory working set), it appears that this benefit is purely from reducing pressure on the memory bus. There should be a lot more to be gained for tests small enough to fit entirely within the cache when appropriately multithreaded. And, the benefit would be even greater for newer CPUs which are more prone to outrun their memory buses (my Sandy Bridge is relatively old at this point). Given this, I can totally see where those 40% and 70% productivity increases KEP cites are coming from!

Thanks guys for pointing this out to me - I can see why everyone's talking about it as a big revolution!


(And yes, this exchange should definitely be moved to the "Software/Instructions/Questions" thread....)

Last fiddled with by mdettweiler on 2017-06-05 at 07:34
mdettweiler is offline   Reply With Quote
Old 2017-06-05, 13:18   #205
KEP
Quasi Admin Thing
 
KEP's Avatar
 
May 2005

38F16 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Thanks guys for pointing this out to me - I can see why everyone's talking about it as a big revolution!
Your welcome
KEP is offline   Reply With Quote
Old 2017-06-28, 23:51   #206
wikimax
 
Jul 2016

1 Posts
Default R247 reservation

Dear Mr. Barnes!

Is it possible to reserve the following interval:

Riesel
b=247
k=1 to 469184 (all k, i want to start a new range)
n=1 to 2^12=4096

I have only one PC and I want to check this interval with the program “Mathematica”. Should I send you the results in this forum? I want to transfer the results in an Excel-File for better reading. Or do you prefer another program (not Mathematica)? If yes, how can I start ist?

With regards!
wikimax is offline   Reply With Quote
Old 2017-06-29, 09:58   #207
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

3×5×317 Posts
Default

Quote:
Originally Posted by wikimax View Post
Dear Mr. Barnes!

Is it possible to reserve the following interval:

Riesel
b=247
k=1 to 469184 (all k, i want to start a new range)
n=1 to 2^12=4096

I have only one PC and I want to check this interval with the program “Mathematica”. Should I send you the results in this forum? I want to transfer the results in an Excel-File for better reading. Or do you prefer another program (not Mathematica)? If yes, how can I start ist?

With regards!
Can't you use a siever and either LLR or pfgw programs for such search?

Luigi
ET_ is offline   Reply With Quote
Old 2017-06-29, 13:38   #208
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

5,779 Posts
Default

Quote:
Originally Posted by wikimax View Post
Dear Mr. Barnes!

Is it possible to reserve the following interval:

Riesel
b=247
k=1 to 469184 (all k, i want to start a new range)
n=1 to 2^12=4096

I have only one PC and I want to check this interval with the program “Mathematica”. Should I send you the results in this forum? I want to transfer the results in an Excel-File for better reading. Or do you prefer another program (not Mathematica)? If yes, how can I start ist?
Use srbsieve. Read this thread to learn how to use it. You will also need newpgen, srsieve, and pfgw with srbsieve. Trust me in that it will save you many weeks, if not months, of time.

Also, you need to reserve n to 10,000 in the minimum, but preferably to 25,000.
rogue is offline   Reply With Quote
Old 2017-06-30, 09:31   #209
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

32·72·23 Posts
Default

Wikimax,

We cannot accept reservations for bases to n<10000. You will need to read our software thread. As discussed by others, you will need to use the appropriate software to test the bases. Mathematica will be very inefficient for doing these searches.


Gary
gd_barnes is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Useless SSE instructions __HRB__ Programming 41 2012-07-07 17:43
Questions about software licenses... WraithX GMP-ECM 37 2011-10-28 01:04
Software/instructions/questions gd_barnes No Prime Left Behind 48 2009-07-31 01:44
Instructions to manual LLR? OmbooHankvald PSearch 3 2005-08-05 20:28
Instructions please? jasong Sierpinski/Riesel Base 5 10 2005-03-14 04:03

All times are UTC. The time now is 07:19.

Thu Jul 2 07:19:30 UTC 2020 up 99 days, 4:52, 0 users, load averages: 1.24, 1.37, 1.37

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.