![]() |
Sieve Benchmark Thread
Let's get the ball rolling on this one.
Processor: Pentium 4 3.4 GHz tpsieve for the variable n-range: 5M p/sec tpsieve for a single n: 71.5M p/sec NewPGen for a single n: 86M p/sec NewPGen for "Operation Megabit Twin": estimated to be 80 hours for 1T |
CPU: Intel i5-750 (all 4 cores loaded).
tpsieve on x86_64 Linux for n=480000-485000: 108M p/sec. |
[quote=Historian;217612]NewPGen for "Operation Megabit Twin": estimated to be 80 hours for 1T[/quote]
FYI, this converts to 3472222 p/sec, i.e. 3M p/sec since this very rough estimate is only good to one significant figure. (Just to put this in perspective with the other p/sec estimates.) |
[QUOTE=mdettweiler;217643]FYI, this converts to 3472222 p/sec, i.e. 3M p/sec since this very rough estimate is only good to one significant figure. (Just to put this in perspective with the other p/sec estimates.)[/QUOTE]
From what I've seen, the Megabit Twin project goes through a range of k, not a range of p. So that's ~3.5M k/sec, not 3.5M p/sec. |
[quote=Historian;217664]From what I've seen, the Megabit Twin project goes through a range of k, not a range of p. So that's ~3.5M k/sec, not 3.5M p/sec.[/quote]
Ah, right, I see now...most of the prime search efforts I've worked with deal with relatively small ranges of k, and thus I am used to always having an unqualified reference to the suffix "T" refer to p, not k. Since in this project both values are of magnitudes that can be reasonably referred to in T, I would suggest that in the future qualifiers be used: for example "k=1T" instead of just 1T, leaving the latter (or even better, p=1T) strictly for p references. |
You can also calculate a rate in p/sec. We are currently sieving to p=100e9 and therefore 80 hours translates to 347k p/sec. Not very fast, but NewPGen has to break a 1T k range into almost 250 pieces until it gets to p=1e9.
|
[quote=amphoria;217694]You can also calculate a rate in p/sec. We are currently sieving to p=100e9 and therefore 80 hours translates to 347k p/sec. Not very fast, but NewPGen has to break a 1T k range into almost 250 pieces until it gets to p=1e9.[/quote]
It is possible to change the level of the recombine. Just set a limit on the sieve to shortly after you estimate 1/250(I would recommend 1/300 or less in case of mistakes) are remaining. That could in theory mean its possible to combine really early like 1e6 or something like that. I will do a test now to see vaguely when. edit: ~p=4e4 would do the trick nicely |
[QUOTE=henryzz;217716]It is possible to change the level of the recombine. Just set a limit on the sieve to shortly after you estimate 1/250(I would recommend 1/300 or less in case of mistakes) are remaining. That could in theory mean its possible to combine really early like 1e6 or something like that.
I will do a test now to see vaguely when. edit: ~p=4e4 would do the trick nicely[/QUOTE] You saved me a job. :smile: I was about to do some tests. So, you are suggesting sieving to just 40,000 then again to 100G and it will fit into 485Mb? Just making sure I've got it right. |
[QUOTE=Flatlander;217720]You saved me a job. :smile:
I was about to do some tests. So, you are suggesting sieving to just 40,000 then again to 100G and it will fit into 485Mb? Just making sure I've got it right. [/QUOTE] The default option for NewPGen is to sieve to 1G, then to 100G. I don't know whether it's possible to change it to what you were suggesting. |
I meant run it once to 40,000 then manually load it again to 100G.
|
[quote=Flatlander;217720]You saved me a job. :smile:
I was about to do some tests. So, you are suggesting sieving to just 40,000 then again to 100G and it will fit into 485Mb? Just making sure I've got it right.[/quote] That should work. Once each bit is sieved upto the limit set(in Options|Sieve Until in windows) they will be comibined into one file which should be in theory small enougth to fit into 485Mb. I haven't tested this although I have done something like this to combine early(not really early like this) before so I know that bit works. It's the 485Mb bit that I am not so certain over. It depends whether the memory usage is just number of candidates or if it is also effected by distance between candidates etc. |
Okay, thanks. I'll try it with the 0.75T range I have left to do and see how much memory it uses.
|
[QUOTE=henryzz;217716]It is possible to change the level of the recombine. Just set a limit on the sieve to shortly after you estimate 1/250(I would recommend 1/300 or less in case of mistakes) are remaining. That could in theory mean its possible to combine really early like 1e6 or something like that.
I will do a test now to see vaguely when. edit: ~p=4e4 would do the trick nicely[/QUOTE] How much time will this save? By my reckoning, you won't actually save any time at all. |
[QUOTE=axn;217731]How much time will this save? By my reckoning, you won't actually save any time at all.[/QUOTE]
Possibly. Sieving a 0.05T range to 4e4 just took 27 mins, so < 10 hours for 1T. (Compare with c.24hrs to 1G.) Sieving from 4e4 to 100G is progressing painfully slowly but of course the first bit is always much slower. I think I'll just stop now and run a whole 1T range, now that I know it will easily fit in 485Mb once sieved to 4e4. I'll post the timings here later. The NPG help file states: "NewPGen is happy with lots of k's to sieve - there is nothing to be gained by dividing a range of k's up and sieving each subrange in turn..." So I'm hoping there will be an increase in efficiency. |
[QUOTE=Flatlander;217741]Sieving from 4e4 to 100G is progressing painfully slowly but of course the first bit is always much slower.[/quote]
That's why I am asserting that there'll be no speed gain. In fact, due to the greater IO involved, it might actually be slower. [QUOTE=Flatlander;217741] The NPG help file states: "NewPGen is happy with lots of k's to sieve - there is nothing to be gained by dividing a range of k's up and sieving each subrange in turn..." So I'm hoping there will be an increase in efficiency.[/QUOTE] Only true when p >= k range (or maybe range/2). Otherwise there is no increase in efficiency, and might even be slower due to memory pressure (Fast Array vs Normal Array mode). |
Fair enough. :smile:
|
[QUOTE=Historian;217612]
Processor: Pentium 4 3.4 GHz tpsieve for the variable n-range: 5M p/sec [/QUOTE] tpsieve for the variable n-range: 14.1M p/sec using one core of a 2.4 GHz core2duo processor. |
[QUOTE=axn;217744]
Only true when p >= k range (or maybe range/2). Otherwise there is no increase in efficiency, and might even be slower due to memory pressure (Fast Array vs Normal Array mode).[/QUOTE] This is in fact the problem with only sieving to p=4e4 in the first stage. It leaves about 32.5 M k's and therefore NewPGen uses normal array mode. I hadn't worked out why it was so slow until I stopped the sieve at p=1e6. When restarted NewPGen switched to fast array mode and the removal rate jumped from 40 k/sec to 280 k/sec, If there is an advantage (yet to be proven), then the first stage needs to sieve to much closer to p=1e6. |
Re. Megabit Twin Sieve
I tried sieving to 10M then again to 100G. The total time was 10-20% slower than letting NPG do it automatically. I would seem that indeed the 'sweet spot' is above 1G. Some scribbled timings: Start to 10M took 15hrs 3m (Leaving 6.1M ks.) Started NPG again. (Used fast array, 384Mb ram.) 10M-20M took 1hr 24m 20m-100m took 4hrs 25m 100m-500m took 2hrs 41m 500m-100G to 8hrs 33m Total time c. 32hrs compared with c. 26-28 hrs (iirc) letting NPG do it automatically. C2Quad Q6700 at stock 2.66GHz. 2Gb ram, NPG memory at maximum 485Mb. |
[QUOTE=amphoria;218272]This is in fact the problem with only sieving to p=4e4 in the first stage. It leaves about 32.5 M k's and therefore NewPGen uses normal array mode. I hadn't worked out why it was so slow until I stopped the sieve at p=1e6. When restarted NewPGen switched to fast array mode and the removal rate jumped from 40 k/sec to 280 k/sec, If there is an advantage (yet to be proven), then the first stage needs to sieve to much closer to p=1e6.[/QUOTE]
Hmmm... How much time did the sieve take to get to 4e4? 1e6? 10e6? I am asking because I am "fairly confident" that I can write a custom sieve that can sieve a range of k's to 1e6 (or even 10e6) much faster than NewPGen can. NewPGen isn't really optimised for the initial sieving. |
[QUOTE=Flatlander;218274]Some scribbled timings:
Start to 10M took 15hrs 3m (Leaving 6.1M ks.) [/QUOTE] Hmmm. I should be able to do better than this. Gotta code it up, though, to know for sure. |
[QUOTE=axn;218276]Hmmm... How much time did the sieve take to get to 4e4? 1e6? 10e6?...[/QUOTE]
[QUOTE=Flatlander;217741]... Sieving a 0.05T range to 4e4 just took 27 mins, so < 10 hours for 1T...[/QUOTE] All I have is '7hrs 46m to 4e4' jotted on a piece of paper but that sounds a bit quick. All figures subject to slight variation as NPG had high priority while 4 x LLRNet with low priority were running in the background. (So I didn't waste any cycles when NPG finished overnight.) |
[QUOTE=axn;218276]Hmmm... How much time did the sieve take to get to 4e4? 1e6? 10e6?
I am asking because I am "fairly confident" that I can write a custom sieve that can sieve a range of k's to 1e6 (or even 10e6) much faster than NewPGen can. NewPGen isn't really optimised for the initial sieving.[/QUOTE] It took 13 hours to get to 4e4. It took about 56 hours to go from 4e4 to 1e6. It is currently still sieving at 27e6 with 4.8M k's remaining (15 hours from 1e6). By comparison letting NewPGen do the split at p=1e9 took 42.5 hours to 1e9 and another 10.5 hours to go from 1e9 to 100e9. |
[QUOTE=amphoria;218299]It took 13 hours to get to 4e4. It took about 56 hours to go from 4e4 to 1e6. It is currently still sieving at 27e6 with 4.8M k's remaining (15 hours from 1e6).[/QUOTE]
It took 23.5 hours to go from 1e6 to 1e9. |
I've got a preliminary sieve cooked up. Currently it sieves a 1T range to 50e6 in around 1.2 hrs (C2D 2 GHz. Linux 64-bit).
Undergoing testing. Once testing checks out, assuming there is interest, I will post the code here. |
[QUOTE=axn;218487]I've got a preliminary sieve cooked up. Currently it sieves a 1T range to 50e6 in around 1.2 hrs (C2D 2 GHz. Linux 64-bit).
Undergoing testing. Once testing checks out, assuming there is interest, I will post the code here.[/QUOTE] Wow, that's pretty fast. Were you using one core or both of them? |
@axn
Brilliant! I assume someone will be able to port it to Windows, 64 bit? |
[QUOTE=Oddball;218502]Wow, that's pretty fast. Were you using one core or both of them?[/QUOTE]
One core. [QUOTE=Flatlander;218515]@axn Brilliant! I assume someone will be able to port it to Windows, 64 bit?[/QUOTE] It's written in pascal. Will compile under free pascal (they have 32-bit & 64-bit compilers for windows and linux) |
It looks to me that pascal is not as fast as c++.
[url]http://shootout.alioth.debian.org/u32/pascal.php[/url] |
[QUOTE=henryzz;218555]It looks to me that pascal is not as fast as c++.
[url]http://shootout.alioth.debian.org/u32/pascal.php[/url][/QUOTE] I agree, in general. However, the siever spends ~99% of its time in about 15 lines of code which are written in assembly. Even there, I haven't bothered with too much optimization. The program is simple enough to be ported to any language without much trouble -- but unless you're an assembly expert, you'll find little gain. EDIT:- More gains can be made from a more complex algorithm. However, the current gains (vis-a-vis Newpgen) are good enough that I can't justify spending more time on performance. My current focus is on increasing the p limit, so that the time spend on sieving by Newpgen is minimized. |
[QUOTE=axn;218487]I've got a preliminary sieve cooked up. Currently it sieves a 1T range to 50e6 in around 1.2 hrs (C2D 2 GHz. Linux 64-bit).
Undergoing testing. Once testing checks out, assuming there is interest, I will post the code here.[/QUOTE] Are you sieving for both Twins and Sophie Germains? |
[QUOTE=amphoria;218563]Are you sieving for both Twins and Sophie Germains?[/QUOTE]
Yep. Exploiting the fact that k should be multiple of both 3 and 5. |
1 Attachment(s)
Ok. Here's the source code.
Currently sieves 1T upto p=100e6 in 50 minutes. Experiment with tweaking the two parameters in the file. If you can't download Free Pascal to compile it, I can try compiling for Win64/Linux64 and post the binaries. EDIT:- Could use more testing. Bug reports welcome. |
[quote=axn;218600]Ok. Here's the source code.
Currently sieves 1T upto p=100e6 in 50 minutes. Experiment with tweaking the two parameters in the file. If you can't download Free Pascal to compile it, I can try compiling for Win64/Linux64 and post the binaries. EDIT:- Could use more testing. Bug reports welcome.[/quote] I tried 20T-21T on a Linux (ubuntu) 64bit CPU c2q 6600 2.6Ghz Time 28 min. sieved to 100G ~3.5M candidates left !! Should it be that many ?? Lennart |
[QUOTE=Lennart;218606]I tried 20T-21T on a Linux (ubuntu) 64bit CPU c2q 6600 2.6Ghz
Time 28 min. sieved to 100G ~3.5M candidates left !! Should it be that many ?? Lennart[/QUOTE] There should be 1M candidates left after sieving to 100G. Maybe you sieved to 100M instead? But even then, sieving 1T to p=100M in half an hour is really fast. One quad core machine could finish the entire k=1-500T range in less than a week! |
[quote=Oddball;218608]There should be 1M candidates left after sieving to 100G. Maybe you sieved to 100M instead?
But even then, sieving 1T to p=100M in half an hour is really fast. One quad core machine could finish the entire k=1-500T range in less than a week![/quote] I used axn's code. Lennart |
[QUOTE=Lennart;218610]I used axn's code.
Lennart[/QUOTE] That only sieves to 100M. The file can be further sieved using NewPGen. |
[quote=axn;218611]That only sieves to 100M. The file can be further sieved using NewPGen.[/quote]
Yes I saw that later :smile: Thanks Lennart |
[QUOTE=Lennart;218612]Yes I saw that later :smile:
Thanks Lennart[/QUOTE] Can you do one more benchmark? Can you double both SieveSize and SmallPrimes and rerun it? That will double the sieve depth to appr 200M. It would be interesting to see the scaling. |
[quote=axn;218613]Can you do one more benchmark? Can you double both SieveSize and SmallPrimes and rerun it? That will double the sieve depth to appr 200M. It would be interesting to see the scaling.[/quote]
Ok I shall try. Lennart EDIT: I have started 20-21T with those changes. |
[quote=axn;218613]Can you do one more benchmark? Can you double both SieveSize and SmallPrimes and rerun it? That will double the sieve depth to appr 200M. It would be interesting to see the scaling.[/quote]
Ok done to 200e6 33min. 3M candidates left.[CODE]smirre2@smirre2-desktop:~/Desktop/tps_pascal$ ./LuckyMinus.pas 1000000 20 21 smirre2@smirre2-desktop:~/Desktop/tps_pascal$ ./LuckyMinus.pas 1000000 20 21 Lucky minus n=1000000 k=20000000000025-20999999999985 p<=217645177 2084/2084 = 3037726 smirre2@smirre2-desktop:~/Desktop/tps_pascal$ [/CODE] Lennart |
[QUOTE=Lennart;218617]Ok done to 200e6 33min. 3M candidates left.[/QUOTE]
That's actually excellent scaling. The selection of these parameters to optimize the total sieving time is an interesting problem -- probably needing some experimentation. Especially when running single copy vs multiple copies. It is probably a good idea to increase SmallPrimes much further, depending on how much time NewPGen takes to get it up to 1e9. |
I am currently trying to compile a Win64 build. But running into some weird runtime error. Need to troubleshoot :yucky:
|
[QUOTE=Historian;217612]
Processor: Pentium 4 3.4 GHz ... NewPGen for "Operation Megabit Twin": estimated to be 80 hours for 1T [/QUOTE] I have a similar processor (Pentium 4, 3.2 GHz), and I can confirm that it does take about 80 hours to sieve 1T using NewPGen. With the new sieve, it's expected to take 4-5 hours: 1 hour and 10 minutes using the new sieve to p=100M, and 3-4 hours to use NewPGen to sieve from p=100M to p=100G. |
[QUOTE=Oddball;218623]With the new sieve, it's expected to take 4-5 hours: 1 hour and 10 minutes using the new sieve to p=100M[/QUOTE]
[quote]Can you double both SieveSize and SmallPrimes and rerun it?[/quote] The sweet spot for my PC seems to be increasing SieveSize and SmallPrimes by 50% (SieveSize = 1500000 and SmallPrimes = 9000000). It then takes 1 hour and 3 minutes to sieve a 1T range to p=160M. |
[QUOTE=Oddball;218623]3-4 hours to use NewPGen to sieve from p=100M to p=100G.[/QUOTE]
Sounds wrong. Do you mean 30-40 hrs? |
[QUOTE=axn;218677]Sounds wrong. Do you mean 30-40 hrs?[/QUOTE]
Actually, I meant 15-16 hours, which is 12 hours longer than my original estimate. The time was estimated to be from 7PM - 10:30AM, but I accidentally calculated it as 7PM - 10:30PM :blush: |
Clueless.
[QUOTE=axn;218622]I am currently trying to compile a Win64 build. But running into some weird runtime error. Need to troubleshoot :yucky:[/QUOTE]
I haven't used FreePascal before but fed your text file into 64 bit Lazarus and an .exe fell out! :cool: Anyway, the exe runs okay but I got the following compiler messages: [CODE]lm(44,19) Hint: Converting the operands to "DWord" before doing the add could prevent overflow errors. lm(82,23) Hint: Converting the operands to "Int64" before doing the multiply could prevent overflow errors. lm(102,23) Hint: Converting the operands to "Int64" before doing the multiply could prevent overflow errors. lm(124,23) Hint: Converting the operands to "Int64" before doing the multiply could prevent overflow errors. lm(205,17) Hint: Converting the operands to "DWord" before doing the add could prevent overflow errors. lm(212,48) Hint: Converting the operands to "DWord" before doing the subtract could prevent overflow errors. lm(218,51) Hint: Converting the operands to "DWord" before doing the subtract could prevent overflow errors. lm(223,18) Hint: Converting the operands to "DWord" before doing the add could prevent overflow errors. lm(225,51) Hint: Converting the operands to "DWord" before doing the subtract could prevent overflow errors. lm(312,35) Hint: Converting the operands to "DWord" before doing the add could prevent overflow errors. Project "lm" successfully built. :) [/CODE] Is the exe 'safe' to use? (Windows 7 64bit.) |
[QUOTE=Flatlander;218771]
Is the exe 'safe' to use? [/QUOTE] Should be. I have used "proper" data types, so should be ok. You can spot check a 100G k range against NewPGen, sieved to same depth. |
Not thinking big enough.
SieveSize and SmallPrimes can be bumped up quite a bit. I have tried SmallPrimes of 60e6 (p~1.2e9), and SieveSize of 6e6, and it runs in under 2hrs. That should save a lot more hours off NewPGen sieving.
|
axnSieve rocks!
Using SmallPrimes of 60e6 ("p<=1,190,494,759"), and SieveSize of 6e6 as above:
2hr 55m for 1T range on one core of DualCore T4400 laptop. 2.2Ghz, 1Mb L2 cache. Windows 7, 64 bit. Very nice. :smile::tu: Uses 1,268,972K of RAM in task manager. As you suggest, I'll compare a sample with NPG's output. |
axnSieve
SmallPrimes of 60e6, SieveSize of 60e5 produces identical results to NPG over a 0.05T sample. (The only difference was the header where NPG stopped at a P ten less than axnSieve.)
Testing underway for 90e6/90e5, P=1,824,261,409. Looks like 2T will take about 6hr 15m. (Uses 1,833,380KB.) I reached a compiler error at 93e6/93e5 but 92e6/92e5 compiled fine. (I won't go that high though.) With 60e6/60e5 a 1T sieve uses 192Mb in NPG so 2T will fit comfortably, but even with 90e6/90e5 I don't think 3T will be <485Mb. (Hmmm. Might be worth tweaking the program to do 2.5T.) |
[QUOTE=Flatlander;218881]With 60e6/60e5 a 1T sieve uses 192Mb in NPG so 2T will fit comfortably, but even with 90e6/90e5 I don't think 3T will be <485Mb. (Hmmm. Might be worth tweaking the program to do 2.5T.)[/QUOTE]
192 Mb is what newpgen automatically chooses. But it can work with much less while still using fast array mode (96 Mb, maybe even 48 Mb). I am going to go out on a limb and say that 384 MB fast array can handle ranges much larger than 20T (yes, 20, not 2). PS:- I remember there being a rule saying something like 6 bytes per k. That'd mean 384 MB can handle 64M (=67108864) candidates in fast array mode. |
[QUOTE=Flatlander;218881]Testing underway for 90e6/90e5, P=1,824,261,409. Looks like 2T will take about 6hr 15m. (Uses 1,833,380KB.)
I reached a compiler error at 93e6/93e5 but 92e6/92e5 compiled fine. (I won't go that high though.)[/QUOTE] If there is real savings to be had in allowing the program to sieve higher that 1G, I have a few ideas that can reduce the memory requirement for the bigger primes possibly allowing you to sieve as high as p=3G. However, I am not sure it is worth it. Basically, going from 60e6 (~1.2G) to 90e6 (~1.8G) takes you (6h15)/2 - 2h55 = 12 min (assuming both numbers are from the same machine). Implementing the memory saving measures will probably introduce a slow down of 10-15% (pure speculation). Let's say instead of 3hr7 for a 1T range, it takes 3h30. So the effective delta would be 35 min instead of 12 min. Can NewPGen cover the same range (ie 1.2-1.8G) in 30 min? I realise that, since 1.8G is already done, the correct analysis should be from 1.8G to _optimal sieve point_. Fine. Can you post some timing for NewPGen to take a 1T (or 2T or whatever) range from p=1.8G to p=3G in increments of 0.2G? That'll give me a clue as to what is a good cutoff point. PS:- There is another idea that'll give me a 5x speed improvement. But this involves sieving candidates out of order (technically, residue classes mod 7*11*13). So the candidates will have to be sorted after the sieving step. |
[QUOTE=axn;218882]PS:- I remember there being a rule saying something like 6 bytes per k. That'd mean 384 MB can handle 64M (=67108864) candidates in fast array mode.[/QUOTE]
Unfortunately that is not what I experienced. With 32.5M candidates it used normal array mode. |
[QUOTE=axn;218883]Can NewPGen cover the same range (ie 1.2-1.8G) in 30 min?
[/QUOTE] iirc I think it takes an hour or more. [QUOTE=axn;218883]... Can you post some timing for NewPGen to take a 1T (or 2T or whatever) range from p=1.8G to p=3G in increments of 0.2G? That'll give me a clue as to what is a good cutoff point. [/QUOTE] I'll try. [QUOTE=axn;218883]... PS:- There is another idea that'll give me a 5x speed improvement. But this involves sieving candidates out of order (technically, residue classes mod 7*11*13). So the candidates will have to be sorted after the sieving step.[/QUOTE] Sounds good. I wouldn't know how to sort them tho'. [QUOTE=axn;218882]192 Mb is what newpgen automatically chooses. But it can work with much less while still using fast array mode (96 Mb, maybe even 48 Mb). I am going to go out on a limb and say that 384 MB fast array can handle ranges much larger than 20T (yes, 20, not 2). PS:- I remember there being a rule saying something like 6 bytes per k. That'd mean 384 MB can handle 64M (=67108864) candidates in fast array mode.[/QUOTE] So are you saying that sieving maybe 20T and feeding it into NPG is about optimal with the software as it is now? :surprised |
[QUOTE=Flatlander;218888]So are you saying that sieving maybe 20T and feeding it into NPG is about optimal with the software as it is now? :surprised[/QUOTE]
Well maybe 10T (in light of amphoria's post). [QUOTE=amphoria;218886]Unfortunately that is not what I experienced. With 32.5M candidates it used normal array mode.[/QUOTE] |
[QUOTE=Flatlander;218881]...
Testing underway for 90e6/90e5, P=1,824,261,409. Looks like 2T will take about 6hr 15m. (Uses 1,833,380KB.) [/QUOTE] Took 6hrs 17mins with the computer in use for browsing etc. 0.01T sample identical to NPG output (except header again.) I'll try to get some timings in NPG tomorrow. |
Some NPG timings. I wasn't able to take all of them at regular intervals but you should be able to get a curve from them.
Small variation (10s) due to saving the file every 20m but other variations I can't explain. The other core was idle. 2T range. Start P=1,824,261,409 to 1.9G 13m 08s to 2.0G 14m 04s to 2.1G 13m 08s to 2.[B]3[/B]G 28m 20s to 2.4G 13m 15s to 2.5G 12m 38s to 3.242G 1hr 19m to 3.4G 15m 2s to 3.506G 9m 26s to 3.6G 8m 1s (Sorry for the messy data. We had visitors!) I ran a 5T range on axnSieve (same PC) last night (90e6/90e5) and it took 14hrs 54m. A negligible slow down compared to 60e6/60e5. NPG is now running the file using fast array, 384Mb. |
I have finished re-running 17T-18T using axnSieve and NPG and the results match NPG only exactly.
I have found that with 32-bit Windows I get a runtime error 215 if I set SmallPrimes to 18e6 or greater. With SmallPrimes set to 12e6 it is fine. This is an arithmetic overflow error. Is it possible that this is a limitation of 32-bit? |
[QUOTE=amphoria;219017]I have found that with 32-bit Windows I get a runtime error 215 if I set SmallPrimes to 18e6 or greater. With SmallPrimes set to 12e6 it is fine. This is an arithmetic overflow error. Is it possible that this is a limitation of 32-bit?[/QUOTE]
Can you rebuild with -gl flag, no other optimizations turned on, and post the debug output? |
[QUOTE=axn;219019]Can you rebuild with -gl flag, no other optimizations turned on, and post the debug output?[/QUOTE]
In order to compile with -gl I switched from using the IDE to using the command line compiler and the problem has gone away. It looks like the IDE was using a compiler option that was causing the problem. Note I only used the default options in the IDE. Currently testing with SmallPrimes = 30e6. |
[QUOTE=Flatlander;219012]
2T range. Start P=1,824,261,409 to 1.9G 13m 08s to 2.0G 14m 04s to 2.1G 13m 08s to 2.[B]3[/B]G 28m 20s to 2.4G 13m 15s to 2.5G 12m 38s to 3.242G 1hr 19m to 3.4G 15m 2s to 3.506G 9m 26s to 3.6G 8m 1s [/QUOTE] Looks like there is some more performance gain to be had. I'll do some modifications this weekend. |
Unfortunately I spoke too soon. With the command compiler the outer loop in do_sieve_iter is not terminating. If I terminate it with Ctrl-C I loose some of the candidates at the end of the file.
Is it posisble that the commmand line compiler uses an aggressive optimisation that messes up the compilation of the loop? Has anyone else managed to compile LuckyMinus under Windows? I've currently reverted to the IDE compiled version with SmallPrimes = 15e6. |
[QUOTE=amphoria;219117]...Has anyone else managed to compile LuckyMinus under Windows?
[/QUOTE] 64-bit Windows 7, yes. I haven't tried 32-bit XP yet. (Under Lazarus. See my 'Clueless' post above, number 48.) Is the code optimized for/dependent on 64-bit for higher SmallPrimes? |
[QUOTE=Flatlander;219124]Is the code optimized for/dependent on 64-bit for higher SmallPrimes?[/QUOTE]
No. :smile: I've noticed that sometimes FPC code generation for "for loops" goes a bit wonky. Maybe a for-less code would be better. |
[QUOTE=Flatlander;219124]64-bit Windows 7, yes. I haven't tried 32-bit XP yet. (Under Lazarus. See my 'Clueless' post above, number 48.)[/QUOTE]
This was 32-bit Vista. |
[QUOTE=axn;219129]I've noticed that sometimes FPC code generation for "for loops" goes a bit wonky. Maybe a for-less code would be better.[/QUOTE]
I tried turning off all optimizations with -O- and the for loop now works correctly. Given that the inner loop is written in asm this should not make any significant difference to the performance. |
[QUOTE=Flatlander;219012]
... I ran a 5T range on axnSieve (same PC) last night (90e6/90e5) and it took 14hrs 54m. A negligible slow down compared to 60e6/60e5. NPG is now running the file using fast array, 384Mb.[/QUOTE] A 7T range also fits in fast array, 384Mb. |
There is an unsafe code in the program in 32-bit compiler, that'll mess up sieve from primes p > 280m (2^32/15).
So ranges done with 32-bit siever with p > 280m should be redone. Fix is to change line 200 from [CODE]t := (t*15) mod n[/CODE] to [CODE]t := (QWord(t)*15) mod n[/CODE] |
[QUOTE=axn;219200]There is an unsafe code in the program in 32-bit compiler, that'll mess up sieve from primes p > 280m (2^32/15).
So ranges done with 32-bit siever with p > 280m should be redone. Fix is to change line 200 from [CODE]t := (t*15) mod n[/CODE] to [CODE]t := (QWord(t)*15) mod n[/CODE][/QUOTE] This will affect all ranges done by me > 20T. I assume that you have now checked the code for other similar issues with 32-bit compilation. Edit: BTW please read this as a statement of fact rather than any implied criticism. I well understand the risks of using experimental code. Fortunately you caught it pretty early on. |
[QUOTE=amphoria;219212]I assume that you have now checked the code for other similar issues with 32-bit compilation.[/quote]
I think so, but I haven't done a thorough enough check to be 100% sure. If you want, you can wait for my confirmation until proceeding further. [QUOTE=amphoria;219212]Edit: BTW please read this as a statement of fact rather than any implied criticism. I well understand the risks of using experimental code. Fortunately you caught it pretty early on.[/QUOTE] Understood. In this case, however, any criticism would be well deserved. :redface: |
[QUOTE=axn;219240]I think so, but I haven't done a thorough enough check to be 100% sure. If you want, you can wait for my confirmation until proceeding further.[/QUOTE]
The rest of the code checks out. |
[QUOTE=axn;219254]The rest of the code checks out.[/QUOTE]
Thanks for confirming. |
For interest I compared the results of 23T-24T after applying the fix with the original results. There were about 39000 lines different.
|
[QUOTE=Flatlander;219151]A 7T range also fits in fast array, 384Mb.[/QUOTE]
As does 10T. :smile: (90e6/90e5) |
I'm having all sorts of problems trying to compile a Windows 32 bit axnSieve. Could someone please post an exe with as high as possible SmallPrimes and SieveSize that is already known to be good.
Thanks :smile: |
1 Attachment(s)
Try this one:
|
Thanks, that's running fine. :smile::tu:
|
[QUOTE=Oddball;220093]Try this one:[/QUOTE]
Actually, now that I have some more ram coming for my 32-bit system could you please make me an .exe with: SmallPrimes of 40e6 and SieveSize of 4e6. (lm4.exe)) and if you can: SmallPrimes of 60e6 and SieveSize of 6e6. (lm6.exe) SmallPrimes of 80e6 and SieveSize of 8e6. (lm8.exe) Much appreciated. :smile: @axn I suppose it's not worth making a version where SmallPrimes and SieveSize are entered at the command line? Then I could twiddle to my heart's content without pestering anyone. (Once someone compiles it for me. :rolleyes:) |
1 Attachment(s)
I don't know whether any of these work, since I don't have enough RAM to test them. You might want to compare a small 0.01T range with NewPGen's output to make sure there are no differences between the two files. But here you go:
|
1 Attachment(s)
Another one:
|
1 Attachment(s)
I've come across a strange error when trying to compile lm8.exe, and Windows gives a "virtual memory is increasing" prompt. Another try fails, but the third attempt seems to be successful. As I said before, you should compare a small 0.01T range with NewPGen's output to make sure that there are no differences between the two files.
|
That's great! Thank you.
|
i hope you're using the patched code for the 32 bit build
|
[QUOTE=axn;220222]i hope you're using the patched code for the 32 bit build[/QUOTE]
All of the .exe programs I attached were done with the patched code. The only things that weren't done with the patched code were some of my earlier ranges, but I didn't re-do them since they were under the safe p<280M threshold (I used p=217M). |
[quote=Historian;217612]
Processor: Pentium 4 3.4 GHz tpsieve for the variable n-range: 5M p/sec [/quote] One core of an AMD Phenom II @ 2.8 GHz: 16.6M p/sec |
[quote=MooMoo2;223629]One core of an AMD Phenom II @ 2.8 GHz: 16.6M p/sec[/quote]
That's because Turbo Core was on. With all 6 cores loaded and no background tasks, I'm getting 85M p/sec, or 14.2M p/sec/core. Doing things like browsing the web and using Microsoft Word drops the rate to about 82M p/sec, but it's still much faster than my old Pentium 4. |
[quote=Oddball;224323]That's because Turbo Core was on. With all 6 cores loaded and no background tasks, I'm getting 85M p/sec, or 14.2M p/sec/core.
Doing things like browsing the web and using Microsoft Word drops the rate to about 82M p/sec, but it's still much faster than my old Pentium 4.[/quote] Pardon my ignorance but, what's Turbo Core? :huh: Is it some kind of temporary automatic-overclock thing? |
[quote=mdettweiler;224340]Pardon my ignorance but, what's Turbo Core? :huh: Is it some kind of temporary automatic-overclock thing?[/quote]
It's like Intel's turbo boost: [URL]http://www.intel.com/technology/turboboost/[/URL] If 3 or more cores are idle and the CPU is cool enough, it automatically increases frequency from 2.8 GHz to 3.3 GHz and shuts down the other cores. But if most of the cores are active, the CPU remains at 2.8 GHz. |
[quote=Oddball;224342]It's like Intel's turbo boost:
[URL]http://www.intel.com/technology/turboboost/[/URL] If 3 or more cores are idle and the CPU is cool enough, it automatically increases frequency from 2.8 GHz to 3.3 GHz and shuts down the other cores. But if most of the cores are active, the CPU remains at 2.8 GHz.[/quote] Ah, I see. I suppose that would complicate benchmarks a bit--it's a somewhat common practice to leave some or all of the other cores in a machine idle when running a one-core benchmark, but it would seem in this case that one would want to keep at least 3 busy in order to ensure an unskewed result. BTW, have you considered overclocking your machine? If you have something better than the stock heatsink/fan cooling your CPU, then it's a good way to get a tidy performance boost even with all cores running. I personally have my Core 2 Quad Q6600 overclocked to 2.8 Ghz (from 2.4) and it runs stably (and noticeably faster) at that frequency without having to increase the CPU voltage at all. |
[quote=mdettweiler;224343]BTW, have you considered overclocking your machine?[/quote]
Not now. Room temperatures sometimes get up to the low 80's (27-28 degrees Celsius) during the hottest time of the day, so it'll be hard to keep the CPU cool enough if it's overclocked by a noticeable amount. I might consider overclocking in October or November, when temperatures drop to the mid-low 60's. It would be pretty awesome to get a round number of 100M p/sec. |
[quote=Oddball;224323]With all 6 cores loaded and no background tasks, I'm getting 85M p/sec, or 14.2M p/sec/core.
[/quote] [quote]BTW, have you considered overclocking your machine? If you have something better than the stock heatsink/fan cooling your CPU, then it's a good way to get a tidy performance boost even with all cores running.[/quote] Well, it looks like I didn't need an overclock to get that increase in speed. Switching from the 64 bit version to the SSE2 version boosts performance to an astonishing 134M p/sec :shock: At this rate, a 10T range would take less than a day to complete. [quote]One core of an AMD Phenom II @ 2.8 GHz: 16.6M p/sec [/quote] With one core running and all other cores idle, I get a rate of 26.1M p/sec using the SSE2 version. The 64 bit version produces the same benchmark as yours. |
[quote=Oddball;225417]Switching from the 64 bit version to the SSE2 version boosts performance to an astonishing 134M p/sec :shock:
[/quote] Sieve speed seems to vary for no apparent reason. While sieving in the 500T range, I was getting 133M p/sec if all cores were on with the SSE2 version. But now, in the 700T range, performance goes up to 138M p/sec, but it drops again to 132M p/sec at 7000T. |
I am testing tpsieve-cuda on a gts250 card.
test file 480k-485k test range 735T-740T 57Mp/sec ETA 5T ~ 24hr Lennart |
What types of speeds would we expect to see on a high-end GPU? According to this post: [URL]http://www.mersenneforum.org/showpost.php?p=227656&postcount=298[/URL]
high end GPU's are about 60 times faster than low end ones. |
[quote=MooMoo2;228556]What types of speeds would we expect to see on a high-end GPU? According to this post: [URL]http://www.mersenneforum.org/showpost.php?p=227656&postcount=298[/URL]
high end GPU's are about 60 times faster than low end ones.[/quote] On a GTX 460 (slight factory overclock IIRC): n=480K-485K (August file) p=1010T-1015T 279M p/sec. ETA ~5 hours This is with command line flags -m 38400, -Q 10e6, which seem to be optimal for this GPU on this sieve. |
| All times are UTC. The time now is 13:33. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.