mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Sierpinski/Riesel Base 5

Reply
 
Thread Tools
Old 2007-06-20, 18:29   #331
Cruelty
 
Cruelty's Avatar
 
May 2005

23×7×29 Posts
Default

I've just noticed that sr2sieve v.1.5.6 (linux x86-64) chooses hashtable size of 512kB instead of previously selected 1024kB (1.5.5). When I tried forcing -H 1024 I get the following error:
Code:
Wed Jun 20 20:25:24 2007 ERROR: Hashtable size 1024 Kb is too large, recompile with SHORT_HASHTABLE=0
BTW: I have just finnished double checking my range and both results are the same

Last fiddled with by Cruelty on 2007-06-20 at 18:39
Cruelty is offline   Reply With Quote
Old 2007-06-21, 00:05   #332
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13·89 Posts
Default

Quote:
Originally Posted by Cruelty View Post
I've just noticed that sr2sieve v.1.5.6 (linux x86-64) chooses hashtable size of 512kB instead of previously selected 1024kB (1.5.5). When I tried forcing -H 1024 I get the following error:
Code:
Wed Jun 20 20:25:24 2007 ERROR: Hashtable size 1024 Kb is too large, recompile with SHORT_HASHTABLE=0
512Kb will be the size of the Sieve of Eratosthenes bitmap, not the hashtable.

Previous versions used 1/2 L2 cache for the bitmap, but new machines coming out with massive L2 cache sizes are going to cause some problems.

In version 1.5.6 I limited the bitmap size to 512Kb, but if that is too small I could increase the limit as far as 4Mb. If you run with -L512 that will cause a 256Kb bitmap to be used, could you check whether there is a difference in speed from using 256Kb vs 512Kb?

One problem is that as larger bitmaps are used the p/sec times may become bumpy, you may need to take the average over a few reports to get a good comparison.
geoff is offline   Reply With Quote
Old 2007-06-21, 05:31   #333
Cruelty
 
Cruelty's Avatar
 
May 2005

23·7·29 Posts
Default

Attached are benchmarks for 1.5.5 and 1.5.6 - 1.5.5 is faster.
Attached Files
File Type: zip bench.zip (144.9 KB, 90 views)
Cruelty is offline   Reply With Quote
Old 2007-06-22, 02:07   #334
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

48516 Posts
Default

Quote:
Originally Posted by Cruelty View Post
Attached are benchmarks for 1.5.5 and 1.5.6 - 1.5.5 is faster.
It is necessary to compare the actual run times. The benchmark timer was changed in 1.5.6 to flush the pipeline before starting the benchmark, which should give more consistent times on some machines (like my P4/Celeron), but all times will now be a little longer. Also the Sieve of Eratosthenes is not part of the benchmarks, so the bitmap size will have no effect on them.

In version 1.5.7 I have increased the bitmap limit to 2Mb anyway, so it will only come into effect for machines with L2 cache size greater than 4Mb. I am a little worried that the large bitmap will have a negative effect on multi-core machines, but I will leave it up to the user to lower the limit with the -L switch if that is the case.

Also in version 1.5.7 for the x86-64 build there is a sr2sieve-fpu binary which can sieve up to 2^62 (the standard x86-64 binary is limited to 2^52). I expect the sr2sieve-fpu binary will be slower, but it would be interesting to know just how much slower.
geoff is offline   Reply With Quote
Old 2007-06-22, 21:04   #335
Cruelty
 
Cruelty's Avatar
 
May 2005

162410 Posts
Default

Indeed initial benchmark still favours 1.5.5 however "real" results are better using 1.5.7 (~3.5%).
Then the "fpu" version produces some really strange results + generates an error...
Code:
./sr2sieve-fpu -vv
sr2sieve 1.5.7 -- A sieve for multiple sequences k*b^n+/-1.
L1 data cache 32Kb (detected), L2 cache 2048Kb (detected).
Read 489119 terms for 10 sequences from ABCD format file `sr2data.txt'.
Split 10 base 2 sequences into 255 base 2^60 subsequences.
Loaded Legendre symbol lookup tables for 10 sequences from `sr2cache.bin'.
Using 16 Kb for the baby-steps giant-steps hashtable, maximum density 0.25.
Best time for baby step method gen/2: 919350.
Best time for baby step method gen/4: 914373.
Best time for baby step method gen/1: 18288.
Best time for giant step method gen/2: 802197.
Best time for giant step method gen/4: 779670.
Best time for giant step method gen/1: 787527.
Best time for ladder method gen/2: 1791.
Best time for ladder method gen/4: 1269.
Best time for ladder method gen/1: 3285.
Best time for ladder method add/1: 3924.
Using baby step method gen/1, giant step method gen/4, ladder method gen/4.
Resuming from checkpoint pmin=9204118684829 in `checkpoint.txt'.
Using 1024Kb for the Sieve of Eratosthenes bitmap.
Expecting to find factors for about 9768.73 terms in this range.
sr2sieve started: 1000000 <= n <= 1999997, 9204118684829 <= p <= 10000000000000
Segmentation fault (core dumped)

Last fiddled with by Cruelty on 2007-06-22 at 21:10
Cruelty is offline   Reply With Quote
Old 2007-06-23, 03:12   #336
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

115710 Posts
Default

Quote:
Originally Posted by Cruelty View Post
Then the "fpu" version produces some really strange results + generates an error...
Thanks for the report, it looks like I still have some work to do on that :-(.
geoff is offline   Reply With Quote
Old 2007-06-24, 01:19   #337
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

22058 Posts
Default sr2sieve 1.5.8 (x86-64 only)

I have fixed a corruption of the FPU stack in the sr2sieve-fpu build, but there may be other bugs remaining. Use sr2sieve-fpu for tesing only for now.

In the standard x86-64 build I have enabled gen/8 methods. There are not enough general registers to do 8 mulmods in parallel, but they may still be faster than gen/4 methods in some cases.
geoff is offline   Reply With Quote
Old 2007-06-24, 07:12   #338
Cruelty
 
Cruelty's Avatar
 
May 2005

31308 Posts
Default

Version 1.5.8 is marginally faster than 1.5.7:
Code:
./sr2sieve -vv
sr2sieve 1.5.7 -- A sieve for multiple sequences k*b^n+/-1.
L1 data cache 32Kb (detected), L2 cache 2048Kb (detected).
Read 489119 terms for 10 sequences from ABCD format file `sr2data.txt'.
Split 10 base 2 sequences into 255 base 2^60 subsequences.
Loaded Legendre symbol lookup tables for 10 sequences from `sr2cache.bin'.
Using 16 Kb for the baby-steps giant-steps hashtable, maximum density 0.25.
Best time for baby step method gen/2: 52875.
Best time for baby step method gen/4: 40599.
Best time for baby step method gen/1: 58644.
Best time for giant step method gen/2: 26460.
Best time for giant step method gen/4: 25011.
Best time for giant step method gen/1: 33462.
Best time for ladder method gen/2: 1467.
Best time for ladder method gen/4: 1134.
Best time for ladder method gen/1: 2538.
Best time for ladder method add/1: 2943.
Using baby step method gen/4, giant step method gen/4, ladder method gen/4.
Resuming from checkpoint pmin=9452855837203 in `checkpoint.txt'.
Using 1024Kb for the Sieve of Eratosthenes bitmap.
Expecting to find factors for about 9768.73 terms in this range.
sr2sieve started: 1000000 <= n <= 1999997, 9452855837203 <= p <= 10000000000000
p=9452986387147, 2188122 p/sec, 8132 factors, 87.84% done, 222 sec/factor
Code:
./sr2sieve -vv
sr2sieve 1.5.8 -- A sieve for multiple sequences k*b^n+/-1.
L1 data cache 32Kb (detected), L2 cache 2048Kb (detected).
Read 489119 terms for 10 sequences from ABCD format file `sr2data.txt'.
Split 10 base 2 sequences into 255 base 2^60 subsequences.
Loaded Legendre symbol lookup tables for 10 sequences from `sr2cache.bin'.
Using 16 Kb for the baby-steps giant-steps hashtable, maximum density 0.25.
Best time for baby step method gen/2: 52929.
Best time for baby step method gen/4: 40725.
Best time for baby step method gen/8: 40590.
Best time for baby step method gen/1: 58581.
Best time for giant step method gen/2: 26415.
Best time for giant step method gen/4: 25101.
Best time for giant step method gen/8: 25092.
Best time for giant step method gen/1: 33444.
Best time for ladder method gen/2: 1467.
Best time for ladder method gen/4: 1134.
Best time for ladder method gen/8: 1269.
Best time for ladder method gen/1: 2547.
Best time for ladder method add/1: 2943.
Using baby step method gen/8, giant step method gen/8, ladder method gen/4.
Resuming from checkpoint pmin=9452855837203 in `checkpoint.txt'.
Using 1024Kb for the Sieve of Eratosthenes bitmap.
Expecting to find factors for about 9768.73 terms in this range.
sr2sieve started: 1000000 <= n <= 1999997, 9452855837203 <= p <= 10000000000000
p=9452986780493, 2195598 p/sec, 8132 factors, 87.84% done, 221 sec/factor
"FPU" version is working and it's not so bad in terms of speed
Code:
cruelty@c2d-low:~/Desktop/LLR-low/tests$ ./sr2sieve-fpu -vv
sr2sieve 1.5.8 -- A sieve for multiple sequences k*b^n+/-1.
L1 data cache 32Kb (detected), L2 cache 2048Kb (detected).
Read 489119 terms for 10 sequences from ABCD format file `sr2data.txt'.
Split 10 base 2 sequences into 255 base 2^60 subsequences.
Loaded Legendre symbol lookup tables for 10 sequences from `sr2cache.bin'.
Using 16 Kb for the baby-steps giant-steps hashtable, maximum density 0.25.
Best time for baby step method gen/2: 54351.
Best time for baby step method gen/4: 41274.
Best time for baby step method gen/8: 38502.
Best time for baby step method gen/1: 82296.
Best time for giant step method gen/2: 30744.
Best time for giant step method gen/4: 28467.
Best time for giant step method gen/8: 28890.
Best time for giant step method gen/1: 41715.
Best time for ladder method gen/2: 1773.
Best time for ladder method gen/4: 1251.
Best time for ladder method gen/8: 1341.
Best time for ladder method gen/1: 3249.
Best time for ladder method add/1: 3924.
Using baby step method gen/8, giant step method gen/4, ladder method gen/4.
Resuming from checkpoint pmin=9453008566493 in `checkpoint.txt'.
Using 1024Kb for the Sieve of Eratosthenes bitmap.
Expecting to find factors for about 9768.73 terms in this range.
sr2sieve started: 1000000 <= n <= 1999997, 9453008566493 <= p <= 10000000000000
p=9453036880669, 1938265 p/sec, 8133 factors, 87.85% done, 250 sec/factor
Cruelty is offline   Reply With Quote
Old 2007-06-26, 03:46   #339
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13·89 Posts
Default

Quote:
Originally Posted by Cruelty View Post
Version 1.5.8 is marginally faster than 1.5.7:
Code:
./sr2sieve -vv
sr2sieve 1.5.7 -- A sieve for multiple sequences k*b^n+/-1.
p=9452986387147, 2188122 p/sec, 8132 factors, 87.84% done, 222 sec/factor
Code:
./sr2sieve -vv
sr2sieve 1.5.8 -- A sieve for multiple sequences k*b^n+/-1.
p=9452986780493, 2195598 p/sec, 8132 factors, 87.84% done, 221 sec/factor
That small improvement probably just comes from loop unrolling, the gen/8 methods were implemented just by doing the gen/4 method twice per loop.

Quote:
"FPU" version is working and it's not so bad in terms of speed
Code:
cruelty@c2d-low:~/Desktop/LLR-low/tests$ ./sr2sieve-fpu -vv
sr2sieve 1.5.8 -- A sieve for multiple sequences k*b^n+/-1.
p=9453036880669, 1938265 p/sec, 8133 factors, 87.85% done, 250 sec/factor
I might try to translate rogue's ppc64 mulmods to the x86-64, to see how they compare.
geoff is offline   Reply With Quote
Old 2007-06-28, 02:07   #340
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13·89 Posts
Default

In sr2sieve version 1.5.10 (x86-64 only) there are two new binaries: sr2sieve-int1 and sr2sieve-int2. These are attempts to use rogue's ppc64 mulmod (which doesn't use floating point) on the x86-64, it would be interesting to see how they compare to sr2sieve-fpu. (If they work at all, that is :-).

The main problem translating the ppc64 code to x86-64 is that there are not enough general registers to get maximum benefit from unrolling the loops, so I expect the -int binaries will be slower. But if they are not too much slower then it may be worthwhile trying to do some of the operations in SSE registers to get around that.
geoff is offline   Reply With Quote
Old 2007-06-28, 05:31   #341
Cruelty
 
Cruelty's Avatar
 
May 2005

23·7·29 Posts
Default

Attached are benchmarks for all variants of 1.5.10.
Attached Files
File Type: txt bench.1.5.10.txt (5.8 KB, 108 views)
Cruelty is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Very Prime Riesel and Sierpinski k robert44444uk Open Projects 587 2016-11-13 15:26
Sierpinski/ Riesel bases 6 to 18 robert44444uk Conjectures 'R Us 139 2007-12-17 05:17
Sierpinski/Riesel Base 10 rogue Conjectures 'R Us 11 2007-12-17 05:08
Sierpinski / Riesel - Base 23 michaf Conjectures 'R Us 2 2007-12-17 05:04
Sierpinski / Riesel - Base 22 michaf Conjectures 'R Us 49 2007-12-17 05:03

All times are UTC. The time now is 12:11.


Mon Aug 2 12:11:41 UTC 2021 up 10 days, 6:40, 0 users, load averages: 1.28, 1.49, 1.46

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.