![]() |
|
|
#331 |
|
May 2005
23×7×29 Posts |
I've just noticed that sr2sieve v.1.5.6 (linux x86-64) chooses hashtable size of 512kB instead of previously selected 1024kB (1.5.5). When I tried forcing -H 1024 I get the following error:
Code:
Wed Jun 20 20:25:24 2007 ERROR: Hashtable size 1024 Kb is too large, recompile with SHORT_HASHTABLE=0 Last fiddled with by Cruelty on 2007-06-20 at 18:39 |
|
|
|
|
|
#332 | |
|
Mar 2003
New Zealand
13·89 Posts |
Quote:
Previous versions used 1/2 L2 cache for the bitmap, but new machines coming out with massive L2 cache sizes are going to cause some problems. In version 1.5.6 I limited the bitmap size to 512Kb, but if that is too small I could increase the limit as far as 4Mb. If you run with -L512 that will cause a 256Kb bitmap to be used, could you check whether there is a difference in speed from using 256Kb vs 512Kb? One problem is that as larger bitmaps are used the p/sec times may become bumpy, you may need to take the average over a few reports to get a good comparison. |
|
|
|
|
|
|
#333 |
|
May 2005
23·7·29 Posts |
Attached are benchmarks for 1.5.5 and 1.5.6 - 1.5.5 is faster.
|
|
|
|
|
|
#334 |
|
Mar 2003
New Zealand
48516 Posts |
It is necessary to compare the actual run times. The benchmark timer was changed in 1.5.6 to flush the pipeline before starting the benchmark, which should give more consistent times on some machines (like my P4/Celeron), but all times will now be a little longer. Also the Sieve of Eratosthenes is not part of the benchmarks, so the bitmap size will have no effect on them.
In version 1.5.7 I have increased the bitmap limit to 2Mb anyway, so it will only come into effect for machines with L2 cache size greater than 4Mb. I am a little worried that the large bitmap will have a negative effect on multi-core machines, but I will leave it up to the user to lower the limit with the -L switch if that is the case. Also in version 1.5.7 for the x86-64 build there is a sr2sieve-fpu binary which can sieve up to 2^62 (the standard x86-64 binary is limited to 2^52). I expect the sr2sieve-fpu binary will be slower, but it would be interesting to know just how much slower. |
|
|
|
|
|
#335 |
|
May 2005
162410 Posts |
Indeed initial benchmark still favours 1.5.5 however "real" results are better using 1.5.7 (~3.5%).
Then the "fpu" version produces some really strange results + generates an error... Code:
./sr2sieve-fpu -vv sr2sieve 1.5.7 -- A sieve for multiple sequences k*b^n+/-1. L1 data cache 32Kb (detected), L2 cache 2048Kb (detected). Read 489119 terms for 10 sequences from ABCD format file `sr2data.txt'. Split 10 base 2 sequences into 255 base 2^60 subsequences. Loaded Legendre symbol lookup tables for 10 sequences from `sr2cache.bin'. Using 16 Kb for the baby-steps giant-steps hashtable, maximum density 0.25. Best time for baby step method gen/2: 919350. Best time for baby step method gen/4: 914373. Best time for baby step method gen/1: 18288. Best time for giant step method gen/2: 802197. Best time for giant step method gen/4: 779670. Best time for giant step method gen/1: 787527. Best time for ladder method gen/2: 1791. Best time for ladder method gen/4: 1269. Best time for ladder method gen/1: 3285. Best time for ladder method add/1: 3924. Using baby step method gen/1, giant step method gen/4, ladder method gen/4. Resuming from checkpoint pmin=9204118684829 in `checkpoint.txt'. Using 1024Kb for the Sieve of Eratosthenes bitmap. Expecting to find factors for about 9768.73 terms in this range. sr2sieve started: 1000000 <= n <= 1999997, 9204118684829 <= p <= 10000000000000 Segmentation fault (core dumped) Last fiddled with by Cruelty on 2007-06-22 at 21:10 |
|
|
|
|
|
#336 |
|
Mar 2003
New Zealand
115710 Posts |
|
|
|
|
|
|
#337 |
|
Mar 2003
New Zealand
22058 Posts |
I have fixed a corruption of the FPU stack in the sr2sieve-fpu build, but there may be other bugs remaining. Use sr2sieve-fpu for tesing only for now.
In the standard x86-64 build I have enabled gen/8 methods. There are not enough general registers to do 8 mulmods in parallel, but they may still be faster than gen/4 methods in some cases. |
|
|
|
|
|
#338 |
|
May 2005
31308 Posts |
Version 1.5.8 is marginally faster than 1.5.7:
Code:
./sr2sieve -vv sr2sieve 1.5.7 -- A sieve for multiple sequences k*b^n+/-1. L1 data cache 32Kb (detected), L2 cache 2048Kb (detected). Read 489119 terms for 10 sequences from ABCD format file `sr2data.txt'. Split 10 base 2 sequences into 255 base 2^60 subsequences. Loaded Legendre symbol lookup tables for 10 sequences from `sr2cache.bin'. Using 16 Kb for the baby-steps giant-steps hashtable, maximum density 0.25. Best time for baby step method gen/2: 52875. Best time for baby step method gen/4: 40599. Best time for baby step method gen/1: 58644. Best time for giant step method gen/2: 26460. Best time for giant step method gen/4: 25011. Best time for giant step method gen/1: 33462. Best time for ladder method gen/2: 1467. Best time for ladder method gen/4: 1134. Best time for ladder method gen/1: 2538. Best time for ladder method add/1: 2943. Using baby step method gen/4, giant step method gen/4, ladder method gen/4. Resuming from checkpoint pmin=9452855837203 in `checkpoint.txt'. Using 1024Kb for the Sieve of Eratosthenes bitmap. Expecting to find factors for about 9768.73 terms in this range. sr2sieve started: 1000000 <= n <= 1999997, 9452855837203 <= p <= 10000000000000 p=9452986387147, 2188122 p/sec, 8132 factors, 87.84% done, 222 sec/factor Code:
./sr2sieve -vv sr2sieve 1.5.8 -- A sieve for multiple sequences k*b^n+/-1. L1 data cache 32Kb (detected), L2 cache 2048Kb (detected). Read 489119 terms for 10 sequences from ABCD format file `sr2data.txt'. Split 10 base 2 sequences into 255 base 2^60 subsequences. Loaded Legendre symbol lookup tables for 10 sequences from `sr2cache.bin'. Using 16 Kb for the baby-steps giant-steps hashtable, maximum density 0.25. Best time for baby step method gen/2: 52929. Best time for baby step method gen/4: 40725. Best time for baby step method gen/8: 40590. Best time for baby step method gen/1: 58581. Best time for giant step method gen/2: 26415. Best time for giant step method gen/4: 25101. Best time for giant step method gen/8: 25092. Best time for giant step method gen/1: 33444. Best time for ladder method gen/2: 1467. Best time for ladder method gen/4: 1134. Best time for ladder method gen/8: 1269. Best time for ladder method gen/1: 2547. Best time for ladder method add/1: 2943. Using baby step method gen/8, giant step method gen/8, ladder method gen/4. Resuming from checkpoint pmin=9452855837203 in `checkpoint.txt'. Using 1024Kb for the Sieve of Eratosthenes bitmap. Expecting to find factors for about 9768.73 terms in this range. sr2sieve started: 1000000 <= n <= 1999997, 9452855837203 <= p <= 10000000000000 p=9452986780493, 2195598 p/sec, 8132 factors, 87.84% done, 221 sec/factor Code:
cruelty@c2d-low:~/Desktop/LLR-low/tests$ ./sr2sieve-fpu -vv sr2sieve 1.5.8 -- A sieve for multiple sequences k*b^n+/-1. L1 data cache 32Kb (detected), L2 cache 2048Kb (detected). Read 489119 terms for 10 sequences from ABCD format file `sr2data.txt'. Split 10 base 2 sequences into 255 base 2^60 subsequences. Loaded Legendre symbol lookup tables for 10 sequences from `sr2cache.bin'. Using 16 Kb for the baby-steps giant-steps hashtable, maximum density 0.25. Best time for baby step method gen/2: 54351. Best time for baby step method gen/4: 41274. Best time for baby step method gen/8: 38502. Best time for baby step method gen/1: 82296. Best time for giant step method gen/2: 30744. Best time for giant step method gen/4: 28467. Best time for giant step method gen/8: 28890. Best time for giant step method gen/1: 41715. Best time for ladder method gen/2: 1773. Best time for ladder method gen/4: 1251. Best time for ladder method gen/8: 1341. Best time for ladder method gen/1: 3249. Best time for ladder method add/1: 3924. Using baby step method gen/8, giant step method gen/4, ladder method gen/4. Resuming from checkpoint pmin=9453008566493 in `checkpoint.txt'. Using 1024Kb for the Sieve of Eratosthenes bitmap. Expecting to find factors for about 9768.73 terms in this range. sr2sieve started: 1000000 <= n <= 1999997, 9453008566493 <= p <= 10000000000000 p=9453036880669, 1938265 p/sec, 8133 factors, 87.85% done, 250 sec/factor |
|
|
|
|
|
#339 | ||
|
Mar 2003
New Zealand
13·89 Posts |
Quote:
Quote:
|
||
|
|
|
|
|
#340 |
|
Mar 2003
New Zealand
13·89 Posts |
In sr2sieve version 1.5.10 (x86-64 only) there are two new binaries: sr2sieve-int1 and sr2sieve-int2. These are attempts to use rogue's ppc64 mulmod (which doesn't use floating point) on the x86-64, it would be interesting to see how they compare to sr2sieve-fpu. (If they work at all, that is :-).
The main problem translating the ppc64 code to x86-64 is that there are not enough general registers to get maximum benefit from unrolling the loops, so I expect the -int binaries will be slower. But if they are not too much slower then it may be worthwhile trying to do some of the operations in SSE registers to get around that. |
|
|
|
|
|
#341 |
|
May 2005
23·7·29 Posts |
Attached are benchmarks for all variants of 1.5.10.
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Very Prime Riesel and Sierpinski k | robert44444uk | Open Projects | 587 | 2016-11-13 15:26 |
| Sierpinski/ Riesel bases 6 to 18 | robert44444uk | Conjectures 'R Us | 139 | 2007-12-17 05:17 |
| Sierpinski/Riesel Base 10 | rogue | Conjectures 'R Us | 11 | 2007-12-17 05:08 |
| Sierpinski / Riesel - Base 23 | michaf | Conjectures 'R Us | 2 | 2007-12-17 05:04 |
| Sierpinski / Riesel - Base 22 | michaf | Conjectures 'R Us | 49 | 2007-12-17 05:03 |