![]() |
|
|
#166 |
|
May 2005
110010110002 Posts |
Here are my results on C2D@2.4GHz running version 1.5.10 of linux.x86-64 binary
Code:
sr2sieve 1.5.10 -- A sieve for multiple sequences k*b^n+/-1. L1 data cache 32Kb (detected), L2 cache 2048Kb (detected). Read 233216 terms for 14 sequences from dat format file `riesel.dat'. Split 14 base 2 sequences into 425 base 2^60 subsequences. Using 16 Kb for the baby-steps giant-steps hashtable, maximum density 0.16. Best time for baby step method gen/2: 36891. Best time for baby step method gen/4: 28404. Best time for baby step method gen/8: 29061. Best time for baby step method gen/1: 42660. Best time for giant step method gen/2: 18009. Best time for giant step method gen/4: 16902. Best time for giant step method gen/8: 16911. Best time for giant step method gen/1: 22446. Best time for ladder method gen/2: 1467. Best time for ladder method gen/4: 1143. Best time for ladder method gen/8: 1269. Best time for ladder method gen/1: 2538. Best time for ladder method add/1: 2943. Using baby step method gen/4, giant step method gen/4, ladder method gen/4. Using 1024Kb for the Sieve of Eratosthenes bitmap. Expecting to find factors for about 4552.40 terms in this range. sr2sieve started: 680004 <= n <= 1000000, 11000000000000 <= p <= 20000000000000 p=11000122028941, 2040651 p/sec, 0 factors, 0.00% done, ETA 20 Aug 06:13 |
|
|
|
|
|
#167 |
|
Mar 2003
New Zealand
100100001012 Posts |
I have had a look at the code produced for the ppc64, it seems that GCC is unable to move some constants outside of the critical loops due to aliasing issues, and so it does some redundant reloading of registers. In version 1.5.12 I have made a few changes that might help, but the real solution is probably to code the whole loop in assembly, not just the loop body as we do now.
BlisteringSheep: If you can send me the assembled bsgs.s file for 1.5.12 as before, I will check whether the changes have had the expected effect. Last fiddled with by geoff on 2007-07-02 at 02:39 Reason: added attachment |
|
|
|
|
|
#168 |
|
Oct 2006
On a Suzuki Boulevard C90
2·3·41 Posts |
Here are the gcc-4.1.1 and gcc-4.1.2 versions. Everything did compile cleanly. I haven't done correctness or performance testing yet, but will start right away.
A couple of questions:
Last fiddled with by BlisteringSheep on 2007-07-02 at 03:16 Reason: ^riesel.dat^sr5check.txt |
|
|
|
|
|
#169 | ||||
|
Mar 2003
New Zealand
48516 Posts |
Quote:
Quote:
Quote:
Quote:
|
||||
|
|
|
|
|
#170 |
|
Oct 2006
On a Suzuki Boulevard C90
2·3·41 Posts |
sr5check passed with both 100e6-150e6 and 5e9-5.1e9. I will setup a test to run the comprehensive list of methods. Is there any way to tell it to not print the factors to the screen? I'd like to capture the output from -vv, to confirm that I'm passing the command-line flags correctly, but don't really need to see all 10000 factors
.Timing results for riesel.dat and SoB.dat with 1.5.10 EXP 1 vs. 1.5.12. Another measurable speedup. Summary: v1.5.10 riesel.dat: 301294 p/sec v1.5.12 riesel.dat: 314009 p/sec v1.5.10 SoB.dat: 504277 p/sec v1.5.12 SoB.dat: 521006 p/sec Full results with -vv output is attached. I also did some testing with gcc-3.4.6, and gcc-4.1.x is still measurably faster. Last fiddled with by BlisteringSheep on 2007-07-02 at 05:35 Reason: forgot a smiley :) |
|
|
|
|
|
#171 | |
|
Oct 2006
On a Suzuki Boulevard C90
2·3·41 Posts |
Quote:
|
|
|
|
|
|
|
#172 |
|
Mar 2003
New Zealand
13×89 Posts |
Version 1.5.13 has a few small changes, it should be a little faster than 1.5.12, but if not then I will undo them. There are no changes to the assembler routines, so just a brief test should suffice.
I realise now that testing each of the 16 combinations of -Bgen/x and -Ggen/y options was not necessary, just testing the 4 combinations with -Bgen/x and -Ggen/x would have been enough to exercise all the assembler routines. |
|
|
|
|
|
#173 |
|
Oct 2006
On a Suzuki Boulevard C90
24610 Posts |
Passed both of my sr5check tests fine. It is indeed faster (timings from 2.0 GHz 970FX):
1.5.12 SoB.dat: 250971 p/sec 1.5.13 SoB.dat: 258296 p/sec 1.5.12 riesel.dat: 417483 p/sec 1.5.13 riesel.dat: 435247 p/sec sr5check timings: 1.5.12 100e6-150e6: 84.432 cpu sec 1.5.13 100e6-150e6: 80.542 cpu sec 1.5.12 5e9-5.1e9: 141.644 cpu sec 1.5.13 5e9-5.1e9: 135.267 cpu sec Looks like I'll be deploying this version. :) Last fiddled with by BlisteringSheep on 2007-07-06 at 21:32 Reason: added sr5check timings |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| srsieve/sr2sieve enhancements | rogue | Software | 300 | 2021-03-18 20:31 |
| 32-bit of sr1sieve and sr2sieve for Win | pepi37 | Software | 5 | 2013-08-09 22:31 |
| sr2sieve question | SaneMur | Information & Answers | 2 | 2011-08-21 22:04 |
| sr2sieve client | mgpower0 | Prime Sierpinski Project | 54 | 2008-07-15 16:50 |
| How to use sr2sieve | nuggetprime | Riesel Prime Search | 40 | 2007-12-03 06:01 |