![]() |
|
|
#375 |
|
Feb 2007
33×5 Posts |
Heating season is coming up, so I have a good reason to dust off the old BP6 dual Celeron.
Did some benchmarks running Fedora 7: Code:
sr2sieve 1.6.5 89376 p/sec sr2sieve 1.4.42 intel 83988 p/sec sr2sieve 1.4.42 amd 83667 p/sec sr2sieve 1.5.20 intel 88587 p/sec sr2sieve 1.5.20 amd 89993 p/sec sr2sieve 1.6.5 89376 p/sec sr2sieve 1.6.6 89397 p/sec JJsieveCMOV6.exe 76 kp/s (through wine) |
|
|
|
|
|
#376 |
|
Mar 2003
New Zealand
48516 Posts |
In this version I optimised the hashtable code a bit.
|
|
|
|
|
|
#377 |
|
May 2005
110010110002 Posts |
Attached you will find comparison between sr1 and sr2 sieves on Linux-x86-64. ~10% speed increase for sr2 and ~2% for sr1
|
|
|
|
|
|
#378 |
|
Mar 2003
New Zealand
100100001012 Posts |
This version has a new giant step method for the x86-64, it is my first attempt to combine the mulmods and hashtable lookups in one pass. It is a little faster for Core 2, but untested on Athlon 64.
The idea behind doing the hashtable lookups at the same time as the mulmods is that it will give the Athlon 64 CPU something useful to do while waiting for the high latency integer/floating point conversions to finish. (These are not a bottleneck on the Core 2). But the problem is that the hashtable code contains a lot of branches, and the branches are only predictable when the hashtable density is low, so it is not really clear whether it will pay off in practice. |
|
|
|
|
|
#379 |
|
Feb 2007
33·5 Posts |
1.6.9:
p=688770037701469, 1070697 p/sec, 29 factors, 99.1% cpu, 16748 sec/factor [root@Athlon64 ~]# 1008239 p/sec, 27 factors, 58.0% done, ETA 17 Oct 01:40 A good 15% improvement on my opteron and A64, thanks! More than 1 Mp/s on each core for the first time. |
|
|
|
|
|
#380 |
|
Mar 2003
New Zealand
13·89 Posts |
Great! With a bit of luck we should get a similar improvement when the baby step mulmods ae combined with the hashtable insertions. I plan to make these changes for the 32-bit versions as well, but the gains will probably be less unless I can figure out a way to employ SSE2 for the hashtable operations.
|
|
|
|
|
|
#381 |
|
A Sunny Moo
Aug 2007
USA (GMT-5)
624910 Posts |
In light of the recent sieve file update, I was wondering, when updating the sieve file for sr5sieve, if you've got the Legendre symbol tables cached by using the -c option the first time you ran sr5sieve, do you have to delete the sr5cache.bin file, and run it with the -c switch again, to generate the cached file again? Or, does the sieve file have no effect on the sr5cache.bin file, and thus not need to be re-generated when a new sieve file comes out?
|
|
|
|
|
|
#382 | |
|
Mar 2003
New Zealand
13·89 Posts |
Quote:
The cache file stores information about each k,c pair. This information doesn't change when terms are deleted from the sieve file. Regenerating the cache file will just remove the redundant entries for those k,c that have been removed from the sieve file. A note for those sieving with SoB.dat and riesel.dat: If you run `sr2sieve -rs -c' once it will generate a combined cache file that can be used for either SoB.dat or riesel.dat. (Stop sieving with ctrl-c once it has been generated). |
|
|
|
|
|
|
#383 | |
|
A Sunny Moo
Aug 2007
USA (GMT-5)
3×2,083 Posts |
Quote:
I'll keep that in mind next time a new dat file comes out (or if a new prime is found, in which case I'll remove it manually).
|
|
|
|
|
|
|
#384 |
|
Mar 2003
New Zealand
100100001012 Posts |
This version results from a lot of cut-and-paste of the latest sr5sieve code, and so could contain bugs. Use the latest 1.1.x version instead if you have problems. New since version 1.1.12:
* -e switch reports speeds in elapsed time instead of cpu time. * Single x86 executable with seperate AMD and Intel code paths: --amd or --intel switches can be used to override the automatic code path selection. * x86-64 executable can now sieve to p=2^62. The x87 FPU will be used for p > 2^51. The --no-sse2 switch forces use of the x87 FPU for p < 2^51. * New hashtable code should benefit all x86/x86-64 machines. New giant steps method (new/4) and baby steps method (gen/6) for x86-64 should benefit Athlon 64. |
|
|
|
|
|
#385 |
|
Oct 2006
On a Suzuki Boulevard C90
2×3×41 Posts |
Geoff, There's a compilation error in mulmod-ppc64.c, line 8, undeclared variable 'p'. I think it's just a copy-and-paste, where the function parameter is named 'b' and and should be 'p' instead. At least that's the change I made. :)
FYI, here's some startup info for v1.1.12 and v1.2.0 on a 970MP: Code:
sr1sieve 1.1.12 -- A sieve for one sequence k*b^n+/-1. L1 data cache 32Kb (default), L2 cache 1024Kb (detected). Read 141012 terms for 5*2^n-1 from NewPGen file `5sheep_840.txt'. Split 1 base 2 sequence into 61 base 2^180 subsequences. Using 0 Kb for Legendre symbol tables. BSGS range: 133*132 - 1033*17. Using 16 Kb for the baby-steps giant-steps hashtable, maximum density 0.13. Best time for baby step method gen/1: 164. Best time for baby step method gen/2: 182. Best time for baby step method gen/4: 155. Best time for baby step method gen/8: 152. Best time for giant step method gen/1: 159. Best time for giant step method gen/2: 153. Best time for giant step method gen/4: 152. Best time for giant step method gen/8: 150. Baby step method gen/8, giant step method gen/8. Using 512Kb for the Sieve of Eratosthenes bitmap. Code:
sr1sieve 1.2.0 -- A sieve for one sequence k*b^n+/-1. Compiled on Oct 20 2007 with GCC version 4.1.1. L1 data cache 32Kb (default), L2 cache 1024Kb (detected). Read 141012 terms for 5*2^n-1 from NewPGen file `5sheep_840.txt'. Split 1 base 2 sequence into 61 base 2^180 subsequences. Using 0 Kb for Legendre symbol tables. BSGS range: 133*132 - 1033*17. Using 16 Kb for the baby-steps giant-steps hashtable, maximum density 0.13. Best time for baby step method gen/2: 188. Best time for baby step method gen/4: 162. Best time for baby step method gen/8: 154. Best time for giant step method gen/2: 146. Best time for giant step method gen/4: 141. Best time for giant step method gen/8: 139. Baby step method gen/8, giant step method gen/8. Using 512Kb for the Sieve of Eratosthenes bitmap. |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Very Prime Riesel and Sierpinski k | robert44444uk | Open Projects | 587 | 2016-11-13 15:26 |
| Sierpinski/ Riesel bases 6 to 18 | robert44444uk | Conjectures 'R Us | 139 | 2007-12-17 05:17 |
| Sierpinski/Riesel Base 10 | rogue | Conjectures 'R Us | 11 | 2007-12-17 05:08 |
| Sierpinski / Riesel - Base 23 | michaf | Conjectures 'R Us | 2 | 2007-12-17 05:04 |
| Sierpinski / Riesel - Base 22 | michaf | Conjectures 'R Us | 49 | 2007-12-17 05:03 |