mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Sierpinski/Riesel Base 5 (https://www.mersenneforum.org/forumdisplay.php?f=54)
-   -   A multiple k/c sieve for Sierpinski/Riesel problems (https://www.mersenneforum.org/showthread.php?t=5785)

KriZp 2007-10-05 00:14

1 Attachment(s)
Heating season is coming up, so I have a good reason to dust off the old BP6 dual Celeron.
Did some benchmarks running Fedora 7:
[code]
sr2sieve 1.6.5 89376 p/sec
sr2sieve 1.4.42 intel 83988 p/sec
sr2sieve 1.4.42 amd 83667 p/sec
sr2sieve 1.5.20 intel 88587 p/sec
sr2sieve 1.5.20 amd 89993 p/sec
sr2sieve 1.6.5 89376 p/sec
sr2sieve 1.6.6 89397 p/sec
JJsieveCMOV6.exe 76 kp/s (through wine)[/code]

verbose (-vv) output in attached file for anyone interested.

geoff 2007-10-08 01:31

sr2sieve/sr5sieve 1.6.7
 
In this version I optimised the hashtable code a bit.

Cruelty 2007-10-09 20:52

1 Attachment(s)
Attached you will find comparison between sr1 and sr2 sieves on Linux-x86-64. ~10% speed increase for sr2 and ~2% for sr1 :tu:

geoff 2007-10-13 22:22

sr2sieve/sr5sieve 1.6.9
 
This version has a new giant step method for the x86-64, it is my first attempt to combine the mulmods and hashtable lookups in one pass. It is a little faster for Core 2, but untested on Athlon 64.

The idea behind doing the hashtable lookups at the same time as the mulmods is that it will give the Athlon 64 CPU something useful to do while waiting for the high latency integer/floating point conversions to finish. (These are not a bottleneck on the Core 2). But the problem is that the hashtable code contains a lot of branches, and the branches are only predictable when the hashtable density is low, so it is not really clear whether it will pay off in practice.

KriZp 2007-10-14 02:09

1.6.9:
p=688770037701469, 1070697 p/sec, 29 factors, 99.1% cpu, 16748 sec/factor
[root@Athlon64 ~]# 1008239 p/sec, 27 factors, 58.0% done, ETA 17 Oct 01:40

A good 15% improvement on my opteron and A64, thanks! More than 1 Mp/s on each core for the first time.

geoff 2007-10-16 05:38

[QUOTE=KriZp;116305]1.6.9:
p=688770037701469, 1070697 p/sec, 29 factors, 99.1% cpu, 16748 sec/factor
[root@Athlon64 ~]# 1008239 p/sec, 27 factors, 58.0% done, ETA 17 Oct 01:40

A good 15% improvement on my opteron and A64, thanks! More than 1 Mp/s on each core for the first time.[/QUOTE]

Great! With a bit of luck we should get a similar improvement when the baby step mulmods ae combined with the hashtable insertions. I plan to make these changes for the 32-bit versions as well, but the gains will probably be less unless I can figure out a way to employ SSE2 for the hashtable operations.

mdettweiler 2007-10-16 15:36

In light of the recent sieve file update, I was wondering, when updating the sieve file for sr5sieve, if you've got the Legendre symbol tables cached by using the -c option the first time you ran sr5sieve, do you have to delete the sr5cache.bin file, and run it with the -c switch again, to generate the cached file again? Or, does the sieve file have no effect on the sr5cache.bin file, and thus not need to be re-generated when a new sieve file comes out?

geoff 2007-10-18 05:16

[QUOTE=Anonymous;116476]In light of the recent sieve file update, I was wondering, when updating the sieve file for sr5sieve, if you've got the Legendre symbol tables cached by using the -c option the first time you ran sr5sieve, do you have to delete the sr5cache.bin file, and run it with the -c switch again, to generate the cached file again? Or, does the sieve file have no effect on the sr5cache.bin file, and thus not need to be re-generated when a new sieve file comes out?[/QUOTE]

It is not necessary to regenerate the cache file, unless you want to save a little disk space.

The cache file stores information about each k,c pair. This information doesn't change when terms are deleted from the sieve file. Regenerating the cache file will just remove the redundant entries for those k,c that have been removed from the sieve file.

A note for those sieving with SoB.dat and riesel.dat: If you run `sr2sieve -rs -c' once it will generate a combined cache file that can be used for either SoB.dat or riesel.dat. (Stop sieving with ctrl-c once it has been generated).

mdettweiler 2007-10-18 14:18

[quote=geoff;116583]It is not necessary to regenerate the cache file, unless you want to save a little disk space.

The cache file stores information about each k,c pair. This information doesn't change when terms are deleted from the sieve file. Regenerating the cache file will just remove the redundant entries for those k,c that have been removed from the sieve file.

A note for those sieving with SoB.dat and riesel.dat: If you run `sr2sieve -rs -c' once it will generate a combined cache file that can be used for either SoB.dat or riesel.dat. (Stop sieving with ctrl-c once it has been generated).[/quote]
Okay, thanks! :smile: I'll keep that in mind next time a new dat file comes out (or if a new prime is found, in which case I'll remove it manually).

geoff 2007-10-19 22:02

sr1sieve 1.2.0
 
This version results from a lot of cut-and-paste of the latest sr5sieve code, and so could contain bugs. Use the latest 1.1.x version instead if you have problems. New since version 1.1.12:

* -e switch reports speeds in elapsed time instead of cpu time.

* Single x86 executable with seperate AMD and Intel code paths: --amd or --intel switches can be used to override the automatic code path selection.

* x86-64 executable can now sieve to p=2^62. The x87 FPU will be used for p > 2^51. The --no-sse2 switch forces use of the x87 FPU for p < 2^51.

* New hashtable code should benefit all x86/x86-64 machines. New giant steps method (new/4) and baby steps method (gen/6) for x86-64 should benefit Athlon 64.

BlisteringSheep 2007-10-20 20:49

Geoff, There's a compilation error in mulmod-ppc64.c, line 8, undeclared variable 'p'. I think it's just a copy-and-paste, where the function parameter is named 'b' and and should be 'p' instead. At least that's the change I made. :)

FYI, here's some startup info for v1.1.12 and v1.2.0 on a 970MP:

[CODE]sr1sieve 1.1.12 -- A sieve for one sequence k*b^n+/-1.
L1 data cache 32Kb (default), L2 cache 1024Kb (detected).
Read 141012 terms for 5*2^n-1 from NewPGen file `5sheep_840.txt'.
Split 1 base 2 sequence into 61 base 2^180 subsequences.
Using 0 Kb for Legendre symbol tables.
BSGS range: 133*132 - 1033*17.
Using 16 Kb for the baby-steps giant-steps hashtable, maximum density 0.13.
Best time for baby step method gen/1: 164.
Best time for baby step method gen/2: 182.
Best time for baby step method gen/4: 155.
Best time for baby step method gen/8: 152.
Best time for giant step method gen/1: 159.
Best time for giant step method gen/2: 153.
Best time for giant step method gen/4: 152.
Best time for giant step method gen/8: 150.
Baby step method gen/8, giant step method gen/8.
Using 512Kb for the Sieve of Eratosthenes bitmap.[/CODE]
[CODE]sr1sieve 1.2.0 -- A sieve for one sequence k*b^n+/-1.
Compiled on Oct 20 2007 with GCC version 4.1.1.
L1 data cache 32Kb (default), L2 cache 1024Kb (detected).
Read 141012 terms for 5*2^n-1 from NewPGen file `5sheep_840.txt'.
Split 1 base 2 sequence into 61 base 2^180 subsequences.
Using 0 Kb for Legendre symbol tables.
BSGS range: 133*132 - 1033*17.
Using 16 Kb for the baby-steps giant-steps hashtable, maximum density 0.13.
Best time for baby step method gen/2: 188.
Best time for baby step method gen/4: 162.
Best time for baby step method gen/8: 154.
Best time for giant step method gen/2: 146.
Best time for giant step method gen/4: 141.
Best time for giant step method gen/8: 139.
Baby step method gen/8, giant step method gen/8.
Using 512Kb for the Sieve of Eratosthenes bitmap.[/CODE]


All times are UTC. The time now is 22:20.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.