mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Sierpinski/Riesel Base 5 (https://www.mersenneforum.org/forumdisplay.php?f=54)
-   -   A multiple k/c sieve for Sierpinski/Riesel problems (https://www.mersenneforum.org/showthread.php?t=5785)

geoff 2007-06-29 02:55

[QUOTE=Cruelty;109189]Attached are benchmarks for all variants of 1.5.10.[/QUOTE]
Thanks for those benchmarks, It is a pity the -int versions are so much slower, but at least I know not to spend any more time on that idea :-)

geoff 2007-07-02 02:50

sr5sieve 1.5.12
 
The x86-64 binary now has two seperate code paths: For factors up to 2^52 it uses SSE2, for factors between 2^52 and 2^62 it uses the (slower) x87 FPU. The choice of code path is automatic, but for testing purposes the `--no-sse2' switch will force it to use the FPU code path.

This is mainly for the benefit of the RieselSieve project, which at the current rate of sieving could be crossing the 2^52 boundary in a few months. Any testing before then will be much appreciated.

There have been a number of small improvements to the ppc-64 build since 1.5.10, these just need testing to make sure no new bugs have crept in.

There haven't been any significant changes to the x86 builds since 1.5.6.

This version contains most of the changes I wanted to make in 1.5.x, and I will mainly concentrate on fixing bugs from now on.

Cruelty 2007-07-02 17:52

1 Attachment(s)
1.5.12 is slower than 1.5.10 by ~13% :sad:

geoff 2007-07-05 03:40

[QUOTE=Cruelty;109476]1.5.12 is slower than 1.5.10 by ~13% :sad:[/QUOTE]
I can't figure out how this could happen, as there has been no change to the SSE2 code since 1.5.10. The only possibility I can think of are that somehow the wrong code path is being used, could you try running with the --no-sse2 switch to test the other code path?

The one area that 1.5.12 should be much slower in is verifying the factors it finds, but this should only have a noticable effect when there are hundreds of factors per second being found.

Cruelty 2007-07-05 18:00

1 Attachment(s)
There is virtually no change when using "--no-sse2" switch - it has to be something different.

geoff 2007-07-05 22:55

sr5sieve 1.5.13
 
This version fixes the bug in 1.5.12 that caused the x86-64 binary to always use the non-SSE2 code path, even when it said it was using the SSE2 path :-).

Cruelty 2007-07-06 06:02

1 Attachment(s)
Here is comparison of 1.5.10 and 1.5.13. What is "BSGS range..."?

geoff 2007-07-07 00:02

[QUOTE=Cruelty;109729]Here is comparison of 1.5.10 and 1.5.13. What is "BSGS range..."?[/QUOTE]

BSGS range: 92*91 - 1042*8

This means that in the extreme cases it might do 92 baby steps and 91 giant steps, or 1042 baby steps and 8 giant steps.

geoff 2007-07-09 22:56

sr5sieve 1.5.14, sr1sieve 1.1.6
 
In 1.5.14 I have improved the sse2/8 and sse2/16 methods a little, they now avoid doing more than 4 extra mulmods. This increases the number of branches in the code, but most of them are predictable, and since it works out a bit faster on my P4 I assume newer machines with better branch prediction or shorter pipelines will not suffer. Here are the times for my P4:
[code]
19k SoB.dat 68k riesel.dat 237k sr5data.txt
----------- -------------- ----------------
sr2sieve-intel 1.5.6 425 kp/s 223 kp/s 98 kp/s
sr2sieve-intel 1.5.14 455 kp/s 229 kp/s 99 kp/s
[/code]

sr1sieve 1.1.6 is updated with most of the recent changes for ppc64 and x86-64, except it is still limited to sieve depth of 2^52 on x86-64. I don't plan to extend that unless someone has a real need for it.

Cruelty 2007-07-10 22:10

1 Attachment(s)
Attached is comparison for linux.x86-64 binaries.
Virtually no change for sr2sieve, and 2.6% improvement for sr1sieve :tu:

geoff 2007-07-12 22:59

sr5sieve 1.5.15
 
This version extends the improvements in 1.5.14 to the non-SSE2 x86 mulmod code. I hope it'll be a little faster on newer machines. It is only fractionally faster on my P3, but better on my P4 (tested with SSE2 disabled):
[code]
19k SoB.dat 68k riesel.dat 237k sr5data.txt
----------- -------------- ----------------
1.5.14 --no-sse2 260 kp/s 142 kp/s 69 kp/s
1.5.15 --no-sse2 286 kp/s 151 kp/s 71 kp/s
[/code]


All times are UTC. The time now is 22:20.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.