![]() |
|
|
#342 |
|
Mar 2003
New Zealand
115710 Posts |
|
|
|
|
|
|
#343 |
|
Mar 2003
New Zealand
13×89 Posts |
The x86-64 binary now has two seperate code paths: For factors up to 2^52 it uses SSE2, for factors between 2^52 and 2^62 it uses the (slower) x87 FPU. The choice of code path is automatic, but for testing purposes the `--no-sse2' switch will force it to use the FPU code path.
This is mainly for the benefit of the RieselSieve project, which at the current rate of sieving could be crossing the 2^52 boundary in a few months. Any testing before then will be much appreciated. There have been a number of small improvements to the ppc-64 build since 1.5.10, these just need testing to make sure no new bugs have crept in. There haven't been any significant changes to the x86 builds since 1.5.6. This version contains most of the changes I wanted to make in 1.5.x, and I will mainly concentrate on fixing bugs from now on. |
|
|
|
|
|
#344 |
|
May 2005
23·7·29 Posts |
1.5.12 is slower than 1.5.10 by ~13%
Last fiddled with by Cruelty on 2007-07-02 at 17:53 |
|
|
|
|
|
#345 |
|
Mar 2003
New Zealand
13·89 Posts |
I can't figure out how this could happen, as there has been no change to the SSE2 code since 1.5.10. The only possibility I can think of are that somehow the wrong code path is being used, could you try running with the --no-sse2 switch to test the other code path?
The one area that 1.5.12 should be much slower in is verifying the factors it finds, but this should only have a noticable effect when there are hundreds of factors per second being found. |
|
|
|
|
|
#346 |
|
May 2005
110010110002 Posts |
There is virtually no change when using "--no-sse2" switch - it has to be something different.
|
|
|
|
|
|
#347 |
|
Mar 2003
New Zealand
13×89 Posts |
This version fixes the bug in 1.5.12 that caused the x86-64 binary to always use the non-SSE2 code path, even when it said it was using the SSE2 path :-).
|
|
|
|
|
|
#348 |
|
May 2005
23×7×29 Posts |
Here is comparison of 1.5.10 and 1.5.13. What is "BSGS range..."?
|
|
|
|
|
|
#349 |
|
Mar 2003
New Zealand
13·89 Posts |
|
|
|
|
|
|
#350 |
|
Mar 2003
New Zealand
13·89 Posts |
In 1.5.14 I have improved the sse2/8 and sse2/16 methods a little, they now avoid doing more than 4 extra mulmods. This increases the number of branches in the code, but most of them are predictable, and since it works out a bit faster on my P4 I assume newer machines with better branch prediction or shorter pipelines will not suffer. Here are the times for my P4:
Code:
19k SoB.dat 68k riesel.dat 237k sr5data.txt
----------- -------------- ----------------
sr2sieve-intel 1.5.6 425 kp/s 223 kp/s 98 kp/s
sr2sieve-intel 1.5.14 455 kp/s 229 kp/s 99 kp/s
|
|
|
|
|
|
#351 |
|
May 2005
23·7·29 Posts |
Attached is comparison for linux.x86-64 binaries.
Virtually no change for sr2sieve, and 2.6% improvement for sr1sieve |
|
|
|
|
|
#352 |
|
Mar 2003
New Zealand
100100001012 Posts |
This version extends the improvements in 1.5.14 to the non-SSE2 x86 mulmod code. I hope it'll be a little faster on newer machines. It is only fractionally faster on my P3, but better on my P4 (tested with SSE2 disabled):
Code:
19k SoB.dat 68k riesel.dat 237k sr5data.txt
----------- -------------- ----------------
1.5.14 --no-sse2 260 kp/s 142 kp/s 69 kp/s
1.5.15 --no-sse2 286 kp/s 151 kp/s 71 kp/s
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Very Prime Riesel and Sierpinski k | robert44444uk | Open Projects | 587 | 2016-11-13 15:26 |
| Sierpinski/ Riesel bases 6 to 18 | robert44444uk | Conjectures 'R Us | 139 | 2007-12-17 05:17 |
| Sierpinski/Riesel Base 10 | rogue | Conjectures 'R Us | 11 | 2007-12-17 05:08 |
| Sierpinski / Riesel - Base 23 | michaf | Conjectures 'R Us | 2 | 2007-12-17 05:04 |
| Sierpinski / Riesel - Base 22 | michaf | Conjectures 'R Us | 49 | 2007-12-17 05:03 |