![]() |
Voila :smile:
[code]sr1sieve 1.1.1 -- A sieve for one sequence k*b^n+/-1. L1 data cache 32Kb (detected), L2 cache 2048Kb (detected). Read 87511 terms for 4*3^n-1 from NewPGen file `k=4_b=3.txt'. Split 1 base 3 sequence into 32 base 3^90 subsequences. Using 0 Kb for Legendre symbol tables. Using 8 Kb for the baby-steps giant-steps hashtable, maximum density 0.20. Best time for baby step method gen/2: 20196. Best time for baby step method gen/4: 17109. Best time for baby step method gen/1: 23940. Best time for giant step method gen/2: 11529. Best time for giant step method gen/4: 11682. Best time for giant step method gen/1: 16101. Using baby step method gen/4, giant step method gen/2. Using 1024Kb for the Sieve of Eratosthenes bitmap. Expecting to find factors for about 1810.45 terms. sr1sieve started: 200013 <= n <= 1999957, 10613401255319 <= p <= 20000000000000 p=10613920959167, 8678654 p/sec, 0 factors, 0.01% done, ETA 03 Jun 20:31[/code] |
sr1sieve 1.1.2, sr5sieve 1.5.4 x86-64 bugfix
These versions fix a serious bug in the Linux/x86-64 build.
The giant steps method gen/4 did not work correctly in sr5sieve verisons 1.5.1-1.5.3 or sr1sieve versions 1.1.0-1.1.1 To check whether your work was affected, run with the -v switch to get a message like `Using baby step method gen/4, giant step method gen/4, ladder method gen/4.' If the giant step method is reported as gen/4 then the results are invalid. (The bug does not affect the baby steps or ladder methods). I have uploaded a test/benchmark program mulmodk8.zip for the x86-64 code [url=http://www.geocities.com/g_w_reynolds/sr5sieve/testing/]here[/url], it would be helpful if someone could run this on Core2 and AMD64 machines and post the results here. It can be run as ./mulmodk8 and if no error message is printed then all is well. |
Does it affect Linux 64-bit sr2sieve-amd?
I'm going to run Eon until I get an answer. Since I expect to know the answer to my question with 24 hours, I'm going to keep my reservations. |
[QUOTE=jasong;106774]Does it affect Linux 64-bit sr2sieve-amd?
[/QUOTE] sr5sieve-amd and sr5sieve-intel are 32-bit, they are not affected even if you are running them on a 64-bit machine. Only the binaries from the following archives are affected: sr1sieve-1.1.0-linux-x86_64.tar.gz sr1sieve-1.1.1-linux-x86_64.tar.gz sr2sieve-1.5.1-linux-x86_64.tar.gz sr2sieve-1.5.2-linux-x86_64.tar.gz sr2sieve-1.5.3-linux-x86_64.tar.gz sr5sieve-1.5.1-linux-x86_64.tar.gz sr5sieve-1.5.2-linux-x86_64.tar.gz sr5sieve-1.5.3-linux-x86_64.tar.gz |
That is bad news :sad:
What is the nature of the error in previous x86-64 builds? Are some factors missed or some factors are reported although they are not factors? I would like to evaluate how much work I have to repeat... |
OK, the last backup I have is from May 12-th :sad:
Here is a benchmark for C2D E4300 @ 2.4GHz [code]./mulmodk8 length = 1000, iterations = 100000, b = 2, p = 4503599627370449: Code Vec Rate RDTSC ---- --- ------- --------- cmov 1 225.212 1751122971 gen 1 240.370 1551858453 gena 1 238.080 1533973842 genb 1 252.509 1441971999 genc 1 249.984 1445902668 gen 2 384.592 954202257 gen 4 438.570 832425624 gena 4 438.570 839892636[/code] |
[QUOTE=Cruelty;106887]That is bad news :sad:
What is the nature of the error in previous x86-64 builds? Are some factors missed or some factors are reported although they are not factors? I would like to evaluate how much work I have to repeat...[/QUOTE] I think the most likely result is that the program would miss most factors and then stop with an error message like `ERROR: p DOES NOT DIVIDE k*b^n+c' when it eventually found one. If the program found plenty of factors in a range without any error then it is very unlikely to have been affected by the bug (i.e. the gen/4 method was not being used for the giant steps). edit: No incorrect factors would be reported without stopping with an error message. You don't have to worry that a prime has been wrongly removed from the sieve. [QUOTE]./mulmodk8 length = 1000, iterations = 100000, b = 2, p = 4503599627370449: Code Vec Rate RDTSC ---- --- ------- --------- cmov 1 225.212 1751122971[/QUOTE] Thanks for this benchmark. It also performs a check of the results so this is a good indication that the x86-64 code in the latest version is working properly. |
I didn't have any such errors... anyways I will rerun the entire range and afterwards compare results :flex:
|
sr1sieve 1.1.3, sr5sieve 1.5.5
These versions improve the mulmod code for x86-64, based on Cruelty's C2D benchmark.
There are a few minor changes for the x86 version, but nothing that will be noticed in most cases. |
[QUOTE=geoff;107147]These versions improve the mulmod code for x86-64, based on Cruelty's C2D benchmark.[/QUOTE]Improvement is marginal 8.71M vs. 8.77M :tu: that is using sr1sieve :-)
|
sr1sieve 1.1.4, sr5sieve 1.5.6
These versions have improvements to the 32-bit SSE2 mulmod and powmod code, mainly the use of 8-byte SSE2 reads to match the 8-byte FPU writes in tight loops (which is probably of most benefit to P4), but also by interleaving 4 integer multiplications where the previous code interleaved 2. (so sse2/16 method does 4x4 multiplications instead of 8x2) which should benefit other SSE2 capable machines too, I hope.
Also I realized that the compiler options for the *-amd binaries have never been right: The SSE2 code path was using the Athlon64 instruction set but Athlon32 scheduling. I don't know how much difference it makes, but it is fixed to use Athlon64 scheduling now. (The *-intel binaries use i686 scheduling for the SSE2 code path because GCC doesn't know about Core2 yet and the P4 scheduling is slower, even on a P4). Here are benchmarks at p=100e12 for my P4 (2.9GHz, 8K L1, 512K L2): [code] 19k SoB.dat 68k riesel.dat 237k sr5data.txt ----------- -------------- ---------------- sr2sieve-intel 1.4.42 377 kp/s 194 kp/s 85 kp/s sr2sieve-intel 1.5.6 425 kp/s 223 kp/s 98 kp/s [/code] |
| All times are UTC. The time now is 22:37. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.