![]() |
|
|
#45 |
|
Sep 2002
Database er0rr
3,739 Posts |
Thomas informs me that the FFT boundaries have changed recently with the latest LLRs for Athlons:
Here are the latest Athlon FFT lengths for k=3: fftlen nmax ----------------------- 114688 2233110 131072 2560126 163840 3180158 196608 3777190 229376 4411222 262144 5056254 ----------------------- So it makes sense to use a older version of LLR to do n=2233111-2244110 This would be about 600 tests, saving 2000 secs per test = 2 cpu weeks We'll sort it out when the next set of numbers are released (n>2.2 million) Thanks Thomas
|
|
|
|
|
#46 |
|
Feb 2003
22×32×53 Posts |
Meanwhile I had some additional tests on the Athlon, using LLR 3.5, 3.6, and 3.6.2.
As Paul already mentioned in the above post, the FFT boundaries are slightly lower for the latest LLR versions (3.6 and 3.6.2), e.g. for the current 112k FFT we have nmax=2233110 for versions 3.6/3.6.2, but nmax=2244110 for version 3.5. Below are some timings using the different LLR's on my 2GHz Athlon: Code:
Athlon 2GHz (2400+)
--------------------
times per iteration for FFT lengths
114688 (112k) and 131072 (128k):
LLR 112k 128k
----------------------
3.5 7.024 7.904
3.6 7.045 7.938
3.6.2 7.042 7.944
----------------------
(1) There is no significant difference between LLR 3.6 and 3.6.2. (2) LLR 3.5 is slightly faster than 3.6/3.6.2 (about 0.3-0.5%). The latter holds only for the Athlon. On the P4, versions 3.6/3.6.2 are about 1.5-2% faster than LLR 3.5 (at least on my machines...). If someone else (perhaps, or at least you, Paul ) would verify that LLR 3.5 is still a bit faster (or at least not slower) than versions 3.6/3.6.2 on your Athlons, we could recommend to use LLR 3.5 exclusively and entirely for the Athlon, e.g. not only for n=2233111-2244110, but for any range.For those of you, who want to do the timings, I suggest to use the following n's: 2233110 and 2233111, or 2244110 and 2244111, depending on the LLR version. Set the screen output to "every 1000 iterations" and watch it for about one minute. Note, that the very first output on your screen shows a larger msecs/iteration due to the initial U0/V0 computations. And now let's find that two-megabit prime!
|
|
|
|
|
#47 |
|
Sep 2002
Database er0rr
3,739 Posts |
I have timed "llr35" and "llr362" on an Athlon (Debian+X) and calculated that the old LLR is 0.4% quicker:
7.912 ms/iteration llr362 7.880 ms/iteration llr35
|
|
|
|
|
#48 |
|
Sep 2002
Database er0rr
373910 Posts |
Some more timings on Linux with no X:
Athlon 1050MHz 13.014 ms/iteration llr362 13.028 ms/iteration llr35 ...new LLR slightly faster (0.1%) Athlon XP1600+ 9.432 ms/iteration llr362 9.401 ms/iteration llr35 ...old LLR faster (0.3%) Athlon XP2000+ 8.278 ms/iteration llr362 8.250 ms/iteration llr35 ...old LLR faster (0.3%) So it seems on AthlonXPs (not ordinary old Athlons) it is better to run LLR35
Last fiddled with by paulunderwood on 2005-10-09 at 00:51 |
|
|
|
|
#49 |
|
Feb 2003
77416 Posts |
Just out of curiosity I did a test on an Opteron 246 (2 GHz) using the "CpuSupportsSSE2=0" switch to see whether it would run faster at the lower FFT length. Here are the figures I got:
Code:
FFT length time per iteration --------------------------------------------- SSE2 enabled: 196608 8.483 ms SSE2 disabled: 163840 9.036 ms Last fiddled with by Thomas11 on 2007-02-14 at 10:18 |
|
|
|
|
#50 |
|
Sep 2002
Database er0rr
3,739 Posts |
Thanks to Thomas for this info:
here are the FFT lengths and break points for k=3, n=5-20M: Code:
For P4 and other SSE2 cpus:
fftlen nmax
-----------------------
327680 6161318
393216 7339382
458752 8544446
524288 9764510
655360 12130637
786432 14446765
917504 16822893
1048576 19219021
1310720 23891277
And for Athlons (cpus without SSE2):
fftlen nmax
-----------------------
262144 5056254
327680 6285318
393216 7460382
458752 8707446
524288 9964510
655360 12370637
786432 14686765
917504 17162893
1048576 19629021
1310720 24351277
|
|
|
|
|
#51 | |
|
Sep 2002
Database er0rr
72338 Posts |
By request and by curtesy of Thomas here is the program to create tables such as above.
Quote:
Last fiddled with by paulunderwood on 2008-06-26 at 14:17 |
|
|
|
|
|
#52 |
|
Sep 2002
Database er0rr
1110100110112 Posts |
Here is another program written by Thomas for optimization of project throughput, for wide ranges of "n" crossing FFT jumps.
|
|
|