mersenneforum.org  

Go Back   mersenneforum.org > Other Stuff > Archived Projects > 3*2^n-1 Search

 
 
Thread Tools
Old 2005-10-07, 15:15   #45
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

386410 Posts
Default

Thomas informs me that the FFT boundaries have changed recently with the latest LLRs for Athlons:

Here are the latest Athlon FFT lengths for k=3:

fftlen nmax
-----------------------
114688 2233110
131072 2560126
163840 3180158
196608 3777190
229376 4411222
262144 5056254
-----------------------

So it makes sense to use a older version of LLR to do n=2233111-2244110 This would be about 600 tests, saving 2000 secs per test = 2 cpu weeks

We'll sort it out when the next set of numbers are released (n>2.2 million)

Thanks Thomas
paulunderwood is offline  
Old 2005-10-08, 14:38   #46
Thomas11
 
Thomas11's Avatar
 
Feb 2003

22·32·53 Posts
Default

Meanwhile I had some additional tests on the Athlon, using LLR 3.5, 3.6, and 3.6.2.
As Paul already mentioned in the above post, the FFT boundaries are slightly lower for the latest LLR versions (3.6 and 3.6.2), e.g. for the current 112k FFT we have nmax=2233110 for versions 3.6/3.6.2, but nmax=2244110 for version 3.5.

Below are some timings using the different LLR's on my 2GHz Athlon:
Code:
Athlon 2GHz (2400+)
--------------------
                                                                                
times per iteration for FFT lengths
114688 (112k) and 131072 (128k):
                                                                                
LLR       112k    128k
----------------------
3.5      7.024   7.904
3.6      7.045   7.938
3.6.2    7.042   7.944
----------------------
We note the following:
(1) There is no significant difference between LLR 3.6 and 3.6.2.
(2) LLR 3.5 is slightly faster than 3.6/3.6.2 (about 0.3-0.5%).

The latter holds only for the Athlon. On the P4, versions 3.6/3.6.2 are about 1.5-2% faster than LLR 3.5 (at least on my machines...).

If someone else (perhaps, or at least you, Paul ) would verify that LLR 3.5 is still a bit faster (or at least not slower) than versions 3.6/3.6.2 on your Athlons, we could recommend to use LLR 3.5 exclusively and entirely for the Athlon, e.g. not only for n=2233111-2244110, but for any range.

For those of you, who want to do the timings, I suggest to use the following n's: 2233110 and 2233111, or 2244110 and 2244111, depending on the LLR version. Set the screen output to "every 1000 iterations" and watch it for about one minute. Note, that the very first output on your screen shows a larger msecs/iteration due to the initial U0/V0 computations.

And now let's find that two-megabit prime!
Thomas11 is offline  
Old 2005-10-08, 23:52   #47
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

23×3×7×23 Posts
Default

I have timed "llr35" and "llr362" on an Athlon (Debian+X) and calculated that the old LLR is 0.4% quicker:

7.912 ms/iteration llr362
7.880 ms/iteration llr35

paulunderwood is offline  
Old 2005-10-09, 00:28   #48
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

74308 Posts
Default

Some more timings on Linux with no X:

Athlon 1050MHz
13.014 ms/iteration llr362
13.028 ms/iteration llr35
...new LLR slightly faster (0.1%)

Athlon XP1600+
9.432 ms/iteration llr362
9.401 ms/iteration llr35
...old LLR faster (0.3%)

Athlon XP2000+
8.278 ms/iteration llr362
8.250 ms/iteration llr35
...old LLR faster (0.3%)

So it seems on AthlonXPs (not ordinary old Athlons) it is better to run LLR35

Last fiddled with by paulunderwood on 2005-10-09 at 00:51
paulunderwood is offline  
Old 2007-02-14, 10:17   #49
Thomas11
 
Thomas11's Avatar
 
Feb 2003

22·32·53 Posts
Default

Just out of curiosity I did a test on an Opteron 246 (2 GHz) using the "CpuSupportsSSE2=0" switch to see whether it would run faster at the lower FFT length. Here are the figures I got:

Code:
               FFT length  time per iteration
---------------------------------------------
SSE2 enabled:    196608         8.483 ms
SSE2 disabled:   163840         9.036 ms
It clearly shows that the SSE2 code is much faster, even at the higher FFT length. This behaviour should also hold for the Athlon64's. Perhaps someone could do a test for verification...

Last fiddled with by Thomas11 on 2007-02-14 at 10:18
Thomas11 is offline  
Old 2008-06-16, 17:32   #50
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

23×3×7×23 Posts
Default IBDWT break points for 5M < n < 20M+

Thanks to Thomas for this info:

here are the FFT lengths and break points for k=3, n=5-20M:

Code:
For P4 and other SSE2 cpus:

     fftlen       nmax
-----------------------
     327680    6161318
     393216    7339382
     458752    8544446
     524288    9764510
     655360   12130637
     786432   14446765
     917504   16822893
    1048576   19219021
    1310720   23891277

And for Athlons (cpus without SSE2):

     fftlen       nmax
-----------------------
     262144    5056254
     327680    6285318
     393216    7460382
     458752    8707446
     524288    9964510
     655360   12370637
     786432   14686765
     917504   17162893
    1048576   19629021
    1310720   24351277
paulunderwood is offline  
Old 2008-06-26, 14:17   #51
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

23·3·7·23 Posts
Default

By request and by curtesy of Thomas here is the program to create tables such as above.

Quote:
The attached zip file contains the little tool "fft_len" which I used to get those numbers. You simply need to rename/copy the file matching your cpu architecture (either "maxlen_Athlon.txt" or "maxlen_P4.txt") to "maxlen.txt" and then run fft_len from the command line.
It asks for "k", "n_min" and "n_max" and then produces the list.
Attached Files
File Type: zip fft_len.zip (8.1 KB, 354 views)

Last fiddled with by paulunderwood on 2008-06-26 at 14:17
paulunderwood is offline  
Old 2008-06-26, 15:56   #52
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

23×3×7×23 Posts
Default

Here is another program written by Thomas for optimization of project throughput, for wide ranges of "n" crossing FFT jumps.
Attached Files
File Type: zip llrtools_src.zip (10.9 KB, 360 views)
paulunderwood is offline  
 

Thread Tools


All times are UTC. The time now is 17:34.


Thu Oct 21 17:34:59 UTC 2021 up 90 days, 12:03, 1 user, load averages: 1.04, 1.11, 1.43

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.