20100518, 02:43  #1 
Dec 2008
Boycotting the Soapbox
1320_{8} Posts 
Is a 18MFFT sufficient to test a 100M number?
I compute: 100.000.000/log_{10}(2)/18M~=17.64 bits/double
According to http://www.loria.fr/~gaudry/publis/issac07.pdf prime95 uses 17.76 bits for a 32M FFT, but it doesn't mention whether 80bit temporaries (how much impact does this have?) were used or not. If an 18M FFT is sufficient, then an implementation should be at least 10% faster than a 20M FFT, because length3 FFTs are less complicated than length5 FFTs. 
20100518, 05:48  #2 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
23401_{8} Posts 

20100518, 06:06  #3 
"Lucan"
Dec 2006
England
14512_{8} Posts 
For all we know this may be the highspot of Bolivia.
Sundance PS Keep thinking Butch  that's what you're good at. Last fiddled with by davieddy on 20100518 at 06:40 
20100518, 07:13  #4 
"Lucan"
Dec 2006
England
6474_{10} Posts 

20100518, 13:58  #5  
P90 years forever!
Aug 2002
Yeehaw, FL
2^{2}·43·47 Posts 
Quote:
Yes, an 18M FFT is sufficient for the smallest 100M numbers. Quote:
V25.12: Code:
Time FFTlen=20480K, Arch=3, Pass1=2560, Pass2=8192, clm=2: 522.924 ms., 523.432 ms. Code:
Time FFTlen=18432K, Arch=3, Pass1=4096, Pass2=4608, clm=4: 396.830 ms., 397.333 ms. Time FFTlen=18432K, Arch=3, Pass1=4096, Pass2=4608, clm=2: 400.698 ms., 401.327 ms. Time FFTlen=20480K, Arch=3, Pass1=2048, Pass2=10240, clm=4: 431.619 ms., 441.531 ms. Time FFTlen=20480K, Arch=3, Pass1=2048, Pass2=10240, clm=2: 438.762 ms., 439.651 ms. Time FFTlen=20480K, Arch=3, Pass1=2560, Pass2=8192, clm=4: 420.561 ms., 421.185 ms. Time FFTlen=20480K, Arch=3, Pass1=2560, Pass2=8192, clm=2: 427.008 ms., 427.514 ms. 

20100518, 16:50  #6 
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
3×2,003 Posts 
Hows v26 coming along? It looks like there is quite a speed increase there which would be nice to use. Does that speed increase continue down to sizes used by things like pfgw and llr?

20100518, 17:20  #7 
P90 years forever!
Aug 2002
Yeehaw, FL
2^{2}·43·47 Posts 
It's proceeding nicely. All twopass FFTs have been recoded. This means a lot of QA will be necessary. The speed increase is less pronounced at smaller FFT sizes. FFT sizes above 64K will see a benefit  including pfgw and llr.

20100521, 20:56  #8 
"Oliver"
Mar 2005
Germany
1113_{10} Posts 
Hi,
nice speedup from an allready incredible fast code. I'm just curious which is your strategy for performance optimizations.  just run one test with one thread and aim for best performance?  run N tests with one thread each and aim for best performance?  run one test with N threads and aim for best performance? (N is the number of cores of your system(s)) I remember that you said something like v24/v25 are optimized for "low memory bandwidth" like P4 (and Core 2 CPUs) and v26 will benefit from higher memory bandwidth of e.g. Core i7. So will we see those speedup on real world usage (utilizing all cores), too or does memory bandwidth bottleneck this improvement somehow? Oliver Last fiddled with by TheJudger on 20100521 at 20:57 
20100521, 23:29  #9  
P90 years forever!
Aug 2002
Yeehaw, FL
2^{2}·43·47 Posts 
Quote:
Quote:
Perhaps I can release a preprebeta candidate and let you folks run the benchmark with the other N1 cores running LL tests. It might lead to several different preferred FFT implementations (e.g. the 2560K FFT has 6 different implementations with different pass 2 sizes and therefore subtly different cache and bandwidth requirements). Quote:
Quote:
The basic MASM building blocks have been optimized for 32bit and 64bit Core 2 architecture. I really need to take a month or two and optimize them for the P4, K8, and K10 architectures. 

20100522, 19:42  #10  
"Oliver"
Mar 2005
Germany
3×7×53 Posts 
Quote:
Quote:
I'm wondering how the distribution of CPU type is within the primenet. Is it worth to spent much time on P4 optimization? Do you plan to fix/increase the limit of 32 cores in v26? Oliver 

20100522, 21:24  #11 
P90 years forever!
Aug 2002
Yeehaw, FL
2^{2}·43·47 Posts 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
necessary but not sufficient condition for primes  Alberico Lepore  Alberico Lepore  22  20180103 16:17 
Devaraj numbers necessary and sufficient condition  devarajkandadai  Number Theory Discussion Group  7  20170923 02:58 
CPU time for 100M digit prime test  biggerben  Software  7  20141024 05:47 
How far along are you in your 100M digit LL test?  JuanTutors  Lounge  6  20120221 07:36 
Who is LLing a mersenne number > 100M digits?  joblack  LMH > 100M  1  20091008 12:31 