![]() |
![]() |
#1 |
Dec 2008
Boycotting the Soapbox
24·32·5 Posts |
![]()
I compute: 100.000.000/log10(2)/18M~=17.64 bits/double
According to http://www.loria.fr/~gaudry/publis/issac07.pdf prime95 uses 17.76 bits for a 32M FFT, but it doesn't mention whether 80-bit temporaries (how much impact does this have?) were used or not. If an 18M FFT is sufficient, then an implementation should be at least 10% faster than a 20M FFT, because length-3 FFTs are less complicated than length-5 FFTs. |
![]() |
![]() |
![]() |
#2 |
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
232558 Posts |
![]() |
![]() |
![]() |
![]() |
#3 |
"Lucan"
Dec 2006
England
145128 Posts |
![]()
For all we know this may be the highspot of Bolivia.
Sundance PS Keep thinking Butch - that's what you're good at. Last fiddled with by davieddy on 2010-05-18 at 06:40 |
![]() |
![]() |
![]() |
#4 |
"Lucan"
Dec 2006
England
2·3·13·83 Posts |
![]() |
![]() |
![]() |
![]() |
#5 | ||
P90 years forever!
Aug 2002
Yeehaw, FL
11111000001002 Posts |
![]() Quote:
Yes, an 18M FFT is sufficient for the smallest 100M numbers. Quote:
V25.12: Code:
Time FFTlen=20480K, Arch=3, Pass1=2560, Pass2=8192, clm=2: 522.924 ms., 523.432 ms. Code:
Time FFTlen=18432K, Arch=3, Pass1=4096, Pass2=4608, clm=4: 396.830 ms., 397.333 ms. Time FFTlen=18432K, Arch=3, Pass1=4096, Pass2=4608, clm=2: 400.698 ms., 401.327 ms. Time FFTlen=20480K, Arch=3, Pass1=2048, Pass2=10240, clm=4: 431.619 ms., 441.531 ms. Time FFTlen=20480K, Arch=3, Pass1=2048, Pass2=10240, clm=2: 438.762 ms., 439.651 ms. Time FFTlen=20480K, Arch=3, Pass1=2560, Pass2=8192, clm=4: 420.561 ms., 421.185 ms. Time FFTlen=20480K, Arch=3, Pass1=2560, Pass2=8192, clm=2: 427.008 ms., 427.514 ms. |
||
![]() |
![]() |
![]() |
#6 |
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
3×1,997 Posts |
![]()
Hows v26 coming along? It looks like there is quite a speed increase there which would be nice to use. Does that speed increase continue down to sizes used by things like pfgw and llr?
|
![]() |
![]() |
![]() |
#7 |
P90 years forever!
Aug 2002
Yeehaw, FL
22·5·397 Posts |
![]()
It's proceeding nicely. All two-pass FFTs have been recoded. This means a lot of QA will be necessary. The speed increase is less pronounced at smaller FFT sizes. FFT sizes above 64K will see a benefit -- including pfgw and llr.
|
![]() |
![]() |
![]() |
#8 |
"Oliver"
Mar 2005
Germany
100010110002 Posts |
![]()
Hi,
nice speedup from an allready incredible fast code. ![]() I'm just curious which is your strategy for performance optimizations. - just run one test with one thread and aim for best performance? - run N tests with one thread each and aim for best performance? - run one test with N threads and aim for best performance? (N is the number of cores of your system(s)) I remember that you said something like v24/v25 are optimized for "low memory bandwidth" like P4 (and Core 2 CPUs) and v26 will benefit from higher memory bandwidth of e.g. Core i7. So will we see those speedup on real world usage (utilizing all cores), too or does memory bandwidth bottleneck this improvement somehow? Oliver Last fiddled with by TheJudger on 2010-05-21 at 20:57 |
![]() |
![]() |
![]() |
#9 | ||||
P90 years forever!
Aug 2002
Yeehaw, FL
22·5·397 Posts |
![]() Quote:
Quote:
Perhaps I can release a pre-pre-beta candidate and let you folks run the benchmark with the other N-1 cores running LL tests. It might lead to several different preferred FFT implementations (e.g. the 2560K FFT has 6 different implementations with different pass 2 sizes and therefore subtly different cache and bandwidth requirements). Quote:
Quote:
The basic MASM building blocks have been optimized for 32-bit and 64-bit Core 2 architecture. I really need to take a month or two and optimize them for the P4, K8, and K10 architectures. |
||||
![]() |
![]() |
![]() |
#10 | ||
"Oliver"
Mar 2005
Germany
100010110002 Posts |
![]() Quote:
Quote:
![]() I'm wondering how the distribution of CPU type is within the primenet. Is it worth to spent much time on P4 optimization? Do you plan to fix/increase the limit of 32 cores in v26? Oliver |
||
![]() |
![]() |
![]() |
#11 |
P90 years forever!
Aug 2002
Yeehaw, FL
794010 Posts |
![]() |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
necessary but not sufficient condition for primes | Alberico Lepore | Alberico Lepore | 22 | 2018-01-03 16:17 |
Devaraj numbers- necessary and sufficient condition | devarajkandadai | Number Theory Discussion Group | 7 | 2017-09-23 02:58 |
CPU time for 100M digit prime test | biggerben | Software | 7 | 2014-10-24 05:47 |
How far along are you in your 100M digit LL test? | JuanTutors | Lounge | 6 | 2012-02-21 07:36 |
Who is LL-ing a mersenne number > 100M digits? | joblack | LMH > 100M | 1 | 2009-10-08 12:31 |