![]() |
|
|
#34 |
|
"David"
Jul 2015
Ohio
11×47 Posts |
cuFFT claims it will support up to 128M...
|
|
|
|
|
|
#35 | |
|
Serpentine Vermin Jar
Jul 2014
CF116 Posts |
Quote:
It was one of the reasons I wanted to re-check stuff people had done themselves. Not that I thought anyone in particular was cheating, but to keep that from even being a question. It's not the first time Never Odd or Even has self-checked a really large exponent... that M383838383 that I triple-checked was one of them, and people had wondered about that one as well. It checked out in the end, but still, the only way to put the issue to rest was to do an independent run... That's kind of what I thought, so I specifically checked what app was reported as running the tests and I was surprised to see it was a version of Prime95 that won't accept an exponent that large. I wonder if it was started under an earlier Prime95 version that didn't have a problem with it (but still at 32M FFT which seems "iffy") and then version 28.x would still allow it since it was already started under the previous version? Somehow I don't think that would work... it *should* throw an error right away about an illegal line in the worktodo, but maybe George allowed it under that "upgrade" scenario? Last fiddled with by Madpoo on 2016-01-25 at 02:58 |
|
|
|
|
|
|
#36 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
5×11×137 Posts |
Quote:
The error code is expected: 255+ reproducible round off errors. I do know that at least once the test was restarted and the interim residues still mismatched. NOoE then had the bright idea of trying again using AVX instead of FMA3 FFT and the interim residues matched again. To me, this indicates a roundoff error > 0.6 occurred. |
|
|
|
|
|
|
#37 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
2·47·101 Posts |
cudaLucas seems to run best at fft size 36864K for this expo:
Code:
# ./CUDALucas-2.05.1-CUDA6.5-linux-x86_64 -f 32768 601248421 Warning: Couldn't find .ini file. Using defaults for non-specified options. sleep value = -1 from CUDALucas.ini must have the form k*10^m for k = 1, 2, or 5. Changing to 0. The fft length 32K is too small for exponent 601248421, increasing to 34992K Using threads: square 256, splice 128. Starting M601248421 fft length = 34992K Running careful round off test for 1000 iterations. If average error > 0.25, or maximum error > 0.35, the test will restart with a longer FFT. Iteration 100, average error = 0.10853, max error = 0.16309 Iteration 200, average error = 0.12747, max error = 0.17969 Iteration 300, average error = 0.13351, max error = 0.16406 Iteration 400, average error = 0.13659, max error = 0.17188 Iteration 500, average error = 0.13844, max error = 0.17578 Iteration 600, average error = 0.13987, max error = 0.17188 Iteration 700, average error = 0.14078, max error = 0.17188 Iteration 800, average error = 0.14137, max error = 0.17676 Iteration 900, average error = 0.14191, max error = 0.17188 Iteration 1000, average error = 0.14215 <= 0.25 (max error = 0.17969), continuing test. ^C # ./CUDALucas-2.05.1-CUDA6.5-linux-x86_64 -f 36864k 601248421 Warning: Couldn't find .ini file. Using defaults for non-specified options. sleep value = -1 from CUDALucas.ini must have the form k*10^m for k = 1, 2, or 5. Changing to 0. Using threads: square 256, splice 128. Continuing M601248421 @ iteration 3102 with fft length 36864K, 0.00% done |
|
|
|
|
|
#38 |
|
Undefined
"The unspeakable one"
Jun 2006
My evil lair
24·389 Posts |
|
|
|
|
|
|
#39 |
|
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
2×47×101 Posts |
Mlucas takes sizes in k (k is implied), and I don't run cudaLucas so I have not noticed. Anyway with '-f 32768k' the result is the same, "The fft length 32768K is too small for exponent 601248421, increasing to 34020K".
Tried another card, other FFT sizes seem faster than others. It is card dependent. GRID K520 preferred 34992K (with 36864K faster by -cufftbench), 580 preferred 34020K but after error-rate burn-in rejected it (with 35840K or 35280K faster by -cufftbench in separate runs, and 36864K slower than either). |
|
|
|
|
|
#40 | |
|
∂2ω=0
Sep 2002
República de California
2D7E16 Posts |
Quote:
As to how high one can go @32M: Using my own code [SSE2 build on my old slow Core2Duo macbook] I recently ran 595799947 to iter 2M @32M ... got a bunch of (likely benign) 0.40625 ROEs, as well as a pair of more-iffy 0.4375s. Same code gives these ROE warnings (emitted above 0.40625) for the first 10000 iterations of the LL test on M601248421: M601248421 Roundoff warning on iteration 640, maxerr = 0.437500000000 M601248421 Roundoff warning on iteration 950, maxerr = 0.437500000000 M601248421 Roundoff warning on iteration 1165, maxerr = 0.421875000000 M601248421 Roundoff warning on iteration 1246, maxerr = 0.437500000000 M601248421 Roundoff warning on iteration 1437, maxerr = 0.437500000000 M601248421 Roundoff warning on iteration 1547, maxerr = 0.437500000000 M601248421 Roundoff warning on iteration 2323, maxerr = 0.437500000000 M601248421 Roundoff warning on iteration 3278, maxerr = 0.437500000000 M601248421 Roundoff warning on iteration 5195, maxerr = 0.437500000000 M601248421 Roundoff warning on iteration 6413, maxerr = 0.421875000000 M601248421 Roundoff warning on iteration 8381, maxerr = 0.437500000000 M601248421 Roundoff warning on iteration 8528, maxerr = 0.437500000000 10000 iterations of M601248421 with FFT length 33554432 = 32768 K Res64: 891262C7FD6BBDA3. AvgMaxErr = 0.343563772. MaxErr = 0.437500000. Program: E16.0 So, given that George's code tends to be a smidge more accurate than mine, while I suppose its possible to run the 601M-scale exponent @32M for long stretches without fatal ROEs, I seriously doubt it could be done all the way through without hitting a fatal-level error. George, how difficult would it be for you to build a maxp-relaxed version of Prime95 you could shoot to Aaron, or him to do so himself? |
|
|
|
|
|
|
#41 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
5·11·137 Posts |
Quote:
|
|
|
|
|
|
|
#42 | |
|
∂2ω=0
Sep 2002
República de California
2·32·647 Posts |
Quote:
|
|
|
|
|
|
|
#43 |
|
Einyen
Dec 2003
Denmark
C5616 Posts |
I did a "-cufftbench" with CudaLucas up to 131072K, so it works.
On my card it suggests 34992K FFT for exponents 585M to 618M which it tested at 26.4ms/iter, so M601248421 would take ~ 188 days on my Titan Black, which is not fun without ECC Ram. These huge test would be perfect with CudaLucas on Tesla cards with ECC Ram. In theory they could test the highest exponents in primenet near 1G and even further. At 131072K it says max exponent: 2147483647 (but at 95ms/iter it would take 6.5 years) Last fiddled with by ATH on 2016-01-26 at 17:30 |
|
|
|
|
|
#44 | |
|
Apr 2013
32·13 Posts |
Quote:
|
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Force FFT-size to be used | kruoli | Software | 4 | 2017-11-17 18:14 |
| Pi(x) value for x at 10^16 size | edorajh | Computer Science & Computational Number Theory | 6 | 2017-03-08 20:28 |
| Size optimization | Sleepy | Msieve | 14 | 2011-10-20 10:27 |
| Exponent Size Gap | Mini-Geek | PrimeNet | 8 | 2007-03-25 07:29 |
| FFT-Size | andi314 | Lounge | 14 | 2007-01-22 00:21 |