mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2016-01-25, 02:20   #34
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

51710 Posts
Default

cuFFT claims it will support up to 128M...
airsquirrels is offline   Reply With Quote
Old 2016-01-25, 02:55   #35
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

331310 Posts
Default

Quote:
Originally Posted by Batalov View Post
I've shown a white-hat demonstration years ago how to submit two results (for the price of one calculation), always. It is nearly trivial. With different shifts and with different error codes (if one wants ) ...or without errors. With or without actual work done even once. Using completely unmodified Prime95/mprpime, so the result is de facto trusted. The entire hack is done with savefiles.
Yeah, I read that thread a while back I think (if it's the one I'm thinking of).

It was one of the reasons I wanted to re-check stuff people had done themselves. Not that I thought anyone in particular was cheating, but to keep that from even being a question.

It's not the first time Never Odd or Even has self-checked a really large exponent... that M383838383 that I triple-checked was one of them, and people had wondered about that one as well.

It checked out in the end, but still, the only way to put the issue to rest was to do an independent run...

Quote:
Originally Posted by airsquirrels View Post
cuFFT claims it will support up to 128M...
That's kind of what I thought, so I specifically checked what app was reported as running the tests and I was surprised to see it was a version of Prime95 that won't accept an exponent that large.

I wonder if it was started under an earlier Prime95 version that didn't have a problem with it (but still at 32M FFT which seems "iffy") and then version 28.x would still allow it since it was already started under the previous version? Somehow I don't think that would work... it *should* throw an error right away about an illegal line in the worktodo, but maybe George allowed it under that "upgrade" scenario?

Last fiddled with by Madpoo on 2016-01-25 at 02:58
Madpoo is offline   Reply With Quote
Old 2016-01-25, 04:14   #36
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

5·11·137 Posts
Default

Quote:
Originally Posted by Madpoo View Post
I know there are some who are suspicious of NOoE's results and this one in particular is eyebrow raising for the reasons mentioned above.
I have very high confidence in this result. NOoE and I worked out the plan whereby prime95 could possibly test a number this large with a 32M FFT.

The error code is expected: 255+ reproducible round off errors.

I do know that at least once the test was restarted and the interim residues still mismatched. NOoE then had the bright idea of trying again using AVX instead of FMA3 FFT and the interim residues matched again. To me, this indicates a roundoff error > 0.6 occurred.
Prime95 is online now   Reply With Quote
Old 2016-01-25, 06:03   #37
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

2×47×101 Posts
Default

cudaLucas seems to run best at fft size 36864K for this expo:
Code:
# ./CUDALucas-2.05.1-CUDA6.5-linux-x86_64 -f 32768 601248421

Warning: Couldn't find .ini file. Using defaults for non-specified options.
sleep value = -1 from CUDALucas.ini must have the form k*10^m for k = 1, 2, or 5.
Changing to 0.
The fft length 32K is too small for exponent 601248421, increasing to 34992K
Using threads: square 256, splice 128.
Starting M601248421 fft length = 34992K
Running careful round off test for 1000 iterations.
If average error > 0.25, or maximum error > 0.35,
the test will restart with a longer FFT.
Iteration  100, average error = 0.10853, max error = 0.16309
Iteration  200, average error = 0.12747, max error = 0.17969
Iteration  300, average error = 0.13351, max error = 0.16406
Iteration  400, average error = 0.13659, max error = 0.17188
Iteration  500, average error = 0.13844, max error = 0.17578
Iteration  600, average error = 0.13987, max error = 0.17188
Iteration  700, average error = 0.14078, max error = 0.17188
Iteration  800, average error = 0.14137, max error = 0.17676
Iteration  900, average error = 0.14191, max error = 0.17188
Iteration 1000, average error = 0.14215 <= 0.25 (max error = 0.17969), continuing test.
^C

# ./CUDALucas-2.05.1-CUDA6.5-linux-x86_64 -f 36864k 601248421

Warning: Couldn't find .ini file. Using defaults for non-specified options.
sleep value = -1 from CUDALucas.ini must have the form k*10^m for k = 1, 2, or 5.
Changing to 0.
Using threads: square 256, splice 128.

Continuing M601248421 @ iteration 3102 with fft length 36864K,  0.00% done
Attached Thumbnails
Click image for larger version

Name:	CudaLucas_iter_times.png
Views:	165
Size:	15.7 KB
ID:	13774  
Batalov is offline   Reply With Quote
Old 2016-01-25, 06:17   #38
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

24×389 Posts
Default

Quote:
Originally Posted by Batalov View Post
cudaLucas seems to run best at fft size 36864K for ... 601248421
Your post suggest that 34992K would be fine also.

BTW: Why did you try with 32768 FFT? Did you intend 32768K FFT?
retina is online now   Reply With Quote
Old 2016-01-25, 06:39   #39
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

224268 Posts
Default

Mlucas takes sizes in k (k is implied), and I don't run cudaLucas so I have not noticed. Anyway with '-f 32768k' the result is the same, "The fft length 32768K is too small for exponent 601248421, increasing to 34020K".

Tried another card, other FFT sizes seem faster than others. It is card dependent.
GRID K520 preferred 34992K (with 36864K faster by -cufftbench),
580 preferred 34020K but after error-rate burn-in rejected it (with 35840K or 35280K faster by -cufftbench in separate runs, and 36864K slower than either).
Batalov is offline   Reply With Quote
Old 2016-01-25, 09:31   #40
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

1164610 Posts
Default

Quote:
Originally Posted by Madpoo View Post
Well, my mission of making sure no exponents are self-verified has hit a roadblock today.

M601248421

I can't test an exponent that size using v28.x with the max FFT of 32M
FYI, Mlucas currently supports up to 256M.

As to how high one can go @32M: Using my own code [SSE2 build on my old slow Core2Duo macbook] I recently ran 595799947 to iter 2M @32M ... got a bunch of (likely benign) 0.40625 ROEs, as well as a pair of more-iffy 0.4375s. Same code gives these ROE warnings (emitted above 0.40625) for the first 10000 iterations of the LL test on M601248421:

M601248421 Roundoff warning on iteration 640, maxerr = 0.437500000000
M601248421 Roundoff warning on iteration 950, maxerr = 0.437500000000
M601248421 Roundoff warning on iteration 1165, maxerr = 0.421875000000
M601248421 Roundoff warning on iteration 1246, maxerr = 0.437500000000
M601248421 Roundoff warning on iteration 1437, maxerr = 0.437500000000
M601248421 Roundoff warning on iteration 1547, maxerr = 0.437500000000
M601248421 Roundoff warning on iteration 2323, maxerr = 0.437500000000
M601248421 Roundoff warning on iteration 3278, maxerr = 0.437500000000
M601248421 Roundoff warning on iteration 5195, maxerr = 0.437500000000
M601248421 Roundoff warning on iteration 6413, maxerr = 0.421875000000
M601248421 Roundoff warning on iteration 8381, maxerr = 0.437500000000
M601248421 Roundoff warning on iteration 8528, maxerr = 0.437500000000
10000 iterations of M601248421 with FFT length 33554432 = 32768 K
Res64: 891262C7FD6BBDA3. AvgMaxErr = 0.343563772. MaxErr = 0.437500000. Program: E16.0

So, given that George's code tends to be a smidge more accurate than mine, while I suppose its possible to run the 601M-scale exponent @32M for long stretches without fatal ROEs, I seriously doubt it could be done all the way through without hitting a fatal-level error.

George, how difficult would it be for you to build a maxp-relaxed version of Prime95 you could shoot to Aaron, or him to do so himself?
ewmayer is offline   Reply With Quote
Old 2016-01-25, 15:15   #41
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

753510 Posts
Default

Quote:
Originally Posted by ewmayer View Post
So, given that George's code tends to be a smidge more accurate than mine, while I suppose its possible to run the 601M-scale exponent @32M for long stretches without fatal ROEs, I seriously doubt it could be done all the way through without hitting a fatal-level error.
Prime95 backtracks on errors above 0.40625 redoing the problematic iteration using 3 half-sized multiplies rather than a single squaring. Thus, prime95 really can recover from roundoff errors up to 0.59375. Even so, I told NOoE not to attempt an exponent above 600M without doing two side-by-side runs comparing residues along the way.
Prime95 is online now   Reply With Quote
Old 2016-01-25, 22:56   #42
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2×32×647 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Prime95 backtracks on errors above 0.40625 redoing the problematic iteration using 3 half-sized multiplies rather than a single squaring. Thus, prime95 really can recover from roundoff errors up to 0.59375. Even so, I told NOoE not to attempt an exponent above 600M without doing two side-by-side runs comparing residues along the way.
Thanks - btw, this morning I re-ran my above 10kiter test using 36M FFT. The Res64 matches, indicating none of the ROEs was in fact a fatal one aliased to (1.0-ROE) ... but in a full-length run I would expect multiple such occurrences, given the sheer frequency of 0.4375 errors seen in the shallow run.
ewmayer is offline   Reply With Quote
Old 2016-01-26, 17:29   #43
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

2·1,579 Posts
Default

I did a "-cufftbench" with CudaLucas up to 131072K, so it works.

On my card it suggests 34992K FFT for exponents 585M to 618M which it tested at 26.4ms/iter, so M601248421 would take ~ 188 days on my Titan Black, which is not fun without ECC Ram.

These huge test would be perfect with CudaLucas on Tesla cards with ECC Ram. In theory they could test the highest exponents in primenet near 1G and even further. At 131072K it says max exponent: 2147483647 (but at 95ms/iter it would take 6.5 years)

Last fiddled with by ATH on 2016-01-26 at 17:30
ATH is offline   Reply With Quote
Old 2018-12-19, 10:30   #44
ramgeis
 
ramgeis's Avatar
 
Apr 2013

11101012 Posts
Default

Quote:
Originally Posted by Madpoo View Post
I'd like a larger FFT size option in the next 28.x build, just so I can do a DC on Ake's result of M595999993 (when he finally turned it in, it had some errors that marked it suspect). It's currently the largest result but I have no way to run a verification.
Turns out that despite the errors the result was fine.
ramgeis is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Force FFT-size to be used kruoli Software 4 2017-11-17 18:14
Pi(x) value for x at 10^16 size edorajh Computer Science & Computational Number Theory 6 2017-03-08 20:28
Size optimization Sleepy Msieve 14 2011-10-20 10:27
Exponent Size Gap Mini-Geek PrimeNet 8 2007-03-25 07:29
FFT-Size andi314 Lounge 14 2007-01-22 00:21

All times are UTC. The time now is 18:20.


Sun Aug 1 18:20:17 UTC 2021 up 9 days, 12:49, 0 users, load averages: 3.23, 3.06, 2.79

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.