mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2020-03-29, 12:16   #2003
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×23×83 Posts
Default

Quote:
Originally Posted by preda View Post
But Ken, what is the appropriate action to take on error?

Let's say that during P-1 stage1, a residue==0 is detected. This is not the result of innapropriate FFT-size, it indicates a hardware error. But what to do in this situation, given there's no way to check P-1 -- basically what makes sense is for gpuowl to simply stop doing any P-1 (on that GPU). It can't reliably roll back to any trusted point. Assuming that a residue that is != 0 is a correct one, in the situation where the GPU produces res==0 sometimes, is not a good way to go. I would just discard the whole test as corrupted.
Count the error. Roll back to last save file that did not detect an error. That's the same as CUDALucas does when it detects an error, or prime95 does, or did before addition of Jacobi or Gerbicz checks.
Or suspend effort on that worktodo line and go to the next entry; then the user can decide later whether to resume or abandon the item that had an issue.
There are several res64-based checks possible.
res64=0 at any iteration; check more of the res to see if it's zero too. If it is it's an error, probably a failure to copy the full residue. (There's a tiny chance that res128>0 but res64=0 occurs and is correct.)
res=1 at any iteration; res=3 after the first iteration.
res64 repeating from one iteration to the next.
res64 cycling among a very small list of values. https://www.mersenneforum.org/showpo...1&postcount=10

Last fiddled with by kriesel on 2020-03-29 at 13:03
kriesel is offline   Reply With Quote
Old 2020-03-29, 19:53   #2004
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

19×587 Posts
Default

Quote:
Originally Posted by preda View Post
I just commited a change that makes CARRY64 the default for P-1, and CARRY32 the default for PRP on AMD. The rationale being that, if CARRY32 is not appropriate, this fact will be visible for PRP, thus safe; on P-1 we use the safe default (i.e. CARRY64) until we have a better solution there.
Awesome - just pulled, built, and switched runs to.

George, never heard your thoughts on whether checking the relative signs of the signed-int x and the result of the 3*x might be a useful diagnostic here.
ewmayer is offline   Reply With Quote
Old 2020-03-29, 21:17   #2005
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

19·587 Posts
Default

Just noticed something curious - since I didn't know how long it might be until a fix for the carry issue, yesterday I edited my 2 worktodo files - 10 entries each - and moved all PRPs not preceded by a p-1 of the same exponent to the top. Except that I buggered one such edit, and a PRP got moved into the top slot, while its accompanying p-1 remained below. Caught the 'doh!' with the PRP ~20% done, halted, moved the p-1 to its proper place at top of the file, resumed. The exponent is 103939597. The PRP was using 5632K, the just-started p-1 is using 6144K. Is that expected?

Last fiddled with by ewmayer on 2020-03-29 at 21:17
ewmayer is offline   Reply With Quote
Old 2020-03-29, 21:26   #2006
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

1,021 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Just noticed something curious - since I didn't know how long it might be until a fix for the carry issue, yesterday I edited my 2 worktodo files - 10 entries each - and moved all PRPs not preceded by a p-1 of the same exponent to the top. Except that I buggered one such edit, and a PRP got moved into the top slot, while its accompanying p-1 remained below. Caught the 'doh!' with the PRP ~20% done, halted, moved the p-1 to its proper place at top of the file, resumed. The exponent is 103939597. The PRP was using 5632K, the just-started p-1 is using 6144K. Is that expected?
Yes, the FFT bounds are a bit more conservative for P-1. I think this area (FFT bounds) is under investigation currently. But yes, what you see is expected given the current code.
preda is offline   Reply With Quote
Old 2020-03-29, 21:56   #2007
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

19·587 Posts
Default

Quote:
Originally Posted by preda View Post
Yes, the FFT bounds are a bit more conservative for P-1. I think this area (FFT bounds) is under investigation currently. But yes, what you see is expected given the current code.
Has the "more conservative threshold for p-1" been changed in the latest commit? Because in my 2 results files I see p as large as 103985003 using 5632 for the p-1 step, using older builds.
ewmayer is offline   Reply With Quote
Old 2020-03-29, 23:03   #2008
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

2×13×109 Posts
Default

I compiled gpuowl on the Colab pro and want to test it on the Tesla P100.
Anyone have a list of all the different options that can be tweaked to find the fastest combination?
ATH is online now   Reply With Quote
Old 2020-03-29, 23:04   #2009
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

1,021 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Has the "more conservative threshold for p-1" been changed in the latest commit? Because in my 2 results files I see p as large as 103985003 using 5632 for the p-1 step, using older builds.
No, the different FFT bounds between PRP and P-1 has been there for about 1month, I'm not aware of recent changes. So it may be something else.
preda is offline   Reply With Quote
Old 2020-03-29, 23:35   #2010
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

19×587 Posts
Default

Quote:
Originally Posted by preda View Post
No, the different FFT bounds between PRP and P-1 has been there for about 1month, I'm not aware of recent changes. So it may be something else.
FYI, the most-recentcase I see in my logs of an expo > than the recent ones using 5632K is 16. Feb, v6.11-142-gf54af2e.

More weirdness, this time hardware related - current pair of runs suffered drastic slowing-down ~30 mins ago, despite temps having been well below the usual 'caution' threshold, no odd fan noises or any other sign of amiss-ness. SMI showed both s-and-m-clocks well below their normal for my default sclk=4 setting. This happened once before, and a quick 'rocm-smi --gpureset -d 1' resolved it. To cover all the bases I first rebooted, verified the slowness persisted, then tried the reset - this time no joy.
ewmayer is offline   Reply With Quote
Old 2020-03-30, 00:20   #2011
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

1,021 Posts
Default

Quote:
Originally Posted by ewmayer View Post
FYI, the most-recentcase I see in my logs of an expo > than the recent ones using 5632K is 16. Feb, v6.11-142-gf54af2e.

More weirdness, this time hardware related - current pair of runs suffered drastic slowing-down ~30 mins ago, despite temps having been well below the usual 'caution' threshold, no odd fan noises or any other sign of amiss-ness. SMI showed both s-and-m-clocks well below their normal for my default sclk=4 setting. This happened once before, and a quick 'rocm-smi --gpureset -d 1' resolved it. To cover all the bases I first rebooted, verified the slowness persisted, then tried the reset - this time no joy.
No idea, sorry. Did you check dmesg for errors? did you try without setting sclk after reboot, to see the behavior? Do you see the power use, how did that change?
preda is offline   Reply With Quote
Old 2020-03-30, 01:00   #2012
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·23·83 Posts
Default

Quote:
Originally Posted by ATH View Post
I compiled gpuowl on the Colab pro and want to test it on the Tesla P100.
Anyone have a list of all the different options that can be tweaked to find the fastest combination?
See https://mersenneforum.org/showpost.p...postcount=1968 and the source code.
kriesel is offline   Reply With Quote
Old 2020-03-30, 01:01   #2013
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

683310 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Question: In the 4th of those 6 lines, both temp and cy can be of either sign, but temp, the rounded-but-not-yet-wordsize-normalized iFFT term, is nearly always going to much larger in magnitude than the +cy carryin, i.e. we should be able to infer the expected sign of the next line's carryout computation from it, to see whether - in your case - the integer result overlowing into the sign bit, yes?
We could detect the error condition with about 3 or 4 instructions. However, we try to create the fastest code possible and pick default settings that should safely avoid dangerous situations. Sometimes we don't quite succeed -- especially with day-to-day development.

The current code that selects CARRY64 for all P-1 work is overkill. I know how to fix that.

Last fiddled with by Prime95 on 2020-03-30 at 01:01
Prime95 is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1615 2020-05-16 23:55
GPUOWL AMD Windows OpenCL issues xx005fs GPU Computing 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 14:51.

Sun May 31 14:51:14 UTC 2020 up 67 days, 12:24, 1 user, load averages: 2.74, 2.53, 1.99

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.