mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

kriesel 2020-03-29 12:16

[QUOTE=preda;541235]But Ken, what is the appropriate action to take on error?

Let's say that during P-1 stage1, a residue==0 is detected. This is not the result of innapropriate FFT-size, it indicates a hardware error. But what to do in this situation, given there's no way to check P-1 -- basically what makes sense is for gpuowl to simply stop doing any P-1 (on that GPU). It can't reliably roll back to any trusted point. Assuming that a residue that is != 0 is a correct one, in the situation where the GPU produces res==0 sometimes, is not a good way to go. I would just discard the whole test as corrupted.[/QUOTE]Count the error. Roll back to last save file that did not detect an error. That's the same as CUDALucas does when it detects an error, or prime95 does, or did before addition of Jacobi or Gerbicz checks.
Or suspend effort on that worktodo line and go to the next entry; then the user can decide later whether to resume or abandon the item that had an issue.
There are several res64-based checks possible.
res64=0 at any iteration; check more of the res to see if it's zero too. If it is it's an error, probably a failure to copy the full residue. (There's a tiny chance that res128>0 but res64=0 occurs and is correct.)
res=1 at any iteration; res=3 after the first iteration.
res64 repeating from one iteration to the next.
res64 cycling among a very small list of values. [URL]https://www.mersenneforum.org/showpost.php?p=515641&postcount=10[/URL]

ewmayer 2020-03-29 19:53

[QUOTE=preda;541232]I just commited a change that makes CARRY64 the default for P-1, and CARRY32 the default for PRP on AMD. The rationale being that, if CARRY32 is not appropriate, this fact will be visible for PRP, thus safe; on P-1 we use the safe default (i.e. CARRY64) until we have a better solution there.[/QUOTE]

Awesome - just pulled, built, and switched runs to.

George, never heard your thoughts on whether checking the relative signs of the signed-int x and the result of the 3*x might be a useful diagnostic here.

ewmayer 2020-03-29 21:17

Just noticed something curious - since I didn't know how long it might be until a fix for the carry issue, yesterday I edited my 2 worktodo files - 10 entries each - and moved all PRPs not preceded by a p-1 of the same exponent to the top. Except that I buggered one such edit, and a PRP got moved into the top slot, while its accompanying p-1 remained below. Caught the 'doh!' with the PRP ~20% done, halted, moved the p-1 to its proper place at top of the file, resumed. The exponent is 103939597. The PRP was using 5632K, the just-started p-1 is using 6144K. Is that expected?

preda 2020-03-29 21:26

[QUOTE=ewmayer;541268]Just noticed something curious - since I didn't know how long it might be until a fix for the carry issue, yesterday I edited my 2 worktodo files - 10 entries each - and moved all PRPs not preceded by a p-1 of the same exponent to the top. Except that I buggered one such edit, and a PRP got moved into the top slot, while its accompanying p-1 remained below. Caught the 'doh!' with the PRP ~20% done, halted, moved the p-1 to its proper place at top of the file, resumed. The exponent is 103939597. The PRP was using 5632K, the just-started p-1 is using 6144K. Is that expected?[/QUOTE]

Yes, the FFT bounds are a bit more conservative for P-1. I think this area (FFT bounds) is under investigation currently. But yes, what you see is expected given the current code.

ewmayer 2020-03-29 21:56

[QUOTE=preda;541269]Yes, the FFT bounds are a bit more conservative for P-1. I think this area (FFT bounds) is under investigation currently. But yes, what you see is expected given the current code.[/QUOTE]

Has the "more conservative threshold for p-1" been changed in the latest commit? Because in my 2 results files I see p as large as 103985003 using 5632 for the p-1 step, using older builds.

ATH 2020-03-29 23:03

I compiled gpuowl on the Colab pro and want to test it on the Tesla P100.
Anyone have a list of all the different options that can be tweaked to find the fastest combination?

preda 2020-03-29 23:04

[QUOTE=ewmayer;541273]Has the "more conservative threshold for p-1" been changed in the latest commit? Because in my 2 results files I see p as large as 103985003 using 5632 for the p-1 step, using older builds.[/QUOTE]

No, the different FFT bounds between PRP and P-1 has been there for about 1month, I'm not aware of recent changes. So it may be something else.

ewmayer 2020-03-29 23:35

[QUOTE=preda;541281]No, the different FFT bounds between PRP and P-1 has been there for about 1month, I'm not aware of recent changes. So it may be something else.[/QUOTE]

FYI, the most-recentcase I see in my logs of an expo > than the recent ones using 5632K is 16. Feb, v6.11-142-gf54af2e.

More weirdness, this time hardware related - current pair of runs suffered drastic slowing-down ~30 mins ago, despite temps having been well below the usual 'caution' threshold, no odd fan noises or any other sign of amiss-ness. SMI showed both s-and-m-clocks well below their normal for my default sclk=4 setting. This happened once before, and a quick 'rocm-smi --gpureset -d 1' resolved it. To cover all the bases I first rebooted, verified the slowness persisted, then tried the reset - this time no joy.

preda 2020-03-30 00:20

[QUOTE=ewmayer;541284]FYI, the most-recentcase I see in my logs of an expo > than the recent ones using 5632K is 16. Feb, v6.11-142-gf54af2e.

More weirdness, this time hardware related - current pair of runs suffered drastic slowing-down ~30 mins ago, despite temps having been well below the usual 'caution' threshold, no odd fan noises or any other sign of amiss-ness. SMI showed both s-and-m-clocks well below their normal for my default sclk=4 setting. This happened once before, and a quick 'rocm-smi --gpureset -d 1' resolved it. To cover all the bases I first rebooted, verified the slowness persisted, then tried the reset - this time no joy.[/QUOTE]

No idea, sorry. Did you check dmesg for errors? did you try without setting sclk after reboot, to see the behavior? Do you see the power use, how did that change?

kriesel 2020-03-30 01:00

[QUOTE=ATH;541280]I compiled gpuowl on the Colab pro and want to test it on the Tesla P100.
Anyone have a list of all the different options that can be tweaked to find the fastest combination?[/QUOTE]See [url]https://mersenneforum.org/showpost.php?p=540152&postcount=1968[/url] and the source code.

Prime95 2020-03-30 01:01

[QUOTE=ewmayer;541111]
Question: In the 4th of those 6 lines, both temp and cy can be of either sign, but temp, the rounded-but-not-yet-wordsize-normalized iFFT term, is nearly always going to much larger in magnitude than the +cy carryin, i.e. we should be able to infer the expected sign of the next line's carryout computation from it, to see whether - in your case - the integer result overlowing into the sign bit, yes?[/QUOTE]

We could detect the error condition with about 3 or 4 instructions. However, we try to create the fastest code possible and pick default settings that should safely avoid dangerous situations. Sometimes we don't quite succeed -- especially with day-to-day development.

The current code that selects CARRY64 for all P-1 work is overkill. I know how to fix that.


All times are UTC. The time now is 23:09.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.