mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

PhilF 2020-01-12 14:47

[QUOTE=preda;534937]Could you double check whether you actually lost the PRP savefiles? that's higly surprising, because gpuOwl does not delete the content of the past exponents ever, except when using -cleanup (which you aren't using).

So, please track down the exponent on which you were PRP half-way (from gpuowl.log). Next look in the folder for that exponent, you should have the savefiles safely there -- not deleted and not lost.

What I think happened is this: you simply started a new exponent (a different one) from worktodo.txt. The order of worktodo entries changed, and the exponent you were 50% through is still there. Maybe it even has an entry in the worktodo.txt.

An extended excerpt of gpuowl.log would help with understanding what happened.[/QUOTE]
I wish that were the case, but I believe there was only 1 line in worktodo.txt at the time.

There is a possibility that I messed up and kept the wrong save files because I had a number of folders in that same range with very similar names. That might be a better explanation since it does not appear to be the code. I think rather than trying to track this down, I should try it again. If I can reproduce this I'll let you know.

The worktodo.txt-bak is a mess, partially because I didn't know about the needed newline (AIDs are not real). Gpuowl must have added the duplicate lines:

[code]PRP=764DD1319D71BA1AE73B4D7C415C22EF,1,2,101949599,-1,76,2PFactor=764DD1319D71BA1AE73B4D7C415C22EF,1,2,101949599,-1,2,0
PRP=764DD1319D71BA1AE73B4D7C415C22EF,1,2,101949599,-1,76,0
PFactor=764DD1319D71BA1AE73B4D7C415C22EF,1,2,101949599,-1,2,0
PRP=764DD1319D71BA1AE73B4D7C415C22EF,1,2,101949599,-1,76,0[/code]


Here are the last 20 lines of gpuowl.log:

[code]2020-01-11 18:58:15 i7-4790 101949599 OK 48200000 47.28%; 893 us/it; ETA 0d 13:20; 52a583a2a885b208 (check 0.53s)
2020-01-11 19:01:15 i7-4790 101949599 OK 48400000 47.47%; 893 us/it; ETA 0d 13:17; 88403b125b19d22a (check 0.53s)
2020-01-11 19:04:14 i7-4790 101949599 OK 48600000 47.67%; 893 us/it; ETA 0d 13:14; 8eb6c84a2f34b07b (check 0.53s)
2020-01-11 19:06:59 i7-4790 Stopping, please wait..
2020-01-11 19:06:59 i7-4790 101949599 OK 48784800 47.85%; 893 us/it; ETA 0d 13:11; e0868a0077e6cd96 (check 0.50s)
2020-01-11 19:06:59 i7-4790 Exiting because "stop requested"
2020-01-11 19:06:59 i7-4790 Bye
2020-01-11 19:33:20 Note: not found 'config.txt'
2020-01-11 19:33:20 config: -device 0 -user pfrakes -cpu i7-4790 -B1 1000000 -B2 32000000
2020-01-11 19:33:20 device 0, unique id ''
2020-01-11 19:33:20 i7-4790 'worktodo.txt': could not find the line 'PRP=764DD1319D71BA1AE73B4D7C415C22EF,1,2,101949599,-1,76,2' to delete
2020-01-11 19:33:20 i7-4790 101949599 FFT 5632K: Width 256x4, Height 64x4, Middle 11; 17.68 bits/word
2020-01-11 19:33:21 i7-4790 OpenCL args "-DEXP=101949599u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0x1.401bafea92a09p+0 -DIWEIGHT_STEP=0x1.99762c21e62cp-1 -DWEIGHT_BIGSTEP=0x1.306fe0a31b715p+0 -DIWEIGHT_BIGSTEP=0x1.ae89f995ad3adp-1 -DAMDGPU=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-01-11 19:33:22 i7-4790 OpenCL compilation in 1.70 s
2020-01-11 19:33:23 i7-4790 101949599 OK 11200 loaded: blockSize 400, 55afd3a9f362e204
2020-01-11 19:33:24 i7-4790 101949599 OK 12000 0.01%; 882 us/it; ETA 1d 00:58; 4dcb47cf0ec6fab2 (check 0.48s)
2020-01-11 19:33:36 i7-4790 Stopping, please wait..
2020-01-11 19:33:37 i7-4790 101949599 OK 26000 0.03%; 880 us/it; ETA 1d 00:55; 26a557d2e852e785 (check 0.48s)
2020-01-11 19:33:37 i7-4790 Exiting because "stop requested"
2020-01-11 19:33:37 i7-4790 Bye
[/code]

preda 2020-01-12 20:11

Thank you for the info. In the meantime I added detection of the missing newline at the end of worktodo.txt, presumably the merged lines you saw should not happen anymore.

[QUOTE=PhilF;534957]
The worktodo.txt-bak is a mess, partially because I didn't know about the needed newline (AIDs are not real). Gpuowl must have added the duplicate lines:

[code]PRP=764DD1319D71BA1AE73B4D7C415C22EF,1,2,101949599,-1,76,2PFactor=764DD1319D71BA1AE73B4D7C415C22EF,1,2,101949599,-1,2,0
PRP=764DD1319D71BA1AE73B4D7C415C22EF,1,2,101949599,-1,76,0
PFactor=764DD1319D71BA1AE73B4D7C415C22EF,1,2,101949599,-1,2,0
PRP=764DD1319D71BA1AE73B4D7C415C22EF,1,2,101949599,-1,76,0[/code]
[/QUOTE]

Prime95 2020-01-13 02:31

nVidia change coming (pending preda's approval of my last commit).

I've gone through all the nVidia timings posted the last 2 months in an attempt to come up with reasonable default settings for nVidia GPUs. The new defaults will be:

WORKINGIN4 (was WORKINGIN5)
WORKINGOUT4 (was WORKINGOUT3)
T2_SHUFFLE (was T2_SHUFFLE_REVERSELINE)
CARRY64 (was CARRY32)
FANCY_MIDDLEMUL1 (was ORIGINAL_TWEAKED)
LESS_ACCURATE (was MORE_ACCURATE)

The UNROLL_ALL default was not changed

Note FANCY_MIDDLEMUL1 is only implemented for MIDDLE=10,11. Otherwise, the default is ORIGINAL_TWEAKED.

kriesel 2020-01-13 14:01

gpuowl v6.11-104 hang observed on RX550
 
1 Attachment(s)
Gpuowl waiting for gpu, and gpu waiting for something to do? Note it went to almost no gpu ram committed. Spinner motion stopped.
[CODE]2020-01-11 19:27:14 condorella/rx550 90709987 OK 62600000 69.01%; 15528 us/it; ETA 5d 01:15; 3a0d2997b51f9d09 (check 6.36s)
2020-01-11 20:19:05 condorella/rx550 90709987 OK 62800000 69.23%; 15527 us/it; ETA 5d 00:23; 8952aef2e247dec3 (check 6.35s)
2020-01-11 21:10:57 condorella/rx550 90709987 OK 63000000 69.45%; 15527 us/it; ETA 4d 23:31; 5da17c923a0ce57b (check 6.69s)
2020-01-11 22:02:49 condorella/rx550 90709987 OK 63200000 69.67%; 15527 us/it; ETA 4d 22:39; bb3eec63b136a9c6 (check 6.34s)
2020-01-13 07:52:11 config.txt: -device 1 -user kriesel -cpu condorella/rx550 -use NO_ASM,MERGED_MIDDLE,WORKINGIN5,WORKINGOUT2,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_REVERSELINE
2020-01-13 07:52:11 device 1, unique id ''
2020-01-13 07:52:11 condorella/rx550 90709987 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.30 bits/word
2020-01-13 07:52:12 condorella/rx550 OpenCL args "-DEXP=90709987u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xc.fb65b19625858p-3 -DIWEIGHT_STEP=0x9.dc1b382f1df1p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DAMDGPU=1 -DNO_ASM=1 -DMERGED_MIDDLE=1 -DWORKINGIN5=1 -DWORKINGOUT2=1 -DT2_SHUFFLE_HEIGHT=1 -DT2_SHUFFLE_MIDDLE=1 -DT2_SHUFFLE_REVERSELINE=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-01-13 07:52:16 condorella/rx550 OpenCL compilation in 3.31 s
2020-01-13 07:52:23 condorella/rx550 90709987 OK 63200000 loaded: blockSize 400, bb3eec63b136a9c6
2020-01-13 07:52:41 condorella/rx550 90709987 OK 63200800 69.67%; 15355 us/it; ETA 4d 21:20; 8aac70bbc7dd7ca0 (check 6.31s)
[/CODE]Note only 3 and 4MB used indicated in GPU-z, not consistent with usual GPU app activity. The 0 clocks indicated are a known issue with the Win7, GPU-Z, and Windows remote desktop combination in use here. The console was easily terminated and the work restarted in a new console instance.

preda 2020-01-13 21:10

I don't know why this happens, but most likely is something outside of the app itself. On Linux I would look into dmesg (syslog) to see if there is anything logged there by the GPU driver. George reported a similar freeze on Linux.

[QUOTE=kriesel;535030]Gpuowl waiting for gpu, and gpu waiting for something to do? Note it went to almost no gpu ram committed. Spinner motion stopped.
[CODE]2020-01-11 19:27:14 condorella/rx550 90709987 OK 62600000 69.01%; 15528 us/it; ETA 5d 01:15; 3a0d2997b51f9d09 (check 6.36s)
2020-01-11 20:19:05 condorella/rx550 90709987 OK 62800000 69.23%; 15527 us/it; ETA 5d 00:23; 8952aef2e247dec3 (check 6.35s)
2020-01-11 21:10:57 condorella/rx550 90709987 OK 63000000 69.45%; 15527 us/it; ETA 4d 23:31; 5da17c923a0ce57b (check 6.69s)
2020-01-11 22:02:49 condorella/rx550 90709987 OK 63200000 69.67%; 15527 us/it; ETA 4d 22:39; bb3eec63b136a9c6 (check 6.34s)
2020-01-13 07:52:11 config.txt: -device 1 -user kriesel -cpu condorella/rx550 -use NO_ASM,MERGED_MIDDLE,WORKINGIN5,WORKINGOUT2,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_REVERSELINE
2020-01-13 07:52:11 device 1, unique id ''
2020-01-13 07:52:11 condorella/rx550 90709987 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.30 bits/word
2020-01-13 07:52:12 condorella/rx550 OpenCL args "-DEXP=90709987u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xc.fb65b19625858p-3 -DIWEIGHT_STEP=0x9.dc1b382f1df1p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DAMDGPU=1 -DNO_ASM=1 -DMERGED_MIDDLE=1 -DWORKINGIN5=1 -DWORKINGOUT2=1 -DT2_SHUFFLE_HEIGHT=1 -DT2_SHUFFLE_MIDDLE=1 -DT2_SHUFFLE_REVERSELINE=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-01-13 07:52:16 condorella/rx550 OpenCL compilation in 3.31 s
2020-01-13 07:52:23 condorella/rx550 90709987 OK 63200000 loaded: blockSize 400, bb3eec63b136a9c6
2020-01-13 07:52:41 condorella/rx550 90709987 OK 63200800 69.67%; 15355 us/it; ETA 4d 21:20; 8aac70bbc7dd7ca0 (check 6.31s)
[/CODE]Note only 3 and 4MB used indicated in GPU-z, not consistent with usual GPU app activity. The 0 clocks indicated are a known issue with the Win7, GPU-Z, and Windows remote desktop combination in use here. The console was easily terminated and the work restarted in a new console instance.[/QUOTE]

kriesel 2020-01-13 23:13

[QUOTE=preda;535053]I don't know why this happens, but most likely is something outside of the app itself. On Linux I would look into dmesg (syslog) to see if there is anything logged there by the GPU driver. George reported a similar freeze on Linux.[/QUOTE]
Event Viewer, Windows Logs, System,
Event 4101, Display
1/11/2020 10:26:10 PM Display driver amdkmdap stopped responding and has successfully recovered.

This may have been the notorious Windows TDR behavior. Something took too long and Windows thought the driver stopped responding. Or the driver actually did stop responding, and needed restarting.

Apparently if the gpu is reset by Windows, gpuowl will wait indefinitely and has no code for detecting that situation or dealing with it. (It waited more than 31 hours, until I intervened.) An hour timeout on gpuowl's cpu side and resubmit the lost work to the gpu might address this.

This is a known issue on the CUDA side too, not just AMD.
Warning, Source Display, Event ID 4101
Display driver nvlddmkm stopped responding and has successfully recovered.
CUDALucas detects an error condition and exits. Batch wrappers are used to continue on.

ewmayer 2020-01-14 02:34

[Posted similar in the CuLu/nVidia how-to thread]

In looking at the GPU subforum through a n00b user's eyes, it strikes me what a mess it is. I want to be able to get to the best practices for my target GPU/OS in post 1 of a "GPU how-to" thread. This thread has a problem in that regard: Whatever was initially posted in Post #1, as a new user I expect to see either a list of, or link to, a Best Practices guide right there, and to have same updated on a running basis to reflect changes in Best Practices and/or new editions of hardware and software of the particular family covered by the thread.

Here, I see that in post #195 George added some new info, and noted "The "gold standard" instructions in post #76 should be updated" ... well, they never were, and why would they be warehoused in post #76 to begin with? Why not at least edit Post #1 in the thread to reflect that? E.g "We encourage new users to peruse the whole thread, but for a quick best-practices guide, visit Post #76[link]".

kriesel 2020-01-14 03:14

new users
 
"Best practice" like beauty is in the eye of the beholder.

"Information and answers" does seem a likely place for the new user to look. [URL]https://www.mersenneforum.org/forumdisplay.php?f=38[/URL]

There has long been a thread (first sticky thread there) [URL]https://www.mersenneforum.org/showthread.php?t=1534[/URL] specifically for new users. I admit to not finding it when I was a new user, and for considerably after, too.

Uncwilly added a pointer there (post 21) to the book-size collection of reference posts I've assembled. [URL]https://www.mersenneforum.org/showthread.php?t=24607[/URL]

Second sticky thread there is one created to be a pointer to the reference info.


(Ernst, #195 and #76 do not check out.)

M344587487 2020-01-14 09:11

#76 is my short outdated checklist to getting an Ubuntu environment setup: [url]https://www.mersenneforum.org/showpost.php?p=511655&postcount=76[/url]


Someone should make an updated version, it might be me but I'll have to create my setup again to do that as it's been dismantled for a while. There is another quickstart option that might be fun which is to take a fresh install with ROCm etc installed and turn it into a live CD.

kriesel 2020-01-14 11:32

[QUOTE=M344587487;535088]#76 is my short outdated checklist to getting an Ubuntu environment setup: [URL]https://www.mersenneforum.org/showpost.php?p=511655&postcount=76[/URL]


Someone should make an updated version, it might be me but I'll have to create my setup again to do that as it's been dismantled for a while. There is another quickstart option that might be fun which is to take a fresh install with ROCm etc installed and turn it into a live CD.[/QUOTE]Oh, thanks for clarifying that; entirely different thread. I took Ernst's post to mean #76 and 195 in this thread.

ewmayer 2020-01-14 20:00

[QUOTE=kriesel;535092]Oh, thanks for clarifying that; entirely different thread. I took Ernst's post to mean #76 and 195 in this thread.[/QUOTE]

My bad - I was referring to the Radeon 7 thread, which is also GpuOwl-centric, for obvious reasons.


All times are UTC. The time now is 23:12.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.