mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-01-12, 14:47   #1772
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

25·17 Posts
Default

Quote:
Originally Posted by preda View Post
Could you double check whether you actually lost the PRP savefiles? that's higly surprising, because gpuOwl does not delete the content of the past exponents ever, except when using -cleanup (which you aren't using).

So, please track down the exponent on which you were PRP half-way (from gpuowl.log). Next look in the folder for that exponent, you should have the savefiles safely there -- not deleted and not lost.

What I think happened is this: you simply started a new exponent (a different one) from worktodo.txt. The order of worktodo entries changed, and the exponent you were 50% through is still there. Maybe it even has an entry in the worktodo.txt.

An extended excerpt of gpuowl.log would help with understanding what happened.
I wish that were the case, but I believe there was only 1 line in worktodo.txt at the time.

There is a possibility that I messed up and kept the wrong save files because I had a number of folders in that same range with very similar names. That might be a better explanation since it does not appear to be the code. I think rather than trying to track this down, I should try it again. If I can reproduce this I'll let you know.

The worktodo.txt-bak is a mess, partially because I didn't know about the needed newline (AIDs are not real). Gpuowl must have added the duplicate lines:

Code:
PRP=764DD1319D71BA1AE73B4D7C415C22EF,1,2,101949599,-1,76,2PFactor=764DD1319D71BA1AE73B4D7C415C22EF,1,2,101949599,-1,2,0
PRP=764DD1319D71BA1AE73B4D7C415C22EF,1,2,101949599,-1,76,0
PFactor=764DD1319D71BA1AE73B4D7C415C22EF,1,2,101949599,-1,2,0
PRP=764DD1319D71BA1AE73B4D7C415C22EF,1,2,101949599,-1,76,0

Here are the last 20 lines of gpuowl.log:

Code:
2020-01-11 18:58:15 i7-4790 101949599 OK 48200000  47.28%;  893 us/it; ETA 0d 13:20; 52a583a2a885b208 (check 0.53s)
2020-01-11 19:01:15 i7-4790 101949599 OK 48400000  47.47%;  893 us/it; ETA 0d 13:17; 88403b125b19d22a (check 0.53s)
2020-01-11 19:04:14 i7-4790 101949599 OK 48600000  47.67%;  893 us/it; ETA 0d 13:14; 8eb6c84a2f34b07b (check 0.53s)
2020-01-11 19:06:59 i7-4790 Stopping, please wait..
2020-01-11 19:06:59 i7-4790 101949599 OK 48784800  47.85%;  893 us/it; ETA 0d 13:11; e0868a0077e6cd96 (check 0.50s)
2020-01-11 19:06:59 i7-4790 Exiting because "stop requested"
2020-01-11 19:06:59 i7-4790 Bye
2020-01-11 19:33:20 Note: not found 'config.txt'
2020-01-11 19:33:20 config: -device 0 -user pfrakes -cpu i7-4790 -B1 1000000 -B2 32000000 
2020-01-11 19:33:20 device 0, unique id ''
2020-01-11 19:33:20 i7-4790 'worktodo.txt': could not find the line 'PRP=764DD1319D71BA1AE73B4D7C415C22EF,1,2,101949599,-1,76,2' to delete
2020-01-11 19:33:20 i7-4790 101949599 FFT 5632K: Width 256x4, Height 64x4, Middle 11; 17.68 bits/word
2020-01-11 19:33:21 i7-4790 OpenCL args "-DEXP=101949599u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0x1.401bafea92a09p+0 -DIWEIGHT_STEP=0x1.99762c21e62cp-1 -DWEIGHT_BIGSTEP=0x1.306fe0a31b715p+0 -DIWEIGHT_BIGSTEP=0x1.ae89f995ad3adp-1 -DAMDGPU=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-01-11 19:33:22 i7-4790 OpenCL compilation in 1.70 s
2020-01-11 19:33:23 i7-4790 101949599 OK    11200 loaded: blockSize 400, 55afd3a9f362e204
2020-01-11 19:33:24 i7-4790 101949599 OK    12000   0.01%;  882 us/it; ETA 1d 00:58; 4dcb47cf0ec6fab2 (check 0.48s)
2020-01-11 19:33:36 i7-4790 Stopping, please wait..
2020-01-11 19:33:37 i7-4790 101949599 OK    26000   0.03%;  880 us/it; ETA 1d 00:55; 26a557d2e852e785 (check 0.48s)
2020-01-11 19:33:37 i7-4790 Exiting because "stop requested"
2020-01-11 19:33:37 i7-4790 Bye

Last fiddled with by PhilF on 2020-01-12 at 14:55 Reason: Added worktodo.txt-bak
PhilF is offline   Reply With Quote
Old 2020-01-12, 20:11   #1773
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2×3×13×17 Posts
Default

Thank you for the info. In the meantime I added detection of the missing newline at the end of worktodo.txt, presumably the merged lines you saw should not happen anymore.

Quote:
Originally Posted by PhilF View Post
The worktodo.txt-bak is a mess, partially because I didn't know about the needed newline (AIDs are not real). Gpuowl must have added the duplicate lines:

Code:
PRP=764DD1319D71BA1AE73B4D7C415C22EF,1,2,101949599,-1,76,2PFactor=764DD1319D71BA1AE73B4D7C415C22EF,1,2,101949599,-1,2,0
PRP=764DD1319D71BA1AE73B4D7C415C22EF,1,2,101949599,-1,76,0
PFactor=764DD1319D71BA1AE73B4D7C415C22EF,1,2,101949599,-1,2,0
PRP=764DD1319D71BA1AE73B4D7C415C22EF,1,2,101949599,-1,76,0
preda is offline   Reply With Quote
Old 2020-01-13, 02:31   #1774
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2×3×1,193 Posts
Default

nVidia change coming (pending preda's approval of my last commit).

I've gone through all the nVidia timings posted the last 2 months in an attempt to come up with reasonable default settings for nVidia GPUs. The new defaults will be:

WORKINGIN4 (was WORKINGIN5)
WORKINGOUT4 (was WORKINGOUT3)
T2_SHUFFLE (was T2_SHUFFLE_REVERSELINE)
CARRY64 (was CARRY32)
FANCY_MIDDLEMUL1 (was ORIGINAL_TWEAKED)
LESS_ACCURATE (was MORE_ACCURATE)

The UNROLL_ALL default was not changed

Note FANCY_MIDDLEMUL1 is only implemented for MIDDLE=10,11. Otherwise, the default is ORIGINAL_TWEAKED.
Prime95 is offline   Reply With Quote
Old 2020-01-13, 14:01   #1775
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10010010110112 Posts
Default gpuowl v6.11-104 hang observed on RX550

Gpuowl waiting for gpu, and gpu waiting for something to do? Note it went to almost no gpu ram committed. Spinner motion stopped.
Code:
2020-01-11 19:27:14 condorella/rx550 90709987 OK 62600000  69.01%; 15528 us/it; ETA 5d 01:15; 3a0d2997b51f9d09 (check 6.36s)
2020-01-11 20:19:05 condorella/rx550 90709987 OK 62800000  69.23%; 15527 us/it; ETA 5d 00:23; 8952aef2e247dec3 (check 6.35s)
2020-01-11 21:10:57 condorella/rx550 90709987 OK 63000000  69.45%; 15527 us/it; ETA 4d 23:31; 5da17c923a0ce57b (check 6.69s)
2020-01-11 22:02:49 condorella/rx550 90709987 OK 63200000  69.67%; 15527 us/it; ETA 4d 22:39; bb3eec63b136a9c6 (check 6.34s)
2020-01-13 07:52:11 config.txt: -device 1 -user kriesel -cpu condorella/rx550 -use NO_ASM,MERGED_MIDDLE,WORKINGIN5,WORKINGOUT2,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_REVERSELINE
2020-01-13 07:52:11 device 1, unique id ''
2020-01-13 07:52:11 condorella/rx550 90709987 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.30 bits/word
2020-01-13 07:52:12 condorella/rx550 OpenCL args "-DEXP=90709987u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xc.fb65b19625858p-3 -DIWEIGHT_STEP=0x9.dc1b382f1df1p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DAMDGPU=1 -DNO_ASM=1 -DMERGED_MIDDLE=1 -DWORKINGIN5=1 -DWORKINGOUT2=1 -DT2_SHUFFLE_HEIGHT=1 -DT2_SHUFFLE_MIDDLE=1 -DT2_SHUFFLE_REVERSELINE=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-01-13 07:52:16 condorella/rx550 OpenCL compilation in 3.31 s
2020-01-13 07:52:23 condorella/rx550 90709987 OK 63200000 loaded: blockSize 400, bb3eec63b136a9c6
2020-01-13 07:52:41 condorella/rx550 90709987 OK 63200800  69.67%; 15355 us/it; ETA 4d 21:20; 8aac70bbc7dd7ca0 (check 6.31s)
Note only 3 and 4MB used indicated in GPU-z, not consistent with usual GPU app activity. The 0 clocks indicated are a known issue with the Win7, GPU-Z, and Windows remote desktop combination in use here. The console was easily terminated and the work restarted in a new console instance.
Attached Thumbnails
Click image for larger version

Name:	gpuowl611-104hung-on-rx550.png
Views:	27
Size:	136.2 KB
ID:	21610  

Last fiddled with by kriesel on 2020-01-13 at 14:06
kriesel is offline   Reply With Quote
Old 2020-01-13, 21:10   #1776
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2·3·13·17 Posts
Default

I don't know why this happens, but most likely is something outside of the app itself. On Linux I would look into dmesg (syslog) to see if there is anything logged there by the GPU driver. George reported a similar freeze on Linux.

Quote:
Originally Posted by kriesel View Post
Gpuowl waiting for gpu, and gpu waiting for something to do? Note it went to almost no gpu ram committed. Spinner motion stopped.
Code:
2020-01-11 19:27:14 condorella/rx550 90709987 OK 62600000  69.01%; 15528 us/it; ETA 5d 01:15; 3a0d2997b51f9d09 (check 6.36s)
2020-01-11 20:19:05 condorella/rx550 90709987 OK 62800000  69.23%; 15527 us/it; ETA 5d 00:23; 8952aef2e247dec3 (check 6.35s)
2020-01-11 21:10:57 condorella/rx550 90709987 OK 63000000  69.45%; 15527 us/it; ETA 4d 23:31; 5da17c923a0ce57b (check 6.69s)
2020-01-11 22:02:49 condorella/rx550 90709987 OK 63200000  69.67%; 15527 us/it; ETA 4d 22:39; bb3eec63b136a9c6 (check 6.34s)
2020-01-13 07:52:11 config.txt: -device 1 -user kriesel -cpu condorella/rx550 -use NO_ASM,MERGED_MIDDLE,WORKINGIN5,WORKINGOUT2,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_REVERSELINE
2020-01-13 07:52:11 device 1, unique id ''
2020-01-13 07:52:11 condorella/rx550 90709987 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.30 bits/word
2020-01-13 07:52:12 condorella/rx550 OpenCL args "-DEXP=90709987u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xc.fb65b19625858p-3 -DIWEIGHT_STEP=0x9.dc1b382f1df1p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DAMDGPU=1 -DNO_ASM=1 -DMERGED_MIDDLE=1 -DWORKINGIN5=1 -DWORKINGOUT2=1 -DT2_SHUFFLE_HEIGHT=1 -DT2_SHUFFLE_MIDDLE=1 -DT2_SHUFFLE_REVERSELINE=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-01-13 07:52:16 condorella/rx550 OpenCL compilation in 3.31 s
2020-01-13 07:52:23 condorella/rx550 90709987 OK 63200000 loaded: blockSize 400, bb3eec63b136a9c6
2020-01-13 07:52:41 condorella/rx550 90709987 OK 63200800  69.67%; 15355 us/it; ETA 4d 21:20; 8aac70bbc7dd7ca0 (check 6.31s)
Note only 3 and 4MB used indicated in GPU-z, not consistent with usual GPU app activity. The 0 clocks indicated are a known issue with the Win7, GPU-Z, and Windows remote desktop combination in use here. The console was easily terminated and the work restarted in a new console instance.
preda is offline   Reply With Quote
Old 2020-01-13, 23:13   #1777
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

111338 Posts
Default

Quote:
Originally Posted by preda View Post
I don't know why this happens, but most likely is something outside of the app itself. On Linux I would look into dmesg (syslog) to see if there is anything logged there by the GPU driver. George reported a similar freeze on Linux.
Event Viewer, Windows Logs, System,
Event 4101, Display
1/11/2020 10:26:10 PM Display driver amdkmdap stopped responding and has successfully recovered.

This may have been the notorious Windows TDR behavior. Something took too long and Windows thought the driver stopped responding. Or the driver actually did stop responding, and needed restarting.

Apparently if the gpu is reset by Windows, gpuowl will wait indefinitely and has no code for detecting that situation or dealing with it. (It waited more than 31 hours, until I intervened.) An hour timeout on gpuowl's cpu side and resubmit the lost work to the gpu might address this.

This is a known issue on the CUDA side too, not just AMD.
Warning, Source Display, Event ID 4101
Display driver nvlddmkm stopped responding and has successfully recovered.
CUDALucas detects an error condition and exits. Batch wrappers are used to continue on.

Last fiddled with by kriesel on 2020-01-13 at 23:26
kriesel is offline   Reply With Quote
Old 2020-01-14, 02:34   #1778
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

11×19×47 Posts
Default

[Posted similar in the CuLu/nVidia how-to thread]

In looking at the GPU subforum through a n00b user's eyes, it strikes me what a mess it is. I want to be able to get to the best practices for my target GPU/OS in post 1 of a "GPU how-to" thread. This thread has a problem in that regard: Whatever was initially posted in Post #1, as a new user I expect to see either a list of, or link to, a Best Practices guide right there, and to have same updated on a running basis to reflect changes in Best Practices and/or new editions of hardware and software of the particular family covered by the thread.

Here, I see that in post #195 George added some new info, and noted "The "gold standard" instructions in post #76 should be updated" ... well, they never were, and why would they be warehoused in post #76 to begin with? Why not at least edit Post #1 in the thread to reflect that? E.g "We encourage new users to peruse the whole thread, but for a quick best-practices guide, visit Post #76[link]".
ewmayer is online now   Reply With Quote
Old 2020-01-14, 03:14   #1779
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

37×127 Posts
Default new users

"Best practice" like beauty is in the eye of the beholder.

"Information and answers" does seem a likely place for the new user to look. https://www.mersenneforum.org/forumdisplay.php?f=38

There has long been a thread (first sticky thread there) https://www.mersenneforum.org/showthread.php?t=1534 specifically for new users. I admit to not finding it when I was a new user, and for considerably after, too.

Uncwilly added a pointer there (post 21) to the book-size collection of reference posts I've assembled. https://www.mersenneforum.org/showthread.php?t=24607

Second sticky thread there is one created to be a pointer to the reference info.


(Ernst, #195 and #76 do not check out.)

Last fiddled with by kriesel on 2020-01-14 at 03:17
kriesel is offline   Reply With Quote
Old 2020-01-14, 09:11   #1780
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2B816 Posts
Default

#76 is my short outdated checklist to getting an Ubuntu environment setup: https://www.mersenneforum.org/showpo...5&postcount=76


Someone should make an updated version, it might be me but I'll have to create my setup again to do that as it's been dismantled for a while. There is another quickstart option that might be fun which is to take a fresh install with ROCm etc installed and turn it into a live CD.
M344587487 is offline   Reply With Quote
Old 2020-01-14, 11:32   #1781
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10010010110112 Posts
Default

Quote:
Originally Posted by M344587487 View Post
#76 is my short outdated checklist to getting an Ubuntu environment setup: https://www.mersenneforum.org/showpo...5&postcount=76


Someone should make an updated version, it might be me but I'll have to create my setup again to do that as it's been dismantled for a while. There is another quickstart option that might be fun which is to take a fresh install with ROCm etc installed and turn it into a live CD.
Oh, thanks for clarifying that; entirely different thread. I took Ernst's post to mean #76 and 195 in this thread.
kriesel is offline   Reply With Quote
Old 2020-01-14, 20:00   #1782
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

11·19·47 Posts
Default

Quote:
Originally Posted by kriesel View Post
Oh, thanks for clarifying that; entirely different thread. I took Ernst's post to mean #76 and 195 in this thread.
My bad - I was referring to the Radeon 7 thread, which is also GpuOwl-centric, for obvious reasons.
ewmayer is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1657 2020-10-27 01:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 21:29.

Mon Nov 23 21:29:03 UTC 2020 up 74 days, 18:40, 4 users, load averages: 2.56, 2.42, 2.43

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.