![]() |
|
|
#78 |
|
∂2ω=0
Sep 2002
República de California
22·2,939 Posts |
Ok, first gpuowl issue - my Haswell system has always been notoriously unstable, I get the Linux equivalent of BSOD ~2x per week, no overclocking, either. Just did a quick before-going-to-bed check, found it had done so sometime in last few hours. On reboot, starting my Mlucas job on the CPU was no problem, but trying to restart gpuowl (from within the run0 dir I created within the main gpuowl dir) hits this - file list shown at end:
Code:
ewmayer@ewmayer-haswell:~/gpuowl/run0$ ../gpuowl 2020-02-04 22:37:23 gpuowl v6.11-142-gf54af2e 2020-02-04 22:37:23 Note: not found 'config.txt' 2020-02-04 22:37:23 device 0, unique id '' 2020-02-04 22:37:24 gfx906+sram-ecc-0 103984877 FFT 5632K: Width 256x4, Height 64x4, Middle 11; 18.03 bits/word 2020-02-04 22:37:25 gfx906+sram-ecc-0 OpenCL args "-DEXP=103984877u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0x1.f54acc23489eep+0 -DIWEIGHT_STEP=0x1.0577e0c0e09e4p-1 -DWEIGHT_BIGSTEP=0x1.ae89f995ad3adp+0 -DIWEIGHT_BIGSTEP=0x1.306fe0a31b715p-1 -DAMDGPU=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 1 warning generated. 2020-02-04 22:37:29 gfx906+sram-ecc-0 warning: argument unused during compilation: '-I .' 2020-02-04 22:37:29 gfx906+sram-ecc-0 OpenCL compilation in 3.90 s 2020-02-04 22:37:29 gfx906+sram-ecc-0 '/home/ewmayer/gpuowl/run0/103984877/103984877.owl' invalid 2020-02-04 22:37:30 gfx906+sram-ecc-0 103984877 OK 35000000 loaded: blockSize 400, 2c0ebcb44118e8be 2020-02-04 22:37:31 gfx906+sram-ecc-0 Can't open '/home/ewmayer/gpuowl/run0/103984877/103984877-new.owl' (mode 'wb') 2020-02-04 22:37:31 gfx906+sram-ecc-0 Exception NSt10filesystem7__cxx1116filesystem_errorE: filesystem error: can't open file: Success [/home/ewmayer/gpuowl/run0/103984877/103984877-new.owl] 2020-02-04 22:37:31 gfx906+sram-ecc-0 Bye ewmayer@ewmayer-haswell:~/gpuowl/run0$ ll total 80 drwxr-xr-x 3 ewmayer ewmayer 4096 Feb 4 14:41 ./ drwxr-xr-x 8 ewmayer ewmayer 4096 Feb 3 15:40 ../ drwxr-xr-x 2 root root 4096 Feb 4 22:28 103984877/ -rw-r--r-- 1 ewmayer ewmayer 45684 Feb 4 22:37 gpuowl.log -rw-r--r-- 1 ewmayer ewmayer 301 Feb 4 14:44 results.txt -rw-r--r-- 1 root root 181 Feb 4 14:41 worktodo.txt -rw-r--r-- 1 root root 244 Feb 4 13:58 worktodo.txt-bak ewmayer@ewmayer-haswell:~/gpuowl/run0$ ll 103984877/ total 128216 drwxr-xr-x 2 root root 4096 Feb 4 22:28 ./ drwxr-xr-x 3 ewmayer ewmayer 4096 Feb 4 14:41 ../ -rw-r--r-- 1 root root 12998165 Feb 4 22:26 103984877-old.owl -rw-r--r-- 1 root root 12998155 Feb 4 14:17 103984877-old.p1.owl -rw-r--r-- 1 root root 46137398 Feb 4 14:38 103984877-old.p2.owl -rw-r--r-- 1 root root 0 Feb 4 22:28 103984877.owl -rw-r--r-- 1 root root 12998155 Feb 4 14:18 103984877.p1.owl -rw-r--r-- 1 root root 46137398 Feb 4 14:40 103984877.p2.owl Code:
ewmayer@ewmayer-haswell:~/gpuowl/run0$ ../gpuowl 2020-02-04 22:52:21 gpuowl v6.11-142-gf54af2e 2020-02-04 22:52:21 Note: not found 'config.txt' 2020-02-04 22:52:21 device 0, unique id '' 2020-02-04 22:52:21 gfx906+sram-ecc-0 103984877 FFT 5632K: Width 256x4, Height 64x4, Middle 11; 18.03 bits/word 2020-02-04 22:52:22 gfx906+sram-ecc-0 OpenCL args "-DEXP=103984877u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0x1.f54acc23489eep+0 -DIWEIGHT_STEP=0x1.0577e0c0e09e4p-1 -DWEIGHT_BIGSTEP=0x1.ae89f995ad3adp+0 -DIWEIGHT_BIGSTEP=0x1.306fe0a31b715p-1 -DAMDGPU=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 1 warning generated. 2020-02-04 22:52:26 gfx906+sram-ecc-0 warning: argument unused during compilation: '-I .' 2020-02-04 22:52:26 gfx906+sram-ecc-0 OpenCL compilation in 3.80 s 2020-02-04 22:52:26 gfx906+sram-ecc-0 103984877 OK 35000000 loaded: blockSize 400, 2c0ebcb44118e8be 2020-02-04 22:52:27 gfx906+sram-ecc-0 Can't open '/home/ewmayer/gpuowl/run0/103984877/103984877-new.owl' (mode 'wb') 2020-02-04 22:52:27 gfx906+sram-ecc-0 Exception NSt10filesystem7__cxx1116filesystem_errorE: filesystem error: can't open file: Success [/home/ewmayer/gpuowl/run0/103984877/103984877-new.owl] 2020-02-04 22:52:27 gfx906+sram-ecc-0 Bye Code:
ewmayer@ewmayer-haswell:~/gpuowl/run0$ ../gpuowl 2020-02-04 22:53:31 gpuowl v6.11-142-gf54af2e 2020-02-04 22:53:31 Note: not found 'config.txt' 2020-02-04 22:53:31 device 0, unique id '' 2020-02-04 22:53:32 gfx906+sram-ecc-0 103984877 FFT 5632K: Width 256x4, Height 64x4, Middle 11; 18.03 bits/word 2020-02-04 22:53:33 gfx906+sram-ecc-0 OpenCL args "-DEXP=103984877u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DWEIGHT_STEP=0x1.f54acc23489eep+0 -DIWEIGHT_STEP=0x1.0577e0c0e09e4p-1 -DWEIGHT_BIGSTEP=0x1.ae89f995ad3adp+0 -DIWEIGHT_BIGSTEP=0x1.306fe0a31b715p-1 -DAMDGPU=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 1 warning generated. 2020-02-04 22:53:36 gfx906+sram-ecc-0 warning: argument unused during compilation: '-I .' 2020-02-04 22:53:36 gfx906+sram-ecc-0 OpenCL compilation in 3.76 s 2020-02-04 22:53:37 gfx906+sram-ecc-0 103984877 OK 35000000 loaded: blockSize 400, 2c0ebcb44118e8be 2020-02-04 22:53:38 gfx906+sram-ecc-0 Can't open '/home/ewmayer/gpuowl/run0/103984877/103984877-new.owl' (mode 'wb') 2020-02-04 22:53:38 gfx906+sram-ecc-0 Exception NSt10filesystem7__cxx1116filesystem_errorE: filesystem error: can't open file: Success [/home/ewmayer/gpuowl/run0/103984877/103984877-new.owl] 2020-02-04 22:53:38 gfx906+sram-ecc-0 Bye In the meantime I simply deleted the current entry from worktodo.txt and restarted gpuowl on the next one. |
|
|
|
|
|
#79 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24×3×163 Posts |
Quote:
Welcome to the gpu side. Old hardware with hefty power supplies increasingly look like homes just begging for fast gpus. |
|
|
|
|
|
|
#80 |
|
"Mihai Preda"
Apr 2015
22×3×112 Posts |
That error message wants to say that gpuowl intends to *create* the file <n>-new.owl (to write to it a new checkpoint), and of course it's a fatal error if it can't do so. Why it can't create the file? maybe disk full, maybe wrong rights on the folder, maybe something else? Can you manually write to that path? as the same user as gpuowl?
It seems the owner of the folder /home/ewmayer/gpuowl/run0/103984877/ is root. Last fiddled with by preda on 2020-02-05 at 07:58 |
|
|
|
|
|
#81 | |
|
"Mihai Preda"
Apr 2015
22×3×112 Posts |
Quote:
./gpuowl options > /dev/null Or, nohup will also redirect output to a file and keep the background process running after shell close: nohup ./gpuowl options & |
|
|
|
|
|
|
#82 | ||
|
"Mihai Preda"
Apr 2015
22·3·112 Posts |
I think the max sclk is 7, that being the default too. The card can't run for any amount of time on that sclk though due to overheating, thus thermally throttles *a lot* until it cools down, after which it speeds up again etc in an inefficient see-saw pattern.
While running PRP you could proceed to memory overclock tuning, usually 1150 is safe and can go up to 1180 or 1200. In general you want at least 24h without errors as validation. I usually run at sclk 3 or lower, but never more than 4. Quote:
Quote:
Last fiddled with by preda on 2020-02-05 at 08:15 |
||
|
|
|
|
|
#83 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
1E9016 Posts |
Attempts to redirect with append by >> on Google Colab, which is linux VMs, did not work, for background tasks, so that the VM could be monitored with top in the foreground.
Last fiddled with by kriesel on 2020-02-05 at 08:27 |
|
|
|
|
|
#84 | ||||
|
∂2ω=0
Sep 2002
República de California
22·2,939 Posts |
Quote:
So woke up this a.m., fan noise from the system was suspiciously quiet ... no crash, just the 'backup run' of the next assignment in the worktodo file quit due to p-1 stage 2 finding a factor. And I'd neglected to add more assignments to pad the worktodo file. Grr. Anyhow, as root, I restored the 1*7 files-dir to its post-system-crash state, a valid-looking <n>-old.owl file, an empty <n>.owl file, and no <n>-new.owl file, then chown'ed the ownership to me-as-regular-user, restored the worktodo entry, and restarted ... still same error trying to create <n>-new.owl. But then saw that I'd forgotten to change the group of the files in question from root to me (i.e. my 'chown ewmayer *' should've been 'chown ewmayer:ewmayer *', so used 'sudo chgrp *' (equivalent to 'chown :ewmayer *') to do that, now restart is successful. Thanks for the help. Quote:
Quote:
Quote:
Last fiddled with by ewmayer on 2020-02-05 at 20:41 |
||||
|
|
|
|
|
#85 |
|
"Composite as Heck"
Oct 2017
2·52·19 Posts |
IIRC the default temp target maintained by variable fan speed is 95 C and the "oh dear" territory is 105 C, I suggest anything lower than 90 C, depends how much tolerance you have for noise and wear and tear on the fans.
|
|
|
|
|
|
#86 | |
|
∂2ω=0
Sep 2002
República de California
22×2,939 Posts |
Quote:
Oh, Matt - do you agree with Preda's comment that single-job running with appropriately tuned fan and memclock settings now gives total throughput similar to the 2-job running your script sets up for? And would it be worthwhile updating your setup-guide post to reflect some of the issues I hit with my setup under Ubuntu 19.10? Specifically: o Recent versions of GpuOwl need libgmp-dev to be installed; o I needed to manually removed a bunch of nVidia package crud to get the system to properly recognize the R7; o ROCm 3.0 breaks OpenCL, so if that is the current version shipping with one's distro, it needs to be reverted to 2.10 (or perhaps fiddle the pkg-install notes to get the latter from the start); o If single-job running can now be done at more or less the same total throughput as 2-job, that part of the setup guide can be simplified. Last fiddled with by ewmayer on 2020-02-05 at 22:10 |
|
|
|
|
|
|
#87 | ||
|
"Composite as Heck"
Oct 2017
2×52×19 Posts |
Quote:
Quote:
|
||
|
|
|
|
|
#88 | |
|
"Mihai Preda"
Apr 2015
22×3×112 Posts |
Quote:
Anyway, maybe you could try something like: --setmlevel 1 1150 and see if that has an effect on performance (expected: increase in perf) and on power (expected: small increase in power). My Gpus usually run at under 85C. I think a max safe temperature is 102-105. Anyway in the region above 100 the GPU throttles, so I would try to keep it under 97 to avoid thermal-throttling. (the values above are for the "junction" temperature, which is the highest value of the three (edge, junction, mem)). The default fan curve keeps the GPU too hot, so I set a higher manual fan speed. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| AMD Radeon Pro WX 3200 | ET_ | GPU Computing | 1 | 2019-07-04 11:02 |
| Radeon Pro Vega II Duo (look at this monster) | M344587487 | GPU Computing | 10 | 2019-06-18 14:00 |
| What's the best project to run on a Radeon RX 480? | jasong | GPU Computing | 0 | 2016-11-09 04:32 |
| Radeon Pro Duo | 0PolarBearsHere | GPU Computing | 0 | 2016-03-15 01:32 |
| AMD Radeon R9 295X2 | firejuggler | GPU Computing | 33 | 2014-09-03 21:42 |