![]() |
![]() |
#1 |
Jun 2019
Ipswich, MA
22·3 Posts |
![]()
I've been running GPU Owl for many months now on six RX 5700 XT cards. Suddenly today, none will run, and I get this:
2020-09-22 00:21:41 config: -device 0 -user Xebecer -cpu Birlinn_Oban_GPU0 -cleanup 2020-09-22 00:21:41 config: -d 0 2020-09-22 00:21:41 device 0, unique id '' 2020-09-22 00:21:41 Birlinn_Oban_GPU0 108996493 FFT: 6M 1K:12:256 (17.32 bpw) 2020-09-22 00:21:41 Birlinn_Oban_GPU0 Expected maximum carry32: 2E490000 2020-09-22 00:21:41 Birlinn_Oban_GPU0 Exception gpu_error: clGetPlatformIDs(16, platforms, (unsigned *) &nPlatforms) at clwrap.cpp:71 getDeviceIDs 2020-09-22 00:21:41 Birlinn_Oban_GPU0 Bye Last fiddled with by Xebecer on 2020-09-22 at 04:38 |
![]() |
![]() |
![]() |
#2 |
Sep 2002
Database er0rr
5×29×31 Posts |
![]()
Are you using Linux or Windows? Has the OS upgraded recently?
|
![]() |
![]() |
![]() |
#3 |
"/X\(‘-‘)/X\"
Jan 2013
310610 Posts |
![]()
Have you tried rebooting? Or power cycling your machine?
|
![]() |
![]() |
![]() |
#4 |
"Mihai Preda"
Apr 2015
5·172 Posts |
![]() |
![]() |
![]() |
![]() |
#5 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
1CC616 Posts |
![]()
Any OpenCL devices listed in gpuowl -h output, after the options, before the fft lengths?
Any other OpenCL utility able to detect your gpus? (Gpu-z, rocm-smi, etc, depending on OS) |
![]() |
![]() |
![]() |
#6 |
Sep 2002
Database er0rr
5·29·31 Posts |
![]()
I ran apt-get update (Debian) and the system froze while installing ROCm-3.8.0 with the message "building initial module for 4.19-0-9-amd64". Rebooted. I had to do dpkg --configure -a (??) and it then updated the module for 3.19-0-10-am64. Rebooted.
I then recompiled gpuOwl against ROCm-3.8.0. I got my 2 instance of gpuOwl running. Now preda's "pp.sh" script will not work and so I can't overclock the RAM. Last fiddled with by paulunderwood on 2020-09-22 at 07:29 |
![]() |
![]() |
![]() |
#7 | |
"Mihai Preda"
Apr 2015
5·172 Posts |
![]() Quote:
I opened an issue about powerplay: https://github.com/RadeonOpenCompute/ROCm/issues/1228 Last fiddled with by preda on 2020-09-22 at 09:22 |
|
![]() |
![]() |
![]() |
#8 | |
Sep 2002
Database er0rr
449510 Posts |
![]() Quote:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amdgpu.ppfeaturemask=0xffffffff" Code:
linux /boot/vmlinuz-4.19.0-10-amd64 root=UUID=f93eeec4-4134-4e79-b5c7-019d1dbc1ab2 ro quiet amdgpu.ppfeaturemask=0xffffffff Code:
rocm=/opt/rocm-3.8.0/bin/rocm-smi pp() { echo $* cd /sys/class/drm/card$1/device echo "m 1 $2" > pp_od_clk_voltage echo "vc 1 1304 $3" > pp_od_clk_voltage echo "vc 2 1801 $4" > pp_od_clk_voltage echo c > pp_od_clk_voltage $rocm -d$1 --setsclk $5 } pp 0 1175 820 1050 3 Here is the content of that file pp_od_clk_voltage: Code:
OD_SCLK: 0: 808Mhz 1: 1801Mhz OD_MCLK: 1: 1175Mhz OD_VDDC_CURVE: 0: 808Mhz 715mV 1: 1304Mhz 826mV 2: 1801Mhz 1138mV OD_RANGE: SCLK: 808Mhz 2200Mhz MCLK: 800Mhz 1200Mhz VDDC_CURVE_SCLK[0]: 808Mhz 2200Mhz VDDC_CURVE_VOLT[0]: 738mV 1218mV VDDC_CURVE_SCLK[1]: 808Mhz 2200Mhz VDDC_CURVE_VOLT[1]: 738mV 1218mV VDDC_CURVE_SCLK[2]: 808Mhz 2200Mhz VDDC_CURVE_VOLT[2]: 738mV 1218mV Last fiddled with by paulunderwood on 2020-09-22 at 10:32 |
|
![]() |
![]() |
![]() |
#9 | |
"Mihai Preda"
Apr 2015
5·172 Posts |
![]() Quote:
rocm-smi --setmclk 2 rocm-smi --autorespond y --setmemoverdrive 10 If it hangs on memoverdrive, you should check "dmesg" sometimes somethig informative can be there at the end. |
|
![]() |
![]() |
![]() |
#10 |
Jul 2009
Germany
11·61 Posts |
![]()
This error message is exactly the same as the error message that you receive when you no longer have a GPU runtime available in Google.Colab. So no GPU was just found. Maybe it helps to reinstall the gpu/ROCm drivers?
Last fiddled with by moebius on 2020-09-22 at 21:13 |
![]() |
![]() |
![]() |
#11 |
Jun 2019
Ipswich, MA
C16 Posts |
![]()
I forced the 'Insider Ring' update to Windows, and all is well. But, that was weird.
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Running fstrim on SSD while mprime is running might cause errors in mprime | AwesomeMachine | Software | 4 | 2021-10-07 23:49 |
Lap Top Suddenly 1/4 speed. | petrw1 | Hardware | 35 | 2015-11-07 11:36 |
Suddenly I'm getting only trivial TF tests | fivemack | Software | 34 | 2015-10-25 16:54 |
V27.9 interation time suddenly doubled | scubabob | Software | 2 | 2014-01-24 16:27 |
Running other programs while running Prime95. | Neimanator | PrimeNet | 14 | 2013-08-10 20:15 |