![]() |
GPU Owl suddenly not running
I've been running GPU Owl for many months now on six RX 5700 XT cards. Suddenly today, none will run, and I get this:
2020-09-22 00:21:41 config: -device 0 -user Xebecer -cpu Birlinn_Oban_GPU0 -cleanup 2020-09-22 00:21:41 config: -d 0 2020-09-22 00:21:41 device 0, unique id '' 2020-09-22 00:21:41 Birlinn_Oban_GPU0 108996493 FFT: 6M 1K:12:256 (17.32 bpw) 2020-09-22 00:21:41 Birlinn_Oban_GPU0 Expected maximum carry32: 2E490000 2020-09-22 00:21:41 Birlinn_Oban_GPU0 Exception gpu_error: clGetPlatformIDs(16, platforms, (unsigned *) &nPlatforms) at clwrap.cpp:71 getDeviceIDs 2020-09-22 00:21:41 Birlinn_Oban_GPU0 Bye |
Are you using Linux or Windows? Has the OS upgraded recently?
|
Have you tried rebooting? Or power cycling your machine?
|
[QUOTE=Xebecer;557524]
2020-09-22 00:21:41 Birlinn_Oban_GPU0 Exception gpu_error: clGetPlatformIDs(16, platforms, (unsigned *) &nPlatforms) at clwrap.cpp:71 getDeviceIDs [/QUOTE] Does "clinfo" work? Did your system update recently? upgraded something? Are you running ROCm? did ROCm update? |
Any OpenCL devices listed in gpuowl -h output, after the options, before the fft lengths?
Any other OpenCL utility able to detect your gpus? (Gpu-z, rocm-smi, etc, depending on OS) |
I ran apt-get update (Debian) and the system froze while installing ROCm-3.8.0 with the message "building initial module for 4.19-0-9-amd64". Rebooted. I had to do dpkg --configure -a (??) and it then updated the module for 3.19-0-10-am64. Rebooted.
I then recompiled gpuOwl against ROCm-3.8.0. I got my 2 instance of gpuOwl running. Now preda's "pp.sh" script will not work and so I can't overclock the RAM. |
[QUOTE=paulunderwood;557534]I ran apt-get update (Debian) and the system froze while installing ROCm-3.8.0 with the message "building initial module for 4.19-0-9-amd64". Rebooted. I had to do dpkg --configure -a (??) and it then updated the module for 3.19-0-10-am64. Rebooted.
I then recompiled gpuOwl against ROCm-3.8.0. I got my 2 instance of gpuOwl running. Now preda's "pp.sh" script will not work and so I can't overclock the RAM.[/QUOTE] How does the ROCm 3.8 performance look like? I understand that you can't compare directly because powerplay ain't working anymore.. I opened an issue about powerplay: [url]https://github.com/RadeonOpenCompute/ROCm/issues/1228[/url] |
[QUOTE=preda;557537]How does the ROCm 3.8 performance look like? I understand that you can't compare directly because powerplay ain't working anymore..
I opened an issue about powerplay: [url]https://github.com/RadeonOpenCompute/ROCm/issues/1228[/url][/QUOTE] Here are parts of my /etc/default/grub [CODE]GRUB_CMDLINE_LINUX_DEFAULT="quiet amdgpu.ppfeaturemask=0xffffffff"[/CODE] and /boot/grub/grub.cfg [CODE]linux /boot/vmlinuz-4.19.0-10-amd64 root=UUID=f93eeec4-4134-4e79-b5c7-019d1dbc1ab2 ro quiet amdgpu.ppfeaturemask=0xffffffff [/CODE] and pp.sh: [CODE]rocm=/opt/rocm-3.8.0/bin/rocm-smi pp() { echo $* cd /sys/class/drm/card$1/device echo "m 1 $2" > pp_od_clk_voltage echo "vc 1 1304 $3" > pp_od_clk_voltage echo "vc 2 1801 $4" > pp_od_clk_voltage echo c > pp_od_clk_voltage $rocm -d$1 --setsclk $5 } pp 0 1175 820 1050 3 [/CODE] I tried running the commands one by one and it just hangs at [C]echo "m 1 1175" > pp_od_clk_voltage[/C]. Here is the content of that file pp_od_clk_voltage: [code] OD_SCLK: 0: 808Mhz 1: 1801Mhz OD_MCLK: 1: 1175Mhz OD_VDDC_CURVE: 0: 808Mhz 715mV 1: 1304Mhz 826mV 2: 1801Mhz 1138mV OD_RANGE: SCLK: 808Mhz 2200Mhz MCLK: 800Mhz 1200Mhz VDDC_CURVE_SCLK[0]: 808Mhz 2200Mhz VDDC_CURVE_VOLT[0]: 738mV 1218mV VDDC_CURVE_SCLK[1]: 808Mhz 2200Mhz VDDC_CURVE_VOLT[1]: 738mV 1218mV VDDC_CURVE_SCLK[2]: 808Mhz 2200Mhz VDDC_CURVE_VOLT[2]: 738mV 1218mV [/code] I just ran [C]echo c > pp_od_clk_voltage[/C] and got the overclock but not the voltage drop (I think). |
[QUOTE=paulunderwood;557540]
I tried running the commands one by one and it just hangs at [C]echo "m 1 1175" > pp_od_clk_voltage[/C]. [/QUOTE] Try to use the rocm-smi script to set the RAM frequency, I'm curious whether that works. Something along the lines of: rocm-smi --setmclk 2 rocm-smi --autorespond y --setmemoverdrive 10 If it hangs on memoverdrive, you should check "dmesg" sometimes somethig informative can be there at the end. |
[QUOTE=Xebecer;557524]
2020-09-22 00:21:41 Birlinn_Oban_GPU0 Exception gpu_error: clGetPlatformIDs(16, platforms, (unsigned *) &nPlatforms) at clwrap.cpp:71 getDeviceIDs 2020-09-22 00:21:41 Birlinn_Oban_GPU0 Bye[/QUOTE] This error message is exactly the same as the error message that you receive when you no longer have a GPU runtime available in Google.Colab. So no GPU was just found. Maybe it helps to reinstall the gpu/ROCm drivers? |
I forced the 'Insider Ring' update to Windows, and all is well. But, that was weird.
|
| All times are UTC. The time now is 15:18. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.