![]() |
|
|
#1 |
|
"Mihai Preda"
Apr 2015
22·3·112 Posts |
One of my Radeon VII GPUs died today. It happened during a power cycle.
I think this is what happened: I was running the GPU with a USB riser. This "mining style" riser has a power input from the PSU, and a USB data cable to the PCIe slot. With everything off, I disconnected the USB data cable (the one linking the riser to the PCIe), and I forgot to put it back on power-on. Upon reboot the GPU is not recognized anymore; it's dead. Tried different slot, without riser, etc, it's basically dead. This is the error reported in Linux for this GPU: Code:
[ 24.015181] [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 20secs aborting [ 24.015429] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing 9300 (len 1031, WS 12, PS 8) @ 0x93A0 [ 24.015609] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing 9274 (len 63, WS 0, PS 8) @ 0x9295 [ 24.015782] amdgpu 0000:19:00.0: amdgpu: gpu post error! [ 24.015784] amdgpu 0000:19:00.0: amdgpu: Fatal error during GPU init [ 24.015790] amdgpu 0000:19:00.0: amdgpu: amdgpu: finishing device. [ 24.015920] amdgpu: probe of 0000:19:00.0 failed with error -22 According to the system log above, the system does detect the GPU at some level, but can't initialize it. The init sequence of the GPU is not working correctly. Maybe some HW component on the GPU is broken.. ? I really don't understand *why* it broke though. Recap: so I did one boot-up with the power cables for this GPU on, the powered riser connected to the GPU, but the USB-PCIe data cable not connected. And the GPU died. Why? does this make sense from an electrical perspective, to be fried like that? Does anybody have more information about what might have happened, and whether the GPU can be fixed, etc? And a warning: when using risers, don't power on with the riser powered but the data cable not connected, apparently some GPUs don't like that.. Last fiddled with by preda on 2022-09-24 at 18:22 |
|
|
|
|
|
#2 |
|
Sep 2002
Database er0rr
5×937 Posts |
it might be a BIOS setting. See: https://hardwarecanucks.com/video-ca...-cables/#howto
|
|
|
|
|
|
#3 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
7,823 Posts |
BIOS setting does not explain a GPU going from working to unusable in just a power cycle. BIOS setting does not explain a Radeon VII GPU that's dead demonstrably, when cold swapped for a running GPU in a different system (GPU A works on that system & riser, GPU B does not, replace A and all's good again, with no mucking with BIOS). I've had some of those. In Windows device manager speak, Error code 43. Those GPUs show up in eBay listings as "for parts or repair only" and commonly sell for about half or less that of a fully functional Radeon VII.
And I'm running all Intel compatible motherboards, so no Gen4 PCIe. Not much Gen3 either. Almost all are connected by 1x to 16x powered extenders similar to https://www.ebay.com/itm/393992902150 I interpret Preda's post to mean everything was connected except the blue cable between USB-A style connectors, and BOOM! at power up. I assume Preda used sufficient anti-static precautions since Radeon VII contains delicate 7nm process features. Looking at the BIOS reference Paul posted, I didn't see any anti-static wrist tether or other clear precautions. Last fiddled with by kriesel on 2022-09-24 at 20:20 |
|
|
|
|
|
#4 | ||
|
"Mihai Preda"
Apr 2015
22·3·112 Posts |
Quote:
Quote:
I'm running most of my PCIe as Gen-1 anyway (from BIOS settings) when not connecting the GPU directly on the slot. For GpuOwl the impact of Gen-1 on performance is unnoticeable. Last fiddled with by preda on 2022-09-24 at 20:27 |
||
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Looks Dead Here | storm5510 | Operation Billion Digits | 5 | 2013-10-16 14:17 |
| Dead project? | gd_barnes | Sierpinski/Riesel Base 5 | 69 | 2010-06-10 15:50 |
| PRP and LLRNET = Dead | magnav0x | Prime Sierpinski Project | 14 | 2005-12-14 22:02 |
| CPU 1 dead in duallie? | Jeff Gilchrist | Hardware | 10 | 2005-11-03 11:53 |
| Dead P3 | Prime95 | Hardware | 10 | 2003-09-10 22:41 |