mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2022-09-24, 18:15   #1
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

22·3·112 Posts
Default Radeon VII GPU dead? (why?)

One of my Radeon VII GPUs died today. It happened during a power cycle.

I think this is what happened: I was running the GPU with a USB riser. This "mining style" riser has a power input from the PSU, and a USB data cable to the PCIe slot. With everything off, I disconnected the USB data cable (the one linking the riser to the PCIe), and I forgot to put it back on power-on. Upon reboot the GPU is not recognized anymore; it's dead. Tried different slot, without riser, etc, it's basically dead.

This is the error reported in Linux for this GPU:
Code:
[   24.015181] [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 20secs aborting
[   24.015429] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing 9300 (len 1031, WS 12, PS 8) @ 0x93A0
[   24.015609] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing 9274 (len 63, WS 0, PS 8) @ 0x9295
[   24.015782] amdgpu 0000:19:00.0: amdgpu: gpu post error!
[   24.015784] amdgpu 0000:19:00.0: amdgpu: Fatal error during GPU init
[   24.015790] amdgpu 0000:19:00.0: amdgpu: amdgpu: finishing device.
[   24.015920] amdgpu: probe of 0000:19:00.0 failed with error -22
The GPU has the fans and lights working.

According to the system log above, the system does detect the GPU at some level, but can't initialize it. The init sequence of the GPU is not working correctly. Maybe some HW component on the GPU is broken.. ?

I really don't understand *why* it broke though. Recap: so I did one boot-up with the power cables for this GPU on, the powered riser connected to the GPU, but the USB-PCIe data cable not connected. And the GPU died. Why? does this make sense from an electrical perspective, to be fried like that?

Does anybody have more information about what might have happened, and whether the GPU can be fixed, etc?

And a warning: when using risers, don't power on with the riser powered but the data cable not connected, apparently some GPUs don't like that..

Last fiddled with by preda on 2022-09-24 at 18:22
preda is offline   Reply With Quote
Old 2022-09-24, 19:02   #2
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

5×937 Posts
Default

it might be a BIOS setting. See: https://hardwarecanucks.com/video-ca...-cables/#howto
paulunderwood is offline   Reply With Quote
Old 2022-09-24, 20:12   #3
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

7,823 Posts
Default

BIOS setting does not explain a GPU going from working to unusable in just a power cycle. BIOS setting does not explain a Radeon VII GPU that's dead demonstrably, when cold swapped for a running GPU in a different system (GPU A works on that system & riser, GPU B does not, replace A and all's good again, with no mucking with BIOS). I've had some of those. In Windows device manager speak, Error code 43. Those GPUs show up in eBay listings as "for parts or repair only" and commonly sell for about half or less that of a fully functional Radeon VII.
And I'm running all Intel compatible motherboards, so no Gen4 PCIe. Not much Gen3 either.
Almost all are connected by 1x to 16x powered extenders similar to https://www.ebay.com/itm/393992902150
I interpret Preda's post to mean everything was connected except the blue cable between USB-A style connectors, and BOOM! at power up.
I assume Preda used sufficient anti-static precautions since Radeon VII contains delicate 7nm process features.
Looking at the BIOS reference Paul posted, I didn't see any anti-static wrist tether or other clear precautions.

Last fiddled with by kriesel on 2022-09-24 at 20:20
kriesel is online now   Reply With Quote
Old 2022-09-24, 20:26   #4
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

22·3·112 Posts
Default

Quote:
Originally Posted by kriesel View Post
Almost all are connected by 1x to 16x powered extenders similar to https://www.ebay.com/itm/393992902150
Yes, the riser I was using looks *exactly* like the one you linked.

Quote:
I interpret Preda's post to mean everything was connected except the blue cable between USB-A style connectors, and BOOM! at power up.
Yes that cable exactly. I'm surprised that leaving only that [data]cable not-connected would brick the GPU.

I'm running most of my PCIe as Gen-1 anyway (from BIOS settings) when not connecting the GPU directly on the slot. For GpuOwl the impact of Gen-1 on performance is unnoticeable.

Last fiddled with by preda on 2022-09-24 at 20:27
preda is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Looks Dead Here storm5510 Operation Billion Digits 5 2013-10-16 14:17
Dead project? gd_barnes Sierpinski/Riesel Base 5 69 2010-06-10 15:50
PRP and LLRNET = Dead magnav0x Prime Sierpinski Project 14 2005-12-14 22:02
CPU 1 dead in duallie? Jeff Gilchrist Hardware 10 2005-11-03 11:53
Dead P3 Prime95 Hardware 10 2003-09-10 22:41

All times are UTC. The time now is 14:16.


Fri Jul 7 14:16:20 UTC 2023 up 323 days, 11:44, 0 users, load averages: 1.07, 1.37, 1.30

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔