mersenneforum.org Radeon VII @ newegg for 500 dollars US on 11-27
 Register FAQ Search Today's Posts Mark Forums Read

2021-03-14, 06:42   #364
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·3·5·227 Posts

Quote:
 Originally Posted by ewmayer a strange yen
Is that what you get when the Japanese mint is having a bad day? Good for numismatists though.

Last fiddled with by kriesel on 2021-03-14 at 06:43

 2021-03-15, 20:21 #365 clowns789     Jun 2003 The Computer 24·52 Posts If anyone is in the market for the pro version, there is one for $2,300 on eBay and ending in less than three days. It was sold already, but apparently canceled by the winning bidder and relisted. While it's a good chunk of change, it's not as marked up percentage-wise as the consumer version. I have been told by Ken Kriesel that the pro version is not expected to be faster for many of our tasks due to memory constraints, but it might be nice to have nonetheless. https://www.ebay.com/itm/AMD-Radeon-...oAAOSwZt1gQP1c 2021-03-15, 20:39 #366 paulunderwood Sep 2002 Database er0rr 10000101101102 Posts Quote:  Originally Posted by clowns789 If anyone is in the market for the pro version, there is one for$2,300 on eBay and ending in less than three days. It was sold already, but apparently canceled by the winning bidder and relisted. While it's a good chunk of change, it's not as marked up percentage-wise as the consumer version. I have been told by Ken Kriesel that the pro version is not expected to be faster for many of our tasks due to memory constraints, but it might be nice to have nonetheless. https://www.ebay.com/itm/AMD-Radeon-...oAAOSwZt1gQP1c
You should be able to afford it if you find the next Mersenne prime

2022-04-09, 20:16   #367
ewmayer
2ω=0

Sep 2002
República de California

11,743 Posts

2 of the 3 R7s in my 3-GPU open-frame desktop rig have been nonfunctional since last fall due to issue detailed in the copied-below e-mail thread between myself and Mike/Xyzzy.

I am trying to find out how one might diagnose whether the actual microelectronics are still functional and maybe just just need reflashing, or whether the whole shebang is shorted due to the psu-plug issue. Do any of our readers have experience with this sort f thing?

The PSU itself is fine aside from the one ruined plug (pic attached), but obviously contains a lot less sensitive electronics than the GPU.

Quote:
 29 Sep 2021 EWM: My 3-GPU rig has been crashing after 12-24hrs uptime ... today when I shut off and power back, up the lights/fans on just 1 GPU come on. Suspect the PSU (a Corsair, identical to the model that flaked out after ~9 months - so much for "ultra reliable") has gone tilt. It's been drawing 750-800W-at-wall, thus putting out maybe 85% of that on the machine side, which seems reasonably within the the 850W rated limit, given the brand's honest-rating reputation. More later.
Quote:
 01 Oct 2021, EWM: Update on this - The system is able to run stably with just the one "light still on" GPU, figured maybe that one just happened to have the lowest-Ohmage connection to the PSU. Next did detailed unplug-pull-out and trace-wires inspection of the 3-GPU rig yesterday - I had foolishly used a single PCIe plug and splitter to power 2 of the 3, because back in the wild-n-crazy days when I was trying to run a 4th GPU - all underclocked, but still - off the same RM850 PSU that proved to be the only cabling solution that seemed to work semi-stably. After selling GPU #4 last Fall I left remaining cables as-is. Figured since one of the same-PCIe GPU pair was plugged into full-width mobo PCI slot and both were underclocked, the single PCIe ribbon cable could handle the wattage. Wrong! But since the PSU-side was the most deeply-buried (i.e. hard to see) plug of the set, didn't realize until I dug down in there ... plug partially melted, vinyl sheathing crumbled in my hand on removal. So figured, ok, maybe it's not the PSU after all - even with the one PCIe socket rendered unusable by melted plastic crud in half the holes, still 3 or 4 unused ones left, plus the one driving the dedicated GPU which still operates fine. So plugged 2 fresh PCIe cables into remaining PSU-side sockets, ran one to each of GPU 2 and 3, tried power-up. Lights and fans on all 3 GPUs came on, OK, but fans on 2 and 3 were running full blast, which was weird. '/opt/rocm/bin/rocm-smi' showed just a single device numbered 0, same as with just GPU 1 plugged in. That GPU used to be at dev #1, but apparently if just 1 GPU is detected, rocm calls that dev 0. Powered back down, unplugged 2 and 3 - no point running fans full blast if GPU not seen by system - was able to run gpuowl on GPU 1 overnight again. Today powered down and switched the PCIe plug from 1 to 2, on powerup, rocm-smi gave "WARNING: No AMD GPUs specified". So the plug-meltage incident appears to have borked 2 and 3. Do you know if there is still a strong market (e.g. professional refurb-folks) for "lights come on, fans spin, but no other signs of life" R7s? Lesson learned: never skimp on the power cabling, and for high-draw components spread the load over as many power plugs as reasonably possible, for redundancy.
Quote:
02 Oct 2021, Xyzzy: have you tested each one alone? in the first slot?
Quote:
 Do you know if there is still a strong market (e.g. professional refurb-folks) for "lights come on, fans spin, but no other signs of life" R7s?
laurv sometimes buys broken cards to fix
[EWM: LaurV says his card-repair skills are mostly on the nVidia side, and more the power side of things, replacing blown MOSFETs and the components around them.]

[Long hiatus - too busy with other things, and can't afford the electricity bill with more than the remaining 3 R7s running, anyway.]

Quote:
 07 Apr 2022, EWM: I finally got around to trying one of the suspected-to-be-b0rked R7s in the 1st PCI slot on the mobo of my open-test-frame desktop rig, which I'd been using for the remaining good GPU - simply swapped the 2 GPUs on the mobo. Plugged PCIe power-cabling into the suspected-bad one, on powerup same symptoms as described below, lights come on, fan runs full-blast, no device found. Shutdown, swapped power-cabling back into working GPU (now in 2nd PCI slot), it works fine. So need a way to find out if the affected GPU itself is OK and just needs reflashing or whatever, or whether it's a total loss.
Quote:
 07 Apr 2022, Xyzzy: just so i understand it, you had only 1 gpu plugged into the mobo for this test?
Quote:
 7 Apr 2022 EWM: No - swapped PCI slots of working gpu with 1 of the 2 borked ones. I've only been using one 2x8-plug PCIe cable since borkage to minimize cabling mess, hooked that up to the borked gpu this time and booted up - no gpu detected, though power was clearly getting to it based on lights/fans. Shut down and moved power cable back to working gpu, now housed in its new PCI slot, booted back up, it runs fine as expected. So the problem is clearly not with the mobo, nor the PSU.
[EWM: LaurV notes: "Power getting tolights (probably 5V or 3.3V) and fans (probably 12V) doesn't mean that the power gets to the GPU itself (probably 1.8V, separate MOSFETs that could be burned, on a different power branch)."]
Attached Thumbnails

 2022-04-10, 03:08 #368 paulunderwood     Sep 2002 Database er0rr 2·3·23·31 Posts Try only one card in pci-e slot 1 with 2 feeds from the PSU. If it fails to show try it in pci-e slot 2. If it still fails try slot 3 etc. If and when it boots and you get a video out, you know the card is good. And you now know which slots are good. So try all slots with known good card. Using the same cables, next try only card 2 in a known good slot. Ditto card 3 by itself. Using the same cables again verify each of the PSU's cable slots. Next verify the pci-cables, by cycling them in a good card and slot. I know, it's a lot of testing! Using an 850w PSU with insufficient cabling is asking for trouble. At least get some more cables from eBay. I also recommend a beefier PSU so that your rig draws near 50%. ps. I might have some spare unused pci-e cables from a Corsair AX860. Last fiddled with by paulunderwood on 2022-04-10 at 03:57
 2022-04-10, 08:21 #369 preda     "Mihai Preda" Apr 2015 19·73 Posts I had one R7 GPU die on me following something PSU related, not sure exactly what happened there. But the symptoms were similar, the lights/fans were ON but the GPU was not detected. I looked in the kernel log ("sudo dmesg"), and I saw the kernel reporting some errors for the affected GPU (that was thus not initialized properly, and not appearing in the list of "initialized" GPUs later on). I contacted the manufacturer (XFX) as the GPU was still in warranty period, and they obliged. I shipped the GPU overseas to their factory somewhere in Asia, and about a month later I received back a working GPU from them. Unfortunatelly it seemed that, at least in my case, the GPU was hardware-affected and a simple BIOS re-flash would not have fixed it.
2022-04-10, 09:34   #370
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×3×5×227 Posts

Quote:
 Originally Posted by preda I shipped the GPU overseas to their factory somewhere in Asia, and about a month later I received back a working GPU from them.
When was that? No such luck here, working through XFX California. They verified what I already had, the cards had failed. They did not repair, they did not replace, they only offered exchange for an RX 5700XT, or payment of original purchase price without sales tax or shipping reimbursement, and that was only as original purchaser with full documentation. What I was paid for 2 corpses was about enough to buy one used RadeonVII replacement at the time. A motherboard's power handling component failing took out 3 of 5 GPUs installed on it at the time. Out of luck on warranty claim for the third since I had not saved or printed its original purchase documentation, and my Best Buy login stopped working by the time the GPUs did. An RX 5700 XT was not an acceptable exchange to me, because:
half the GPU ram (limits max exponent in gpuowl P-1)
less than half the GPU ram bandwidth
less than half the PRP performance (iterations/sec) in gpuowl
~18% the DP performance
the XFX 5700 XT sample I had bought is terribly unreliable
even an RX 6900 XT is not the equal of a Radeon VII, which they would not consider providing
Overall, it was a disappointing warranty claim experience with XFX.

I suppose from their perspective, paying full original price for something near end of warranty might be thought of as generous.
But at the time, working used Radeon VIIs were selling for up to 4 times original sale price.

On that particular motherboard, PCIe power is fed from connectors from both ends of the board.
A component near the CPU socket failed spectacularly, with arcing and a little flame.
Running mfakto on the IGP, along with prime95 on the CPU and gpuowl on all GPUs, is what sent it over the edge. CPU installed was i7-4790 (110 watt TDP). Issue was reproduced on a replacement motherboard. (Definitely a destructive test. I don't run mfakto on IGP on such GPU-heavy systems any more.)
GPUs affected were alternate; of ABCDE, A C E failed, B and D positions survived.
Whatever the damage was, they produced Code 43 errors in Windows afterward; not usable on other systems in Windows or Linux.

Last fiddled with by kriesel on 2022-04-10 at 09:45

2022-04-10, 17:22   #371
preda

"Mihai Preda"
Apr 2015

19×73 Posts

Quote:
 Originally Posted by kriesel When was that?
It was in December 2020.

I imagine it must have been painful to lose so many R7s, sorry for that. Indeed a pity they're not made anymore, in retrospective I should've bought a few more just for my personal use..

 2022-04-10, 19:43 #372 kruoli     "Oliver" Sep 2017 Porta Westfalica, DE 22·3·5·19 Posts Not directly realted to Radeon VII's, but I had major problems with gpuowl when I was remoting into my machine with a graphical session. After avoiding it, everything went fine. But this was on Windows… I hope to give you some motivation to look at "weird" possibilities.
2022-04-10, 20:06   #373
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·3·5·227 Posts

Quote:
 Originally Posted by kruoli Not directly related to Radeon VII's, but I had major problems with gpuowl when I was remoting into my machine with a graphical session.
That's my SOP for gpuowl or other GIMPS apps on Windows. GPU-Z doesn't work correctly with AMD drivers, Win7, and remote desktop, but GIMPS apps do.

 2022-05-07, 21:15 #374 Viliam Furik     "Viliam Furík" Jul 2018 Martin, Slovakia 19·41 Posts I'd like to offer a Radeon VII for sale, for 750 \$, including the shipping costs to pretty much anywhere. It had its fans changed recently, so they should last for a long time hopefully. Please, PM me if you are interested.

 Similar Threads Thread Thread Starter Forum Replies Last Post ET_ GPU Computing 1 2019-07-04 11:02 M344587487 GPU Computing 10 2019-06-18 14:00 jasong GPU Computing 0 2016-11-09 04:32 0PolarBearsHere GPU Computing 0 2016-03-15 01:32 firejuggler GPU Computing 33 2014-09-03 21:42

All times are UTC. The time now is 01:15.

Sun Sep 25 01:15:59 UTC 2022 up 37 days, 22:44, 0 users, load averages: 1.94, 1.70, 1.47