mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2020-08-04, 21:53   #309
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2·13·443 Posts
Default

@sbardwick: Thanks for the detailed explication. In my proposed split-and-recombine-power scheme, I will be taking power from no fewer than 8 SATA connectors - 4 along each of 2 cables which plugs into a 6-pin Peripheral&SATA output of the PSU. Each pair of adjacent SATA plugs get combined into a 6-pin PCIe-plug output. Each pair of those get combined via Y-connector to an 8-pin PCIe-plug output. The resulting pair of 8-pin PCIe-plug outputs feed power to the GPU. So assuming I'm downclocking my card to draw no more than (say) 200W, what strike you as the most likely points of failure in the above scheme?

In any event, the experiment will consist of baby steps:

0. Hook above pair of spaghetti hookups into 2 6-pin Peripheral&SATA outputs of the PSU, but not into a GPU. Boot up, see if the 3 cards currently connected function normally for, say, 24 hours;

1. If step [0] succeeds, power down, plug the 2 8-pin PCIe-plug outputs of the spaghetti into card 4. boot up, see if card 4 is recognized by system, and if other 3 cards can run stably with card 4 idling;

2. If step [1] succeeds, try running code on card 4 at the lowest sclk = 1 setting, and as long as system proves stable and total-wattage at wall <= 850-900W (I've run 3 cards for weeks at a time drawing the full rated 850W - it's clearly a good PSU), gradually up the sclk setting, one notch at a time. (The maximum I expect to be able to run is all 4 cards at sclk=3, but more likely some mix ofcards at sclk=3 and at sclk=2.
ewmayer is offline   Reply With Quote
Old 2020-08-05, 07:28   #310
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

23×5×17 Posts
Default

With the described setup, IMO the most likely points of failure are the 2 12V pins in the PSU. 2nd most likely is one of the adapters, just because adapters are almost always bad quality (bad pins/sockets, poor assembly, incorrect wire gauge, cheap plastic that burns rather than melts under high temperatures [although cheap plastic is less common now], bad wire crimps to connectors, etc ) and introduce new points of failure at every junction - plus you are putting a high current draw device on them which will expose weaknesses that wouldn't bother lighter loads.



Would I try it? It's pushing the edge, but I would. I don't see a high likelihood of a system-killing fault. However, if it works, I'd solder up my own cables to eliminate the various adapters. All those connections can go wonky over time - heat cycling can loosen them.
sdbardwick is online now   Reply With Quote
Old 2020-08-06, 22:45   #311
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2×13×443 Posts
Default

While I await delivery of the ingredients for my Frankencable, been fiddling around with a few other things on the desktop testframe build, and at least figured out one thing about my failed attempts to run a 4th R7 in that system which had been bugging me.

The symptomology: The 3 currently-running GPUs consist of 2 hooked into the mostly wodely-spaced pair of the 3 full-width (16x) pcie slots of my mobo, leaving ~2in (5cm) space between the 2 cards, enough to allow good airflow to/from both. In between those 2 cards we have a 1x pcie slot, another (unused) 16c slot, and a second 1x slot. card 3, the one sitting horizontally in a custom bracket above the Celeron CPU of the system, hooks into one of those 1x slots via standard powered riser. First attempt to add 4th card used the 2nd 1x slot and a second identical-model riser. On boot, all 4 cards were recognized by the OS, but trying to run code on all 4 caused repeatable system crashes, apparently due to the 12V PSU power-rail-overload issue my Frankencable is intended to solve.

So - this was before PhilF raised the 12V-supply possibility - I figured it might be some quirk of the system or OS which wasn't liking the 4th card. At that point my under-desk Haswell ATX-case system was full up (in terms of using all the available 16x slots and staying within its lesser-and-older PSUs limits - with 2 R7s, but I figured if it was some "you may connect only 3 cards" OS quirk, I could work around that by leaving card 4 sitting next to the testframe build in the same spot as before, but hooking its riser data-connection into the remaining 1x pcie slot of the Hasell mobo via a long USB 3.0 cable, while leaving the power connects (2 8-pin plugs into the GPU, one SATA plug into the riser) hooking into the desktop system's PSU. On boot of both systems, card 4 was not recognized at all, the lights were on, but clearly no data being sent. I suspected maybe the 6' USB 3.0 cable I'd bought for the experiment was bad, but using it to connect my sole remaining Android-phone-running-Mlucas to a USB charger plug worked fine, so cable clearly OK. Puzzled, I lastly tried moving the riser data connection for card 4 back from the Haswell to the desktop system, hoping to at least get back to the "all 4 cards recognized, but you may run only 3 stably) state resulting from my initial attempt. Here the mystery - now card 4 again not seen at all.

There I left it until today - while doing some other fiddling, came the thought "OK, both of the USB cables you hooked card 4's riser up with are supposedly basic USB 3.0 ... but the one time card 4 was recognized was using the short blue USB 3.0 cable that shipped with the riser, whereas both attempts where card 4 was unrecognized used the longer 6' accessory cable. What if there really is something about the short cable that makes it work correctly with the riser, whereas the other cable won't"? Classic differential diagnosis, House MD style - and House was modeled after Sherlock Holmes, so the Holmesian "when you have eliminated the impossible..." aphorism comes into play: It makes no sense that a merely-longer USB 3.0 cable wouldn't work with the riser, but I know that using the short USB 3.0 cable that shipped with the riser *did* work. Swapped that one back in, voila! Card 4 again recognized.

So that problem is solved, but if someone could explain why-the-one-cable-works-with-the-riser-and-the-other-doesn't I would be most appreciative. The descriptive text at the above riser-card link says this: "60cm USB 3.0 Riser Cable: With the multi-layer shielded wire, the cable is well-designed for placement of your PCI-E devices and can be extended to 3 meters without any signal loss and interference."
ewmayer is offline   Reply With Quote
Old 2020-08-06, 23:34   #312
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

2A816 Posts
Default

Remember, the USB cable is not using USB protocol on those links. They are just basically being used as physical wire extensions of the PCI-E slot. The quality requirements for the USB cable will greatly increase as length increases; I'd be surprised if you can find a 3 foot generic USB cable that works.

Edit: The cable they include with the adapter is a very nice, well constructed one, if the picture of the internals is to be believed.

Last fiddled with by sdbardwick on 2020-08-06 at 23:44
sdbardwick is online now   Reply With Quote
Old 2020-08-06, 23:45   #313
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2·5·112 Posts
Default

Quote:
Originally Posted by sdbardwick View Post
Remember, the USB cable is not using USB protocol on those links. They are just basically being used as physical wire extensions of the PCI-E slot. The quality requirements for the USB cable will greatly increase as length increases; I'd be surprised if you can find a 3 foot generic USB cable that works.
The PCIe does signaling at very high frequency, and that pushes the envelope of what the cable can do (within the noise margins). That's why I recommend "Always set PCIe to GEN-1 speeds in BIOS, and work from there". Anyway the difference (for compute) between fast/slow PCIe speed is minor, but the reliability gain from using slower signaling is massive.
preda is online now   Reply With Quote
Old 2020-08-06, 23:47   #314
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2×13×443 Posts
Default

Quote:
Originally Posted by sdbardwick View Post
Remember, the USB cable is not using USB protocol on those links. They are just basically being used as physical wire extensions of the PCI-E slot. The quality requirements for the USB cable will greatly increase as length increases; I'd be surprised if you can find a 3 foot generic USB cable that works.
Yes, I think the riser-card text I quoted drastically understated the quality of the USB cable needed to provide the needed signal quality over a longer distance. The 2' cable that came with the riser is noticeably beefier - perhaps 2x the cross-sectional area - than the 6' extension cable I also tried. That would also explain why the 6' long cable worked just fine for power transfer to my Android phone, but not to the riser - the former has it operating just as a standard USB cable, providing perhaps 10W.
ewmayer is offline   Reply With Quote
Old 2020-08-07, 00:00   #315
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

1111111112 Posts
Default

Quote:
Originally Posted by sdbardwick View Post
Remember, the USB cable is not using USB protocol on those links. They are just basically being used as physical wire extensions of the PCI-E slot.
Oh my. I had no idea that was the case.

Yes, I agree, there is no way you can transmit that kind of speed over 6 foot copper wires. Personally, I'm surprised a 2 foot cable works for that purpose. It must be, as you say, because of some very careful construction. I submit it should not be called it a USB cable; it should be referred to as a PCIe extension cable instead, precisely to help prevent this type of confusion.
PhilF is online now   Reply With Quote
Old 2020-08-07, 00:24   #316
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2·13·443 Posts
Default

Quote:
Originally Posted by PhilF View Post
Yes, I agree, there is no way you can transmit that kind of speed over 6 foot copper wires. Personally, I'm surprised a 2 foot cable works for that purpose. It must be, as you say, because of some very careful construction. I submit it should not be called it a USB cable; it should be referred to as a PCIe extension cable instead, precisely to help prevent this type of confusion.
Exactly - in retrospect, given that the choice of connector between the 2 portions of the riser card is arbitrary and thus that the physical nature of said cable is entirely up to the riser-card manufacturer, it seems bizarre that they forced it to be a USB-format cable. since signal speed is what matters, why not design a custom cable optimized for that?

Anyhow, this is why envelope-edge-pushing-build threads by various people (more recent example is the Intel-Xeon-Phi-addon-card one nextdoor, which I hope will turn into a good-for-GIMPS result) are useful: a "my pain is your gain" sort of thing. Now we know of some of the interesting issues getting signal to&from riser-connected GPUs ... over the weekend I hope to solve the "getting power to more GPUs than should normally be possible for my humble 850W PSU" issue, without blowing myself up in the process. :)

Last fiddled with by ewmayer on 2020-08-07 at 01:08
ewmayer is offline   Reply With Quote
Old 2020-08-07, 05:53   #317
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·3·739 Posts
Default

What I'd like to see is a protective layer at the riser and at the gpu-power connections so that a spectacular motherboard failure or other issue does not damage a precious irreplaceable GPU (or more than one).
kriesel is offline   Reply With Quote
Old 2020-08-07, 11:31   #318
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

2A816 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Exactly - in retrospect, given that the choice of connector between the 2 portions of the riser card is arbitrary and thus that the physical nature of said cable is entirely up to the riser-card manufacturer, it seems bizarre that they forced it to be a USB-format cable. since signal speed is what matters, why not design a custom cable optimized for that?
The choice of a USB 3.0 based cable makes sense to me. Even if you were to design a custom cable, it would probably end up looking very similar to a USB 3.0 cable. USB 3.0 and a single PCI-E (v2) lane are almost identical in structure and bandwidth; one Tx/Rx pair of differential signals for full-duplex data transfer.

Note that buried in the listing information for the adapter (in an image, IIRC EDIT: Nope, just click on one of the rightmost images to get the below quote), it says only to use the provided cable.
Quote:
Attention Please:
1. The extension cable is suitable for a motherboard with a PCI-E slot (1X 2X 4X 8X 16X). Make sure to power off the computer before plugging or unplugging the cable.

2. Always use the included USB extension cable only.

To prevent damaging your computer or other equipment, please identify the correct end of the cable before plugging it in.

3. When plugging in the extension cable, you should hold the socket of the equipment to avoid bending the socket pins.

Last fiddled with by sdbardwick on 2020-08-07 at 11:37 Reason: cable quote
sdbardwick is online now   Reply With Quote
Old 2020-08-07, 20:50   #319
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2·13·443 Posts
Default

@sbardwick: The snip you dug out would seem to contradict the blurb by the same vendor, about the same product, that "can be extended to 3 meters without any signal loss". Also LOLed at the "please identify the correct end of the cable before plugging it in" bit - uh, the 2 ends of the supplied cable are, AFAICT, identical. If asymmetry-in-directionality were in play, sure you'd want to make sure the cable could be hooked up only in the proper way, via nonidentical end plugs, yes?
ewmayer is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
AMD Radeon Pro WX 3200 ET_ GPU Computing 1 2019-07-04 11:02
Radeon Pro Vega II Duo (look at this monster) M344587487 GPU Computing 10 2019-06-18 14:00
What's the best project to run on a Radeon RX 480? jasong GPU Computing 0 2016-11-09 04:32
Radeon Pro Duo 0PolarBearsHere GPU Computing 0 2016-03-15 01:32
AMD Radeon R9 295X2 firejuggler GPU Computing 33 2014-09-03 21:42

All times are UTC. The time now is 22:10.

Sat Sep 26 22:10:39 UTC 2020 up 16 days, 19:21, 0 users, load averages: 2.04, 1.66, 1.55

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.