mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   Help needed - needing a Nvidia Tesla K80 in a Dell 7920 tower workstation (https://www.mersenneforum.org/showthread.php?t=26783)

drkirkby 2021-05-08 23:42

Help needed - needing a Nvidia Tesla K80 in a Dell 7920 tower workstation
 
I'm trying to get an Nvidia Tesla K80 GPU I bought used from eBay, working in a Dell 7920 [B]tower [/B]workstation. (There is also a [B]rackmount [/B]version of this workstation).

The Nvidia Tesla K80 has no video connectors on it - it is purely designed for computing, and not driving a monitor(s).

If I put the K80 in the workstation, the power supply comes on for 20 seconds or so, then goes off. Then it comes on for another 20 seconds or so and then powers off. This process goes on in an infinite loop. The thing most people will think of insufficient power, but that should not be the case.

* The machine is designed to take up to 3 x 300 W GPUs.
* Power consumption with dual 26-core CPUs is about 430 W, but the PSU is 1400 W. So the power supply could provide about 1 kW more than its currently supplying.

If I look on the Dell website for this machine

[URL]https://www.dell.com/en-uk/work/shop/workstations/precision-7920-tower-build-your-own/spd/precision-7920-workstation/xctopt7920emea[/URL]

I can see it is available with several [B]graphics cards[/B], from
* Single Radeon Pro WX 2100, 2GB (standard)
* Dual NVLink NVIDIA Quadro GV100, 32GB, 4 DP (loads of money)

but I don't see anything like the K80, which is a pure GPU - not graphics card.

I asked on the Nvidia forum
[URL]https://forums.developer.nvidia.com/t/problems-installating-tesla-k80-on-dell-7920-workstation/165871[/URL]
and was provided with this link
[URL]https://forums.developer.nvidia.com/t/k40-setup-on-lenovo-p510/64917[/URL]
with someone having issues with a K40 GPU, but his issues were less serious, as at least the computer would power up. That link has this comment from an Nvidia moderator:

"If you buy a Tesla card believing you can install it in any system you want, you are asking for trouble. It is simply not possible, in the general case, and there is no design intent to make it possible....There is no suggestion anywhere that Tesla cards can be placed in any system you want with an expectation of proper behavior."

I've enabled the "above 4 GB" option in the BIOS that's mentioned there, but the workstation will not even power up properly - there's nothing ever on the screen. I guess I am just out of luck, and either

a) The card is no good, which is a possibility.
b) It will not work in the Dell 7920.

I wondered if anyone had any ideas how I might get this to work

Dave

Xyzzy 2021-05-09 00:53

Do you have the PCIe cables plugged into it?

frmky 2021-05-09 02:56

Are you cooling it? Tesla cards often don't have a fan installed and require a high-speed fan to keep it cool.

drkirkby 2021-05-09 11:31

[QUOTE=Xyzzy;578035]Do you have the PCIe cables plugged into it?[/QUOTE]
As far as I'm aware, there is only one cable that needs connecting. There's one cable for each of 3 GPUs - see page 72 of the manual for the 7920 tower workstation.

[URL]https://dl.dell.com/topicspdf/precision-7920-workstation_owners-manual_en-us.pdf[/URL]

The PSU does not have any cables coming out of it. The PSU just pulls out of the chassis just by undoing a clip on the rear of the machine. It is possible to change the PSU in under a minute.

There is a rack-mount version of this machine. They are quite similar, but not identical.

* Rack unit has dual redundant PSUs
* Rack unit has better remote management
* Tower unit has more space for drives (10).

The manual for the rack unit is better, so whilst it's not for an identical model, it is sometimes helpful

[URL]https://dl.dell.com/topicspdf/precision-7920r-workstation_owners-manual_en-us.pdf[/URL]
around page 70.

As far as I can see, the 3 cables near the GPUs are designed to plug into the GPUs.

Dave

drkirkby 2021-05-09 11:41

[QUOTE=frmky;578045]Are you cooling it? Tesla cards often don't have a fan installed and require a high-speed fan to keep it cool.[/QUOTE]
I did wonder about the cooling, but the 7920 is designed to take up to 3 x 300 W, or 2 x 375 W GPUs. I assume (perhaps incorrectly), that one or more of the 10 fans in the machine would speed up to provide sufficient cooling air. However, none of the fans speed up in the 20 seconds or so the power supply remains on. I don't know if the card has an air-sensor, or heats up so quickly it shuts down the power supply in 20 seconds before the fans get going.

If one runs the diagnostics on the machine from the BIOS, it gets fairly loud, but never that loud in the 20 s or so before the power supply cuts out.

Maybe it would worth my while trying additional cooling. I will have to check out how the supported cards are cooled.

Dave

paulunderwood 2021-05-09 12:27

[QUOTE=drkirkby;578032]

[URL]https://www.dell.com/en-uk/work/shop/workstations/precision-7920-tower-build-your-own/spd/precision-7920-workstation/xctopt7920emea[/URL]

I can see it is available with several [B]graphics cards[/B], from
* Single Radeon Pro WX 2100, 2GB (standard)
* Dual NVLink NVIDIA Quadro GV100, 32GB, 4 DP (loads of money)

but I don't see anything like the K80, which is a pure GPU - not graphics card.
[/QUOTE]

The WX 2100 is 250w max and the GV100 is 35w. However, the k80 is 300w and this is possibly too much for your DELL system and risks damage to it.

paulunderwood 2021-05-09 12:45

This was one solution: [url]https://www.dell.com/community/Rack-Servers/R730-and-NVIDIA-Tesla-K80-Not-Detected/td-p/7629428[/url]

Xyzzy 2021-05-09 12:51

There is no way that 300W can be drawn from the PCIe slot. You need at least an 8 pin PCIe connector. What connectors are on the card? The PDF you linked shows a single 6 pin cord.

Note the PCIe requirement here: [URL]https://www.techpowerup.com/gpu-specs/tesla-k80.c2616[/URL]

Pictures of your card and PSU would help a bunch!

:mike:

paulunderwood 2021-05-09 13:02

[url]https://www.youtube.com/watch?v=wB9QtsDeAMo[/url]

It looks like you have a power pin-out problem. This guy says you need a CPU plug not PCI-e. Maybe you can get a converter (at your own risk). If you have a 3rd CPU cable,,,

Xyzzy 2021-05-09 13:15

A picture of the card's power connector would allow us to see how it is keyed.

tServo 2021-05-09 13:23

[QUOTE=frmky;578045]Are you cooling it? Tesla cards often don't have a fan installed and require a high-speed fan to keep it cool.[/QUOTE]

I believe you may have 2 problems: power and cooling. It's hard to say which is the culprit.

(1) cooling: I agree with frmky that these cards in a 7920 will need some sort of additional cooling. The 7920 box does NOT have sufficient airflow to cool these red hot beasts.
They were designed to be put in a "pizza box" server that has about 20 high power fans that scream so loud it sounds like a jet taking off. Years ago, I had a friend that put one of these in a 7910 ( I think ) and he cobbled a VERY high speed fan to do the job. The fan fit by the exhaust slots and pulled the air out. I can't remember how he attached it to the machine, he may have used duct tape ( The handyman's secret weapon ).
You can search ebay for "tesla k80 fan" for some guys who use printers to make fans that attach to the input side of the card.

(2) the cards need a very special cable that takes 2 standard pcie 8 pin power cables and output a special 8 pin cable. I would try using 2 different pcie power cables ( not just 1 cable split into 2 ). search ebay for "K80 power cable".

Good luck!

paulunderwood 2021-05-09 13:41

[QUOTE=tServo;578073]I believe you may have 2 problems: power and cooling. It's hard to say which is the culprit.

(1) cooling: I agree with frmky that these cards in a 7920 will need some sort of additional cooling. The 7920 box does NOT have sufficient airflow to cool these red hot beasts.
They were designed to be put in a "pizza box" server that has about 20 high power fans that scream so loud it sounds like a jet taking off. Years ago, I had a friend that put one of these in a 7910 ( I think ) and he cobbled a VERY high speed fan to do the job. The fan fit by the exhaust slots and pulled the air out. I can't remember how he attached it to the machine, he may have used duct tape ( The handyman's secret weapon ).
You can search ebay for "tesla k80 fan" for some guys who use printers to make fans that attach to the input side of the card.

(2) the cards need a very special cable that takes 2 standard pcie 8 pin power cables and output a special 8 pin cable. I would try using 2 different pcie power cables ( not just 1 cable split into 2 ). search ebay for "K80 power cable".

Good luck![/QUOTE]

:goodposting: :tu:

drkirkby 2021-05-09 17:37

[QUOTE=Xyzzy;578068]There is no way that 300W can be drawn from the PCIe slot. You need at least an 8 pin PCIe connector. What connectors are on the card? The PDF you linked shows a single 6 pin cord.

Note the PCIe requirement here: [URL]https://www.techpowerup.com/gpu-specs/tesla-k80.c2616[/URL]

Pictures of your card and PSU would help a bunch!

:mike:[/QUOTE]


Someone gave me a technical reference manual on this workstation. It's on another computer which I can't access now, but that should have the specification. But this video, which is a review of the workstation, says 7 minutes and 50 seconds in that it supports 3 x 300 W or 2 x 375 W GPUs


[url]https://www.youtube.com/watch?v=jP65i_Iqml8[/url]


I will add some pictures later.

tServo 2021-05-09 18:47

[QUOTE=drkirkby;578090]Someone gave me a technical reference manual on this workstation. It's on another computer which I can't access now, but that should have the specification. But this video, which is a review of the workstation, says 7 minutes and 50 seconds in that it supports 3 x 300 W or 2 x 375 W GPUs


[url]https://www.youtube.com/watch?v=jP65i_Iqml8[/url]


I will add some pictures later.[/QUOTE]

I believe Xyzzy's point was ( correct me if I'm wrong, Mike ) that the power cannot come solely via the card's pcie bus, you must also attach pcie power connectors to get that much power. I believe the pcie spec says the pcie bus can only supply 75 watts. The rest must come fron the pcie power connectors.

drkirkby 2021-05-09 19:36

I did have another power cable connected - I will take a photograph of the card and connector tomorrow, but at the moment I don't want to shut the machine down, but unfortunately opening the case powers it off.

Dave

Xyzzy 2021-05-09 19:46

[QUOTE=drkirkby;578103]I did have another power cable connected - I will take a photograph of the card and connector tomorrow, but at the moment I don't want to shut the machine down, but unfortunately opening the case powers it off. [/QUOTE]There is a switch somewhere in your computer called a "chassis intrusion" switch. You can permanently jump that switch so you can open the computer while it is running.

:mike:

paulunderwood 2021-05-09 20:09

[QUOTE=drkirkby;578090]Someone gave me a technical reference manual on this workstation. It's on another computer which I can't access now, but that should have the specification. But this video, which is a review of the workstation, says 7 minutes and 50 seconds in that it supports 3 x 300 W or 2 x 375 W GPUs


[url]https://www.youtube.com/watch?v=jP65i_Iqml8[/url]


I will add some pictures later.[/QUOTE]

The k80 has a non-standard power socket. This is why we strongly recommend a power feed from two PCI-e to K80 (special) which can get from eBay for a fiver. Search: "K80 power cable"..

Xyzzy 2021-05-09 21:20

2 Attachment(s)
But he only has three 6-pin cables, right?

:mike:

Edit: He has at least two 8-pin cables.

kriesel 2021-05-09 21:37

1 Attachment(s)
Cooling: most Tesla cards are passively cooled, designed for modest volume higher pressure air flowing lengthwise ([B]high[/B] impedance), as in rack mount servers that use low volume high pressure fans to force air from rack front (cold aisle) to back (hot aisle).
That's a completely different fan design than the usual consumer gpu or home or office system case low pressure high volume fan (for [B]low[/B] impedance), more commonly called a blower, typically a squirrel cage centrifugal fan. See the [URL="https://mersenneforum.org/showthread.php?t=25767&highlight=passive"]discussion about Xeon Phi including coprocessor cards[/URL], which are also MOSTLY designed for passively cooling. An exception with builtin blower can be seen [URL="https://mersenneforum.org/showpost.php?p=552497&postcount=27"]here[/URL]. It takes a squirrel-cage blower style fan to produce the much higher air pressure to cool a GPU lengthwise than the axial-flow low pressure fans that are common in desktop or tower workstation or consumer cases or consumer grade gpus that have [URL="https://www.bhphotovideo.com/c/product/1512774-REG/hp_6yt67at_nvidia_quadro_p2200_5gb.html/?msclkid=708e431f0afc155ccffd238745494105"]fans on the big face[/URL]. (The difference between a car radiator fan or window fan, and a vacuum cleaner or furnace or leaf blower.)
Little of the above should be a surprise to the OP. Sent to him in a PM 2021-01-11:
"How do you plan to cool the K80? Does your workstation case have the [B]high pressure fans required[/B]?"

All graphics cards shown in the video link given in [URL="https://mersenneforum.org/showpost.php?p=578090&postcount=13"]post 13[/URL] include built-in fans, not passive cooling design.

Note, there's a cottage industry [URL="https://www.ebay.com/itm/124329274594?hash=item1cf29a2ce2:g:xcQAAOSwg9Rdr6Bs"]3d printing adapters[/URL] to passive cooled cards. Some appear to include a low pressure fan not suitable for the task. [URL="https://www.ebay.com/itm/313422989175?hash=item48f9774777:g:j5wAAOSwJiJgIIlF"]This one[/URL] appears to use a proper blower but handicaps it with a sharp 180 degree U-turn which will create a lot of added flow resistance.

I bought a used workstation some time ago that came with a Tesla C2075 and Quadro 4000 that coexisted. That Tesla was also an [B]actively[/B] cooled GPU with blower style fan built in. (It's gone to recycling long ago after first making the box unstable, then unbootable, and failing testing on another system also.)

Power connections: PCIe slots are specified to provide no more than 75 watts max.
A standard six pin auxiliary plug that connects separately to a PCIe module (such as a GPU or other card) has 75 watt max rating. A standard eight pin auxiliary plug that connects separately to a PCIe module has 150 watt max rating.

These are additive: PCIe 75 + 6-aux 75 +8-aux 150 =300 for example.

Some GPUs use:
PCIe power only. Low power cards such as Quadro 2000 or RX550.
PCIe + 6-aux; 150W max
PCIe + pair of 6; 225W max
PCIe + 6 and 8; 300W max
PCIe + pair of 8; 375W max. Some 300W nominal GPUs also use this approach.

RTX3xxxx introduced a higher density and power rating 12-contact aux connector.
Apparently some Teslas used nonstandard power connection.

It's quite common for the higher power GPUs (Radeon VII, GTX1080, RTX2080 etc) to have multiple aux power connectors, and refuse to operate when a single connection is made.
It's quite common for large-capacity PSU GPU Aux power cables to have multiple connectors per cable on the GPU end; (6+2)x2 is common. One cable, 4 connectors usable in various combinations; 6, 8, dual-6, 8+6, dual-8, on one GPU, or split for 2 adjacent modest-power gpus.

Power supply capacity is probably adequate, depending on other components present.
A KillaWatt line power input meter or equivalent could be useful to check the typical draw, although that would not likely identify short sub-second peaks in power demand that exceed the PSU rating. Such meters are available for free loan from some public libraries. They're also good to own and don't cost much compared to any decent CPU or GPU.

A low end GPU with display output perhaps temporarily installed could be useful to observe what is going on during the BIOS startup and any beginnings of OS load, before an SSH server is up.

Recording boot displays with a digital camera can be helpful, allowing later review of a message that flashes on just before it all goes dark.

Drivers: If I recall correctly, from reading the NVIDIA site long ago, and one previous Tesla experience, Teslas can be run either with a driver that also works with Quadros with display capability, or with a special compute-oriented driver. [URL]https://docs.nvidia.com/gameworks/content/developertools/desktop/tesla_compute_cluster.htm[/URL]
Note, NVIDIA allows one NVIDIA GPU driver per system. The version must be compatible with all NVIDIA GPU models you hope to use simultaneously. I've had to move GPUs to other systems to arrange the simultaneous solution to that.

drkirkby 2021-05-10 19:35

1 Attachment(s)
[QUOTE=kriesel;578111]Cooling: most Tesla cards are passively cooled, designed for modest volume higher pressure air flowing lengthwise ([B]high[/B] impedance), as in rack mount servers that use low volume high pressure fans to force air from rack front (cold aisle) to back (hot aisle).
That's a completely different fan design than the usual consumer gpu or home or office system case low pressure high volume fan (for [B]low[/B] impedance), more commonly called a blower, typically a squirrel cage centrifugal fan. [/QUOTE]

Yes, maybe cooling is the problem. The fans themselves must be able to supply enough air, I accept at a lower pressure. The design of the workstation version of the Dell 7920 tower [B]workstation[/B] is very much like a consumer PC in appearance, although not in internal design - for example, it only takes ECC RAM. There is a 2U [B]rackmount[/B] version of the Dell 7920, which is like a typcal server, although its called a workstation.

I'm attaching a copy of the Dell 7920 Technical reference. I'm not aware of anywhere else this document can be found. It will have information about the connectors in it. (I had to use an online compression tool, as the original size was over 4 MB, which is the limit of the forum).

I will try to upload some photographs later. I have had problems on some forums (I think including this one), where the file size is very limited. I'm not sure if this is the forum, but I tried compressing a jpeg both with my iphone and Gimp on a PC, and the forum said the file format was not valid. I maybe thinking of another forum. I will try to attach some pictures later, but for now here's the technical reference, which is better than the users guide. For example, the users guide only shows how to configure the machine with identical RAM modules, but this shows that RAM modules of different sizes can be mixed, and how to mix them.

chalsall 2021-05-10 19:50

[QUOTE=drkirkby;578165]The design of the workstation version of the Dell 7920 tower [B]workstation[/B] is very much like a consumer PC in appearance, although not in internal design - for example, it only takes ECC RAM. There is a 2U [B]rackmount[/B] version of the Dell 7920, which is like a typcal server, although its called a workstation.[/QUOTE]

To share... While I ***love*** Dell rack-mount kit, I hate their workstations. That also goes for HP, Lenovo, et al.

As you mentioned, nothing is readily Commerical Off-The-Shelf (COTS). Not even the PSU!

A client of mine recently had their Lenovo-based accounting workstation (read: mission-critical) fail. It was most likely the PSU, but a replacement would have had to be ordered (and, being Bimshire, imported with a temporal latency of about a month) at a cost of only slightly less than a full new build from commodity parts.

I had them back up and running by building a brand new machine (migrating the HD) in a day. The Lenovo was taken to a local recycler who does e-waste properly.

jawesker 2021-07-23 17:44

Hi, did you get any walkaround for this problem? I have almost the same problem. I own a dell t7820 tower station and recently bought a tesla card k80, when I plug it in with proper ventilation and with the proper power supply the workstation starts but no video and stays like that just doing nothing. I checked and the indicator LEDs in the tesla card are on. After a couple of minues I turn the system off and I could feel that the tesla card was a bit more than warm so I think that the power supply is doing his job. Any one have some experience with this card?

Kind regards

drkirkby 2021-07-24 09:15

[QUOTE=jawesker;583835]Hi, did you get any walkaround for this problem? I have almost the same problem. I own a dell t7820 tower station and recently bought a tesla card k80, when I plug it in with proper ventilation and with the proper power supply the workstation starts but no video and stays like that just doing nothing. I checked and the indicator LEDs in the tesla card are on. After a couple of minues I turn the system off and I could feel that the tesla card was a bit more than warm so I think that the power supply is doing his job. Any one have some experience with this card?

Kind regards[/QUOTE]No, I never persued the matter any more. When I looked at the performance of the K80 for its power consumption, I concluded it was not really worth using. There are a lot of consumer cards that use less power, but have much better double-precision floating point performance. I think I will end up putting my card on eBay.

I guess it would be reasonable for trial factoring, but I'm not sure if its really worth it for anything else.


All times are UTC. The time now is 23:21.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.