mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2018-10-10, 01:27   #23
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

782110 Posts
Default

Quote:
Originally Posted by mackerel View Post
It doesn't work that way. The bulk of the power will be provided directly from the PSU to the GPUs via PCIe 6 or 8 pin power connectors. A small amount of power may be provided via the PCIe slot, but that looks like it can be taken care of either by the 4 pin molex connectors on the mobo, or risers boards often have their own separate power input.

The ATX 24-pin PSU connectors, likely one is the main one that actually powers the rest of the mobo, and the other two are provided to send a switch signal to the other two PSUs. CPU is powered by the EPS connector. In modern systems not a lot of power goes through the ATX power connectors.
Correct. The spec is 75W max per gpu from the pcie slot. Any further comes direct from PSU cables to additional connectors at or near the top inboard corner of the GPUs. If those are not also connected and providing adequate power to the gpu, the GPU does not operate or get detected by the OS. (Accidentally ran that experiment again recently. Very happy to find the new-to-me GPU seems to survive slot-only partial powering fine, though I do not recommend it.)
kriesel is offline   Reply With Quote
Old 2018-10-10, 05:20   #24
SELROC
 

17·541 Posts
Default

Quote:
Originally Posted by kriesel View Post
Correct. The spec is 75W max per gpu from the pcie slot. Any further comes direct from PSU cables to additional connectors at or near the top inboard corner of the GPUs. If those are not also connected and providing adequate power to the gpu, the GPU does not operate or get detected by the OS. (Accidentally ran that experiment again recently. Very happy to find the new-to-me GPU seems to survive slot-only partial powering fine, though I do not recommend it.)

Correct, still the mainboard has 3 PSU connectors.


But again, the whole system is unstable and gives random errors:


amdgpu_job_timedout ring sdma0 ...


so I will build a new system, stay tuned...
  Reply With Quote
Old 2018-10-10, 14:48   #25
SELROC
 

3×5 Posts
Default

Quote:
Originally Posted by kriesel View Post
Correct. The spec is 75W max per gpu from the pcie slot. Any further comes direct from PSU cables to additional connectors at or near the top inboard corner of the GPUs. If those are not also connected and providing adequate power to the gpu, the GPU does not operate or get detected by the OS. (Accidentally ran that experiment again recently. Very happy to find the new-to-me GPU seems to survive slot-only partial powering fine, though I do not recommend it.)

the riser cables have a passive riser that goes into the pcie slot and an active pcb part that goes under the GPU board. Most of the time the active part is made with poor quality components.
  Reply With Quote
Old 2018-10-10, 15:22   #26
SELROC
 

21AC16 Posts
Default

Quote:
Originally Posted by kriesel View Post
Correct. The spec is 75W max per gpu from the pcie slot. Any further comes direct from PSU cables to additional connectors at or near the top inboard corner of the GPUs. If those are not also connected and providing adequate power to the gpu, the GPU does not operate or get detected by the OS. (Accidentally ran that experiment again recently. Very happy to find the new-to-me GPU seems to survive slot-only partial powering fine, though I do not recommend it.)

I'm not sure the GPU can operate correctly without the PSU cable, maybe it gets detected but when a job is started it fails.
  Reply With Quote
Old 2018-10-10, 15:48   #27
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11110100011012 Posts
Default

Quote:
Originally Posted by SELROC View Post
I'm not sure the GPU can operate correctly without the PSU cable, maybe it gets detected but when a job is started it fails.
On Windows at least, for a gpu requiring additional power connections beyond the pcie slot, if no psu cable is connected to a gpu requiring one, or one or zero are connected to a gpu requiring two, it is as if the gpu is not even physically installed in the system hardware; no detection of its physical presence is what I have observed. I speculate the gpu when insufficiently powered does not fully start up itself so does not respond to any pcie detection.
I've also seen cases where with pcie1x-16x powered extenders, if the system PSU is marginal the gpu fan won't spin and the gpu will not be detected. In other cases it will launch a gpu app for the gpu on the extender but the system will not be stable. I've also seen an initially reliable gpu-on-extender installation become one or both of the preceding cases with the passage of weeks time and no configuration changes (RX550s with an RX480; 4th NVIDIA on a different system), and cases where a pair of Quadro 2000s per system are stable, one on an extender, for more than 8 months and counting. So if a multi-gpu system becomes unstable, subtract the last gpu added and see how it behaves after that.
I've also seen a case where a system with a third gpu added would start, run all cpu cores on prime95 and the first two gpus could run fully loaded with mersenne gpu code, but if a gpu app was launched on the third, smallest wattage gpu, it was the last straw, apparently more than the PSU could handle, and the system shut down abruptly.
Good luck to you SELROC in getting your new system finished and stable.

Last fiddled with by kriesel on 2018-10-10 at 16:03
kriesel is offline   Reply With Quote
Old 2018-10-10, 16:01   #28
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

32·11·79 Posts
Default

Quote:
Originally Posted by SELROC View Post
the riser cables have a passive riser that goes into the pcie slot and an active pcb part that goes under the GPU board. Most of the time the active part is made with poor quality components.
I was writing of the case where all the gpus mount in the system without extender cables.
It sounds to me that you are writing of active powered extenders.
For the other readers, these typically have a tiny PCB that has a USB connector. At the other end of the USB cable goes a pciex16 connector and circuit board. A separate power cable connects from the system PSU (perhaps a SATA power connector or other drive connector) to that circuit board to provide the slot power to the gpu. If the gpu needs more than slot power, more cables are required.

Active power is needed typically for even the case of a gpu that fits within the 75W pcie slot spec, such as an RX550 or Quadro 2000, since it's beyond what USB could safely carry from the tiny PCB which is often a 1x connection.
This arrangement usually requires extenders for the power cabling, may add a few feet of loop length to the power connection, and perhaps add too much resistive loss.

Last fiddled with by kriesel on 2018-10-10 at 16:02
kriesel is offline   Reply With Quote
Old 2018-10-10, 16:14   #29
SELROC
 

7·163 Posts
Default

Quote:
Originally Posted by kriesel View Post
I was writing of the case where all the gpus mount in the system without extender cables.
It sounds to me that you are writing of active powered extenders.
For the other readers, these typically have a tiny PCB that has a USB connector. At the other end of the USB cable goes a pciex16 connector and circuit board. A separate power cable connects from the system PSU (perhaps a SATA power connector or other drive connector) to that circuit board to provide the slot power to the gpu. If the gpu needs more than slot power, more cables are required.

Active power is needed typically for even the case of a gpu that fits within the 75W pcie slot spec, such as an RX550 or Quadro 2000, since it's beyond what USB could safely carry from the tiny PCB which is often a 1x connection.
This arrangement usually requires extenders for the power cabling, may add a few feet of loop length to the power connection, and perhaps add too much resistive loss.

in my case the error manifests at random, maybe after a few minutes or after a few hours:


amdgpu_job_timedout [amdgpu] ring sdma0 ...


and


amdgpu_job_timedout [amdgpu] ring sdma1 ...
  Reply With Quote
Old 2018-10-10, 16:35   #30
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

32×11×79 Posts
Default

Quote:
Originally Posted by SELROC View Post
in my case the error manifests at random, maybe after a few minutes or after a few hours:

amdgpu_job_timedout [amdgpu] ring sdma0 ...

and

amdgpu_job_timedout [amdgpu] ring sdma1 ...
What app is running, gpuowl? What exponent range? Does it still occur at much lower exponents?

In Windows there are adjustable registry parameters for TDR (timeout detection and recovery). Windows regards the display gpu as only meant for display, and some gpu app kernels can take longer than its default timeout setting to complete, triggering a device reset by Windows.

Are there analogous timeout limit adjustments available on linux?
Looks like you're not alone: https://github.com/M-Bab/linux-kerne...ries/issues/48

Last fiddled with by kriesel on 2018-10-10 at 16:39
kriesel is offline   Reply With Quote
Old 2018-10-10, 17:08   #31
SELROC
 

271010 Posts
Default

Quote:
Originally Posted by kriesel View Post
What app is running, gpuowl? What exponent range? Does it still occur at much lower exponents?

In Windows there are adjustable registry parameters for TDR (timeout detection and recovery). Windows regards the display gpu as only meant for display, and some gpu app kernels can take longer than its default timeout setting to complete, triggering a device reset by Windows.

Are there analogous timeout limit adjustments available on linux?
Looks like you're not alone: https://github.com/M-Bab/linux-kerne...ries/issues/48

Yes running gpuowl, the exponent is not important, the error happens randomly with 84M and 300M. I'd also say that it is not a GPU-model issue, because they are getting it on Vega and I am getting it on RX580.
  Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime95 and graphics cards keithschmidt Information & Answers 45 2016-09-10 10:08
New Linux rootkit leverages graphics cards for stealth. swl551 Lounge 0 2015-05-08 14:06
Nvidia's next-generation graphics cards ixfd64 GPU Computing 22 2014-11-15 04:25
how do graphics cards work so fast? ixfd64 Hardware 1 2004-06-02 03:01
Chance to use modern Graphics Cards as.. Marco Hardware 28 2003-11-02 23:21

All times are UTC. The time now is 03:46.


Fri Jul 7 03:46:28 UTC 2023 up 323 days, 1:15, 0 users, load averages: 0.79, 0.86, 1.10

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔