mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2019-12-06, 02:49   #243
dcheuk
 
dcheuk's Avatar
 
Jan 2019
Florida

35 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
What are your cards temperature readings under load? I have my card set to "setsclk 4" on Linux. Also my case side is removed and the case is on its side to allow the heat to dissipate.
Device 00 is the one that's connected to the display, device 01 is another one sitting below.

I'm scared of the heat so I undervolted them to 1352mhz so both cards running below 80C. It's a trade off that I am personally willing to take for (some sense of) security. Lol
These temperatures are read when the case is closed (see photo of my setup).
Attached Thumbnails
Click image for larger version

Name:	device00.gif
Views:	179
Size:	40.6 KB
ID:	21393   Click image for larger version

Name:	device01.gif
Views:	171
Size:	40.8 KB
ID:	21396   Click image for larger version

Name:	rsz_1rsz_2img_20191204_202005.jpg
Views:	203
Size:	643.6 KB
ID:	21397  

Last fiddled with by dcheuk on 2019-12-06 at 02:50
dcheuk is offline   Reply With Quote
Old 2019-12-06, 08:47   #244
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2×52×19 Posts
Default

Quote:
Originally Posted by Prime95 View Post
In Wattman, set max speed to 1580, set voltage to 940, set memory clock to 1200.

If errors occur, reduce memory clock. If no errors for a day, try reducing voltage.
Don't necessarily set voltage to 940, every Radeon VII comes out of the factory with a different voltage according to binning and from memory 940 seems a little low as a baseline. To tune voltage you should determine what yours is at stock and reduce it until you find instability then back off to a safe limit.



As for heat, from memory the target is 85C and the limit is 105C. Messing with memory clocks tends to overshoot these targets. I prefer to set fan speed manually so that the fans don't keep varying in speed and to bring the temps down to the 70's.
M344587487 is offline   Reply With Quote
Old 2019-12-06, 09:39   #245
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

22×863 Posts
Default

I do not run Radeon VII's so I'm not sure if this works for them, but I always use MSI Afterburner for GPUs because you can set a custom fan speed curve based on temperature, so you are sure the fans run 100% at the temperature you want. I set it to 100% fan speed at 80C on my RTX 2080.
ATH is offline   Reply With Quote
Old 2019-12-06, 19:38   #246
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

22·3·112 Posts
Default

Quote:
Originally Posted by dcheuk View Post
Device 00 is the one that's connected to the display, device 01 is another one sitting below.

I'm scared of the heat so I undervolted them to 1352mhz so both cards running below 80C. It's a trade off that I am personally willing to take for (some sense of) security. Lol
These temperatures are read when the case is closed (see photo of my setup).
The temperature to watch is "GPU hot spot", I would target that to be in 85-98C. Also GPU power of 140-160W (--setsclk 3) looks good.

Start by not touching the voltages at all ("stock" voltages), but gradually increase memory clocks to see what is stable. Once you find the upper limit of stable memory (this may take days), start gradually reducing voltages and see what remains stable.

It seems to me you may be running with too low voltage right now, and that is producing the errors not the mem clock.

For memory, start by trying 1180. If that is good, you're done, no need to push it to 1200. If not good, try 1150, next try 1100 (usually 1100 is stable in any case).

Last fiddled with by preda on 2019-12-06 at 19:41
preda is offline   Reply With Quote
Old 2019-12-06, 21:19   #247
dcheuk
 
dcheuk's Avatar
 
Jan 2019
Florida

35 Posts
Default

Quote:
Originally Posted by preda View Post
The temperature to watch is "GPU hot spot", I would target that to be in 85-98C. Also GPU power of 140-160W (--setsclk 3) looks good.

Start by not touching the voltages at all ("stock" voltages), but gradually increase memory clocks to see what is stable. Once you find the upper limit of stable memory (this may take days), start gradually reducing voltages and see what remains stable.

It seems to me you may be running with too low voltage right now, and that is producing the errors not the mem clock.

For memory, start by trying 1180. If that is good, you're done, no need to push it to 1200. If not good, try 1150, next try 1100 (usually 1100 is stable in any case).
Okay, thanks, going to try that. Was about to ask about the recurring error:

Code:
2019-12-06 04:17:10 89442629    24900000  27.84%; 1087 us/sq; ETA 0d 19:30; edcd5c9c419b43b2
2019-12-06 04:17:21 89442629    24910000  27.85%; 1087 us/sq; ETA 0d 19:30; 029e94b7024c11bd
2019-12-06 04:17:32 89442629    24920000  27.86%; 1088 us/sq; ETA 0d 19:29; 29aced6f04668b0e
2019-12-06 04:17:43 89442629    24930000  27.87%; 1087 us/sq; ETA 0d 19:29; da958ff83821196c
2019-12-06 04:17:54 89442629    24940000  27.88%; 1087 us/sq; ETA 0d 19:29; ef17aba68cb1c228
2019-12-06 04:18:04 89442629    24950000  27.89%; 1088 us/sq; ETA 0d 19:29; 225c1f2511feb9cd
2019-12-06 04:18:15 89442629    24960000  27.91%; 1088 us/sq; ETA 0d 19:29; b68918f69ef18350
2019-12-06 04:18:26 89442629    24970000  27.92%; 1088 us/sq; ETA 0d 19:29; 923f1abf9276cdda
2019-12-06 04:18:37 89442629    24980000  27.93%; 1088 us/sq; ETA 0d 19:28; 66156822ef6b03c7
2019-12-06 04:18:48 89442629    24990000  27.94%; 1088 us/sq; ETA 0d 19:28; 12cfd9469201de32
2019-12-06 04:19:00 89442629 EE 25000000  27.95%; 1088 us/sq; ETA 0d 19:28; a06893d89292ee88 (check 0.67s)
Code:
2019-12-06 14:40:19 89442629    58910000  65.86%; 1087 us/sq; ETA 0d 09:13; 13172cb240dbf79c
2019-12-06 14:40:30 89442629    58920000  65.87%; 1087 us/sq; ETA 0d 09:13; e77466514918fbe2
2019-12-06 14:40:41 89442629    58930000  65.89%; 1087 us/sq; ETA 0d 09:13; 2cdbf3681819d0f8
2019-12-06 14:40:52 89442629    58940000  65.90%; 1087 us/sq; ETA 0d 09:13; 7df9830557f5b042
2019-12-06 14:41:03 89442629    58950000  65.91%; 1087 us/sq; ETA 0d 09:13; f002368019202720
2019-12-06 14:41:14 89442629    58960000  65.92%; 1086 us/sq; ETA 0d 09:12; 0000000000000000
2019-12-06 14:41:25 89442629    58970000  65.93%; 1086 us/sq; ETA 0d 09:11; 0000000000000000
2019-12-06 14:41:35 89442629    58980000  65.94%; 1085 us/sq; ETA 0d 09:11; 0000000000000000
2019-12-06 14:41:46 89442629    58990000  65.95%; 1085 us/sq; ETA 0d 09:11; 0000000000000000
2019-12-06 14:41:58 89442629 EE 59000000  65.96%; 1085 us/sq; ETA 0d 09:11; 0000000000000000 (check 0.66s) 1 errors
Time to start over lol.

Last fiddled with by dcheuk on 2019-12-06 at 21:25
dcheuk is offline   Reply With Quote
Old 2019-12-06, 21:34   #248
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

7,823 Posts
Default

Quote:
Originally Posted by dcheuk View Post
Okay, thanks, going to try that. Was about to ask about the recurring error:
...

Time to start over lol.
Start the gpu tuning over yes; the exponent, no, the GEC will retreat a bit and ensure it goes right.
kriesel is online now   Reply With Quote
Old 2019-12-06, 21:45   #249
dcheuk
 
dcheuk's Avatar
 
Jan 2019
Florida

35 Posts
Default

Quote:
Originally Posted by kriesel View Post
Start the gpu tuning over yes; the exponent, no, the GEC will retreat a bit and ensure it goes right.
I just deleted it. I am such an idiot


Well time to run it again, 18 hours wasted, because I cannot read. Mistake learned. I didn't realize the program go back to the last save and tries again.

Code:
2019-12-06 14:41:46 89442629    58990000  65.95%; 1085 us/sq; ETA 0d 09:11; 0000000000000000
2019-12-06 14:41:58 89442629 EE 59000000  65.96%; 1085 us/sq; ETA 0d 09:11; 0000000000000000 (check 0.66s) 1 errors
...
2019-12-06 14:46:20 89442629    58990000  65.95%; 1087 us/sq; ETA 0d 09:12; 62d218e86858a452
2019-12-06 14:46:31 89442629 OK 59000000  65.96%; 1087 us/sq; ETA 0d 09:12; 4d3baf6da39985f4 (check 0.67s) 2 errors

Last fiddled with by dcheuk on 2019-12-06 at 21:46
dcheuk is offline   Reply With Quote
Old 2019-12-07, 14:45   #250
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

7,823 Posts
Default

Quote:
Originally Posted by dcheuk View Post
Well time to run it again, 18 hours wasted, because I cannot read. Mistake learned. I didn't realize the program go back to the last save and tries again.
That's one of the two justifications for doing the GEC.
1) Detecting an error with high reliability.
2) Knowing the previous save is good and can be resumed from with high confidence.
kriesel is online now   Reply With Quote
Old 2019-12-08, 08:49   #251
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

22×3×112 Posts
Default

I'd like to share from my experience with RadeonVII and GpuOwl:

- an open-air mining case like e.g. https://www.amazon.com/AAAwave-Stack...5794365&sr=8-3 is v. cheap and allows good cooling conditions for RadeonVII (specifically, the GPU suspended away from MB, and not confined)

- using the classic mining 1x-to-16x PCIe adapters works fine with GpuOwl, with negligeable slowdown. This allows putting as many GPUs as the PSU allows on one MB.

- if the nb. of PCIe connections on the MB is small, an adapter like this https://www.amazon.com/Ubit-Extender...794578&sr=8-15 or similar https://www.amazon.com.au/Ubit-Split...5794693&sr=8-4 allows converting 1-pcie to 4x, again without a significant performance hit

If doing any of the above, pay extra attention, measure twice (the voltage), etc. as the risk to burn the MB or the GPU or the house is not-negligeable.

Last fiddled with by preda on 2019-12-08 at 09:39
preda is offline   Reply With Quote
Old 2019-12-29, 20:44   #252
PhilF
 
PhilF's Avatar
 
"6800 descendent"
Feb 2005
Colorado

32·83 Posts
Default

Quote:
Originally Posted by Prime95 View Post
A bit bummed. Today I plugged my Windows box into Kill-A-Watt.

Best I can tell Kill-A-Watt shows gpuowl drawing an extra 240 watts over an idling Radeon VII. Wattman shows the card drawing only 190 watts.

The box has a Platinum power supply.

Anyone else observing such a discrepancy between Wattman readings and at-the-wall power draw?
My Windows box with the latest AMD software (Wattman appears to have been replaced with something else) shows my card is pulling 170W. My wattmeter shows actual wattage is 210W above idle.

Edit: My current early phase tuning has me at 1475 mhz, 982 mV, stock 1000 mhz RAM speed, fan cranked high enough to keep the temperature at 88 degrees (at least until spring), drawing 170W, and 956us performance on a 4608K FFT exponent.

Last fiddled with by PhilF on 2019-12-29 at 20:53
PhilF is offline   Reply With Quote
Old 2019-12-29, 21:17   #253
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

5×937 Posts
Default

Quote:
Originally Posted by PhilF View Post
My Windows box with the latest AMD software (Wattman appears to have been replaced with something else) shows my card is pulling 170W. My wattmeter shows actual wattage is 210W above idle.

Edit: My current early phase tuning has me at 1475 mhz, 982 mV, stock 1000 mhz RAM speed, fan cranked high enough to keep the temperature at 88 degrees (at least until spring), drawing 170W, and 956us performance on a 4608K FFT exponent.
Have you considered running Linux and the ROCm drivers? You would get more crunching done, I am interested in why you would not switch to Linux. Is this a desktop box? Or is it a lack of will to switch?
paulunderwood is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Vega 20 announced with 7.64 TFlops of FP64 M344587487 GPU Computing 4 2018-11-08 16:56
GTX 1180 Mars Volta consumer card specs leaked tServo GPU Computing 20 2018-06-24 08:04
RX Vega performance xx005fs GPU Computing 5 2018-01-17 00:22
Radeon Pro Duo 0PolarBearsHere GPU Computing 0 2016-03-15 01:32
AMD Radeon R9 295X2 firejuggler GPU Computing 33 2014-09-03 21:42

All times are UTC. The time now is 14:35.


Fri Jul 7 14:35:11 UTC 2023 up 323 days, 12:03, 0 users, load averages: 1.03, 0.74, 0.88

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔