mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2019-03-25, 12:40   #78
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

22·3·112 Posts
Default

Quote:
Originally Posted by M344587487 View Post
The results are in.

Best results:
Code:
target               workers mclk sclk 4608K_combined_throughput_ms_it 5M_combined_throughput_ms_it rocm-smi_power_after_undervolt_YMMV
efficient_throughput 2       1201 1547 0.845                           0.95                         176
quick_single_test    1       1201 1802 0.86                            0.95                         232
0.85ms/it at wavefront, that's amazing! More than twice vega64. Sweet.
preda is offline   Reply With Quote
Old 2019-03-25, 15:42   #79
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2×52×19 Posts
Default

Quote:
Originally Posted by SELROC View Post
Are you sure that tuning gpu manually is better than leaving it on automatic ?
Definitely, here's the efficient tuning compared to completely stock results (--setfan aside as the temps were too high on auto):
Code:
type  workers effective_5M_ms/it rocm-smi_watts
tuned 2       0.95               180 (mclk=1200, sclk=1547, "vc 2 1801 1030")
stock 2       0.98               247
stock 1       1.04               247
The stock settings may have been hampered ever so slightly by the default power limit of 250W although probably not much as the power draw tends to be pretty steady with gpuowl.

  • It's worth setting the perf_level with --setsclk, if you don't it'll automatically ramp up to --setsclk 8 which has the best throughput but poor efficiency compared to perf_level 4 or 5. By reducing throughput a little you increase efficiency a lot, it depends what you're going for
  • It's worth overclocking the memory because there are some easy gains. Not a big jump in power consumption for a worthwhile jump in throughput
  • It's necessary to set the fans manually with --setfan if you've overclocked the memory as on auto it seems to have trouble maintaining the 95 degree target and can push into the hundreds. If the memory is not overclocked the auto fans are good at maintaining temps unless you're using default settings and the card happens to have a high stock voltage
  • It's worth undervolting IMO for some power savings but more importantly to reduce heat meaning lower fan speeds for less noise and less wear and tear on the fans. I'm not suggesting an undervolt to the bleeding edge but to some safe tens of mV under the stock voltage, whatever is easy to grab. That said they've tuned the voltage to each card so it's the least beneficial (and at the same time most sketchy) thing to tune and you could skip it if you're uneasy about messing with the voltage. If you have a card with a really bad stock voltage and are using --setsclk 8 I think it's borderline necessary to undervolt to try and avoid hitting the power limit and having the card sound like a jet. For reference a fan speed setting under 120 is comfortable to have next to you, above 150 is obnoxious, above 180 approaches jet territory


Quote:
Originally Posted by preda View Post
0.85ms/it at wavefront, that's amazing! More than twice vega64. Sweet.
Nice :) Glad to have benchmarked it especially as the tuning capabilities on Linux are finally easily accessible.

I did try mfakto, perftest results here although I don't really know what they mean or if the test was done right ( https://www.mersenneforum.org/showpo...postcount=1504 ). Two instances of mfakto didn't improve throughput, and running one instance of mfakto and one of gpuowl had mfakto running at ~96% and gpuowl at ~5% of where they would be had they been solo.

What else needs benching? Getting it on mersenne.ca by running a specific test is all that's left on my list (it's unclear if they even accept gpuowl/PRP results but I'll try anyway with stock and tuned results). I'm not going to go beyond 1200MHz on the memory as it's beyond the default limits, if something goes tits up I don't want that voiding the warranty.
M344587487 is offline   Reply With Quote
Old 2019-03-25, 15:58   #80
SELROC
 

4C716 Posts
Default

Quote:
Originally Posted by M344587487 View Post
Definitely, here's the efficient tuning compared to completely stock results (--setfan aside as the temps were too high on auto):
Code:
type  workers effective_5M_ms/it rocm-smi_watts
tuned 2       0.95               180 (mclk=1200, sclk=1547, "vc 2 1801 1030")
stock 2       0.98               247
stock 1       1.04               247
The stock settings may have been hampered ever so slightly by the default power limit of 250W although probably not much as the power draw tends to be pretty steady with gpuowl.

  • It's worth setting the perf_level with --setsclk, if you don't it'll automatically ramp up to --setsclk 8 which has the best throughput but poor efficiency compared to perf_level 4 or 5. By reducing throughput a little you increase efficiency a lot, it depends what you're going for
  • It's worth overclocking the memory because there are some easy gains. Not a big jump in power consumption for a worthwhile jump in throughput
  • It's necessary to set the fans manually with --setfan if you've overclocked the memory as on auto it seems to have trouble maintaining the 95 degree target and can push into the hundreds. If the memory is not overclocked the auto fans are good at maintaining temps unless you're using default settings and the card happens to have a high stock voltage
  • It's worth undervolting IMO for some power savings but more importantly to reduce heat meaning lower fan speeds for less noise and less wear and tear on the fans. I'm not suggesting an undervolt to the bleeding edge but to some safe tens of mV under the stock voltage, whatever is easy to grab. That said they've tuned the voltage to each card so it's the least beneficial (and at the same time most sketchy) thing to tune and you could skip it if you're uneasy about messing with the voltage. If you have a card with a really bad stock voltage and are using --setsclk 8 I think it's borderline necessary to undervolt to try and avoid hitting the power limit and having the card sound like a jet. For reference a fan speed setting under 120 is comfortable to have next to you, above 150 is obnoxious, above 180 approaches jet territory



Nice :) Glad to have benchmarked it especially as the tuning capabilities on Linux are finally easily accessible.

I did try mfakto, perftest results here although I don't really know what they mean or if the test was done right ( https://www.mersenneforum.org/showpo...postcount=1504 ). Two instances of mfakto didn't improve throughput, and running one instance of mfakto and one of gpuowl had mfakto running at ~96% and gpuowl at ~5% of where they would be had they been solo.

What else needs benching? Getting it on mersenne.ca by running a specific test is all that's left on my list (it's unclear if they even accept gpuowl/PRP results but I'll try anyway with stock and tuned results). I'm not going to go beyond 1200MHz on the memory as it's beyond the default limits, if something goes tits up I don't want that voiding the warranty.

I don't follow you here. With gpuowl the gpu RAM already runs at maximum speed with automatic setting (2000MHz). The gpu core clock runs at 1319MHz. Overclocking the gpu cores to the maximum 1360MHz will produce more heat and thermal throttling, thus it is not worth to overclock.


BTW, I run my gpus open air, with ambient temperature at 20-25C, they go up to 80-82C already.
  Reply With Quote
Old 2019-03-25, 16:51   #81
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24×3×163 Posts
Default

Quote:
Originally Posted by M344587487 View Post
What else needs benching? Getting it on mersenne.ca by running a specific test is all that's left on my list (it's unclear if they even accept gpuowl/PRP results but I'll try anyway with stock and tuned results). I'm not going to go beyond 1200MHz on the memory as it's beyond the default limits, if something goes tits up I don't want that voiding the warranty.
If you mean adding an entry in https://www.mersenne.ca/cudalucas.php for the Radeon VII, give the exponent in the upper right of https://www.mersenne.ca/cudalucas.php a try for 30,000 iterations and send in the log content. And do tf and send it in also if you haven't yet. https://www.mersenne.ca/mfaktc.php

(And for anyone who has RTX20xx, please submit also.)

For submitting results there, mersenne.ca only takes tf for 232 > p > 109 to my knowledge. https://www.mersenneforum.org/showpo...11&postcount=9
kriesel is online now   Reply With Quote
Old 2019-03-25, 17:12   #82
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

16668 Posts
Default

Quote:
Originally Posted by SELROC View Post
I don't follow you here. With gpuowl the gpu RAM already runs at maximum speed with automatic setting (2000MHz). The gpu core clock runs at 1319MHz. Overclocking the gpu cores to the maximum 1360MHz will produce more heat and thermal throttling, thus it is not worth to overclock.


BTW, I run my gpus open air, with ambient temperature at 20-25C, they go up to 80-82C already.
I think you might be confusing this with a different card. The Radeon VII stock speeds are 1000MHz HBM2 memory and 1800MHz max core clock. The specs you're quoting sound like a Polaris card. Anything I've said only applies to Radeon VII which I may have confusingly called the R7.

Quote:
Originally Posted by kriesel View Post
If you mean adding an entry in https://www.mersenne.ca/cudalucas.php for the Radeon VII, give the exponent in the upper right of https://www.mersenne.ca/cudalucas.php a try for 30,000 iterations and send in the log content. And do tf and send it in also if you haven't yet. https://www.mersenne.ca/mfaktc.php

(And for anyone who has RTX20xx, please submit also.)

For submitting results there, mersenne.ca only takes tf for 232 > p > 109 to my knowledge. https://www.mersenneforum.org/showpo...11&postcount=9
Thanks.
M344587487 is offline   Reply With Quote
Old 2019-03-25, 17:16   #83
SELROC
 

2×32×11×37 Posts
Default

I'm not confusing it. I am telling you what a RX580 runs like. I will be able to tell for Radeon VII when it arrives.
  Reply With Quote
Old 2019-03-25, 17:50   #84
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

95010 Posts
Default

Ok, then maybe rewording will clear things up if there is still any confusion. The settings I used are a core underclock from 1800MHz to 1547MHz, a memory overclock from 1000MHz to 1200MHz, and an optional core undervolt by a small amount from stock. Polaris seems to be quite a different beast to Vega. I look forward to seeing how you fare with the R7 and what the stock voltage of your card is.
M344587487 is offline   Reply With Quote
Old 2019-03-25, 18:51   #85
xx005fs
 
"Eric"
Jan 2018
USA

223 Posts
Default

Quote:
Originally Posted by M344587487 View Post
Ok, then maybe rewording will clear things up if there is still any confusion. The settings I used are a core underclock from 1800MHz to 1547MHz, a memory overclock from 1000MHz to 1200MHz, and an optional core undervolt by a small amount from stock. Polaris seems to be quite a different beast to Vega. I look forward to seeing how you fare with the R7 and what the stock voltage of your card is.

Does raising the core clock help the throughput when the memory is overclocked to 1200MHz (say that you are running 1800/1200 instead of 1547/1200)
xx005fs is offline   Reply With Quote
Old 2019-03-25, 19:27   #86
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

95010 Posts
Default

Quote:
Originally Posted by xx005fs View Post
Does raising the core clock help the throughput when the memory is overclocked to 1200MHz (say that you are running 1800/1200 instead of 1547/1200)
When using two workers yes, didn't bench it before as I figured such inefficient settings would be used to quickly DC a prime contender which is what the quick_single_test result above is. When using 1800/1200 with two workers the iteration times for 5M are 1.76 ms/it each, an effective rate of 0.88 ms/it. It was at the power cap of 250W in rocm-smi which means it's ~8% quicker than the 2 worker 1547/1200 but uses ~39% more energy. The fan speed also had to be set to over 200 which is hilariously loud.
M344587487 is offline   Reply With Quote
Old 2019-03-25, 19:32   #87
xx005fs
 
"Eric"
Jan 2018
USA

223 Posts
Default

Quote:
Originally Posted by M344587487 View Post
When using two workers yes, didn't bench it before as I figured such inefficient settings would be used to quickly DC a prime contender which is what the quick_single_test result above is. When using 1800/1200 with two workers the iteration times for 5M are 1.76 ms/it each, an effective rate of 0.88 ms/it. It was at the power cap of 250W in rocm-smi which means it's ~8% quicker than the 2 worker 1547/1200 but uses ~39% more energy. The fan speed also had to be set to over 200 which is hilariously loud.

Yeah i figured it was probably going to be limited by HBM. 0.88ms/it for 1 worker is mighty impressive and it's way faster than even 2 vega 64 would do. How much power is the card drawing by itself after undervolt?
xx005fs is offline   Reply With Quote
Old 2019-03-25, 20:03   #88
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2·52·19 Posts
Default

Quote:
Originally Posted by xx005fs View Post
Yeah i figured it was probably going to be limited by HBM. 0.88ms/it for 1 worker is mighty impressive and it's way faster than even 2 Vega 64 would do. How much power is the card drawing by itself after undervolt?
0.88 ms/it is the effective throughput of combining two simultaneous workers doing 1.76 ms/it each at 5M with the card reporting ~250W in rocm-smi. The single worker result is 0.95 ms/it with the card reporting ~232W in rocm-smi. In both cases there was a mild undervolt to ~1030mV which was more for reducing temps than power consumption.
M344587487 is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Vega 20 announced with 7.64 TFlops of FP64 M344587487 GPU Computing 4 2018-11-08 16:56
GTX 1180 Mars Volta consumer card specs leaked tServo GPU Computing 20 2018-06-24 08:04
RX Vega performance xx005fs GPU Computing 5 2018-01-17 00:22
Radeon Pro Duo 0PolarBearsHere GPU Computing 0 2016-03-15 01:32
AMD Radeon R9 295X2 firejuggler GPU Computing 33 2014-09-03 21:42

All times are UTC. The time now is 14:49.


Fri Jul 7 14:49:02 UTC 2023 up 323 days, 12:17, 0 users, load averages: 0.97, 1.28, 1.17

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔