![]() |
|
|
#67 | ||
|
"Composite as Heck"
Oct 2017
16668 Posts |
4608K mem OD 19% perf_level test:
Used --setfan 200 as I was paranoid that the fans wouldn't ramp properly with memory temps. perf_level 8 omitted as it was hitting the power cap. The power cap can be changed with "--setpoweroverdrive WATTS" but there's no point until I figure out undervolting IMO. Code:
perf_level wall_power rocm-smi_power temp sclk mclk ms_per_it_4608K joules_per_it 7 319 244 97 1774 1192 0.86 0.20984 6 306 235 95 1750 1192 0.87 0.20445 5 280 211 85 1684 1192 0.88 0.18568 4 236 173 73 1547 1192 0.92 0.15916 3 198 140 61 1373 1192 0.98 0.1372 Quote:
Quote:
A few other notes:
|
||
|
|
|
|
|
#68 | |
|
"Eric"
Jan 2018
USA
223 Posts |
Quote:
Speaking about the memory frequency, I am pretty sure that if you managed to figure out how to edit the power play table, it can go above 1200MHz and be set to whatever you want, and also you will have more flexibility controlling the frequency and voltage rather than using the curve Wattman provides. Last fiddled with by xx005fs on 2019-03-23 at 20:02 Reason: added link to the reddit article |
|
|
|
|
|
|
#69 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
1137410 Posts |
Quote:
What are you trying to achieve? |
|
|
|
|
|
|
#70 |
|
Sep 2006
The Netherlands
3·269 Posts |
M344587487 - the question to benchmark it you can ignore it, if what i read in some documentation happens to be true - namely that you cannot manual read/write into the LDS.
It's only relevant question when you can read/write to/from it. Much of the documentation of AMD seems to get written by the latest Indian helpdesk girl they have hired - who still has to learn how a GPU looks like - and seemingly they do not have the actual hardware at the spot where they write the documentation to verify what they write is true. Which is why information from forums is critical. I lost months to some of the Nvidia architectures as wasted time just because i didn't realize how bad Fermat architecture is in prefetching GDDR5 versus Kepler and newer generations. arguably it is also possible it wasn't the prefetching yet profitting more from running multiple warps at the same time could be the reason for this performance penalty as well at Fermat versus Kepler - yet the solution to sieving at Kepler generation and 900 and 1000 series Nvidia is much simpler than for Fermat architecture in that case and i lost months to figuring that one out. So having the correct throughput latencies of features you can potentially use to speedup is very important. However if they can't get used by the programmer then they're useless to benchmark. In general spoken AMD and Nvidia suck bigtime in giving correct documentation for programmers - like throughput latencies and how the actual execution works of threads/warps or whatever you want to call it at the SIMDs. In fact at latest Nvidia architecture they're not giving away how many execution units it has for example for integer multiplication. How can those manufacturers expect lots of good programs for those gpu's giving away so little information? Last fiddled with by diep on 2019-03-23 at 21:15 |
|
|
|
|
|
#71 | |
|
Sep 2006
The Netherlands
3×269 Posts |
Quote:
3.4 Tflops means it can execute 1.7T FP64 instructions a clock. Parallellizing the entire transform over what is it like 80 compute units makes it really problematic to get a high percentage there out of that 1.7T. Parallellizing supercomputers and processes at CPU's, even with complex game tree search algorithms (where the parallel algorithm just to get the task paralelllized as nonstop searches start and stop and get started based upon the result of this search) was 40 pages of a4 - despite that i usually got nearby 100% scaling and over 50% speedup at hundreds of cpu's. Compare to deep blue that was more like at under 1% there though they claimed 3% in later publication. That > 90% is total wishful thinking at GPU's with 80 compute units. What you can do is run coupleo of dozens of those 89M tests at the same time and then log the average iteration times after couple of minutes. And then look at the average iteration time which you will need to calculate from all the iteration times after couple of minutes. Then draw conclusions based upon that. Start with running 2, then 4 etc. Last fiddled with by diep on 2019-03-23 at 21:35 |
|
|
|
|
|
|
#72 |
|
"Eric"
Jan 2018
USA
22310 Posts |
Undervolting and overclocking, not for GIMPS tho. It's a common practice by miners to drop voltages below what AMD allows in Wattman (by changing voltage for other P-states or such) or Overclockers that just want to max out the hardware by changing the max power limit and voltage (obv clock speed too). Last fiddled with by xx005fs on 2019-03-23 at 22:41 |
|
|
|
|
|
#73 | ||
|
"Composite as Heck"
Oct 2017
2×52×19 Posts |
Got undervolting working with that link. Turns out every R7 GPU is tested and ships with a different stock voltage, probably indicating how well you've done in the silicon lottery. Because of this undervolting works but probably won't yield as much of a benefit as undervolting Vega 56/64 does (which I believe had a flat insanely high stock voltage of 1.2V). My stock voltage is 1081mV, other people have reported stock voltages of 1127mV, 1133mV and 1040mV.
The way the voltage is set is on a curve. Here's my stock curve: Code:
OD_VDDC_CURVE: 0: 808Mhz 690mV 1: 1304Mhz 797mV 2: 1801Mhz 1081mV 4608K mclk=1200 perf_level=5 undervolt test: Code:
undervolt_setting perf_level wall_power rocm-smi_power --setfan ms_per_it_4608K joules_per_it
vc 2 1801 1081 5 282 213 160 0.88 0.18744
vc 2 1801 1030 5 270 200 145 0.88 0.176
vc 2 1801 1020 5 270 200 135 0.88 0.176
vc 2 1801 1015 5 268 197 125 0.88 0.17336
vc 2 1801 1010 errors within minutes. Previous result had no errors within half an hour but it
needs an overnight stability test and then get backed off to 1020 if it passes
Code:
undervolt_setting perf_level wall_power rocm-smi_power --setfan ms_per_it_4608K joules_per_it
vc 2 1801 1081 4 243 175 110 0.92 0.161
vc 2 1801 1030 4 237 171 105 0.92 0.15732
vc 2 1801 1020 4 236 169 100 0.92 0.15548
vc 2 1801 1010 4 234 166 97 0.92 0.15272
vc 2 1801 1000 4 233 166 95 0.92 0.15272
vc 2 1801 990 errors within minutes. Previous result had no errors within half an hour but it
needs an overnight stability test and then get backed off to 1010 if it passes
Quote:
Quote:
Out of curiosity I just tested running two instances of gpuowl at the same time and surprisingly it worked. More surprisingly it increased throughput and was more efficient to boot. Power use increased compared to a single instance at the same settings but more than made up for it in throughput. I only have time for one data point now, the rest will have to wait until tomorrow: Code:
undervolt_setting workers perf_level wall_power rocm-smi_power --setfan ms_per_it_for_each_worker effective_ms_per_it_4608K joules_per_it vc 2 1801 1030 2 4 249 180 115 1.69 0.845 0.1521 |
||
|
|
|
|
|
#74 |
|
"Eric"
Jan 2018
USA
223 Posts |
I realized that the Vega series cards are very temperature sensitive, especially the HBM memories. Previously, I used the stock air cooler on my Vega 56 and I managed 1600MHz on 1050mV (actual load voltage is more like 1000mV), and HBM was way down to 1060MHz (for gaming that was fine but for GPUOWL i had to drop to 1020MHz). On water cooling however I managed near 1650MHz on the same voltage for core and 1150MHz on HBM (with GPUOWL dropping to 1080-1100MHz for stability). I don't know how R7 behaves when temperature decreases but it's certainly interesting (so far i have been seeing good results online with custom watercooling).
As a side note, I think the person in reddit used 2000grit polishing gel to polish the surface at the end, and I felt that the step is unnecessary as lapping something is usually just to even out the surface. As long as it's not too rough (say 200grit with very noticable grooves) it should be fine and the thermal grease should just fill it in. Last fiddled with by xx005fs on 2019-03-24 at 01:06 |
|
|
|
|
|
#75 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24×3×163 Posts |
|
|
|
|
|
|
#76 |
|
"Composite as Heck"
Oct 2017
2·52·19 Posts |
The results are in. Two instances is the limit for simultaneous execution, a third compiles the kernel but doesn't seem to execute. Undervolted by finding the minimum voltage that produced no errors in half an hour, then backed off the voltage by 10mV for safety. If that turns out to be unstable when tested properly all it means is that a few watts get added to the below figures.
Best results: Code:
target workers mclk sclk 4608K_combined_throughput_ms_it 5M_combined_throughput_ms_it rocm-smi_power_after_undervolt_YMMV efficient_throughput 2 1201 1547 0.845 0.95 176 quick_single_test 1 1201 1802 0.86 0.95 232 Here's an approximation of the script I'm using to init the card: Code:
#!/bin/bash if [ "$EUID" -ne 0 ]; then echo "Radeon VII init script needs to be executed as root" && exit; fi #Allow manual control echo "manual" >/sys/class/drm/card0/device/power_dpm_force_performance_level #Undervolt by setting max voltage # V Set this to 50mV less than the max stock voltage of your card (which varies from card to card), then optionally tune it down echo "vc 2 1801 1010" >/sys/class/drm/card0/device/pp_od_clk_voltage #Overclock mclk to 1200 echo "m 1 1200" >/sys/class/drm/card0/device/pp_od_clk_voltage #Push a dummy sclk change for the undervolt to stick echo "s 1 1801" >/sys/class/drm/card0/device/pp_od_clk_voltage #Push everything to the card echo "c" >/sys/class/drm/card0/device/pp_od_clk_voltage #Put card into desired performance level /opt/rocm/bin/rocm-smi --setsclk 4 --setfan 110 As a quick and dirty guide to getting the same setup as this:
|
|
|
|
|
|
#77 | |
|
1448 Posts |
Quote:
Are you sure that tuning gpu manually is better than leaving it on automatic ? |
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Vega 20 announced with 7.64 TFlops of FP64 | M344587487 | GPU Computing | 4 | 2018-11-08 16:56 |
| GTX 1180 Mars Volta consumer card specs leaked | tServo | GPU Computing | 20 | 2018-06-24 08:04 |
| RX Vega performance | xx005fs | GPU Computing | 5 | 2018-01-17 00:22 |
| Radeon Pro Duo | 0PolarBearsHere | GPU Computing | 0 | 2016-03-15 01:32 |
| AMD Radeon R9 295X2 | firejuggler | GPU Computing | 33 | 2014-09-03 21:42 |