![]() |
![]() |
#1 |
"Composite as Heck"
Oct 2017
3×311 Posts |
![]()
I'm trying to tune my Vega56 for gpuowl at 5M FFT. I'm a novice at overclocking especially on linux, please let me know if I'm doing things wrong. I'm aiming for two profiles, one for efficiency the other throughput (without going crazy either way). All I've done so far is mess around with the default p-states using "rocm-smi --setsclk LEVEL". Couldn't get states 6 or 7 to stick (they are sometime's briefly entered before staying 99% of the time at level 5). Set "rocm-smi --setfan 120" to be able to compare temps.
Software: Ubuntu 16.04, latest 4.13 kernel, ROCm 1.7.1, gcc 5.4.0, gpuowl v2.0-dbc5a01 Code:
P-state core_clk mem_clk temp watts ms/it mJ/it 5 1474 800 59 165 2.68 442.2 4 1312 800 49 132 2.87 378.84 3 1269 800 46 120 2.9 348 2 1138 800 43 109 3.16 344.44 1 991 800 42 97 3.5 339.5 0 852 800 40 87 3.93 341.91 If you want to chime in with your GPUs gpuowl 5M stats and how you got them feel free, who doesn't love benchmarks. |
![]() |
![]() |
![]() |
#2 |
"Composite as Heck"
Oct 2017
3×311 Posts |
![]()
ROCm supports Ubuntu 18.04 now so I migrated. The gpuowl version used above was consistently slower on this setup by 0.1ms/it at level 5 so I updated gpuowl and retested. It's not an apples to apples comparison as the new testing uses a 4608K kernel, but it's still interesting.
Software: Ubuntu 18.04, kernel 4.15.0-38-generic, gcc 8.2.0, gpuowl 4.7-5b01b65 Code:
P-state core_clk mem_clk temp watts ms/it mJ/it 5 1474 800 59 164 2.38 390.32 4 1312 800 50 130 2.50 325 3 1269 800 47 120 2.52 302.4 2 1138 800 40 91 2.66 242.06 1 991 700 37 75 3.04 228 0 852 167 29 40 8.84 353.6 I'm looking forward to the day ROCm exposes voltage control, going beyond 800MHz memory clock and finer control of core clocks. It looks like it may be possible to do this manually now by pushing PPT tables in binary form instead of the currently not working text form, has anyone tried this? https://github.com/RadeonOpenCompute...ment-418597555 |
![]() |
![]() |
![]() |
#3 | |
23·419 Posts |
![]() Quote:
I have always used the amdgpu driver. I run my gpus at nominal clock, and I let the gpu do automatic voltage and fan control with factory settings. At nominal clock the gpu goes up to 77C, 144W, as reported by the "sensors" command. |
|
![]() |
![]() |
#4 |
"Mihai Preda"
Apr 2015
22·192 Posts |
![]()
I use ROCm 1.9.1, Ubuntu 18.04 with Linux kernel 4.18.8.
With dual Vega64 (air with the standard "blower" cooler). Here are my observations: 1. ROCm is in general faster then amdgpu-pro (better compiler, producing better ISA code). 2. My sweet-spot is p-state 5 (rocm-smi --setsclk 5), which results in 1401MHz, GPU fan at 2300 RPM (automatic), 150W power, 75degC temperature. If I set the frequency higher (p-state 6, or 7, or automatic (default)), the GPU quickly reaches 82-84 decC and there does thermal throttling. This thermal throttling results in worse performance then p-state 5, so it's a lose-lose: higher temperature, higher power use, lower performance. I do not set the fan speed manually, I leave it on automatic, which is enough cooling for 150W with 75C. |
![]() |
![]() |
![]() |
#5 | |
202038 Posts |
![]() Quote:
I would suggest to leave the fan in automatic mode, until you have a stable temperature and know for sure which is the max temperature, then you can tune your fan speed if you want. |
|
![]() |
![]() |
#6 | ||
"Composite as Heck"
Oct 2017
3·311 Posts |
![]()
Just installed the latest modded kernel and appear to be able to alter P-state voltage and clock settings via PPT. I'm in the process of checking things and have a watt-meter so I'll be able to see if voltage control actually works soon.
All previous testing appears to be invalid as it may not have been on the stock config. Never been able to get states 6 or 7 to stick or break over the 75C temperature target, quite possibly due to a holdover from mining crypto on windows. I was under the impression that using the blockchain drivers and setting powerplay tables didn't write anything to the card (at least nothing that survives a reboot) but maybe it remembers the last good PPT setting or something. Am I wrong to think that firmware is the only thing stored on the card and that it hasn't been updated? Quote:
Quote:
75C is the temperature target, I think it'll only exceed this in states 6 and 7 until throttling. Until now I haven't had states 6 and 7 to play with so don't know for sure. It looks like the temp target can be changed via PPT which is nice. |
||
![]() |
![]() |
![]() |
#7 | |
24·223 Posts |
![]() Quote:
Today I have opened an issue on ROCm github and requested Debian support. I was not the first one to open an issue for Debian support, so may be something is moving in that direction. |
|
![]() |
![]() |
#8 |
"Eric"
Jan 2018
USA
3348 Posts |
![]()
I have done quite some tweaking on both Windows and Linux and due to my lack of knowledge when I use Linux I didn't use the ROCm driver but instead opted for the amdgpu-pro driver and I used an utility called amdcovc to overclock and tweak my GPU. I found that to try to maximize performance try to max out the HBM2 clock as high as possible especially if you have Samsung HBM rather than Hynix. What I found is that flashing a Vega 56 with a 64 BIOS (ONLY with Samsung HBM2 and I don't know how to check that on Linux) increases HBM voltage and thus overclocks better. On Vega 56 the max limit for Samsung is about 1020-1050MHz AFAIK and for Hynix it's usually under 1000. Undervolting it also greatly help as reducing the voltage to 0.95V will greatly reduce power and heat and increase stability on the memory as the core heats up the memory also becomes less stable. My personal finding is that even with core overclocked to massive speeds like 1750+MHz, if the HBM is at stock it barely improves compared to pure stock performance at about 2.2ms/it on Vega 56 flashed to 64 BIOS. However, with a memory overclock I can easily push it to 2.06ms/it while drawing half the power, from 300W down to less than 150W. If electricity is not a concern try to push it as high as the core will go while keeping the temperature below 70C and with reasonable fan speed which will probably improve the speed to about 1.9ms/it on Samsung HBM.
|
![]() |
![]() |
![]() |
#9 |
"Composite as Heck"
Oct 2017
3×311 Posts |
![]()
Thanks. Pretty sure it's Samsung HBM2 as I believe none of the early Vega 56 used Hynix but we'll see. Undervolt core and overclock memory is the idea. If the spreadsheet used to generate the powerplay tables is to be believed the memory voltage of P-State 4 can be altered because it uses the core voltage of P-State 4 instead (which may not be useful as we won't be able to undervolt core and OC mem at the same time). If I can avoid a bios flash I will but it's worth keeping in the back pocket.
I'm aiming for efficiency without sacrificing too much throughput. The best so far is 0.825V 1269MHz core, 900MHz memory for 2.47 ms/it at 100W for the low end. If it's stable I might end up using that (or a slightly bumped voltage for stability), depends how good a clock of ~1400 ends up being. |
![]() |
![]() |
![]() |
#10 |
"Composite as Heck"
Oct 2017
3·311 Posts |
![]()
Prep:
Create and use powerplay table:
Tested with ROCm 1.9.1, Ubuntu 18.04, custom kernel 4.19, latest gpuowl as of 2018-10-28 on a Vega 56 with stock bios. YMMV, don't blow up your card ;) |
![]() |
![]() |
![]() |
#11 | |
"Eric"
Jan 2018
USA
22·5·11 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
gpuOwL: an OpenCL program for Mersenne primality testing | preda | GpuOwl | 2908 | 2023-01-30 01:25 |
gpuowl: runtime error | SELROC | GpuOwl | 69 | 2021-09-29 10:07 |
How to interface gpuOwl with PrimeNet | preda | PrimeNet | 2 | 2017-10-07 21:32 |
Organizational tuning | biwema | Software | 12 | 2006-01-17 03:02 |