![]() |
There are 2 cards in this linux box. These are the settings for the GPU that is throttling.
[CODE]echo "vc 2 1801 1020" >/sys/class/drm/card1/device/pp_od_clk_voltage echo "m 1 1160" >/sys/class/drm/card1/device/pp_od_clk_voltage echo "s 1 1801" >/sys/class/drm/card1/device/pp_od_clk_voltage echo "c" >/sys/class/drm/card1/device/pp_od_clk_voltage /opt/rocm/bin/rocm-smi -d 1 --setsclk 4 --setfan 220 [/CODE] These are the settings for the GPU that is not throttling: [CODE]echo "vc 2 1801 1050" >/sys/class/drm/card0/device/pp_od_clk_voltage echo "m 1 1170" >/sys/class/drm/card0/device/pp_od_clk_voltage echo "s 1 1801" >/sys/class/drm/card0/device/pp_od_clk_voltage echo "c" >/sys/class/drm/card0/device/pp_od_clk_voltage /opt/rocm/bin/rocm-smi -d 0 --setsclk 4 --setfan 180 [/CODE] Results: [CODE]GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 0 91.0c 182.0W 1547Mhz 1171Mhz 85.88% manual 250.0W N/A 87% 1 96.0c 195.0W 1547Mhz N/A 100.0% manual 250.0W N/A 96% GPU[0] : Voltage: 868 mV GPU[1] : Voltage: 856 mV[/CODE] At midday, temps can rise up to 95 and 102 respectively. GPU0 uses more voltage, a faster memory clock, yet runs cooler and uses less power. Stranger still is that GPU0 is physically above GPU1 -- thus should have warmer intake air. Current timings at 5M FFT running 2 gpuowls on each. GPU0 1690us, GPU1 1765us. BTW, how does one get the junction temperature. My rocm-smi only outputs one temperature value. |
On mine it is [C]/opt/rocm/bin/rocm-smi -t[/C]
[code] /opt/rocm/bin/rocm-smi -v ========================ROCm System Management Interface======================== ================================================================================ GPU[0] : VBIOS version: 113-D3600200-106 ================================================================================ ==============================End of ROCm SMI Log ==============================[/code] I have my fan at 140 and the junction temprature is 87C. So much quieter than 255. |
[QUOTE=Prime95;523511]
Stranger still is that GPU0 is physically above GPU1 -- thus should have warmer intake air. [/QUOTE] Effects of forced air cooling literally and figuratively blow away any possible intuited "hot air rising" effects. Does the lower GPU have less intake area due to being closer to the bottom of the case or some other obstruction? |
[QUOTE=Prime95;523511]There are 2 cards in this linux box. ...
GPU0 uses more voltage, a faster memory clock, yet runs cooler and uses less power. Stranger still is that GPU0 is physically above GPU1 -- thus should have warmer intake air. [/QUOTE] Have you tried physically swapping the gpus, to determine whether it's gpu attributes or location? Used a handheld digital thermometer to check the air temperature near each gpu's intake area? |
Same BIOS and it only shows one temperature:
[CODE]========================ROCm System Management Interface======================== ================================================================================ GPU[0] : VBIOS version: 113-D3600200-106 GPU[1] : VBIOS version: 113-D3600200-106 ================================================================================ ==============================End of ROCm SMI Log ============================== /opt/rocm/bin/rocm-smi -t ========================ROCm System Management Interface======================== ================================================================================ GPU[0] : Temperature (Sensor #1): 91.0 c GPU[1] : Temperature (Sensor #1): 96.0 c ================================================================================ ==============================End of ROCm SMI Log ============================== [/CODE] The lower, hotter GPU's air flow is half obstructed by the power supply. Then there is about a 1 inch gap to the higher, cooler GPU. I do have a 12 cm fan blowing air into the case from the bottom. Of course the power supply cables, which are zip-tied, obstruct some of that air flow. Air flow will never be good -- too much stuff in too little space. As an aside, my other linux box also has one hot GPU and one cooler GPU. Basically the same tower layout and in that box the one on top runs much hotter than the one on the bottom. I may have to try the sandpaper trick. Never done that before -- a little scary. |
[QUOTE=preda;523506]What I consider a hardware design fail on the Radeon VII is the large red "Radeon" LED logo placed on top of the GPU that covers the air exaust that would have been there (moving the hot air "up" from the GPU) otherwise. As a result, most of the hot air is directed downwards under the GPU.[/QUOTE]
I partially agree to that, however, I don't think the obstruction of airflow by the LED logo is causing the most damage, but more on AMD making the vapor chamber convex when the die is flat on the Radeon VII, which leads them to have to use a graphite thermal "pad" on the die to ensure proper contact on the chip, and obviously that would be significantly thicker than a thermal paste layer between two relatively flat and smooth surface. It really bothers me that AMD didn't even bother sanding their vapor chambers flat, and the fact that you have to do it yourself is a bit of a pain. |
[QUOTE=Prime95;523526]Same BIOS and it only shows one temperature:
[/QUOTE] I have an ASUS. |
[QUOTE=xx005fs;523527]I partially agree to that, however, I don't think the obstruction of airflow by the LED logo is causing the most damage, but more on AMD making the vapor chamber convex when the die is flat on the Radeon VII, which leads them to have to use a graphite thermal "pad" on the die to ensure proper contact on the chip, and obviously that would be significantly thicker than a thermal paste layer between two relatively flat and smooth surface. It really bothers me that AMD didn't even bother sanding their vapor chambers flat, and the fact that you have to do it yourself is a bit of a pain.[/QUOTE]
I've ordered sandpaper and Arctic Silver. Wish me luck. |
[QUOTE=Prime95;523562]I've ordered sandpaper and Arctic Silver. Wish me luck.[/QUOTE]
Try gluing a swatch of the sandpaper to a flat surface, like a piece of finished wood, making sure to use as little glue as possible and smoothing things out before letting dry (or putting a flat-sode weight on top during drying) - it makes achieving a flat surface much easier. |
I just finished my first lapping attempt. I may not have done a very good job. I was planing on taping the sandpaper face up to a granite or glass countertop. Upon disassembly that plan went out the window as the heatsink has standoffs that prevent lapping this way. I ended up taping the sandpaper to a level. This was not as wide or as steady as I would have liked. I tried 400 grit then 600 grit. Removing the old thermal pad required a lot of goo gone and alcohol (for the CPU, not me). Arctic Silver MX-4 used as the thermal paste.
All back together and the results are the same. Still thermal throttling. Timings go from 1715 us at startup to 1760 us due to throttling. The good news is that I didn't break anything. |
One mystery I still do not have an answer for. Why does one GPU report more voltage but lower power draw?
[CODE]GPU[0] : Average Graphics Package Power: 179.0W GPU[0] : Voltage: 868 mV [/CODE] vs. [CODE]GPU[1] : Average Graphics Package Power: 196.0W GPU[1] : Voltage: 856 mV [/CODE] Same clock speed, GPU0 has 1% more overclocking of memory. |
| All times are UTC. The time now is 22:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.