mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   Radeon VII @ newegg for 500 dollars US on 11-27 (https://www.mersenneforum.org/showthread.php?t=24979)

kriesel 2020-05-19 05:44

[QUOTE=Prime95;545822]sclk4 = 1547 MHz
sclk3 = 1373 MHz.

At 1200 MCLK, running 2 gpuowl instances I get effective iteration times of 570 and 606 microseconds for a 5M FFT. You can convert that into GHz-days/day.[/QUOTE]
I had p=92561321, 5M fft, 976 us/it; 1.0456 days, 335.278 ghd = 320.66ghd/day on v6.11-134.
606us/it would be ~516.44 ghd/day;
570 would be 549ghd/d.

ewmayer 2020-05-19 19:49

Thanks - before just blindly invoking the recipe in script form, I did it line-by-line, as root. On this system, I get none of the weird file-permission error I got on the Haswell when trying this (even as root - Matt and my wrestling with that covers the better part of a page of posts earlier in this thread).

This is with just gpu1 and gpu2 in the new build. (Pulled gpu3 yesterday because there simply is no good way to mount it in the test frame I have - proper securement would need a 5-6" longer frame and suitable added mounting hardware). Here the before:
[code]GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
0 72.0c 159.0W 1373Mhz 1001Mhz 70.98% manual 250.0W 4% 100%
1 75.0c 203.0W 1547Mhz 1001Mhz 58.82% manual 250.0W 4% 100% [/code]
After:
[code]GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
0 81.0c 257.0W 1750Mhz 1201Mhz 69.8% manual 250.0W 4% 100%
1 74.0c 205.0W 1547Mhz 1001Mhz 58.82% manual 250.0W 4% 100%
[/code]
So it hit only gpu1 (device 0 in the rocm-smi listing), and note the jump in gpu1 temp and SCLK setting - so that's why you need the ending --setsclk, because the MCLK tweak resets that (but not the fan setting, which we see is still @70% for device 0) to default. So bumped sclk back down to 3, now all is well again temp-wise, and the per-iter timings for the 2 gpu1 jobs have dropped from 1435 us to 1350 us, effectively the same gpu2 is getting at its higher sclk=4 setting. Very nice. So how do I also apply the MCLK tweak to gpu2?

[QUOTE=Prime95;545821]#Allow manual control
echo "manual" >/sys/class/drm/card0/device/power_dpm_force_performance_level
#Undervolt by setting max voltage
# V Set this to 50mV less than the max stock voltage of your card (which varies from card to card), then $echo "vc 2 1801 1030" >/sys/class/drm/card0/device/pp_od_clk_voltage
#Overclock mclk up to 1200
echo "m 1 1200" >/sys/class/drm/card0/device/pp_od_clk_voltage
#Push a dummy sclk change for the undervolt to stick
echo "s 1 1801" >/sys/class/drm/card0/device/pp_od_clk_voltage
#Push everything to the card
echo "c" >/sys/class/drm/card0/device/pp_od_clk_voltage
#Put card into desired performance level
/opt/rocm/bin/rocm-smi -d 0 --setsclk 3 --setfan 170


This is a typical startup script for me. Undervolting will help with heat. I typically am stable somewhere in the 1020 to 1050 range. Mclk is usually stable in the 1160 to 1200 range. YMMV.[/QUOTE]

ewmayer 2020-05-19 20:01

Update on the new build - as noted in my preceding post, pulled gpu3, which I now intend to switch to my Haswell system, as there is still room for 1 more Radeon 7 in the pcie 2.0x16 slot of that system. Will post pics of the 2-gpu new build once I tidy up the cabling.

Now the PSU in the Haswell is high-quality, but being an older specimen, only has two 8-pin power plugs, both of which are currently driving the existing R7. So my options are [a] buy a new PSU and redo all the cabling, and [b] do something 'interesting'. W.r.to the latter, I'd like to know if this idea is feasible or not: Since I underclock my R7s, they draw only ~1/2 the wattage they consume at the default settings. Clearly, the two 8-pin power plugs (plus whatever lesser amount of wattage the cards may be drawing via the PCI bus) can supply more than enough wattage to drive the card at full-blast settings, so in theory they should be able to drive 2 cards at the aforementioned underclock settings. You know how SATA cables typically have multiple power plugs along their length? That's what I picturing, just now with the 8-pin power plugs - are splitter adapters available for those?

M344587487 2020-05-19 20:46

Splitter adapters exist on ebay but I doubt they're sold from a high quality reputable source, 8 pin PCIe cables are rated for 150W and I doubt a proper source could get the proper rubber stamps to sell something that likely breaks the rating. As long as the splitter is made well I don't think you'd have a problem, but don't split into more than two even if you are sipping power. Overclockers often well exceed the rating for hours at a time with no consequences but there are rare stories of cables melting, it's not worth the risk.



Whatever you do don't buy a replacement cable that plugs directly into a modular PSU, even if it's a cable from a different SKU of the same manufacturer the proprietary connector on the PSU side may be the same connector but with a different pinout that makes something go pop.

Prime95 2020-05-19 20:59

[QUOTE=ewmayer;545874]Very nice. So how do I also apply the MCLK tweak to gpu2?[/QUOTE]

Change all occurrences of card0 to card1

ewmayer 2020-05-19 21:03

[QUOTE=M344587487;545881]Splitter adapters exist on ebay but I doubt they're sold from a high quality reputable source, 8 pin PCIe cables are rated for 150W and I doubt a proper source could get the proper rubber stamps to sell something that likely breaks the rating. As long as the splitter is made well I don't think you'd have a problem, but don't split into more than two even if you are sipping power. Overclockers often well exceed the rating for hours at a time with no consequences but there are rare stories of cables melting, it's not worth the risk.

Whatever you do don't buy a replacement cable that plugs directly into a modular PSU, even if it's a cable from a different SKU of the same manufacturer the proprietary connector on the PSU side may be the same connector but with a different pinout that makes something go pop.[/QUOTE]

[url=https://www.amazon.com/pcie-8-pin-splitter/s?k=pcie+8+pin+splitter]Shoulda checked Amazon first[/url] - let me know what you think of those offerings. In the Haswell the PSU is non-modular, no worries there. The new-build Corsair PSU is modular, but was going to use one of the cables that came with it to plug into that as currently (but now only one plugged into the PSU), then route the GPU-side ends into a pair of 8-pin splitters. So in the Haswell this should allow the 2nd GPU to be installed, and in the new build it looks a tidy way to reduce the overall amount of cabling, replacing one 2-foot-long 8-pin-to-dual-8-pin ribbon cable with a shorter splitter.

Prime95 2020-05-19 21:13

[QUOTE=ewmayer;545883][url=https://www.amazon.com/pcie-8-pin-splitter/s?k=pcie+8+pin+splitter]Shoulda checked Amazon first[/url] - let me know what you think of those offerings. In the Haswell the PSU is non-modular, no worries there. The new-build Corsair PSU is modular, but was going to use one of the cables that came with it to plug into that as currently (but now only one plugged into the PSU), then route the GPU-side ends into a pair of 8-pin splitters. So in the Haswell this should allow the 2nd GPU to be installed, and in the new build it looks a tidy way to reduce the overall amount of cabling, replacing one 2-foot-long 8-pin-to-dual-8-pin ribbon cable with a shorter splitter.[/QUOTE]

Does the Haswell PSU have spare 6-pin PCIE plugs? It might make more sense to get a 6-pin to 8-pin converter rather than making the one 8-pin cable drive two GPUs with splitters.

M344587487 2020-05-19 21:14

Any of the splitters (female to double male) with decent ratings look fine. It's personal preference but I would try to avoid powering a thousand dollars worth of equipment with the cheapest connector I could find even if it is a placebo to spend double.

kriesel 2020-05-19 21:26

What is the sequence? If the power reduction kicks in too late it may be moot. Early excess draw will prevent a system from starting, or it will limp with one gpu AWOL.
Ideally, gpus are idle during and throughout boot, then
power limit controls go into effect for all gpus present, then after a delay, a gpu app launches, then after a delay another gpu app launches. Probably easily done with a launch-all script.

prime95/mprime does staggered worker starts for compact memory allocations as I recall and it also gives a power draw staircase rather than a "flip the switch from idle to full power" behavior.

I'd stay away from 6-to-dual-8 adapters. Short adapters add less voltage drop. The adapters can probably thermally ride out a short term large draw before your power controls take effect. I'd be more concerned about the power supply and a reliable boot making it through marginal load levels, unless the cards are idle at boot and only ramped up later. And margins will narrow over time.

ewmayer 2020-05-19 21:45

[QUOTE=Prime95;545882]Change all occurrences of card0 to card1[/QUOTE]
Per-iter timing for the 2 jobs on that GPU dropped from 1350 us to 1250 us - outstanding.

[QUOTE=Prime95;545886]Does the Haswell PSU have spare 6-pin PCIE plugs? It might make more sense to get a 6-pin to 8-pin converter rather than making the one 8-pin cable drive two GPUs with splitters.[/QUOTE]

No, it does not - that was the first thing that occurred to me, as the GPU I bought for that system, the one I bought new, came with a pair of 2x6-pin to 8-pin "reverse splitter" adapters, apparently for use with older PSUs.

[QUOTE=kriesel;545888]What is the sequence? If the power reduction kicks in too late it may be moot. Early excess draw will prevent a system from starting, or it will limp with one gpu AWOL.
Ideally, gpus are idle during and throughout boot, then
power limit controls go into effect for all gpus present, then after a delay, a gpu app launches, then after a delay another gpu app launches. Probably easily done with a launch-all script.[/QUOTE]

The sequence would be more or less along your lines:

1. Fiddle mclk and sclk settings to desired (mclk up, sclk down);
2. On GPU1, fire up 1 gpuOwl instance, wait a few seconds, then fire up 2nd instance;
3. Repeat step [2] for all remaining GPUs installed in the system.

[QUOTE]I'd stay away from 6-to-dual-8 adapters. Short adapters add less voltage drop. The adapters can probably thermally ride out a short term large draw before your power controls take effect. I'd be more concerned about the power supply and a reliable boot making it through marginal load levels, unless the cards are idle at boot and only ramped up later. And margins will narrow over time.[/QUOTE]

Right - not even a possibility for either of my rigs - Haswell has just the two 6+2-pin plugs each of which I will split into a pair of same, new-build PSU uses 8-pin-to-dual-8-pin connectors for the GPUs, though it has some 6-pin power-outs labeled "peripheral & SATA", the only one of which I use is for the SDD.

[b]Edit:[/b] And, [url=https://www.amazon.com/dp/B079SS63WV]splitter 4-pack[/url] ordered, 2 for the haswell, 2 for the new build. This is fun (...right up to the point where you melt your rig and burn down the house, that is.)

[b]Edit2:[/b] Haswell system successfully upgraded from ROCm 2.10 to 3.3 ... alas, the upgrade does not solve the unable-to-fiddle-MCLK problem on this system - note the following was as root:
[code]root@ewmayer-haswell:/home/ewmayer/gpuowl# echo "manual" >/sys/class/drm/card0/device/power_dpm_force_performance_level
bash: /sys/class/drm/card0/device/power_dpm_force_performance_level: Permission denied[/code]
...but the upgrade to ROCm 3.3 by itself drops my per-iter timings for 2 runs @5.5M FFT (at sclk=4) from 1415 us to 1345 us, the same I was getting on GPU2 of the new-build (the one I run at sclk=4) before successfully doing the MCLK fiddle there, which dropped times to 1245 us. So still a 7-8% performance gain I'm missing out on on the Haswell, due to the weird file-permission issue. That 7-8% represents more than the total throughput of the CPU on that system, to put things in perspective. The file ownership & permissions look the same on both systems:
Haswell:
[code]ewmayer@ewmayer-haswell:~$ ll /sys/class/drm/card0/device
lrwxrwxrwx 1 root root 0 May 19 15:22 /sys/class/drm/card0/device -> ../../../0000:00:02.0/[/code]
New Build:
[code]ewmayer@ewmayer-gimp:~$ ll /sys/class/drm/card0/device
lrwxrwxrwx 1 root root 0 May 18 15:58 /sys/class/drm/card0/device -> ../../../0000:03:00.0/[/code]

ewmayer 2020-05-20 02:47

Ordered a pair of these [url=https://www.amazon.com/Monoprice-Medium-Multimedia-Desktop-Stand/dp/B01MA61R25]monitor stands[/url] to house the new build atop my desk ... I need 8" height so one is only half the needed height, will either screw together 2 legs on each corner and put one glass piece away as a spare, or flip the bottom one upside down and epoxy the bottom of the leg pairs together to create a glass-top-and-bottom enclosure, depending on details of the leg-height-adjustment mechanism. Once I put existing monitor on top I'll be able to use my desk again, while having easy access to the machine, with free airflow to it but some protection from downward-drifting motes of dust.


All times are UTC. The time now is 22:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.