mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   Radeon VII @ newegg for 500 dollars US on 11-27 (https://www.mersenneforum.org/showthread.php?t=24979)

ewmayer 2020-05-18 23:18

[b]Edit:[/b] Hmm, I'm just not liking the footprint of the system with GPU #3 in, and properly securing that horizontally-sticking-out GPU would need some additional custom bracket hackery ... cool as the idea of 3 GPUs in a single ATX mobo is, I'm wondering if moving GPU #3 to the available PCIe 2.0x16 slot of my Haswell system (~3" below the PCIe 3.0x16 slot) makes more sense ... with 2 GPUs in the new system I can run GPU1 at sclk=3 and GPU2 (with nothing blocking the 3 fans) at sclk=4. Adding GPU3 means downclocking all 3 GPUs to sclk=3.

The GPU in the Haswell system is at sclk=4 ... adding a 2nd would mean running both at sclk=3 for temp/wattage reasons. In either setup I end up with 1 'fast' GPU and 3 'slow' ones, but putting 2 in each mobo makes for a pair of much tidier packages. Only problem is that the older PSU in my Haswell system only has 2 of the 8-pin PCIe power connectors needed for the Radeon VII ... I'll need to have a closer look tomorrow to see what power connectors are still free in that system, i.e. whether ganging them together might be possible.

Buying a new PSU for the Haswell is of course another option, but that means more expense (minor) and more time rewiring the guts of that system (no huge deal for modern modular PSUs, but still).

Prime95 2020-05-18 23:25

Have you tried bumping up the memory clock?

ewmayer 2020-05-18 23:34

[QUOTE=Prime95;545787]Have you tried bumping up the memory clock?[/QUOTE]

You mean using Matt's setup script? (Someone remind me where that is located).

That didn't work on my Haswell system due to weird file-access problems, but yes, I should try it on the new build.

IIRC the setup script needs one to find the max stock voltage of one's card(s) ... how to do that?

kriesel 2020-05-19 02:01

[QUOTE=paulunderwood;545782]Congrats on getting you 3-GPU rig running. With the one on your Haswell that is equivalent to ~40x 4 core CPUs but far more efficient in power usage and overall size. Now you should soon be propelled into the top 10 testers and have a better chance of striking gold and a place in Mersenne prime history. [/QUOTE]That's a lot of optimism.
The current #10 in primality testing is Runtime Error with 397903. GhzD in the past year.
Rating on a Radeon VII is 274GhzD/day. To kick out that many in a year running 24/365 takes 397903/365/274 = 3.98 Radeon VIIs. And assuming that the top ten won't add hardware. And assuming those somewhat below #10 won't add sufficient hardware to raise the bar. (And haven't already.) How does a Haswell compare to a Radeon VII? How does downclocking the Radeons for power efficiency add to the required number?

Something I've noticed is in some parts of the top producer standings, rank x throughput =~constant. See for example, [URL]https://www.mersenne.org/report_top_500_ll/[/URL]
#82 49916; double is 99832
#42 99661 is nearest; very close to #41. double again is 199664
#20 198233 is nearest; very close to 20.5. Double again is 399328
#10 397903 is nearest; very close to 10.25. double again is 798656
#7 701497 is nearest. not so close to 5.125. double again is 1597312
#3 1889803 is nearest. pretty close to 2.5625.

Different start;
start at #128 29157; double is 58314, expect #64; nearest is #69 at 59066; off by 5 out of 64.
Double again, 116628, expect #32; nearest is #36 at 114990; off by 4 out of 32.
double again, 233256, expect #16; nearest is #16 at 232765; bullseye.
Double again, 466512, expect #8; nearest is #10 at 397903; off by 2 out of 8.
Double again, 933024, expect #4; nearest is #6 at 914563; off by 2, not so good here. It takes a lot more to move up at the very top.
other direction, half 29157 = 14579, expect ~#256; nearest is #276 at 14570.5; off by 20 out of 128.

axn 2020-05-19 02:47

[QUOTE=kriesel;545805] Rating on a Radeon VII is 274GhzD/day. [/QUOTE]

That figure is very out of date. With the current state of gpuowl, it sits around high 400s/low 500s (and my information could be out of date as well!). And that is not running full blast; power consumption in the 150-200w range only. That means it is worth 10 modern 4-core. So a rig of 3 is worth 30x 4-core. Close enough to paul's claim of 40x.

kriesel 2020-05-19 03:16

[QUOTE=axn;545812]That figure is very out of date. With the current state of gpuowl, it sits around high 400s/low 500s (and my information could be out of date as well!)..[/QUOTE]274.8 in the 85M column as of tonight. I probably should have used the 95M column 281ghd/day though to better match current wavefront. [URL]https://www.mersenne.ca/cudalucas.php[/URL] It would be good to get current figures in there, although it raises the question of how to deal with the gpuowl/cudalucas divide on some NVIDIA gpus, those that are capable of running both. And how to deal with any further software performance improvements in the future.
What are people doing to get such phenomenal numbers as you gave?

Some log analysis here indicates 340-440.
Single task, M155991937 1023.7 ghd per [URL]https://www.mersenne.ca/exponent/155991937;[/URL] 1444us/it -> 2.607 days ->392.66 ghd/day.
It's running unrestricted as to clock, on an open frame system so cooling should be good.
Mostly-dual task, ~440GhzD/day, combination of 91M and 140M tasks. 95M with downclocking, 340GhD/day.

ewmayer 2020-05-19 03:20

[QUOTE=kriesel;545805]That's a lot of optimism.
The current #10 in primality testing is Runtime Error with 397903. GhzD in the past year.
Rating on a Radeon VII is 274GhzD/day.[/QUOTE]

I'm getting ~500 GhzD/day on both my Haswell-system R7 @sclk=4 and the new-build ones @sclk=3 ... the latter are more efficient due to running ROCm 3.3, vs the out-of-date 2.10 for the Haswell. With 4 R7s running so we get ~15,000 GhzD/week.

axn 2020-05-19 03:26

[QUOTE=ewmayer;545816]I'm getting ~500 GhzD/day on both my Haswell-system R7 @sclk=4 and the new-build ones @sclk=3 ... the latter are more efficient due to running ROCm 3.3, vs the out-of-date 2.10 for the Haswell. With 4 R7s running so we get ~15,000 GhzD/week.[/QUOTE]

Ah! Now I see what Paul meant. I didn't realize that you had an existing R VII. So 40x it is. @2000 Gd/d, that should take 200 days to reach top 10 (current). That's assuming you're not doing any CPU crunch (but let's be honest, any CPU crunching is probably just a rounding error :smile:)

kriesel 2020-05-19 04:21

[QUOTE=ewmayer;545816]I'm getting ~500 GhzD/day on both my Haswell-system R7 @sclk=4 and the new-build ones @sclk=3 ... the latter are more efficient due to running ROCm 3.3, vs the out-of-date 2.10 for the Haswell. With 4 R7s running so we get ~15,000 GhzD/week.[/QUOTE]
So your answer to how, is linux, sclk4 or 3 whatever that means in Mhz clock rates (don't know if that implies overclocking), rocm, recent version, and probably direct mount in PCIe x16 slots, v3 slots? Any adjustments to voltages or memory clocks? Two instances per gpu if I recall correctly.
Numbers I gave were for Win7, mostly on extenders, mostly stock clocks and voltages.

It would be a useful comparison if you would benchmark one of those Radeons on your 3-card setup with a x1-x16 extender also and give a little more precision.

Prime95 2020-05-19 04:37

[QUOTE=ewmayer;545789]You mean using Matt's setup script? (Someone remind me where that is located).

That didn't work on my Haswell system due to weird file-access problems, but yes, I should try it on the new build.

IIRC the setup script needs one to find the max stock voltage of one's card(s) ... how to do that?[/QUOTE]


#Allow manual control
echo "manual" >/sys/class/drm/card0/device/power_dpm_force_performance_level
#Undervolt by setting max voltage
# V Set this to 50mV less than the max stock voltage of your card (which varies from card to card), then $echo "vc 2 1801 1030" >/sys/class/drm/card0/device/pp_od_clk_voltage
#Overclock mclk up to 1200
echo "m 1 1200" >/sys/class/drm/card0/device/pp_od_clk_voltage
#Push a dummy sclk change for the undervolt to stick
echo "s 1 1801" >/sys/class/drm/card0/device/pp_od_clk_voltage
#Push everything to the card
echo "c" >/sys/class/drm/card0/device/pp_od_clk_voltage
#Put card into desired performance level
/opt/rocm/bin/rocm-smi -d 0 --setsclk 3 --setfan 170


This is a typical startup script for me. Undervolting will help with heat. I typically am stable somewhere in the 1020 to 1050 range. Mclk is usually stable in the 1160 to 1200 range. YMMV.

Prime95 2020-05-19 04:42

[QUOTE=kriesel;545818]So your answer to how, is linux, sclk4 or 3 whatever that means in Mhz clock rates (don't know if that implies overclocking),.[/QUOTE]

sclk4 = 1547 MHz
sclk3 = 1373 MHz.

At 1200 MCLK, running 2 gpuowl instances I get effective iteration times of 570 and 606 microseconds for a 5M FFT. You can convert that into GHz-days/day.


All times are UTC. The time now is 22:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.