mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

Prime95 2020-01-05 22:13

[QUOTE=PhilF;534329]BTW, in regards to my memory timing, I had a chance to play with it today without success. Even overclocked to just 1050 produced errors.[/QUOTE]

You could just be very unlucky. My worst card does 1150. However, I was sent an XFX card that wouldn't do 1000. I RMA'ed it, I don't know for a fact that the memory was the culprit.

PhilF 2020-01-05 22:15

[QUOTE=preda;534330]Did you undervolt? that could also be the reason for errors.[/QUOTE]

Yes, I do. Power draw and fan speed is a problem.

I'm still tuning, but this card seems pretty optimized out of the box. It's branded Gigabyte that advertises a top clock speed of 1800 Mhz. It seems I can't vary the voltage much from the stock power curve, no matter the clock frequency, without getting errors.

Prime95 2020-01-05 22:25

@Phil: Linux, right?

Stay away from that power-hungry 1800 MHz! This is a sample script for one of my cards. Run it as root.

[CODE]#Allow manual control
#echo "manual" >/sys/class/drm/card2/device/power_dpm_force_performance_level
#Undervolt by setting max voltage
# V Set this to 50mV less than the max stock voltage of your card (which varies from card to card), then optionally tune it down
# V Default for this card is 1085mV
echo "vc 2 1801 1020" >/sys/class/drm/card2/device/pp_od_clk_voltage
#Overclock mclk to up to 1200
echo "m 1 1190" >/sys/class/drm/card2/device/pp_od_clk_voltage
#Push a dummy sclk change for the undervolt to stick
echo "s 1 1801" >/sys/class/drm/card2/device/pp_od_clk_voltage
#Push everything to the card
echo "c" >/sys/class/drm/card2/device/pp_od_clk_voltage
#Put card into desired performance level
/opt/rocm/bin/rocm-smi -d 2 --setsclk 4 --setfan 170
[/CODE]

You might want to start voltage at 1080 and work down as well as memory at 1000 and work up. You'll need to change "card2" to "card0" and "-d 2" to "-d 0"

PhilF 2020-01-05 22:39

[QUOTE=Prime95;534333]@Phil: Linux, right?

Stay away from that power-hungry 1800 MHz! This is a sample script for one of my cards. Run it as root.

[CODE]#Allow manual control
#echo "manual" >/sys/class/drm/card2/device/power_dpm_force_performance_level
#Undervolt by setting max voltage
# V Set this to 50mV less than the max stock voltage of your card (which varies from card to card), then optionally tune it down
# V Default for this card is 1085mV
echo "vc 2 1801 1020" >/sys/class/drm/card2/device/pp_od_clk_voltage
#Overclock mclk to up to 1200
echo "m 1 1190" >/sys/class/drm/card2/device/pp_od_clk_voltage
#Push a dummy sclk change for the undervolt to stick
echo "s 1 1801" >/sys/class/drm/card2/device/pp_od_clk_voltage
#Push everything to the card
echo "c" >/sys/class/drm/card2/device/pp_od_clk_voltage
#Put card into desired performance level
/opt/rocm/bin/rocm-smi -d 2 --setsclk 4 --setfan 170
[/CODE]

You might want to start voltage at 1080 and work down as well as memory at 1000 and work up. You'll need to change "card2" to "card0" and "-d 2" to "-d 0"[/QUOTE]

Yes, it is Linux.

My own scripts are quite similar. I started by catting pp_od_clk_voltage file and using those values as my base. I have high, medium, and low scripts, which corresponds to sclk settings of 5, 4, and 3 respectively. That gives me GPU frequencies of 1684, 1547, and 1373 Mhz. The best I can get out of the 1684 Mhz setting causes it to run at 185W (actually more when measured at the wall outlet), 92 degrees, and fans at 99%. That's not comfortable.

So today I started tuning my "medium" setting of 1547 Mhz. Now the power draw is 145 watts stock, with the fan speed set manually at 130. Much more manageable. But as soon as I deviate even slightly from the stock pp_od_clk_voltage settings, even as little as 5 mV (!), I start throwing errors.

Prime95 2020-01-05 23:08

[QUOTE=PhilF;534334]So today I started tuning my "medium" setting of 1547 Mhz. Now the power draw is 145 watts stock, with the fan speed set manually at 130. Much more manageable. But as soon as I deviate even slightly from the stock pp_od_clk_voltage settings, even as little as 5 mV (!), I start throwing errors.[/QUOTE]

That sucks, btw what was stock voltage? Also what does "/opt/rocm/bin/rocm-smi -a" report for the actual voltage at 1547 MHz?

Can you increase memory speed at 1547MHz?

PhilF 2020-01-05 23:20

[QUOTE=Prime95;534337]That sucks, btw what was stock voltage? Also what does "/opt/rocm/bin/rocm-smi -a" report for the actual voltage at 1547 MHz?

Can you increase memory speed at 1547MHz?[/QUOTE]

Stock voltages are:

808Mhz / 723mV
1304Mhz / 801mV
1801Mhz / 1107mV

rocm-smi -a is showing 887mV @ 1547Mhz.

I didn't even try increasing memory speed at 1684Mhz. But at 1547Mhz, the first memory setting I tried was 1100, and it didn't take long to produce an error. So I took it down to 1050, then it took even less time to produce an error. But when using the stock speed of 1000 and stock voltage it has produced zero errors (so far), and it is easy to keep the temperature below 90 degrees.

Prime95 2020-01-05 23:30

[QUOTE=PhilF;534340]I didn't even try increasing memory speed at 1684Mhz. But at 1547Mhz, the first memory setting I tried was 1100, and it didn't take long to produce an error. So I took it down to 1050, then it took even less time to produce an error. But when using the stock speed of 1000 and stock voltage it has produced zero errors (so far), and it is easy to keep the temperature below 90 degrees.[/QUOTE]

My condolences -- easily the worst Radeon VII card I've heard of. Worse yet, it works at stock settings so cannot ethically be RMA'd.

PhilF 2020-01-05 23:34

[QUOTE=Prime95;534342]My condolences -- easily the worst Radeon VII card I've heard of. Worse yet, it works at stock settings so cannot ethically be RMA'd.[/QUOTE]

It's ok. It still beats anything else I've ever had by miles! :smile:

kriesel 2020-01-06 04:20

[QUOTE=Prime95;534342]My condolences -- easily the worst Radeon VII card I've heard of. Worse yet, it works at stock settings so cannot ethically be RMA'd.[/QUOTE]Well that sounds like a cue for an update on stability testing of my XFX Radeon VII. I've had it go up to a couple days with no error, then the weather warms up and another error or two appear, and I've progressively dialed it back slightly at each occasion, to where it's now at 1270 gpu Mhz and 900 mem Mhz, so it's now running just under 10.9 msec/it on a ~655M PRP ~22% complete with 13 errors accumulated so far, about 64 days left.

Prime95 2020-01-06 04:36

[QUOTE=kriesel;534362]I've progressively dialed it back slightly at each occasion, to where it's now at 1270 gpu Mhz and 900 mem Mhz, [/QUOTE]

RMA is your friend, it won't run correctly at stock settings. I assume you tried it in a different machine with similar results.

PhilF 2020-01-06 06:13

[QUOTE=Prime95;534328]Skip the TF, just P-1.[/QUOTE]

Good call. Decided to run P-1 on my next assignment, M103464293, with B1=50000 B2=50000000, and out popped a factor!

I just guessed at those bounds. The test took 52 minutes. Is there an easy fast way to determine sane bounds to use with GPU-based P-1 tests when no previous P-1 testing has been done?


All times are UTC. The time now is 23:13.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.