mersenneforum.org gpuOwL: an OpenCL program for Mersenne primality testing
 Register FAQ Search Today's Posts Mark Forums Read

2020-01-05, 22:13   #1695
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

22·13·157 Posts

Quote:
 Originally Posted by PhilF BTW, in regards to my memory timing, I had a chance to play with it today without success. Even overclocked to just 1050 produced errors.
You could just be very unlucky. My worst card does 1150. However, I was sent an XFX card that wouldn't do 1000. I RMA'ed it, I don't know for a fact that the memory was the culprit.

2020-01-05, 22:15   #1696
PhilF

"6800 descendent"
Feb 2005

2·32·41 Posts

Quote:
 Originally Posted by preda Did you undervolt? that could also be the reason for errors.
Yes, I do. Power draw and fan speed is a problem.

I'm still tuning, but this card seems pretty optimized out of the box. It's branded Gigabyte that advertises a top clock speed of 1800 Mhz. It seems I can't vary the voltage much from the stock power curve, no matter the clock frequency, without getting errors.

 2020-01-05, 22:25 #1697 Prime95 P90 years forever!     Aug 2002 Yeehaw, FL 22×13×157 Posts @Phil: Linux, right? Stay away from that power-hungry 1800 MHz! This is a sample script for one of my cards. Run it as root. Code: #Allow manual control #echo "manual" >/sys/class/drm/card2/device/power_dpm_force_performance_level #Undervolt by setting max voltage # V Set this to 50mV less than the max stock voltage of your card (which varies from card to card), then optionally tune it down # V Default for this card is 1085mV echo "vc 2 1801 1020" >/sys/class/drm/card2/device/pp_od_clk_voltage #Overclock mclk to up to 1200 echo "m 1 1190" >/sys/class/drm/card2/device/pp_od_clk_voltage #Push a dummy sclk change for the undervolt to stick echo "s 1 1801" >/sys/class/drm/card2/device/pp_od_clk_voltage #Push everything to the card echo "c" >/sys/class/drm/card2/device/pp_od_clk_voltage #Put card into desired performance level /opt/rocm/bin/rocm-smi -d 2 --setsclk 4 --setfan 170 You might want to start voltage at 1080 and work down as well as memory at 1000 and work up. You'll need to change "card2" to "card0" and "-d 2" to "-d 0"
2020-01-05, 22:39   #1698
PhilF

"6800 descendent"
Feb 2005

2·32·41 Posts

Quote:
 Originally Posted by Prime95 @Phil: Linux, right? Stay away from that power-hungry 1800 MHz! This is a sample script for one of my cards. Run it as root. Code: #Allow manual control #echo "manual" >/sys/class/drm/card2/device/power_dpm_force_performance_level #Undervolt by setting max voltage # V Set this to 50mV less than the max stock voltage of your card (which varies from card to card), then optionally tune it down # V Default for this card is 1085mV echo "vc 2 1801 1020" >/sys/class/drm/card2/device/pp_od_clk_voltage #Overclock mclk to up to 1200 echo "m 1 1190" >/sys/class/drm/card2/device/pp_od_clk_voltage #Push a dummy sclk change for the undervolt to stick echo "s 1 1801" >/sys/class/drm/card2/device/pp_od_clk_voltage #Push everything to the card echo "c" >/sys/class/drm/card2/device/pp_od_clk_voltage #Put card into desired performance level /opt/rocm/bin/rocm-smi -d 2 --setsclk 4 --setfan 170 You might want to start voltage at 1080 and work down as well as memory at 1000 and work up. You'll need to change "card2" to "card0" and "-d 2" to "-d 0"
Yes, it is Linux.

My own scripts are quite similar. I started by catting pp_od_clk_voltage file and using those values as my base. I have high, medium, and low scripts, which corresponds to sclk settings of 5, 4, and 3 respectively. That gives me GPU frequencies of 1684, 1547, and 1373 Mhz. The best I can get out of the 1684 Mhz setting causes it to run at 185W (actually more when measured at the wall outlet), 92 degrees, and fans at 99%. That's not comfortable.

So today I started tuning my "medium" setting of 1547 Mhz. Now the power draw is 145 watts stock, with the fan speed set manually at 130. Much more manageable. But as soon as I deviate even slightly from the stock pp_od_clk_voltage settings, even as little as 5 mV (!), I start throwing errors.

2020-01-05, 23:08   #1699
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

22×13×157 Posts

Quote:
 Originally Posted by PhilF So today I started tuning my "medium" setting of 1547 Mhz. Now the power draw is 145 watts stock, with the fan speed set manually at 130. Much more manageable. But as soon as I deviate even slightly from the stock pp_od_clk_voltage settings, even as little as 5 mV (!), I start throwing errors.
That sucks, btw what was stock voltage? Also what does "/opt/rocm/bin/rocm-smi -a" report for the actual voltage at 1547 MHz?

Can you increase memory speed at 1547MHz?

2020-01-05, 23:20   #1700
PhilF

"6800 descendent"
Feb 2005

13428 Posts

Quote:
 Originally Posted by Prime95 That sucks, btw what was stock voltage? Also what does "/opt/rocm/bin/rocm-smi -a" report for the actual voltage at 1547 MHz? Can you increase memory speed at 1547MHz?
Stock voltages are:

808Mhz / 723mV
1304Mhz / 801mV
1801Mhz / 1107mV

rocm-smi -a is showing 887mV @ 1547Mhz.

I didn't even try increasing memory speed at 1684Mhz. But at 1547Mhz, the first memory setting I tried was 1100, and it didn't take long to produce an error. So I took it down to 1050, then it took even less time to produce an error. But when using the stock speed of 1000 and stock voltage it has produced zero errors (so far), and it is easy to keep the temperature below 90 degrees.

2020-01-05, 23:30   #1701
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

22×13×157 Posts

Quote:
 Originally Posted by PhilF I didn't even try increasing memory speed at 1684Mhz. But at 1547Mhz, the first memory setting I tried was 1100, and it didn't take long to produce an error. So I took it down to 1050, then it took even less time to produce an error. But when using the stock speed of 1000 and stock voltage it has produced zero errors (so far), and it is easy to keep the temperature below 90 degrees.
My condolences -- easily the worst Radeon VII card I've heard of. Worse yet, it works at stock settings so cannot ethically be RMA'd.

2020-01-05, 23:34   #1702
PhilF

"6800 descendent"
Feb 2005

2×32×41 Posts

Quote:
 Originally Posted by Prime95 My condolences -- easily the worst Radeon VII card I've heard of. Worse yet, it works at stock settings so cannot ethically be RMA'd.
It's ok. It still beats anything else I've ever had by miles!

2020-01-06, 04:20   #1703
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

34·7·13 Posts

Quote:
 Originally Posted by Prime95 My condolences -- easily the worst Radeon VII card I've heard of. Worse yet, it works at stock settings so cannot ethically be RMA'd.
Well that sounds like a cue for an update on stability testing of my XFX Radeon VII. I've had it go up to a couple days with no error, then the weather warms up and another error or two appear, and I've progressively dialed it back slightly at each occasion, to where it's now at 1270 gpu Mhz and 900 mem Mhz, so it's now running just under 10.9 msec/it on a ~655M PRP ~22% complete with 13 errors accumulated so far, about 64 days left.

2020-01-06, 04:36   #1704
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

22×13×157 Posts

Quote:
 Originally Posted by kriesel I've progressively dialed it back slightly at each occasion, to where it's now at 1270 gpu Mhz and 900 mem Mhz,
RMA is your friend, it won't run correctly at stock settings. I assume you tried it in a different machine with similar results.

2020-01-06, 06:13   #1705
PhilF

"6800 descendent"
Feb 2005

2×32×41 Posts

Quote:
 Originally Posted by Prime95 Skip the TF, just P-1.
Good call. Decided to run P-1 on my next assignment, M103464293, with B1=50000 B2=50000000, and out popped a factor!

I just guessed at those bounds. The test took 52 minutes. Is there an easy fast way to determine sane bounds to use with GPU-based P-1 tests when no previous P-1 testing has been done?

 Similar Threads Thread Thread Starter Forum Replies Last Post Bdot GPU Computing 1719 2023-01-16 15:51 xx005fs GpuOwl 0 2019-07-26 21:37 1260 Software 17 2015-08-28 01:35 CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12 Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 12:43.

Mon Feb 6 12:43:26 UTC 2023 up 172 days, 10:11, 1 user, load averages: 1.01, 0.91, 0.85