mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-01-05, 22:13   #1695
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2×3×1,193 Posts
Default

Quote:
Originally Posted by PhilF View Post
BTW, in regards to my memory timing, I had a chance to play with it today without success. Even overclocked to just 1050 produced errors.
You could just be very unlucky. My worst card does 1150. However, I was sent an XFX card that wouldn't do 1000. I RMA'ed it, I don't know for a fact that the memory was the culprit.
Prime95 is offline   Reply With Quote
Old 2020-01-05, 22:15   #1696
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

32×61 Posts
Default

Quote:
Originally Posted by preda View Post
Did you undervolt? that could also be the reason for errors.
Yes, I do. Power draw and fan speed is a problem.

I'm still tuning, but this card seems pretty optimized out of the box. It's branded Gigabyte that advertises a top clock speed of 1800 Mhz. It seems I can't vary the voltage much from the stock power curve, no matter the clock frequency, without getting errors.
PhilF is online now   Reply With Quote
Old 2020-01-05, 22:25   #1697
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2×3×1,193 Posts
Default

@Phil: Linux, right?

Stay away from that power-hungry 1800 MHz! This is a sample script for one of my cards. Run it as root.

Code:
#Allow manual control
#echo "manual" >/sys/class/drm/card2/device/power_dpm_force_performance_level
#Undervolt by setting max voltage
#               V Set this to 50mV less than the max stock voltage of your card (which varies from card to card), then optionally tune it down
#               V Default for this card is 1085mV
echo "vc 2 1801 1020" >/sys/class/drm/card2/device/pp_od_clk_voltage
#Overclock mclk to up to 1200
echo "m 1 1190" >/sys/class/drm/card2/device/pp_od_clk_voltage
#Push a dummy sclk change for the undervolt to stick
echo "s 1 1801" >/sys/class/drm/card2/device/pp_od_clk_voltage
#Push everything to the card
echo "c" >/sys/class/drm/card2/device/pp_od_clk_voltage
#Put card into desired performance level
/opt/rocm/bin/rocm-smi -d 2 --setsclk 4 --setfan 170
You might want to start voltage at 1080 and work down as well as memory at 1000 and work up. You'll need to change "card2" to "card0" and "-d 2" to "-d 0"
Prime95 is offline   Reply With Quote
Old 2020-01-05, 22:39   #1698
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

54910 Posts
Default

Quote:
Originally Posted by Prime95 View Post
@Phil: Linux, right?

Stay away from that power-hungry 1800 MHz! This is a sample script for one of my cards. Run it as root.

Code:
#Allow manual control
#echo "manual" >/sys/class/drm/card2/device/power_dpm_force_performance_level
#Undervolt by setting max voltage
#               V Set this to 50mV less than the max stock voltage of your card (which varies from card to card), then optionally tune it down
#               V Default for this card is 1085mV
echo "vc 2 1801 1020" >/sys/class/drm/card2/device/pp_od_clk_voltage
#Overclock mclk to up to 1200
echo "m 1 1190" >/sys/class/drm/card2/device/pp_od_clk_voltage
#Push a dummy sclk change for the undervolt to stick
echo "s 1 1801" >/sys/class/drm/card2/device/pp_od_clk_voltage
#Push everything to the card
echo "c" >/sys/class/drm/card2/device/pp_od_clk_voltage
#Put card into desired performance level
/opt/rocm/bin/rocm-smi -d 2 --setsclk 4 --setfan 170
You might want to start voltage at 1080 and work down as well as memory at 1000 and work up. You'll need to change "card2" to "card0" and "-d 2" to "-d 0"
Yes, it is Linux.

My own scripts are quite similar. I started by catting pp_od_clk_voltage file and using those values as my base. I have high, medium, and low scripts, which corresponds to sclk settings of 5, 4, and 3 respectively. That gives me GPU frequencies of 1684, 1547, and 1373 Mhz. The best I can get out of the 1684 Mhz setting causes it to run at 185W (actually more when measured at the wall outlet), 92 degrees, and fans at 99%. That's not comfortable.

So today I started tuning my "medium" setting of 1547 Mhz. Now the power draw is 145 watts stock, with the fan speed set manually at 130. Much more manageable. But as soon as I deviate even slightly from the stock pp_od_clk_voltage settings, even as little as 5 mV (!), I start throwing errors.
PhilF is online now   Reply With Quote
Old 2020-01-05, 23:08   #1699
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

715810 Posts
Default

Quote:
Originally Posted by PhilF View Post
So today I started tuning my "medium" setting of 1547 Mhz. Now the power draw is 145 watts stock, with the fan speed set manually at 130. Much more manageable. But as soon as I deviate even slightly from the stock pp_od_clk_voltage settings, even as little as 5 mV (!), I start throwing errors.
That sucks, btw what was stock voltage? Also what does "/opt/rocm/bin/rocm-smi -a" report for the actual voltage at 1547 MHz?

Can you increase memory speed at 1547MHz?
Prime95 is offline   Reply With Quote
Old 2020-01-05, 23:20   #1700
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

32×61 Posts
Default

Quote:
Originally Posted by Prime95 View Post
That sucks, btw what was stock voltage? Also what does "/opt/rocm/bin/rocm-smi -a" report for the actual voltage at 1547 MHz?

Can you increase memory speed at 1547MHz?
Stock voltages are:

808Mhz / 723mV
1304Mhz / 801mV
1801Mhz / 1107mV

rocm-smi -a is showing 887mV @ 1547Mhz.

I didn't even try increasing memory speed at 1684Mhz. But at 1547Mhz, the first memory setting I tried was 1100, and it didn't take long to produce an error. So I took it down to 1050, then it took even less time to produce an error. But when using the stock speed of 1000 and stock voltage it has produced zero errors (so far), and it is easy to keep the temperature below 90 degrees.
PhilF is online now   Reply With Quote
Old 2020-01-05, 23:30   #1701
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2·3·1,193 Posts
Default

Quote:
Originally Posted by PhilF View Post
I didn't even try increasing memory speed at 1684Mhz. But at 1547Mhz, the first memory setting I tried was 1100, and it didn't take long to produce an error. So I took it down to 1050, then it took even less time to produce an error. But when using the stock speed of 1000 and stock voltage it has produced zero errors (so far), and it is easy to keep the temperature below 90 degrees.
My condolences -- easily the worst Radeon VII card I've heard of. Worse yet, it works at stock settings so cannot ethically be RMA'd.
Prime95 is offline   Reply With Quote
Old 2020-01-05, 23:34   #1702
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

32·61 Posts
Default

Quote:
Originally Posted by Prime95 View Post
My condolences -- easily the worst Radeon VII card I've heard of. Worse yet, it works at stock settings so cannot ethically be RMA'd.
It's ok. It still beats anything else I've ever had by miles!
PhilF is online now   Reply With Quote
Old 2020-01-06, 04:20   #1703
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×17×139 Posts
Default

Quote:
Originally Posted by Prime95 View Post
My condolences -- easily the worst Radeon VII card I've heard of. Worse yet, it works at stock settings so cannot ethically be RMA'd.
Well that sounds like a cue for an update on stability testing of my XFX Radeon VII. I've had it go up to a couple days with no error, then the weather warms up and another error or two appear, and I've progressively dialed it back slightly at each occasion, to where it's now at 1270 gpu Mhz and 900 mem Mhz, so it's now running just under 10.9 msec/it on a ~655M PRP ~22% complete with 13 errors accumulated so far, about 64 days left.
kriesel is online now   Reply With Quote
Old 2020-01-06, 04:36   #1704
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2×3×1,193 Posts
Default

Quote:
Originally Posted by kriesel View Post
I've progressively dialed it back slightly at each occasion, to where it's now at 1270 gpu Mhz and 900 mem Mhz,
RMA is your friend, it won't run correctly at stock settings. I assume you tried it in a different machine with similar results.
Prime95 is offline   Reply With Quote
Old 2020-01-06, 06:13   #1705
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

32×61 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Skip the TF, just P-1.
Good call. Decided to run P-1 on my next assignment, M103464293, with B1=50000 B2=50000000, and out popped a factor!

I just guessed at those bounds. The test took 52 minutes. Is there an easy fast way to determine sane bounds to use with GPU-based P-1 tests when no previous P-1 testing has been done?
PhilF is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1657 2020-10-27 01:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 02:58.

Sat Nov 28 02:58:57 UTC 2020 up 79 days, 9 mins, 3 users, load averages: 1.31, 1.10, 1.10

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.