mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-01-06, 07:17   #1706
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

24·83 Posts
Default

Quote:
Originally Posted by PhilF View Post
Good call. Decided to run P-1 on my next assignment, M103464293, with B1=50000 B2=50000000, and out popped a factor!

I just guessed at those bounds. The test took 52 minutes. Is there an easy fast way to determine sane bounds to use with GPU-based P-1 tests when no previous P-1 testing has been done?
I tend to prefer a factor of 30x between B1 and B2 (i.e. B2 = 30*B1). Probably anything between 10x to 50x may be acceptable. OTOH you ratio of 1000x was too large.

For B1 probably something between 500'000 and 1'000'000 is reasonable (for 100M exponents). The exact value doesn't matter too much.
preda is offline   Reply With Quote
Old 2020-01-06, 07:18   #1707
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·17·139 Posts
Default

Quote:
Originally Posted by PhilF View Post
Good call. Decided to run P-1 on my next assignment, M103464293, with B1=50000 B2=50000000, and out popped a factor!

I just guessed at those bounds. The test took 52 minutes. Is there an easy fast way to determine sane bounds to use with GPU-based P-1 tests when no previous P-1 testing has been done?
Look up the exponent on mersenne.ca and use the PrimeNet bounds. That will satisfy the server and retire the P-1 task.

Last fiddled with by kriesel on 2020-01-06 at 07:19
kriesel is online now   Reply With Quote
Old 2020-01-06, 07:25   #1708
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

101001100002 Posts
Default

One datapoint on my GPUs (output of gpuowl/tools/monitor.py)
Code:
GPU UID            VDD   SCLK MCLK Mem-used Mem-busy PWR  FAN  Temp     PCIeErr
0 3044212172dc768c 800mV 1358 1181  0.33GB    37%    146W 1925 70/87/77       0
1 780c28c172da5ebb 825mV 1363 1171  0.33GB    38%    154W 1783 68/84/74       0
2 a810192172fd5d12 781mV 1363 1181  0.61GB    37%    139W 1797 69/84/76       0
I run my GPUs at --setsclk 3 (i.e. about 145W). If I need extra heat I can push to setsclk 4 (170-180W), if it's too hot I can go down to --setsclk 2 but there the efficiency gain is smaller. In general I would not run a RadeonVII above --setsclk 4 (because of noise and lower efficiency).

The efficiency gain from undervolting is modest, so I wouldn't worry if the card does not undervolt. In fact I would suggest to tune first the memory without any undervolting, and only afterwards tune the voltage.

For sure I would watch the temperature, for two reasons: the RadeonVII thermally-throttles a lot. So if you set it to max frequency, it will simply become super-hot and go down to a much lower frequency, for no benefit but with lower efficiency in the process. Second, all the errors are more frequent on hot.

Quote:
Originally Posted by PhilF View Post
Stock voltages are:

808Mhz / 723mV
1304Mhz / 801mV
1801Mhz / 1107mV

rocm-smi -a is showing 887mV @ 1547Mhz.

I didn't even try increasing memory speed at 1684Mhz. But at 1547Mhz, the first memory setting I tried was 1100, and it didn't take long to produce an error. So I took it down to 1050, then it took even less time to produce an error. But when using the stock speed of 1000 and stock voltage it has produced zero errors (so far), and it is easy to keep the temperature below 90 degrees.

Last fiddled with by preda on 2020-01-06 at 07:30
preda is offline   Reply With Quote
Old 2020-01-06, 07:29   #1709
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·17·139 Posts
Default 800M P-1 on Tesla P100, Colab

Fan Ming build of gpuowl, 800M P-1 on Tesla P100, 2.35 days running time for both stages, https://www.mersenne.org/report_expo...0000027&full=1

Quote:
Originally Posted by kriesel View Post
It took ~1.74 days of run time, several colab sessions, with a Fan Ming-provided executable. https://www.mersenne.org/report_expo...0000031&full=1 Current projections from runtime scaling and buffer count trend is higher data points will take 2-4 days each, and throughout the mersenne.org range will be possible. The run times can probably be improved upon; I'm not using any of the performance enhancing T2_shuffle or merged-middle -use options during these runs.

Last fiddled with by kriesel on 2020-01-06 at 07:30
kriesel is online now   Reply With Quote
Old 2020-01-06, 07:35   #1710
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

24·83 Posts
Default

Code:
GPU UID            VDD   SCLK MCLK Mem-used Mem-busy PWR  FAN  Temp     PCIeErr
0 3044212172dc768c 800mV 1358 1181  0.33GB    37%    146W 1925 70/87/77       0
1 780c28c172da5ebb 825mV 1363 1171  0.33GB    38%    154W 1783 68/84/74       0
2 a810192172fd5d12 781mV 1363 1181  0.61GB    37%    139W 1797 69/84/76       0
Who wants to guess which of the above is the XFX? :)
preda is offline   Reply With Quote
Old 2020-01-06, 07:36   #1711
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·17·139 Posts
Default

Quote:
Originally Posted by Prime95 View Post
RMA is your friend, it won't run correctly at stock settings. I assume you tried it in a different machine with similar results.
Nope, new-to-me system bought for housing it, only, so far, running Windows 10. Maybe I'll try a linux dual boot install on that system before relocating it which would require displacing some other production gpus. Right now I'm on deadline on some other things.
kriesel is online now   Reply With Quote
Old 2020-01-06, 07:43   #1712
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

472610 Posts
Default

Quote:
Originally Posted by preda View Post
Who wants to guess which of the above is the XFX? :)
1; high voltage, high power, low mclk
kriesel is online now   Reply With Quote
Old 2020-01-06, 15:46   #1713
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

32·61 Posts
Default

Quote:
Originally Posted by preda View Post
I run my GPUs at --setsclk 3 (i.e. about 145W). If I need extra heat I can push to setsclk 4 (170-180W), if it's too hot I can go down to --setsclk 2 but there the efficiency gain is smaller. In general I would not run a RadeonVII above --setsclk 4 (because of noise and lower efficiency).
Based on that, maybe my card isn't so pitiful after all. All my testing has been with a setsclk setting of 4 or 5, and my --setsclk 4 speed is pulling only 150W.

My --setsclk 5 setting is hungry (about 185W), hot, and noisy. But it does work at that speed. I completed a PRP double-check using that setting. But I haven't even played with a setsclk of 3, because I thought everyone was using 4 or 5.

The outrageous --setsclk 6 setting (1800 Mhz) sets off the overload alarm on my UPS!
PhilF is online now   Reply With Quote
Old 2020-01-06, 16:07   #1714
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

54910 Posts
Default

Quote:
Originally Posted by preda View Post
I tend to prefer a factor of 30x between B1 and B2 (i.e. B2 = 30*B1). Probably anything between 10x to 50x may be acceptable. OTOH you ratio of 1000x was too large.

For B1 probably something between 500'000 and 1'000'000 is reasonable (for 100M exponents). The exact value doesn't matter too much.
Can I assume that by picking a B2 bounds of 1000X that the only repercussion is the test took a little longer? The reason I picked one so large is that I figured a larger B2 would allow for better utilization of the card's 16GB of memory.
PhilF is online now   Reply With Quote
Old 2020-01-06, 21:08   #1715
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

24·83 Posts
Default

Quote:
Originally Posted by PhilF View Post
Can I assume that by picking a B2 bounds of 1000X that the only repercussion is the test took a little longer? The reason I picked one so large is that I figured a larger B2 would allow for better utilization of the card's 16GB of memory.
Yes; a B2 that is very large relative to B1 is safe, but is not a very efficient use of the compute.

P-1 works by finding a factor of "p" of the mersenne candidate, such that p-1 is the product of prime factors of which all but at most one are less that B1 and at most one is between B1 and B2.

In your case it would make sense to increase B1 to 500'000 or 1M if you want to keep B2 at 50M.
preda is offline   Reply With Quote
Old 2020-01-07, 10:08   #1716
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

24×83 Posts
Default

Quote:
Originally Posted by Prime95 View Post
I tested B1=750000, B2=20*B1 on a 5M FFT expo and it took 26 minutes. Clearly a worthwhile investment if no P-1 has been done before (PRP lines in worktodo that do not end in ",0") .

Bonus. My test found a factor! So the P-1 code still works and another exponent bites the dust.
Can somebody please remind me what is the meaning of the last integer value ("0" below) in a PRP assignment such as:

PRP=700000F64405DAFE2EXXXXXXC85EEF72,1,2,91157779,-1,77,0

Do I understand correctly that when it's 0, it means "don't do any P-1"?
Then what does it mean when it's 1, 2, or what else can it be?
preda is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1657 2020-10-27 01:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 02:54.

Sat Nov 28 02:54:33 UTC 2020 up 79 days, 5 mins, 3 users, load averages: 1.66, 1.24, 1.17

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.