mersenneforum.org gpuOwL: an OpenCL program for Mersenne primality testing
 Register FAQ Search Today's Posts Mark Forums Read

2020-01-06, 07:17   #1706
preda

"Mihai Preda"
Apr 2015

24·83 Posts

Quote:
 Originally Posted by PhilF Good call. Decided to run P-1 on my next assignment, M103464293, with B1=50000 B2=50000000, and out popped a factor! I just guessed at those bounds. The test took 52 minutes. Is there an easy fast way to determine sane bounds to use with GPU-based P-1 tests when no previous P-1 testing has been done?
I tend to prefer a factor of 30x between B1 and B2 (i.e. B2 = 30*B1). Probably anything between 10x to 50x may be acceptable. OTOH you ratio of 1000x was too large.

For B1 probably something between 500'000 and 1'000'000 is reasonable (for 100M exponents). The exact value doesn't matter too much.

2020-01-06, 07:18   #1707
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·17·139 Posts

Quote:
 Originally Posted by PhilF Good call. Decided to run P-1 on my next assignment, M103464293, with B1=50000 B2=50000000, and out popped a factor! I just guessed at those bounds. The test took 52 minutes. Is there an easy fast way to determine sane bounds to use with GPU-based P-1 tests when no previous P-1 testing has been done?
Look up the exponent on mersenne.ca and use the PrimeNet bounds. That will satisfy the server and retire the P-1 task.

Last fiddled with by kriesel on 2020-01-06 at 07:19

2020-01-06, 07:25   #1708
preda

"Mihai Preda"
Apr 2015

101001100002 Posts

One datapoint on my GPUs (output of gpuowl/tools/monitor.py)
Code:
GPU UID            VDD   SCLK MCLK Mem-used Mem-busy PWR  FAN  Temp     PCIeErr
0 3044212172dc768c 800mV 1358 1181  0.33GB    37%    146W 1925 70/87/77       0
1 780c28c172da5ebb 825mV 1363 1171  0.33GB    38%    154W 1783 68/84/74       0
2 a810192172fd5d12 781mV 1363 1181  0.61GB    37%    139W 1797 69/84/76       0
I run my GPUs at --setsclk 3 (i.e. about 145W). If I need extra heat I can push to setsclk 4 (170-180W), if it's too hot I can go down to --setsclk 2 but there the efficiency gain is smaller. In general I would not run a RadeonVII above --setsclk 4 (because of noise and lower efficiency).

The efficiency gain from undervolting is modest, so I wouldn't worry if the card does not undervolt. In fact I would suggest to tune first the memory without any undervolting, and only afterwards tune the voltage.

For sure I would watch the temperature, for two reasons: the RadeonVII thermally-throttles a lot. So if you set it to max frequency, it will simply become super-hot and go down to a much lower frequency, for no benefit but with lower efficiency in the process. Second, all the errors are more frequent on hot.

Quote:
 Originally Posted by PhilF Stock voltages are: 808Mhz / 723mV 1304Mhz / 801mV 1801Mhz / 1107mV rocm-smi -a is showing 887mV @ 1547Mhz. I didn't even try increasing memory speed at 1684Mhz. But at 1547Mhz, the first memory setting I tried was 1100, and it didn't take long to produce an error. So I took it down to 1050, then it took even less time to produce an error. But when using the stock speed of 1000 and stock voltage it has produced zero errors (so far), and it is easy to keep the temperature below 90 degrees.

Last fiddled with by preda on 2020-01-06 at 07:30

2020-01-06, 07:29   #1709
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·17·139 Posts
800M P-1 on Tesla P100, Colab

Fan Ming build of gpuowl, 800M P-1 on Tesla P100, 2.35 days running time for both stages, https://www.mersenne.org/report_expo...0000027&full=1

Quote:
 Originally Posted by kriesel It took ~1.74 days of run time, several colab sessions, with a Fan Ming-provided executable. https://www.mersenne.org/report_expo...0000031&full=1 Current projections from runtime scaling and buffer count trend is higher data points will take 2-4 days each, and throughout the mersenne.org range will be possible. The run times can probably be improved upon; I'm not using any of the performance enhancing T2_shuffle or merged-middle -use options during these runs.

Last fiddled with by kriesel on 2020-01-06 at 07:30

 2020-01-06, 07:35 #1710 preda     "Mihai Preda" Apr 2015 24·83 Posts Code: GPU UID VDD SCLK MCLK Mem-used Mem-busy PWR FAN Temp PCIeErr 0 3044212172dc768c 800mV 1358 1181 0.33GB 37% 146W 1925 70/87/77 0 1 780c28c172da5ebb 825mV 1363 1171 0.33GB 38% 154W 1783 68/84/74 0 2 a810192172fd5d12 781mV 1363 1181 0.61GB 37% 139W 1797 69/84/76 0 Who wants to guess which of the above is the XFX? :)
2020-01-06, 07:36   #1711
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·17·139 Posts

Quote:
 Originally Posted by Prime95 RMA is your friend, it won't run correctly at stock settings. I assume you tried it in a different machine with similar results.
Nope, new-to-me system bought for housing it, only, so far, running Windows 10. Maybe I'll try a linux dual boot install on that system before relocating it which would require displacing some other production gpus. Right now I'm on deadline on some other things.

2020-01-06, 07:43   #1712
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

472610 Posts

Quote:
 Originally Posted by preda Who wants to guess which of the above is the XFX? :)
1; high voltage, high power, low mclk

2020-01-06, 15:46   #1713
PhilF

Feb 2005

32·61 Posts

Quote:
 Originally Posted by preda I run my GPUs at --setsclk 3 (i.e. about 145W). If I need extra heat I can push to setsclk 4 (170-180W), if it's too hot I can go down to --setsclk 2 but there the efficiency gain is smaller. In general I would not run a RadeonVII above --setsclk 4 (because of noise and lower efficiency).
Based on that, maybe my card isn't so pitiful after all. All my testing has been with a setsclk setting of 4 or 5, and my --setsclk 4 speed is pulling only 150W.

My --setsclk 5 setting is hungry (about 185W), hot, and noisy. But it does work at that speed. I completed a PRP double-check using that setting. But I haven't even played with a setsclk of 3, because I thought everyone was using 4 or 5.

The outrageous --setsclk 6 setting (1800 Mhz) sets off the overload alarm on my UPS!

2020-01-06, 16:07   #1714
PhilF

Feb 2005

54910 Posts

Quote:
 Originally Posted by preda I tend to prefer a factor of 30x between B1 and B2 (i.e. B2 = 30*B1). Probably anything between 10x to 50x may be acceptable. OTOH you ratio of 1000x was too large. For B1 probably something between 500'000 and 1'000'000 is reasonable (for 100M exponents). The exact value doesn't matter too much.
Can I assume that by picking a B2 bounds of 1000X that the only repercussion is the test took a little longer? The reason I picked one so large is that I figured a larger B2 would allow for better utilization of the card's 16GB of memory.

2020-01-06, 21:08   #1715
preda

"Mihai Preda"
Apr 2015

24·83 Posts

Quote:
 Originally Posted by PhilF Can I assume that by picking a B2 bounds of 1000X that the only repercussion is the test took a little longer? The reason I picked one so large is that I figured a larger B2 would allow for better utilization of the card's 16GB of memory.
Yes; a B2 that is very large relative to B1 is safe, but is not a very efficient use of the compute.

P-1 works by finding a factor of "p" of the mersenne candidate, such that p-1 is the product of prime factors of which all but at most one are less that B1 and at most one is between B1 and B2.

In your case it would make sense to increase B1 to 500'000 or 1M if you want to keep B2 at 50M.

2020-01-07, 10:08   #1716
preda

"Mihai Preda"
Apr 2015

24×83 Posts

Quote:
 Originally Posted by Prime95 I tested B1=750000, B2=20*B1 on a 5M FFT expo and it took 26 minutes. Clearly a worthwhile investment if no P-1 has been done before (PRP lines in worktodo that do not end in ",0") . Bonus. My test found a factor! So the P-1 code still works and another exponent bites the dust.
Can somebody please remind me what is the meaning of the last integer value ("0" below) in a PRP assignment such as:

PRP=700000F64405DAFE2EXXXXXXC85EEF72,1,2,91157779,-1,77,0

Do I understand correctly that when it's 0, it means "don't do any P-1"?
Then what does it mean when it's 1, 2, or what else can it be?

 Similar Threads Thread Thread Starter Forum Replies Last Post Bdot GPU Computing 1657 2020-10-27 01:23 xx005fs GpuOwl 0 2019-07-26 21:37 1260 Software 17 2015-08-28 01:35 CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12 Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 02:54.

Sat Nov 28 02:54:33 UTC 2020 up 79 days, 5 mins, 3 users, load averages: 1.66, 1.24, 1.17