mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

preda 2020-01-06 07:17

[QUOTE=PhilF;534380]Good call. Decided to run P-1 on my next assignment, M103464293, with B1=50000 B2=50000000, and out popped a factor!

I just guessed at those bounds. The test took 52 minutes. Is there an easy fast way to determine sane bounds to use with GPU-based P-1 tests when no previous P-1 testing has been done?[/QUOTE]

I tend to prefer a factor of 30x between B1 and B2 (i.e. B2 = 30*B1). Probably anything between 10x to 50x may be acceptable. OTOH you ratio of 1000x was too large.

For B1 probably something between 500'000 and 1'000'000 is reasonable (for 100M exponents). The exact value doesn't matter too much.

kriesel 2020-01-06 07:18

[QUOTE=PhilF;534380]Good call. Decided to run P-1 on my next assignment, M103464293, with B1=50000 B2=50000000, and out popped a factor!

I just guessed at those bounds. The test took 52 minutes. Is there an easy fast way to determine sane bounds to use with GPU-based P-1 tests when no previous P-1 testing has been done?[/QUOTE]Look up the exponent on mersenne.ca and use the PrimeNet bounds. That will satisfy the server and retire the P-1 task.

preda 2020-01-06 07:25

One datapoint on my GPUs (output of gpuowl/tools/monitor.py)
[CODE]
GPU UID VDD SCLK MCLK Mem-used Mem-busy PWR FAN Temp PCIeErr
0 3044212172dc768c 800mV 1358 1181 0.33GB 37% 146W 1925 70/87/77 0
1 780c28c172da5ebb 825mV 1363 1171 0.33GB 38% 154W 1783 68/84/74 0
2 a810192172fd5d12 781mV 1363 1181 0.61GB 37% 139W 1797 69/84/76 0
[/CODE]

I run my GPUs at --setsclk 3 (i.e. about 145W). If I need extra heat I can push to setsclk 4 (170-180W), if it's too hot I can go down to --setsclk 2 but there the efficiency gain is smaller. In general I would not run a RadeonVII above --setsclk 4 (because of noise and lower efficiency).

The efficiency gain from undervolting is modest, so I wouldn't worry if the card does not undervolt. In fact I would suggest to tune first the memory without any undervolting, and only afterwards tune the voltage.

For sure I would watch the temperature, for two reasons: the RadeonVII thermally-throttles a lot. So if you set it to max frequency, it will simply become super-hot and go down to a much lower frequency, for no benefit but with lower efficiency in the process. Second, all the errors are more frequent on hot.

[QUOTE=PhilF;534340]Stock voltages are:

808Mhz / 723mV
1304Mhz / 801mV
1801Mhz / 1107mV

rocm-smi -a is showing 887mV @ 1547Mhz.

I didn't even try increasing memory speed at 1684Mhz. But at 1547Mhz, the first memory setting I tried was 1100, and it didn't take long to produce an error. So I took it down to 1050, then it took even less time to produce an error. But when using the stock speed of 1000 and stock voltage it has produced zero errors (so far), and it is easy to keep the temperature below 90 degrees.[/QUOTE]

kriesel 2020-01-06 07:29

800M P-1 on Tesla P100, Colab
 
Fan Ming build of gpuowl, 800M P-1 on Tesla P100, 2.35 days running time for both stages, [URL]https://www.mersenne.org/report_exponent/?exp_lo=800000027&full=1[/URL]

[QUOTE=kriesel;533812]It took ~1.74 days of run time, several colab sessions, with a Fan Ming-provided executable. [URL]https://www.mersenne.org/report_exponent/?exp_lo=700000031&full=1[/URL] Current projections from runtime scaling and buffer count trend is higher data points will take 2-4 days each, and throughout the mersenne.org range will be possible. The run times can probably be improved upon; I'm not using any of the performance enhancing T2_shuffle or merged-middle -use options during these runs.[/QUOTE]

preda 2020-01-06 07:35

[CODE]
GPU UID VDD SCLK MCLK Mem-used Mem-busy PWR FAN Temp PCIeErr
0 3044212172dc768c 800mV 1358 1181 0.33GB 37% 146W 1925 70/87/77 0
1 780c28c172da5ebb 825mV 1363 1171 0.33GB 38% 154W 1783 68/84/74 0
2 a810192172fd5d12 781mV 1363 1181 0.61GB 37% 139W 1797 69/84/76 0
[/CODE]
Who wants to guess which of the above is the XFX? :)

kriesel 2020-01-06 07:36

[QUOTE=Prime95;534366]RMA is your friend, it won't run correctly at stock settings. I assume you tried it in a different machine with similar results.[/QUOTE]Nope, new-to-me system bought for housing it, only, so far, running Windows 10. Maybe I'll try a linux dual boot install on that system before relocating it which would require displacing some other production gpus. Right now I'm on deadline on some other things.

kriesel 2020-01-06 07:43

[QUOTE=preda;534388]Who wants to guess which of the above is the XFX? :)[/QUOTE]
1; high voltage, high power, low mclk

PhilF 2020-01-06 15:46

[QUOTE=preda;534385]I run my GPUs at --setsclk 3 (i.e. about 145W). If I need extra heat I can push to setsclk 4 (170-180W), if it's too hot I can go down to --setsclk 2 but there the efficiency gain is smaller. In general I would not run a RadeonVII above --setsclk 4 (because of noise and lower efficiency).[/QUOTE]

Based on that, maybe my card isn't so pitiful after all. All my testing has been with a setsclk setting of 4 or 5, and my --setsclk 4 speed is pulling only 150W.

My --setsclk 5 setting is hungry (about 185W), hot, and noisy. But it does work at that speed. I completed a PRP double-check using that setting. But I haven't even played with a setsclk of 3, because I thought everyone was using 4 or 5.

The outrageous --setsclk 6 setting (1800 Mhz) sets off the overload alarm on my UPS!

PhilF 2020-01-06 16:07

[QUOTE=preda;534383]I tend to prefer a factor of 30x between B1 and B2 (i.e. B2 = 30*B1). Probably anything between 10x to 50x may be acceptable. OTOH you ratio of 1000x was too large.

For B1 probably something between 500'000 and 1'000'000 is reasonable (for 100M exponents). The exact value doesn't matter too much.[/QUOTE]

Can I assume that by picking a B2 bounds of 1000X that the only repercussion is the test took a little longer? The reason I picked one so large is that I figured a larger B2 would allow for better utilization of the card's 16GB of memory.

preda 2020-01-06 21:08

[QUOTE=PhilF;534411]Can I assume that by picking a B2 bounds of 1000X that the only repercussion is the test took a little longer? The reason I picked one so large is that I figured a larger B2 would allow for better utilization of the card's 16GB of memory.[/QUOTE]

Yes; a B2 that is very large relative to B1 is safe, but is not a very efficient use of the compute.

P-1 works by finding a factor of "p" of the mersenne candidate, such that p-1 is the product of prime factors of which all but at most one are less that B1 and at most one is between B1 and B2.

In your case it would make sense to increase B1 to 500'000 or 1M if you want to keep B2 at 50M.

preda 2020-01-07 10:08

[QUOTE=Prime95;534317]I tested B1=750000, B2=20*B1 on a 5M FFT expo and it took 26 minutes. Clearly a worthwhile investment if no P-1 has been done before (PRP lines in worktodo that do not end in ",0") .

Bonus. My test found a factor! So the P-1 code still works and another exponent bites the dust.[/QUOTE]

Can somebody please remind me what is the meaning of the last integer value ("0" below) in a PRP assignment such as:

PRP=700000F64405DAFE2EXXXXXXC85EEF72,1,2,91157779,-1,77,0

Do I understand correctly that when it's 0, it means "don't do any P-1"?
Then what does it mean when it's 1, 2, or what else can it be?


All times are UTC. The time now is 23:13.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.