mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

kriesel 2019-12-27 23:34

[QUOTE=paulunderwood;533664]Quoting from [URL]https://github.com/preda/gpuowl[/URL]
...
Is that all outdated?[/QUOTE]8 to 9 months outdated, yes; announcement of Gpuowl v6.4 by Preda: [URL]https://www.mersenneforum.org/showpost.php?p=513288&postcount=1056[/URL]

kriesel 2019-12-31 02:59

700M P-1 on gpuowl / P100 / colab
 
It took ~1.74 days of run time, several colab sessions, with a Fan Ming-provided executable. [URL]https://www.mersenne.org/report_exponent/?exp_lo=700000031&full=1[/URL] Current projections from runtime scaling and buffer count trend is higher data points will take 2-4 days each, and throughout the mersenne.org range will be possible. The run times can probably be improved upon; I'm not using any of the performance enhancing T2_shuffle or merged-middle -use options during these runs.

Lorenzo 2019-12-31 12:10

Hello!
How to switch gpuOwl to show the traditional "ms/it" instead us/sq?

kriesel 2019-12-31 12:32

[QUOTE=Lorenzo;533824]Hello!
How to switch gpuOwl to show the traditional "ms/it" instead us/sq?[/QUOTE]
Edit source code and recompile.

Lorenzo 2019-12-31 12:44

[QUOTE=kriesel;533825]Edit source code and recompile.[/QUOTE]

uhhhh. Too many efforts to me :(

paulunderwood 2019-12-31 12:50

[QUOTE=Lorenzo;533824]Hello!
How to switch gpuOwl to show the traditional "ms/it" instead us/sq?[/QUOTE]

I don't know about a switch but...


[CODE]grep -B3 -A3 "us/it" Gpu.cpp
static string makeLogStr(u32 E, string_view status, u32 k, u64 res, float secsPerIt, u32 nIters) {
char buf[256];

snprintf(buf, sizeof(buf), "%u %2s %8d %6.2f%%; %4.0f us/it; ETA %s; %s",
E, status.data(), k, k / float(nIters) * 100,
secsPerIt * 1'000'000, getETA(k, nIters, secsPerIt).c_str(),
hex(res).c_str());
[/CODE]

Change:

%4.0f us/it ---> %4.3f ms/it
1'000'000 ---> 1'000

And recompile.

Or just divide by 1000 in your head. :crank:

Lorenzo 2020-01-03 09:03

[QUOTE=paulunderwood;533827]Or just divide by 1000 in your head. :crank:[/QUOTE]

Thank you! This is what I'm looking for :smile:

storm5510 2020-01-03 23:45

[QUOTE=kriesel;533825]Edit source code and recompile.[/QUOTE]

[QUOTE]"The more you overtake the plumbing, the easier it is to stop up the drain." ~Jimmy Doohan.[/QUOTE]The version I am using has a 9/29/2019 date stamp. I have only ran P-1's with it and there have been no problems. I would be reluctant to replace it with anything newer. I feel it needs to update the screen more often, but I live with it. It produces correct results, as far as I know. This is the important part.

preda 2020-01-04 10:51

CARRY32 and CARRY64
 
A new optimization has been contributed by George, it consists in using only 32bits to store the carry-out from a word after the convolution. The theoretical analysis of whether this carry value does fit in 32bits or not is not very clear AFAIK, but the rough idea is that the higher the FFT size, the larger the expected value of the carry is. The new CARRY32 has been tested quite a bit at the wavefront (5M FFT) and never produced an error, OTOH the situation may be different at higher FFT sizes.

The performance gain is significant at about 3-5%. Given the above, CARRY32 is now enabled by default. To get the old behavior one can supply "-use CARRY64" to gpuowl.

PRP should detect a carry overflow (when using CARRY32) if that occurs (and report the usual error, and retry, and get a repetitive error 3 times and stop).

OTOH P-1 has no check; probably it's safer to keep using CARRY64 when doing P-1, especially when using FFT sizes larger than 5M (which is the FFT that was tested a lot for now).

If anybody sees an error which seems to be caused by CARRY32 (at any FFT size), please report it.

kriesel 2020-01-04 15:01

gpuowl-v6.11-112-gf1b00d1 Windows build
 
2 Attachment(s)
This should have the -use CARRY32 default that Preda described above. I've only gone as far as running -h on it so far. Build again had the usual shower of warnings.

Just when I think we're at diminishing returns or at the end of optimizations, George provides another pleasant surprise.

paulunderwood 2020-01-04 16:08

[QUOTE=preda;534207]A new optimization has been contributed by George, it consists in using only 32bits to store the carry-out from a word after the convolution. The theoretical analysis of whether this carry value does fit in 32bits or not is not very clear AFAIK, but the rough idea is that the higher the FFT size, the larger the expected value of the carry is. The new CARRY32 has been tested quite a bit at the wavefront (5M FFT) and never produced an error, OTOH the situation may be different at higher FFT sizes.

The performance gain is significant at about 3-5%. Given the above, CARRY32 is now enabled by default. To get the old behavior one can supply "-use CARRY64" to gpuowl.

PRP should detect a carry overflow (when using CARRY32) if that occurs (and report the usual error, and retry, and get a repetitive error 3 times and stop).

OTOH P-1 has no check; probably it's safer to keep using CARRY64 when doing P-1, especially when using FFT sizes larger than 5M (which is the FFT that was tested a lot for now).

If anybody sees an error which seems to be caused by CARRY32 (at any FFT size), please report it.[/QUOTE]

I don't understand it. I git cloned gpuowl and compiled, and it runs slower than before 1240 us. vs. 750 us. What am I doing wrong?


All times are UTC. The time now is 23:13.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.