gpuOwL: an OpenCL program for Mersenne primality testing
2019-12-27, 23:34   #1673
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·29·127 Posts

Quote:
 Originally Posted by paulunderwood Quoting from https://github.com/preda/gpuowl ... Is that all outdated?
8 to 9 months outdated, yes; announcement of Gpuowl v6.4 by Preda: https://www.mersenneforum.org/showpo...postcount=1056

 2019-12-31, 02:59 #1674 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 2×29×127 Posts 700M P-1 on gpuowl / P100 / colab It took ~1.74 days of run time, several colab sessions, with a Fan Ming-provided executable. https://www.mersenne.org/report_expo...0000031&full=1 Current projections from runtime scaling and buffer count trend is higher data points will take 2-4 days each, and throughout the mersenne.org range will be possible. The run times can probably be improved upon; I'm not using any of the performance enhancing T2_shuffle or merged-middle -use options during these runs. Last fiddled with by kriesel on 2019-12-31 at 03:01
 2019-12-31, 12:10 #1675 Lorenzo     Hello! How to switch gpuOwl to show the traditional "ms/it" instead us/sq?
2019-12-31, 12:32   #1676
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

163068 Posts

Quote:
 Originally Posted by Lorenzo Hello! How to switch gpuOwl to show the traditional "ms/it" instead us/sq?
Edit source code and recompile.

Last fiddled with by kriesel on 2019-12-31 at 12:33

2019-12-31, 12:44   #1677
Lorenzo

Aug 2010
Republic of Belarus

2·89 Posts

Quote:
 Originally Posted by kriesel Edit source code and recompile.
uhhhh. Too many efforts to me :(

2019-12-31, 12:50   #1678
paulunderwood

Sep 2002
Database er0rr

118816 Posts

Quote:
 Originally Posted by Lorenzo Hello! How to switch gpuOwl to show the traditional "ms/it" instead us/sq?
I don't know about a switch but...

Code:
grep -B3 -A3 "us/it"  Gpu.cpp
static string makeLogStr(u32 E, string_view status, u32 k, u64 res, float secsPerIt, u32 nIters) {
char buf[256];

snprintf(buf, sizeof(buf), "%u %2s %8d %6.2f%%; %4.0f us/it; ETA %s; %s",
E, status.data(), k, k / float(nIters) * 100,
secsPerIt * 1'000'000, getETA(k, nIters, secsPerIt).c_str(),
hex(res).c_str());
Change:

%4.0f us/it ---> %4.3f ms/it
1'000'000 ---> 1'000

And recompile.

Last fiddled with by paulunderwood on 2019-12-31 at 12:52

2020-01-03, 09:03   #1679
Lorenzo

Aug 2010
Republic of Belarus

2628 Posts

Quote:
 Originally Posted by paulunderwood Or just divide by 1000 in your head.
Thank you! This is what I'm looking for

2020-01-03, 23:45   #1680
storm5510
Random Account

Aug 2009
Not U. + S.A.

7·192 Posts

Quote:
 Originally Posted by kriesel Edit source code and recompile.
Quote:
 "The more you overtake the plumbing, the easier it is to stop up the drain." ~Jimmy Doohan.
The version I am using has a 9/29/2019 date stamp. I have only ran P-1's with it and there have been no problems. I would be reluctant to replace it with anything newer. I feel it needs to update the screen more often, but I live with it. It produces correct results, as far as I know. This is the important part.

 2020-01-04, 10:51 #1681 preda     "Mihai Preda" Apr 2015 22·192 Posts CARRY32 and CARRY64 A new optimization has been contributed by George, it consists in using only 32bits to store the carry-out from a word after the convolution. The theoretical analysis of whether this carry value does fit in 32bits or not is not very clear AFAIK, but the rough idea is that the higher the FFT size, the larger the expected value of the carry is. The new CARRY32 has been tested quite a bit at the wavefront (5M FFT) and never produced an error, OTOH the situation may be different at higher FFT sizes. The performance gain is significant at about 3-5%. Given the above, CARRY32 is now enabled by default. To get the old behavior one can supply "-use CARRY64" to gpuowl. PRP should detect a carry overflow (when using CARRY32) if that occurs (and report the usual error, and retry, and get a repetitive error 3 times and stop). OTOH P-1 has no check; probably it's safer to keep using CARRY64 when doing P-1, especially when using FFT sizes larger than 5M (which is the FFT that was tested a lot for now). If anybody sees an error which seems to be caused by CARRY32 (at any FFT size), please report it.
2020-01-04, 15:01   #1682
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×29×127 Posts
gpuowl-v6.11-112-gf1b00d1 Windows build

This should have the -use CARRY32 default that Preda described above. I've only gone as far as running -h on it so far. Build again had the usual shower of warnings.

Just when I think we're at diminishing returns or at the end of optimizations, George provides another pleasant surprise.
Attached Files
 gpuowl-v6.11-112-gf1b00d1.7z (436.7 KB, 305 views) make-warnings.txt (5.5 KB, 234 views)

Last fiddled with by kriesel on 2020-01-04 at 15:02

2020-01-04, 16:08   #1683
paulunderwood

Sep 2002
Database er0rr

23×3×11×17 Posts

Quote:
 Originally Posted by preda A new optimization has been contributed by George, it consists in using only 32bits to store the carry-out from a word after the convolution. The theoretical analysis of whether this carry value does fit in 32bits or not is not very clear AFAIK, but the rough idea is that the higher the FFT size, the larger the expected value of the carry is. The new CARRY32 has been tested quite a bit at the wavefront (5M FFT) and never produced an error, OTOH the situation may be different at higher FFT sizes. The performance gain is significant at about 3-5%. Given the above, CARRY32 is now enabled by default. To get the old behavior one can supply "-use CARRY64" to gpuowl. PRP should detect a carry overflow (when using CARRY32) if that occurs (and report the usual error, and retry, and get a repetitive error 3 times and stop). OTOH P-1 has no check; probably it's safer to keep using CARRY64 when doing P-1, especially when using FFT sizes larger than 5M (which is the FFT that was tested a lot for now). If anybody sees an error which seems to be caused by CARRY32 (at any FFT size), please report it.
I don't understand it. I git cloned gpuowl and compiled, and it runs slower than before 1240 us. vs. 750 us. What am I doing wrong?

Last fiddled with by paulunderwood on 2020-01-04 at 16:11

