mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
Thread Tools
Old 2019-12-09, 12:50   #1530
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

317 Posts
Default

RTX 2080, Linux. Just some general observations and incoherent rambling, I haven't done that much tuning. In fact I haven't touched gpuOwL at all before this past weekend, so it's all a bit new to me.

I started with whatever was committed to github up until 2019-12-04. First I had some issues with the compilation, but that was due to those #pragma statements commented out in gpuowl.cl (fixed now).

Then the program apparently needed the -use NO_ASM option to run (thanks SELROC for pointing that out on IRC).

After that I got the program running... but the timings seemed to be all over the place. 2816K was 3.743 ms but the next one I tested, 5120K, was 3.884 ms/iter, and the difference should be bigger, so something must be wrong. Well, after some fiddling around, I found out how to specify the FFT options (width/height/middle) and found that the default settings were most of the time not the fastest ones. Maybe I should have read through this thread better...

Anyway, how would one specify both FFT size and other options? -fft 5632K and -fft +2 seem to be mutually exclusive, only one works. And it would be really useful to have these options in some configuration file, so the program is ready to use even if the FFT size changes (and the new size is faster with different options).

Then came the commits from yesterday (2019-12-08). On the RTX2080, the calculation is limited by FP64 units, not memory bandwidth (memory bus usage is in the 20-30% range depending on FFT size), but there were still some noticeable improvements. For example, 5632K (-fft +2) was 4.396 ms/iter before, and 4.237 ms after the update, using MERGED_MIDDLE. So that's almost 4% better. The improvements vary quite a bit, from 0% to 5.5%, and the average across all FFT sizes and parameters I tested (2M to 20M) was 2.2%.

Another comparison: CUDALucas with the closest applicable FFT size (5760K) and the same hardware is 5.585 ms/iter, so gpuOwL is a bit over 30% faster. Of course the difference varies quite a lot there, too, but 20-30% seems to be the norm.

One observation, though, about that MERGED_MIDDLE improvement. If the FFT size happens to be one without that "middle" part (2M, 4M, 8M, 16M...) and the dumb user (me) still instructs the program to -use MERGED_MIDDLE then the calculation will fail. In hindsight this shouldn't be a surprise, but I plead ignorance and the effects of a Monday morning. The error is :
Code:
2019-12-09 08:24:18 38000009 EE        0 loaded: blockSize 400, 0000000000000000 (expected 0000000000000003)
2019-12-09 08:24:18 Exiting because "error on load"
nomead is offline   Reply With Quote
Old 2019-12-09, 15:33   #1531
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23·271 Posts
Default

Some warnings when compiling... Probably unimportant.
Code:
In file included from common.cpp:4:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':
File.h:31:25: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
   31 |       log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
      |                        ~^                  ~~~~~~~~~~~~
      |                         |                            |
      |                         char*                        const value_type* {aka const wchar_t*}
      |                        %hs
In file included from Worktodo.cpp:6:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':
File.h:31:25: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
   31 |       log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
      |                        ~^                  ~~~~~~~~~~~~
      |                         |                            |
      |                         char*                        const value_type* {aka const wchar_t*}
      |                        %hs
In file included from main.cpp:8:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':
File.h:31:25: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
   31 |       log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
      |                        ~^                  ~~~~~~~~~~~~
      |                         |                            |
      |                         char*                        const value_type* {aka const wchar_t*}
      |                        %hs
In file included from clwrap.cpp:4:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':
File.h:31:25: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
   31 |       log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
      |                        ~^                  ~~~~~~~~~~~~
      |                         |                            |
      |                         char*                        const value_type* {aka const wchar_t*}
      |                        %hs
In file included from ProofSet.h:6,
                 from Gpu.cpp:4:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':
File.h:31:25: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
   31 |       log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
      |                        ~^                  ~~~~~~~~~~~~
      |                         |                            |
      |                         char*                        const value_type* {aka const wchar_t*}
      |                        %hs
In file included from Task.cpp:7:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':
File.h:31:25: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
   31 |       log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
      |                        ~^                  ~~~~~~~~~~~~
      |                         |                            |
      |                         char*                        const value_type* {aka const wchar_t*}
      |                        %hs
In file included from checkpoint.h:5,
                 from checkpoint.cpp:3:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':
File.h:31:25: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
   31 |       log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
      |                        ~^                  ~~~~~~~~~~~~
      |                         |                            |
      |                         char*                        const value_type* {aka const wchar_t*}
      |                        %hs
In file included from Args.cpp:4:
File.h: In static member function 'static File File::open(const std::filesystem::__cxx11::path&, const char*, bool)':
File.h:31:25: warning: format '%s' expects argument of type 'char*', but argument 2 has type 'const value_type*' {aka 'const wchar_t*'} [-Wformat=]
   31 |       log("Can't open '%s' (mode '%s')\n", name.c_str(), mode);
      |                        ~^                  ~~~~~~~~~~~~
      |                         |                            |
      |                         char*                        const value_type* {aka const wchar_t*}
      |                        %hs
kracker is offline   Reply With Quote
Old 2019-12-09, 16:04   #1532
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Quote:
Originally Posted by kracker View Post
Some warnings when compiling... Probably unimportant.
Maybe https://www.mersenneforum.org/showpo...6&postcount=40 will help.
kriesel is offline   Reply With Quote
Old 2019-12-09, 17:38   #1533
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default Feature request: OpenCL version test

Preda, please add an OpenCL version test. As previously posted, https://www.mersenneforum.org/showpo...postcount=1354

Gpuowl will not run on a test gpu Quadro 2000 (compute capability 2.1, opencl 1.1/1.2), and assorted other older gpus, producing a shower of cl compile errors relating to atomics. I think it requires at least OpenCL 2 and therefore a CUDA compute capability above 2.x. An explicit test for opencl version by gpuowl and clear message if the version is too low might be a good thing. ("Gpuowl requires OpenCL 2 support for atomics, which this gpu does not appear to support. Exiting now." or some such helpful message.)

Last fiddled with by kriesel on 2019-12-09 at 17:38
kriesel is offline   Reply With Quote
Old 2019-12-09, 18:41   #1534
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

216810 Posts
Default

Some numbers:

RX570
Code:
5033 NO_ASM
4384 NO_ASM,MERGED_MIDDLE
7285 NO_ASM,MERGED_MIDDLE,WORKINGIN
4365 NO_ASM,MERGED_MIDDLE,WORKINGIN1
4360 NO_ASM,MERGED_MIDDLE,WORKINGIN1A
4459 NO_ASM,MERGED_MIDDLE,WORKINGIN2
4381 NO_ASM,MERGED_MIDDLE,WORKINGIN3
4358 NO_ASM,MERGED_MIDDLE,WORKINGIN5
7433 NO_ASM,MERGED_MIDDLE,WORKINGOUT
5818 NO_ASM,MERGED_MIDDLE,WORKINGOUT0
4400 NO_ASM,MERGED_MIDDLE,WORKINGOUT1
4410 NO_ASM,MERGED_MIDDLE,WORKINGOUT1A
4762 NO_ASM,MERGED_MIDDLE,WORKINGOUT2
4385 NO_ASM,MERGED_MIDDLE,WORKINGOUT3
4610 NO_ASM,MERGED_MIDDLE,WORKINGOUT4
4517 NO_ASM,MERGED_MIDDLE,WORKINGOUT5
Tesla P100
Code:
1318 NO_ASM
951 NO_ASM,MERGED_MIDDLE
945 NO_ASM,MERGED_MIDDLE,WORKINGIN
944 NO_ASM,MERGED_MIDDLE,WORKINGIN1 
952 NO_ASM,MERGED_MIDDLE,WORKINGIN1A 
945 NO_ASM,MERGED_MIDDLE,WORKINGIN2 
952 NO_ASM,MERGED_MIDDLE,WORKINGIN3 
939 NO_ASM,MERGED_MIDDLE,WORKINGIN4
942 NO_ASM,MERGED_MIDDLE,WORKINGIN5 
948 NO_ASM,MERGED_MIDDLE,WORKINGOUT 
948 NO_ASM,MERGED_MIDDLE,WORKINGOUT0 
948 NO_ASM,MERGED_MIDDLE,WORKINGOUT1
956 NO_ASM,MERGED_MIDDLE,WORKINGOUT1A 
948 NO_ASM,MERGED_MIDDLE,WORKINGOUT2 
951 NO_ASM,MERGED_MIDDLE,WORKINGOUT3 
954 NO_ASM,MERGED_MIDDLE,WORKINGOUT4 
949 NO_ASM,MERGED_MIDDLE,WORKINGOUT5

Last fiddled with by kracker on 2019-12-09 at 18:58
kracker is offline   Reply With Quote
Old 2019-12-09, 19:50   #1535
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Quote:
Originally Posted by kracker View Post
Some numbers:
Wow, 15% and 40%.
Thanks for running these.
Please try on any other colab gpu models when you get a chance.

Last fiddled with by kriesel on 2019-12-09 at 19:50
kriesel is offline   Reply With Quote
Old 2019-12-09, 19:53   #1536
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Of course, that was from a huge sample size of 1 nVidia card.
The need for sleep and a more efficient way of getting the data intervened. You're welcome. I also had multiple gpus tied up in P-1 limit and runtime scaling runs in gpuowl versions predating P-1 save file capability at the time.
kriesel is offline   Reply With Quote
Old 2019-12-09, 19:58   #1537
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

Ken, I'm not confident that I can do the OpenCL version test reliably. For example, until recently, ROCm OpenCL was self-reporting as being OpenCL 1.2 although it was compiling fine 2.0. I'm worried that adding this check would not even attempt to compile in such a situation.

That said, I added an OpenCL 2.0 version check, please try it out on the old cards.

Quote:
Originally Posted by kriesel View Post
Preda, please add an OpenCL version test. As previously posted, https://www.mersenneforum.org/showpo...postcount=1354

Gpuowl will not run on a test gpu Quadro 2000 (compute capability 2.1, opencl 1.1/1.2), and assorted other older gpus, producing a shower of cl compile errors relating to atomics. I think it requires at least OpenCL 2 and therefore a CUDA compute capability above 2.x. An explicit test for opencl version by gpuowl and clear message if the version is too low might be a good thing. ("Gpuowl requires OpenCL 2 support for atomics, which this gpu does not appear to support. Exiting now." or some such helpful message.)

Last fiddled with by preda on 2019-12-09 at 20:14
preda is online now   Reply With Quote
Old 2019-12-09, 20:17   #1538
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default gpuowl feature request

Gpuowl feature request: P-1 res64 check for special very likely bad interim residues 0x00 and 0x01. P-1 is currently computing on the high wire without a safety net.

Undetected errors could cost hours or days in single lengthy P-1 runs, and also missed factors.



Come to think of it, that res64 check could save some lost PRP time too when errors occur. Less incentive there though, since the excellent GEC safety net catches the errors eventually.
Code:
2019-12-08 06:01:08 91305491 OK 62250000  68.18%; 1184 us/sq; ETA 0d 09:34; 5cf68328b1473b4a (check 0.90s) 2 errors
2019-12-08 06:02:07 91305491    62300000  68.23%; 1184 us/sq; ETA 0d 09:32; 3efaa597c7d9c53d
2019-12-08 06:03:06 91305491    62350000  68.29%; 1184 us/sq; ETA 0d 09:31; 392c10b87c906301
2019-12-08 06:04:05 91305491    62400000  68.34%; 1178 us/sq; ETA 0d 09:28; 0000000000000000 <-- already have the res64 for output, test it, return to 62250000, save 100,000 additional bad iterations until the GEC check
2019-12-08 06:05:03 91305491    62450000  68.40%; 1155 us/sq; ETA 0d 09:15; 0000000000000000
2019-12-08 06:06:02 91305491 EE 62500000  68.45%; 1156 us/sq; ETA 0d 09:15; 0000000000000000 (check 0.90s) 2 errors
2019-12-08 06:07:02 91305491    62300000  68.23%; 1204 us/sq; ETA 0d 09:42; 3efaa597c7d9c53d
2019-12-08 06:08:01 91305491    62350000  68.29%; 1184 us/sq; ETA 0d 09:31; 392c10b87c906301
2019-12-08 06:09:01 91305491    62400000  68.34%; 1184 us/sq; ETA 0d 09:30; e0c75b60654dbfb4
2019-12-08 06:10:00 91305491    62450000  68.40%; 1183 us/sq; ETA 0d 09:29; c271cc2b8386285f
2019-12-08 06:11:00 91305491 OK 62500000  68.45%; 1183 us/sq; ETA 0d 09:28; 070950e467249083 (check 0.96s) 3 errors
Average savings would be 125,000 iterations (2.5 minutes at wavefront on Radeon VII, proportionally higher on bigger exponents or slower gpus), min 50,000 (~1 minute), max 200,000 (about 4 minutes per error in this case)
Code:
2019-12-03 21:19:42 89064097 OK 75500000  84.77%; 1214 us/sq; ETA 0d 04:34; fba20ffb703f9fb7 (check 0.91s)
2019-12-03 21:20:42 89064097    75550000  84.83%; 1189 us/sq; ETA 0d 04:28; 0000000000000000
2019-12-03 21:21:40 89064097    75600000  84.88%; 1167 us/sq; ETA 0d 04:22; 0000000000000000
2019-12-03 21:22:39 89064097    75650000  84.94%; 1171 us/sq; ETA 0d 04:22; 0000000000000000
2019-12-03 21:23:38 89064097    75700000  84.99%; 1171 us/sq; ETA 0d 04:21; 0000000000000000
2019-12-03 21:24:37 89064097 EE 75750000  85.05%; 1172 us/sq; ETA 0d 04:20; 0000000000000000 (check 0.94s)
2019-12-03 21:25:39 89064097    75550000  84.83%; 1239 us/sq; ETA 0d 04:39; 49985b238359ff96
2019-12-03 21:26:40 89064097    75600000  84.88%; 1217 us/sq; ETA 0d 04:33; 78c0f7429d9a238f
2019-12-03 21:27:41 89064097    75650000  84.94%; 1217 us/sq; ETA 0d 04:32; efab7475b57165bb
2019-12-03 21:28:42 89064097    75700000  84.99%; 1216 us/sq; ETA 0d 04:31; f43c80e5de778e68
2019-12-03 21:29:44 89064097 OK 75750000  85.05%; 1212 us/sq; ETA 0d 04:29; d83c92710ddb50e8 (check 0.91s) 1 errors
It appears the majority of PRP3 GEC errors on my Radeon VII are of the 0x00 variety. I've not seen 0x01 yet. The rest are seemingly normal residue values.

Last fiddled with by kriesel on 2019-12-09 at 20:29
kriesel is offline   Reply With Quote
Old 2019-12-09, 20:25   #1539
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

165678 Posts
Default

Latest windows build (with a fix for power-of-two FFT size with MERGED_MIDDLE).

https://www.dropbox.com/s/bxty3e5qz5...l-win.exe?dl=0
Prime95 is offline   Reply With Quote
Old 2019-12-09, 20:27   #1540
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Quote:
Originally Posted by preda View Post
Ken, I'm not confident that I can do the OpenCL version test reliably. For example, until recently, ROCm OpenCL was self-reporting as being OpenCL 1.2 although it was compiling fine 2.0. I'm worried that adding this check would not even attempt to compile in such a situation.

That said, I added an OpenCL 2.0 version check, please try it out on the old cards.
Thanks, will do. Even just a less than perfectly reliable warning as to why there may be trouble will help us ordinary users. I still don't know whether a certain CUDA gpu that failed to do gpuowl P-1 was because of OpenCL level, a bad -maxAlloc value, or something else. I have the impression I should go back and retest many for limits with some very recent version. If I recall correctly, the memory handling got better since sometime after v6.9. https://www.mersenneforum.org/showpo...postcount=1361
I recently found that V6.11-9 could do P-1 on a 2GB RX550 that a 3GB GTX1060 with v6.9-0-gc137007 could not.
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 10:22.


Fri Aug 6 10:22:40 UTC 2021 up 14 days, 4:51, 1 user, load averages: 3.36, 3.71, 3.79

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.