mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2017-04-23, 16:07   #23
VictordeHolland
 
VictordeHolland's Avatar
 
"Victor de Hollander"
Aug 2011
the Netherlands

2×587 Posts
Default

Quote:
Originally Posted by kladner View Post
If I may say, as a spectator, and a non coder, it amazes me to watch this birth process. The cooperation and involvement by several parties is impressive. Seeing this play out is one of the big pay-offs for hanging out on this forum.
Preda deserves all credit for the coding. I'm just trying to compile it for Win64 and reporting the errors I get.
VictordeHolland is offline   Reply With Quote
Old 2017-04-23, 17:49   #24
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

26×157 Posts
Default

Quote:
Originally Posted by VictordeHolland View Post
Preda deserves all credit for the coding. I'm just trying to compile it for Win64 and reporting the errors I get.
The coding is the "prime" accomplishment <G>, but others taking an interest is like peer review. Both are needed, at least in most cases.
kladner is offline   Reply With Quote
Old 2017-04-23, 19:53   #25
VictordeHolland
 
VictordeHolland's Avatar
 
"Victor de Hollander"
Aug 2011
the Netherlands

2·587 Posts
Default

Succes here also with compiling it with MINGW64 for Windows. gpuOwL is faster and slightly lower error rates on my AMD HD7950 with the limited testing so far.

gpuOwL v0.1
Quote:
gpuOwL v0.1 GPU Lucas-Lehmer primality checker
Tahiti - OpenCL 1.2 AMD-APP (2079.5)
LL FFT 4096K (1024*2048*2) of 76000021 (18.12 bits/word) at iteration 0
OpenCL setup: 656 ms
00020000 / 76000021 [0.03%], ms/iter: 5.618, ETA: 4d 22:35; 000000009c13c393 error 0.185102 (max 0.185102)
00040000 / 76000021 [0.05%], ms/iter: 5.621, ETA: 4d 22:37; 00000000f08b90ef error 0.178388 (max 0.185102)
00060000 / 76000021 [0.08%], ms/iter: 5.627, ETA: 4d 22:42; 00000000d68504f2 error 0.186264 (max 0.186264)
00080000 / 76000021 [0.11%], ms/iter: 5.621, ETA: 4d 22:32; 00000000e93a873a error 0.191096 (max 0.191096)
00100000 / 76000021 [0.13%], ms/iter: 5.609, ETA: 4d 22:15; 0000000035a87d3f error 0.198382 (max 0.198382)
clLucas v1.04
Quote:
C:\clLucas_x64_1.04>clLucas_x64

Platform 0 : Advanced Micro Devices, Inc.
Warning: Couldn't parse ini file option SixStepFFT; using default: off
Platform :Advanced Micro Devices, Inc.
Device 0 : Tahiti

Build Options are : -D KHR_DP_EXTENSION

CL_DEVICE_NAME Tahiti
CL_DEVICE_VENDOR Advanced Micro Devices, Inc.
CL_DEVICE_VERSION OpenCL 1.2 AMD-APP (2079.5)
CL_DRIVER_VERSION 2079.5 (VM)
CL_DEVICE_MAX_COMPUTE_UNITS 28
CL_DEVICE_MAX_CLOCK_FREQUENCY 900
CL_DEVICE_GLOBAL_MEM_SIZE 3221225472
CL_DEVICE_MAX_WORK_GROUP_SIZE 256
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE 1

<FFT tests>

Starting M76000021 fft length = 4096K
Running careful round off test for 1000 iterations. If average error >= 0.25, th
e test will restart with a larger FFT length.
Iteration 100, average error = 0.13540, max error = 0.18750
Iteration 200, average error = 0.16145, max error = 0.18750
Iteration 300, average error = 0.17013, max error = 0.18750
Iteration 400, average error = 0.17448, max error = 0.18750
Iteration 500, average error = 0.17724, max error = 0.20313
Iteration 600, average error = 0.18155, max error = 0.20313
Iteration 700, average error = 0.18463, max error = 0.20313
Iteration 800, average error = 0.18876, max error = 0.21875
Iteration 900, average error = 0.19209, max error = 0.21875
Iteration 1000, average error = 0.19474 < 0.25 (max error = 0.21875), continuing
test.
Iteration 20000 M( 76000021 )C, 0x27d2e8539c13c393, n = 4096K, clLucas v1.04 err
= 0.2188 (2:01 real, 6.0576 ms/iter, ETA 127:50:52)
Iteration 40000 M( 76000021 )C, 0xbdeced3ff08b90ef, n = 4096K, clLucas v1.04 err
= 0.2188 (2:02 real, 6.0910 ms/iter, ETA 128:31:14)
Iteration 60000 M( 76000021 )C, 0x7c3f23c5d68504f2, n = 4096K, clLucas v1.04 err
= 0.2188 (2:02 real, 6.0879 ms/iter, ETA 128:25:15)
Iteration 80000 M( 76000021 )C, 0x02b75b4ce93a873a, n = 4096K, clLucas v1.04 err
= 0.2188 (2:02 real, 6.0903 ms/iter, ETA 128:26:11)
Iteration 100000 M( 76000021 )C, 0xa04bf1ff35a87d3f, n = 4096K, clLucas v1.04 er
r = 0.2188 (2:02 real, 6.0895 ms/iter, ETA 128:23:10)
VictordeHolland is offline   Reply With Quote
Old 2017-04-23, 19:58   #26
VictordeHolland
 
VictordeHolland's Avatar
 
"Victor de Hollander"
Aug 2011
the Netherlands

2·587 Posts
Default

'Guide' for compiling it on Windows with msys64+MINGW64
1. Assuming you have installed msys64 with MINGW64 and you'll need a texteditor (I prefer notepad++)
2. Download+Install AMD SDK 3.0 from http://developer.amd.com/tools-and-s...ssing-app-sdk/
3. Download the gpuowl code from https://github.com/preda/gpuowl (easiest is to download the entire map as a .zip)
4. Extract .zip (preferably to somewhere you can navigate to easily with msys64, so for instance msys64\home\gpuowl)
5. open the Makefile (that is in the gpuowlmap) with texteditor (notepad++)
6. edit the path behind "-L" argument to the OpenCL SDK install path
standard path is C:\Program Files (x86)\AMD APP SDK\3.0\lib\x86_64
I copied the map contents to msys64\home\OpenCL\SDK3 for easier referencing since I always get confused with referencing with spaces and brackets.
In my case the makefile contains:
Code:
gpuowl: gpuowl.cpp clwrap.h tinycl.h
    g++ -O2 -std=c++14 gpuowl.cpp -ogpuowl -L\C\msys64\home\OpenCL\SDK3 -lOpenCL
Don't forget to save the changes ;).

6.1 [OPTIONAL] if you don't have a OpenCL2.0 device edit the clwrap.h file and on line 89 change "-cl-std=CL2.0" to "-cl-std=CL1.2"

7. start msys64/MINGW64 shell and navigate to the gpuowl map. If you put the map in the /home directory of msys64 you can easily go there by typing: "cd .." to get to the home directory. and then "cd yourgpuowlmapname"
8. 'make'

Should look something like this so far
Code:
MINGW64 ~
$ cd ..

MINGW64 /home
$ cd gpuOwlv0.1/

MINGW64 /home/gpuOwlv0.1
$make
g++ -O2 -std=c++14 gpuowl.cpp -ogpuowl -L\C\msys64\home\OpenCL\SDK3 -lOpenCL

MINGW64 /home/gpuOwlv0.1
$
9. Assuming you didn't get any errors, you should now have a gpuowl.exe in the gpuowl directory.
9.1 [OPTIONAL] If you wish to move the gpuowl directory somewhere else, you probably need to copy these 3 .dlls into the directory:
libgcc_s_seh-1.dll
libstdc++-6.dll
libwinpthread-1.dll
10. create a worktodo.txt and start testing.

Please note gpuOwl is not production ready.
VictordeHolland is offline   Reply With Quote
Old 2017-04-25, 04:41   #27
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

22·7·11·29 Posts
Default

Can someone post or PM me a windoze exe? [edit: x64]. We still have trouble compiling it (the same troubles as above - but the troubles are most probably related to our ignorance with the tools).

We are going to give it a test too - we own an old "XFX HD7970 GHz" here.

P.S. we fully understand that it is not "production ready" yet, but if it is faster than clLucas and it is giving out the same DC residue, for sure we will report it to PrimeNet and get some fast credits!

Last fiddled with by LaurV on 2017-04-25 at 04:47
LaurV is offline   Reply With Quote
Old 2017-04-25, 11:42   #28
VictordeHolland
 
VictordeHolland's Avatar
 
"Victor de Hollander"
Aug 2011
the Netherlands

117410 Posts
Default

Quote:
Originally Posted by LaurV View Post
Can someone post or PM me a windoze exe? [edit: x64]. We still have trouble compiling it (the same troubles as above - but the troubles are most probably related to our ignorance with the tools).

We are going to give it a test too - we own an old "XFX HD7970 GHz" here.

P.S. we fully understand that it is not "production ready" yet, but if it is faster than clLucas and it is giving out the same DC residue, for sure we will report it to PrimeNet and get some fast credits!
I've got a HD7950, so I'm hoping my executable should work for you too.
I'll send it when I get home tonight.
VictordeHolland is offline   Reply With Quote
Old 2017-04-25, 15:32   #29
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

22×7×11×29 Posts
Default

Quote:
Originally Posted by VictordeHolland View Post
(I prefer notepad++)
5. <snip> texteditor (notepad++)
The CIA hacked stuff, huh?

(I know why I always used pn2... haha)

Last fiddled with by LaurV on 2017-04-25 at 15:36 Reason: link
LaurV is offline   Reply With Quote
Old 2017-04-25, 16:48   #30
VictordeHolland
 
VictordeHolland's Avatar
 
"Victor de Hollander"
Aug 2011
the Netherlands

2·587 Posts
Default

Quote:
Originally Posted by LaurV View Post
The CIA hacked stuff, huh?

(I know why I always used pn2... haha)
Windows itself probably has multiple/dozens? of unpatched zero-days only known to the three letter agencies. So I wouldn't lose any sleep over it .
VictordeHolland is offline   Reply With Quote
Old 2017-04-26, 01:51   #31
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

132810 Posts
Default

Thanks for the MinGW compilation, and the screenshots! The screenshots show there's an error in printing the residue (leading digits being 0) -- that's hopefully fixed now (not a big deal, the problem was just 'cosmetic').

A small update on where gpuOwL is, and what I'm planning on next.
I was really worried by the results of some of my own testing -- the LL was failing on known primes (24036583). Thus I decided to do some more serious testing to find the cause of the error.

But after all this investigation, my conclusion was that it's not a software bug, but the GPU producing.. an erroneous result very rarely. This is disconcerning, and I'd really like to have a way to detect such problems.

The LL involves two distinct computations. One is FFT-Square-IFFT, the second is "round-to-int + Carry-propagation". An error can occur in either of these, and these are the detection mechanisms that I know of:

- evaluating the "max rounding error" that occurs when rounding-to-int, after the IFFT and before the carry-propagation. This is cheap to compute on the GPU, thus is always on (and printed on every logstep). This rounding error brings two pieces of information: 1. whether the FFT size is big enough for the chosen exponent, and 2. whether something went completely wrong with the FFT/IFFT. I plan to add provisions in the code for detecting a sudden jump in the rounding error (which may indicate FFT error), and re-run the last batch in that situation to check for consistent results.

- evaluating the SUMINP / SUMOUT of the FFT (which is done by the CPU prime95). This is not implemented, because is seems (to me) expensive to do on the GPU. This check would have provided very good detection for FFT/IFFT errors, but no protection against rounding&carry-propagation errors.

- using "offset". This changes the values fed to the FFT/IFFT, and again protects against FFT errors (but without detecting them "in real time").

As is, there is no check that I know of that covers the carry-propagation. If an error takes place in that part, it would not be detected by either the max-rounding-error check or the SUMINP/SUMOUT check. I would be interested in finding out about a GPU-cheap way to check that the integer digits of the modulo-convolution done by LL are not completely haywire.

Development plan:
- implement "offset", and measure performance impact. If impact is small, leave it always-on.
- check for sudden jumps in rounding-error, and automatically re-try in that situation. May help detect too-overclocked GPUs (but not always).
- add some simple self-test, which would run on know-primes and compare residues with a pre-saved residue list, to detect obvious errors. (to detect more subtle errors, a good but expensive way is to run to-completion on know primes (and check 0 residue), or double-check validated results).

Still missing:
- ability to select specific GPU in a multi-GPU system (right now, simply uses the first GPU)
- get some ISA dumps (produced with "-cl -save-temps") and analyze to investigate the low performance reported
- add ability to dump compiled binary (for OpenCLs that do not offer "-save-temps")
preda is offline   Reply With Quote
Old 2017-04-26, 02:06   #32
science_man_88
 
science_man_88's Avatar
 
"Forget I exist"
Jul 2009
Dumbassville

8,369 Posts
Default

Quote:
Originally Posted by preda View Post
Thanks for the MinGW compilation, and the screenshots! The screenshots show there's an error in printing the residue (leading digits being 0) -- that's hopefully fixed now (not a big deal, the problem was just 'cosmetic').

A small update on where gpuOwL is, and what I'm planning on next.
I was really worried by the results of some of my own testing -- the LL was failing on known primes (24036583). Thus I decided to do some more serious testing to find the cause of the error.

But after all this investigation, my conclusion was that it's not a software bug, but the GPU producing.. an erroneous result very rarely. This is disconcerning, and I'd really like to have a way to detect such problems.

The LL involves two distinct computations. One is FFT-Square-IFFT, the second is "round-to-int + Carry-propagation". An error can occur in either of these, and these are the detection mechanisms that I know of:

- evaluating the "max rounding error" that occurs when rounding-to-int, after the IFFT and before the carry-propagation. This is cheap to compute on the GPU, thus is always on (and printed on every logstep). This rounding error brings two pieces of information: 1. whether the FFT size is big enough for the chosen exponent, and 2. whether something went completely wrong with the FFT/IFFT. I plan to add provisions in the code for detecting a sudden jump in the rounding error (which may indicate FFT error), and re-run the last batch in that situation to check for consistent results.

- evaluating the SUMINP / SUMOUT of the FFT (which is done by the CPU prime95). This is not implemented, because is seems (to me) expensive to do on the GPU. This check would have provided very good detection for FFT/IFFT errors, but no protection against rounding&carry-propagation errors.

- using "offset". This changes the values fed to the FFT/IFFT, and again protects against FFT errors (but without detecting them "in real time").

As is, there is no check that I know of that covers the carry-propagation. If an error takes place in that part, it would not be detected by either the max-rounding-error check or the SUMINP/SUMOUT check. I would be interested in finding out about a GPU-cheap way to check that the integer digits of the modulo-convolution done by LL are not completely haywire.

Development plan:
- implement "offset", and measure performance impact. If impact is small, leave it always-on.
- check for sudden jumps in rounding-error, and automatically re-try in that situation. May help detect too-overclocked GPUs (but not always).
- add some simple self-test, which would run on know-primes and compare residues with a pre-saved residue list, to detect obvious errors. (to detect more subtle errors, a good but expensive way is to run to-completion on know primes (and check 0 residue), or double-check validated results).

Still missing:
- ability to select specific GPU in a multi-GPU system (right now, simply uses the first GPU)
- get some ISA dumps (produced with "-cl -save-temps") and analyze to investigate the low performance reported
- add ability to dump compiled binary (for OpenCLs that do not offer "-save-temps")
I'm not even really that involved in GIMPS but is it returning 0xFFFFF... ( hex for all 1's in binary) that was my first thought if so it may be returning the mersenne number itself ( or a multiple) instead of 0. I believe something similar happened in Prime95 at one point ( unless my memory of what I read is foggy).

Last fiddled with by science_man_88 on 2017-04-26 at 02:09
science_man_88 is offline   Reply With Quote
Old 2017-04-26, 03:07   #33
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

24·83 Posts
Default

No, it's not all-1s. I ran 24036583 twice, the second time the result was correct (0). I tracked the difference between the two runs by compared the residues, and at some point around 13% the residues diverged. It means, in the first run an error occurred at that point. Given that the software is supposed to be deterministic (produce identical bits every time), this could be explained by the hardware behaving funny.
preda is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1657 2020-10-27 01:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 11:06.

Fri Nov 27 11:06:51 UTC 2020 up 78 days, 8:17, 4 users, load averages: 1.15, 1.41, 1.42

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.