mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

kriesel 2019-06-23 16:01

[QUOTE=Prime95;519608]I thought Mihai had agreed to make type-1 residues gpuowl's default. However, it is still producing type-4.[/QUOTE]Gpuowl implemented LL through v0.6, PRP with residue type 4 initially (v0.7 to at least 1.1), switched to type 1 by v1.5 and continued it to at least v3.9, then back to type 4 when PRP-1 was added in v4; when P-1 was separated in v6.0, PRP3 remained type 4 through at least v6.5.

I've started a reference table available at [URL]https://www.mersenneforum.org/showpost.php?p=519603&postcount=15[/URL] including a couple other variables too (like when nonzero offset was available in gpuowl, or Jacobi check available in the LL flavors). It's incomplete and a work in progress. I haven't tested, built, downloaded, or even identified the commits for all the 0.1 increment versions yet.

Some useful versions in my opinion are:
v0.5 LL with pseudorandom offset, no Jacobi check; most efficient near the upper limit of the 4M fft ~70-77M exponent; useful for helping DC past LL first tests

v0.6 LL with Jacobi check for helping DC past LL first tests done with nonzero offset; most efficient near the upper limit of the 4M fft ~70-77M exponent; I think zero offset only

v1.9 PRP DC, 4M is fast, limited to zero offset, type 1 residues. (2, 4, 8M; fastest times for each that I've seen in testing on RX480. Although driver updates necessary for v2.0 support that caused a 5% slowdown affected that.)

v3.8 PRP, 8M for ~150M exponents is fast; type 1 residues, zero offset limitation

V6.2-6.5 PRP type 4 residues, many fft lengths, and speeds I've checked are competitive with the best of the previous versions, latest and greatest, limited to zero offset, separate P-1 (which runs for some but I've had crashes with the P-1 in every attempt)

Iteration timing benchmarks vs. a variety of gpuowl versions and fft lengths run on the same system and RX480 gpu are available at [URL]https://www.mersenneforum.org/showpost.php?p=488535&postcount=2[/URL]

Switching between versions and supporting multiple versions is easy. I have dozens on one system with 2 AMD gpus. I use a separate directory for each, shortcuts to get there, and simple batch files containing the executable name and the usual command line options (this is on Windows 7 or 10 typically). For example, g65.bat for V6.5 is [CODE]gpuowl-win -device 0 -carry short -fft +0 -use ORIG_X2

:dev 0 rx480, 1 rx550
: -carry long -fft +0 -carry short -use FMA_X2 -use ORIG_X2[/CODE]I find it handy to have a reminder in comments there which gpu model is which device number, on each system, especially for 3 or more per system, and to have different options there in comments for fast convenient copy/paste into the command in line one.

kriesel 2019-06-23 19:21

PRP offset branch
 
Don't know how to modify SELROC's make directions in post 1076 to do a git branch such as PRP-offset. (Attempts made, results not pretty.) [URL]https://github.com/preda/gpuowl/tree/prp-offset[/URL] So, I tried building it for Windows after downloading and unzipping a zip file, and editing the makefile a bit to correspond to how I had previously built V3.8, since their commit dates are only days apart:[CODE]$ make openowl-notf
g++ -O2 -DREV=\"ae3be65\" -std=c++14 OpenGpu.cpp NoTF.cpp clwrap.cpp common.cpp gpuowl.cpp -o openowl-notf -lOpenCL -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -static
strip openowl-notf.exe
[/CODE]It compiles. It runs, apparently correctly. But there's no indication of a nonzero offset, in console output, gpuowl.log, help output, or results. [CODE]{"exponent":1398269, "worktype":"PRP-3", "status":"P", "program":{"name":"gpuowl", "version":"3.8-ae3be65-OpenCL"}, "timestamp":"2019-06-23 18:58:54 UTC", "computer":"Ellesmere-36x1266-@28:0.0", "aid":"0", "residue-type":1, "fft-length":"512K", "res64":"0000000000000001", "errors":{"gerbicz":0}}
[/CODE]

henryzz 2019-06-23 20:59

I believe the following should work(untested). It is probably possible to simplify.

[CODE]git clone https://github.com/preda/gpuowl
git fetch --all
git checkout prp-offset[/CODE]

SELROC 2019-07-01 05:46

I don't know yet what Preda thinks about this. I have found an array index warning in gpuOwl. FFT 8K.



[url]https://github.com/preda/gpuowl/issues/56[/url]

Prime95 2019-07-03 20:45

The current version does not output any Gerbicz information in the JSON text. See below. Primenet thus fails to mark the PRP test as "highly reliable". BTW, the test below had several failed Gerbicz checks. The count of such failures would be useful in the JSON output.
[CODE]
{"exponent":"87944903", "worktype":"PRP-3", "status":"C", "program":{"name":"gpuowl", "version":"v6.5-82-g77b45a4"}, "timestamp":"2019-07-03 16:55:14 UTC", "user":"gw2", "computer":"radeon2.2", "aid":"8FAA3EF0B7F73F7029BC6154D749FF2D", "fft-length":5242880, "res64":"d730c1a17c8fcd1e", "residue-type":4}
[/CODE]
For comparison, a prime95 JSON output:

[CODE]{"status":"C", "exponent":85527073, "worktype":"PRP-3", "res64":"59AC64DACB6891E4", "residue-type":1, "res2048":"E77683E0E56D070B43DAD890B2957616AE4A6EA891AC9672365B8D3725A17ADC9E82404B0DDB73D9827F2DA3442BE9D111A230DAB332BF7F120A16127AF22768AC2B7A34EA260A772618F53D7D8645CEE444F63F30D95CB453289B3761C05CC67C736A31B99FB65980B48A36A7BAEAEEA354984B2FD8ABE6D664B7B0ADD2005652E8B207FF2E8673804AB8E1DC27A679C760AC9256070F4BAD18A250E52E4FD17A592534D80EEA858B8E69D000CB32A6455E111D3F11576DD30FECE328DD397EF63121DFA6447EA7BF5091636B289192E7FD858035033133ACA6C0A08DAB00DAAAE8A8162254CCCD0B7B69888D19CE66F1E48C6C9013865F59AC64DACB6891E4", "fft-length":4718592, "shift-count":9773447, "error-code":"00000000", "security-code":"728605AD", "program":{"name":"Prime95", "version":"29.7", "build":1, "port":8}, "timestamp":"2019-06-12 23:46:20", "errors":{"gerbicz":0}, "user":"gw_2", "computer":"h110itx1", "aid":"FAFE04EE26AE5DB345E585E8913E1C75"}[/CODE]


LaurV: edited to wrap code tags around the json files, they created a mess on screen due to long, unterminated lines ("beautify"-ing it in your editor, if you use pn or n++ or else, may help, before posting, so we can see it nicely indented :razz:)

Prime95 2019-07-03 20:47

[QUOTE=SELROC;520435]I don't know yet what Preda thinks about this. I have found an array index warning in gpuOwl. FFT 8K.[/QUOTE]

Since preda has not answered yet, the warning is harmless. I'm sure he'll fix it when he has the time.

kriesel 2019-07-03 22:42

[QUOTE=Prime95;520693]The current version does not output any Gerbicz information in the JSON text. See below. Primenet thus fails to mark the PRP test as "highly reliable". BTW, the test below had several failed Gerbicz checks. The count of such failures would be useful in the JSON output.

{"exponent":"87944903", "worktype":"PRP-3", "status":"C", "program":{"name":"gpuowl", "version":"v6.5-82-g77b45a4"}, "timestamp":"2019-07-03 16:55:14 UTC", "user":"gw2", "computer":"radeon2.2", "aid":"8FAA3EF0B7F73F7029BC6154D749FF2D", "fft-length":5242880, "res64":"d730c1a17c8fcd1e", "residue-type":4}[/QUOTE]Confirmed here (and also the case for v5.0-9c13870 or V4.3). But earlier versions did. For example, V1.9, V3.8 (redacted result, with one EE occurrence on Gerbicz check, so it resumed from an earlier saved residue and repeated the Gerbicz block, successfully on the second attempt) [CODE]2019-03-03 06:58:57 condorella-rx550 {"exponent":83411351, "worktype":"PRP-3", "status":"C", "program":{"name":"gpuowl", "version":"3.8-91c52fa-OpenCL"}, "timestamp":"2019-03-03 12:58:57 UTC", "user":"kriesel", "computer":"condorella-rx550", "aid":"redacted", "residue-type":1, "fft-length":"4608K", "res64":"redacted", "errors":{"gerbicz":1}}[/CODE] and V3.9.

SELROC 2019-07-05 07:27

Note: I have a script that quickly recovers after a power loss.


[url]https://github.com/valeriob01/Mersenne-gpu-computing-node/commit/e90de7d656c60ddbb9eac294977ef4ec01174485[/url]

SELROC 2019-07-08 04:26

Current gpuowl typical performance numbers for 89M exponents:


RX580: 3849 us/sq - ETA 4d 2h
Vega64: 2010 us/sq - ETA 2d
RadeonVII: 910 us/sq- ETA 22h 35m

GP2 2019-07-08 16:50

mprime can do k*b^n+c

How feasible would it be, in principle, to adapt gpuOwL to be more flexible? Wagstaff in particular: (2^p+1)/3

paulunderwood 2019-07-08 20:10

[QUOTE=GP2;521019]mprime can do k*b^n+c

How feasible would it be, in principle, to adapt gpuOwL to be more flexible? Wagstaff in particular: (2^p+1)/3[/QUOTE]

:goodposting:

Working mod 2^p+1 is almost as easy as 2^p-1. Then a final division by 3 to get mod (2^p+1)/3. E.g:

[code]
p=127;Mod(3,2^p+1)^2^p
Mod(9, 170141183460469231731687303715884105729)
[/code]

Gerbicz error checking can be done too,


All times are UTC. The time now is 23:14.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.