mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

SELROC 2019-01-24 09:42

Does gpuowl accept cofactor work ?


PRP cofactor work type 160
PRP cofactor DC type 161

preda 2019-01-24 09:55

[QUOTE=SELROC;506750]Does gpuowl accept cofactor work ?


PRP cofactor work type 160
PRP cofactor DC type 161[/QUOTE]

No, not now. I guess it could be added, but I don't know exactly how a cofactor test works, so I don't know how much work that'd be.

SELROC 2019-01-24 10:00

[QUOTE=preda;506751]No, not now. I guess it could be added, but I don't know exactly how a cofactor test works, so I don't know how much work that'd be.[/QUOTE]


Right, I have just added a reminder to gpuowl for 100 million digit numbers. Work type 153.

GP2 2019-01-24 18:44

PRP cofactor work should very rarely be necessary anymore.

Instead, just do a single PRP test of the exponent itself, not taking factors into account even if there are known factors, and retaining a large number of bits in the residue (say, 2048).

Then do a Gerbicz cofactor compositeness test for the cofactor. It is much faster than a PRP cofactor test. See [URL="https://mersenneforum.org/showthread.php?t=23462"]the original post[/URL], and here is [URL="http://mprime.s3-website.us-west-1.amazonaws.com/code/gerbicz_prp_cofactor.py"]my Python implementation[/URL] (using the gmpy2 module).

Every time new factors are discovered, thereby creating a new cofactor, you just reuse the original 2048-bit PRP residue and re-run the Gerbicz cofactor compositeness test on the new cofactor. It will either tell you that the cofactor is definitely composite, or that it is a possible probable prime. Only in the latter case do you actually need to run a PRP cofactor test to confirm that it actually is a probable prime. However, for non-tiny exponents the chance of a false positive are very small.

Note: the Gerbicz cofactor-compositeness test is completely different from Gerbicz error checking. It was just invented by the same guy.

preda 2019-01-26 12:53

P-1 speed on Vega64
 
[QUOTE=preda;506748]P-1 in GpuOwl. Good old classic P-1.
[/QUOTE]

As a rough speed indication, for a 90.6M exponent (the "P-1 wavefront"), on my Vega64 it takes about 2h for B1=1M, B2=30M. The time is split about equally between the two stages. The credit for P1 to those bounds is about 13.55GHzDays.

kracker 2019-01-28 03:17

I'm assuming 5b26497 (v6.2) is usable/stable for P-1, or is it still in testing?
Also, do you happen to know the last version that did not require OpenCL 2.x?

kriesel 2019-01-28 05:57

[QUOTE=kracker;506993]Also, do you happen to know the last version that did not require OpenCL 2.x?[/QUOTE]
Why?

some clues it goes way back, at[URL="https://github.com/preda/gpuowl"] https://github.com/preda/gpuowl[/URL]
"use opencl 2.0 atomics in carry fused" Jul 27 2018 dd0f2b2
"dont attempt initial CL2.0 compilation anymore" Jan 22 2018 1aee5cc (V1.9?)
"fix opencl 1.x FGT compilation (missing global)" Nov 8 2017 8c2e6d6 (V1.8 or 1.9 time frame)
"add stupid global to pointers everywhere to make it compilable in cl 1.2" Sep 18 2017 d7930ed
"bump version to 1.0; log and result format minor change; persistent c..." Aug 27 2017 676be1c

preda 2019-01-28 07:43

[QUOTE=kracker;506993]I'm assuming 5b26497 (v6.2) is usable/stable for P-1, or is it still in testing?
Also, do you happen to know the last version that did not require OpenCL 2.x?[/QUOTE]

Should be usable, yes, and I hope it's not buggy ("no known bugs" :), you're welcome to try it out. Should be pretty fast too. Let me know what exponent ranges you test, what FFT is selected, and of course if you find any factors :)

You could try it initially with a couple of known factors, ideally in the same exponent range, to verify they're detected properly.

About OpenCL 2.x -- as Ken said, it goes a bit back. The problem with OpenCL 1.x is that the kernel "carryFused" does not work without openCL 2.0 atomics, at least it does not work under ROCm which is a major driver for AMD. So maybe you could use a modern driver, such as ROCm or amdgpu-pro, which both support OpenCL 2.x

SELROC 2019-01-28 12:17

GpuOwl 6.2 just gained -0.33 ms/sq on 5M FFT over the previous version for PRP.

preda 2019-01-28 12:49

[QUOTE=SELROC;507011]GpuOwl 6.2 just gained -0.33 ms/sq on 5M FFT over the previous version for PRP.[/QUOTE]

Side gains from P-1 :)

I'll try to explain what changed.

the FFT transforms that are power-of-two in size (such as 4M) are split (using a schema similar to the "matrix FFT algorithm") into two subtransforms of sizes WIDTH and HEIGHT, such that:
Size = Width * Height, where both W and H are powers of two.

the FFT transforms that are not power-of-two in size (such as 4.6M or 5M) are split into 3 sub-FFTs, of sizes that I call WIDTH, MIDDLE, and HEIGHT, with Size = Width * Middle * Height.

Until now, Middle was one of: 3, 5, 9.

In P-1 I found the need to reduce the Height size, and one way to achieve that was by increasing the Middle size. Thus I changed the possible Middle sizes to one of: 6, 9, 10 (by doubling the 3 and 5).

As a side effect, PRP 5M now uses Middle=10 instead of the previous 5, and it turns out that this results in better performance.

kriesel 2019-01-28 14:01

[QUOTE=preda;507002]Should be usable, yes, and I hope it's not buggy ("no known bugs" :), you're welcome to try it out. Should be pretty fast too. Let me know what exponent ranges you test, what FFT is selected, and of course if you find any factors :)

You could try it initially with a couple of known factors, ideally in the same exponent range, to verify they're detected properly.

About OpenCL 2.x -- as Ken said, it goes a bit back. The problem with OpenCL 1.x is that the kernel "carryFused" does not work without openCL 2.0 atomics, at least it does not work under ROCm which is a major driver for AMD. So maybe you could use a modern driver, such as ROCm or amdgpu-pro, which both support OpenCL 2.x[/QUOTE]
There's an assortment of known factor P-1 verification candidates in post 811.
What's next? P-1 save files would be good to have for higher exponents. 100Mdigit exponents often don't get P-1 currently before primality testing, which is unfortunate. P-1 run times scale similarly to primality testing (p^2+) so may be a full 24 hour day or more for 100M digit. That or higher exponents are a bit long to go without save files.


All times are UTC. The time now is 23:12.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.