![]() |
Does gpuowl accept cofactor work ?
PRP cofactor work type 160 PRP cofactor DC type 161 |
[QUOTE=SELROC;506750]Does gpuowl accept cofactor work ?
PRP cofactor work type 160 PRP cofactor DC type 161[/QUOTE] No, not now. I guess it could be added, but I don't know exactly how a cofactor test works, so I don't know how much work that'd be. |
[QUOTE=preda;506751]No, not now. I guess it could be added, but I don't know exactly how a cofactor test works, so I don't know how much work that'd be.[/QUOTE]
Right, I have just added a reminder to gpuowl for 100 million digit numbers. Work type 153. |
PRP cofactor work should very rarely be necessary anymore.
Instead, just do a single PRP test of the exponent itself, not taking factors into account even if there are known factors, and retaining a large number of bits in the residue (say, 2048). Then do a Gerbicz cofactor compositeness test for the cofactor. It is much faster than a PRP cofactor test. See [URL="https://mersenneforum.org/showthread.php?t=23462"]the original post[/URL], and here is [URL="http://mprime.s3-website.us-west-1.amazonaws.com/code/gerbicz_prp_cofactor.py"]my Python implementation[/URL] (using the gmpy2 module). Every time new factors are discovered, thereby creating a new cofactor, you just reuse the original 2048-bit PRP residue and re-run the Gerbicz cofactor compositeness test on the new cofactor. It will either tell you that the cofactor is definitely composite, or that it is a possible probable prime. Only in the latter case do you actually need to run a PRP cofactor test to confirm that it actually is a probable prime. However, for non-tiny exponents the chance of a false positive are very small. Note: the Gerbicz cofactor-compositeness test is completely different from Gerbicz error checking. It was just invented by the same guy. |
P-1 speed on Vega64
[QUOTE=preda;506748]P-1 in GpuOwl. Good old classic P-1.
[/QUOTE] As a rough speed indication, for a 90.6M exponent (the "P-1 wavefront"), on my Vega64 it takes about 2h for B1=1M, B2=30M. The time is split about equally between the two stages. The credit for P1 to those bounds is about 13.55GHzDays. |
I'm assuming 5b26497 (v6.2) is usable/stable for P-1, or is it still in testing?
Also, do you happen to know the last version that did not require OpenCL 2.x? |
[QUOTE=kracker;506993]Also, do you happen to know the last version that did not require OpenCL 2.x?[/QUOTE]
Why? some clues it goes way back, at[URL="https://github.com/preda/gpuowl"] https://github.com/preda/gpuowl[/URL] "use opencl 2.0 atomics in carry fused" Jul 27 2018 dd0f2b2 "dont attempt initial CL2.0 compilation anymore" Jan 22 2018 1aee5cc (V1.9?) "fix opencl 1.x FGT compilation (missing global)" Nov 8 2017 8c2e6d6 (V1.8 or 1.9 time frame) "add stupid global to pointers everywhere to make it compilable in cl 1.2" Sep 18 2017 d7930ed "bump version to 1.0; log and result format minor change; persistent c..." Aug 27 2017 676be1c |
[QUOTE=kracker;506993]I'm assuming 5b26497 (v6.2) is usable/stable for P-1, or is it still in testing?
Also, do you happen to know the last version that did not require OpenCL 2.x?[/QUOTE] Should be usable, yes, and I hope it's not buggy ("no known bugs" :), you're welcome to try it out. Should be pretty fast too. Let me know what exponent ranges you test, what FFT is selected, and of course if you find any factors :) You could try it initially with a couple of known factors, ideally in the same exponent range, to verify they're detected properly. About OpenCL 2.x -- as Ken said, it goes a bit back. The problem with OpenCL 1.x is that the kernel "carryFused" does not work without openCL 2.0 atomics, at least it does not work under ROCm which is a major driver for AMD. So maybe you could use a modern driver, such as ROCm or amdgpu-pro, which both support OpenCL 2.x |
GpuOwl 6.2 just gained -0.33 ms/sq on 5M FFT over the previous version for PRP.
|
[QUOTE=SELROC;507011]GpuOwl 6.2 just gained -0.33 ms/sq on 5M FFT over the previous version for PRP.[/QUOTE]
Side gains from P-1 :) I'll try to explain what changed. the FFT transforms that are power-of-two in size (such as 4M) are split (using a schema similar to the "matrix FFT algorithm") into two subtransforms of sizes WIDTH and HEIGHT, such that: Size = Width * Height, where both W and H are powers of two. the FFT transforms that are not power-of-two in size (such as 4.6M or 5M) are split into 3 sub-FFTs, of sizes that I call WIDTH, MIDDLE, and HEIGHT, with Size = Width * Middle * Height. Until now, Middle was one of: 3, 5, 9. In P-1 I found the need to reduce the Height size, and one way to achieve that was by increasing the Middle size. Thus I changed the possible Middle sizes to one of: 6, 9, 10 (by doubling the 3 and 5). As a side effect, PRP 5M now uses Middle=10 instead of the previous 5, and it turns out that this results in better performance. |
[QUOTE=preda;507002]Should be usable, yes, and I hope it's not buggy ("no known bugs" :), you're welcome to try it out. Should be pretty fast too. Let me know what exponent ranges you test, what FFT is selected, and of course if you find any factors :)
You could try it initially with a couple of known factors, ideally in the same exponent range, to verify they're detected properly. About OpenCL 2.x -- as Ken said, it goes a bit back. The problem with OpenCL 1.x is that the kernel "carryFused" does not work without openCL 2.0 atomics, at least it does not work under ROCm which is a major driver for AMD. So maybe you could use a modern driver, such as ROCm or amdgpu-pro, which both support OpenCL 2.x[/QUOTE] There's an assortment of known factor P-1 verification candidates in post 811. What's next? P-1 save files would be good to have for higher exponents. 100Mdigit exponents often don't get P-1 currently before primality testing, which is unfortunate. P-1 run times scale similarly to primality testing (p^2+) so may be a full 24 hour day or more for 100M digit. That or higher exponents are a bit long to go without save files. |
| All times are UTC. The time now is 23:12. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.