![]() |
![]() |
#1849 |
∂2ω=0
Sep 2002
República de California
5×2,351 Posts |
![]()
Shouldn't the too-small-FFT-size manifest via excessive-fractional-parts (a.k.a. roundoff errors) detected during the round-and-carry step?
|
![]() |
![]() |
![]() |
#1850 |
"mrh"
Oct 2018
Temecula, ca
2·32·5 Posts |
![]()
Ah, thats good to know. I didn't think of that.
|
![]() |
![]() |
![]() |
#1851 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
11100110001102 Posts |
![]()
Are empty worktodo lines (just a newline) not allowed?
Code:
2020-02-13 19:19:40 colab2-TeslaT4 {"exponent":"10000831", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-145-g6146b6d-dirty"}, "timestamp":"2020-02-13 19:19:40 UTC", "user":"kriesel", "computer":"colab2-TeslaT4", "fft-length":524288, "B1":30000, "B2":500000, "factors":["646560662529991467527"]} 2020-02-13 19:19:40 colab2-TeslaT4 worktodo.txt line ignored: "" terminate called after throwing an instance of 'char const*' Last fiddled with by kriesel on 2020-02-13 at 22:04 |
![]() |
![]() |
![]() |
#1852 |
"Mihai Preda"
Apr 2015
22·192 Posts |
![]()
There is no excessive-fractional-parts detection in the round-and-carry. It does have overhead, and for PRP it isn't needed as the GEC provides better cover. This does leave P-1 unprotected.
|
![]() |
![]() |
![]() |
#1853 | ||
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×29×127 Posts |
![]() Quote:
Error check possibilities include (in random order):
Are there more P-1 error check possibilities? Last fiddled with by kriesel on 2020-02-14 at 18:06 |
||
![]() |
![]() |
![]() |
#1854 | |
∂2ω=0
Sep 2002
República de California
1175510 Posts |
![]() Quote:
I'm also just beginning to delve into the underlying code here, so you can answer this more quickly: what is the underlying hardware instruction set used by your code, and does it include the needed round() instruction? If so, what is the hardware latency and pipelineability of that, and how do you expect it to compare to the old coders' trick (developed before IEEE floating-point standardization and widespread use of dedicated round() instructions) of rnd(x) = (x + c) - c, where c = 0.75*2^[#significand bits in a floating datum] needing just an add and a sub? If the % hit with-ROE-checking is significant even after choosing the best of the above options, how difficult would it be to deploy a special round-and-carry-with-ROE-checking routine, which would be invoked only during p-1 testing (and any other future modmul sequence for which a Gerbicz-style check is unavailable)? In order to gauge how many PRP tests the addition of ROE checking here would save, we need some data re. missed p-1 factors - perhaps there could be a dedicated near-term QA effort comparing factors found for a decently large representative set of expos, we could compare factors found by trying each expo twice using the same stage bounds: [1] Using gpuOwl with default settings, i.e. defualt FFT length and no ROE checking; [2] Using gpuOwl with next-larger-than-default FFT length, or - probably better - Prime95/mprime with default FFT param and same p-1 stage bounds as were used in [1]. Or perhaps there are already some stats in hand here, based on early-DCs of first-time PRP tests? Or do those PRP-DC runs skip the p-1 step? |
|
![]() |
![]() |
![]() |
#1855 | ||
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
736610 Posts |
![]() Quote:
Quote:
Last fiddled with by kriesel on 2020-02-15 at 00:20 |
||
![]() |
![]() |
![]() |
#1856 |
"Mihai Preda"
Apr 2015
144410 Posts |
![]()
I'm thinking of a way to use GEC with P-1 first-stage, which would not be a waste if somebody is planning to continue with PRP on the same exponent (if a factor is not found).
The idea is to use right-to-left binary exponentiation, which can use to a large degree the error check. The residue thus computed can be saved and used to start the PRP from this point on. (right now P-1 first-state uses left-to-right binary exponentiation, which is more efficient but can't use the error check). https://en.wikipedia.org/wiki/Modular_exponentiation |
![]() |
![]() |
![]() |
#1857 |
P90 years forever!
Aug 2002
Yeehaw, FL
11111110111112 Posts |
![]()
@Ernst: The GCN timings doc is here -- https://github.com/CLRX/CLRX-mirror/wiki/GcnTimings
ROE error checking would be slower but it would be useful debugging option to sanity check FFT length selections. |
![]() |
![]() |
![]() |
#1858 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2·29·127 Posts |
![]() |
![]() |
![]() |
![]() |
#1859 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
1CC616 Posts |
![]() Quote:
If the performance hit when applied to P-1 stage 1 is not too bad, it would be a tremendous advance, since P-1 is currently by its nature quite thin on error checks compared to primality testing. Last fiddled with by kriesel on 2020-02-15 at 17:39 |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1719 | 2023-01-16 15:51 |
GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |