20200213, 21:04  #1849 
∂^{2}ω=0
Sep 2002
República de California
24662_{8} Posts 
Shouldn't the toosmallFFTsize manifest via excessivefractionalparts (a.k.a. roundoff errors) detected during the roundandcarry step?

20200213, 21:21  #1850 
"mrh"
Oct 2018
Temecula, ca
53 Posts 
Ah, thats good to know. I didn't think of that.

20200213, 22:04  #1851 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3198_{10} Posts 
Are empty worktodo lines (just a newline) not allowed?
Code:
20200213 19:19:40 colab2TeslaT4 {"exponent":"10000831", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11145g6146b6ddirty"}, "timestamp":"20200213 19:19:40 UTC", "user":"kriesel", "computer":"colab2TeslaT4", "fftlength":524288, "B1":30000, "B2":500000, "factors":["646560662529991467527"]} 20200213 19:19:40 colab2TeslaT4 worktodo.txt line ignored: "" terminate called after throwing an instance of 'char const*' Last fiddled with by kriesel on 20200213 at 22:04 
20200214, 07:19  #1852 
"Mihai Preda"
Apr 2015
17·53 Posts 
There is no excessivefractionalparts detection in the roundandcarry. It does have overhead, and for PRP it isn't needed as the GEC provides better cover. This does leave P1 unprotected.

20200214, 17:21  #1853  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2·3·13·41 Posts 
Quote:
Error check possibilities include (in random order):
Are there more P1 error check possibilities? Last fiddled with by kriesel on 20200214 at 18:06 

20200214, 21:44  #1854  
∂^{2}ω=0
Sep 2002
República de California
29B2_{16} Posts 
Quote:
I'm also just beginning to delve into the underlying code here, so you can answer this more quickly: what is the underlying hardware instruction set used by your code, and does it include the needed round() instruction? If so, what is the hardware latency and pipelineability of that, and how do you expect it to compare to the old coders' trick (developed before IEEE floatingpoint standardization and widespread use of dedicated round() instructions) of rnd(x) = (x + c)  c, where c = 0.75*2^[#significand bits in a floating datum] needing just an add and a sub? If the % hit withROEchecking is significant even after choosing the best of the above options, how difficult would it be to deploy a special roundandcarrywithROEchecking routine, which would be invoked only during p1 testing (and any other future modmul sequence for which a Gerbiczstyle check is unavailable)? In order to gauge how many PRP tests the addition of ROE checking here would save, we need some data re. missed p1 factors  perhaps there could be a dedicated nearterm QA effort comparing factors found for a decently large representative set of expos, we could compare factors found by trying each expo twice using the same stage bounds: [1] Using gpuOwl with default settings, i.e. defualt FFT length and no ROE checking; [2] Using gpuOwl with nextlargerthandefault FFT length, or  probably better  Prime95/mprime with default FFT param and same p1 stage bounds as were used in [1]. Or perhaps there are already some stats in hand here, based on earlyDCs of firsttime PRP tests? Or do those PRPDC runs skip the p1 step? 

20200215, 00:19  #1855  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×3×13×41 Posts 
Quote:
Quote:
Last fiddled with by kriesel on 20200215 at 00:20 

20200215, 06:33  #1856 
"Mihai Preda"
Apr 2015
17×53 Posts 
I'm thinking of a way to use GEC with P1 firststage, which would not be a waste if somebody is planning to continue with PRP on the same exponent (if a factor is not found).
The idea is to use righttoleft binary exponentiation, which can use to a large degree the error check. The residue thus computed can be saved and used to start the PRP from this point on. (right now P1 firststate uses lefttoright binary exponentiation, which is more efficient but can't use the error check). https://en.wikipedia.org/wiki/Modular_exponentiation 
20200215, 07:33  #1857 
P90 years forever!
Aug 2002
Yeehaw, FL
2^{4}·3·139 Posts 
@Ernst: The GCN timings doc is here  https://github.com/CLRX/CLRXmirror/wiki/GcnTimings
ROE error checking would be slower but it would be useful debugging option to sanity check FFT length selections. 
20200215, 16:05  #1858 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
6176_{8} Posts 

20200215, 17:25  #1859  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3198_{10} Posts 
Quote:
If the performance hit when applied to P1 stage 1 is not too bad, it would be a tremendous advance, since P1 is currently by its nature quite thin on error checks compared to primality testing. Last fiddled with by kriesel on 20200215 at 17:39 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
mfakto: an OpenCL program for Mersenne prefactoring  Bdot  GPU Computing  1580  20200218 21:11 
GPUOWL AMD Windows OpenCL issues  xx005fs  GPU Computing  0  20190726 21:37 
Primality testing nonMersennes  lukerichards  Software  8  20180124 22:30 
Mersenne trial division implementation  mathPuzzles  Math  8  20170421 07:21 
Testing Mersenne cofactors for primality?  CRGreathouse  Computer Science & Computational Number Theory  18  20130608 19:12 