20190116, 04:12  #1 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
139C_{16} Posts 
Developer's corner
This thread is intended for data and other aids for Mersenne prime finder software testing or development. These may also be useful in assessing hardware reliability.
Lists of 7smooth FFT lengths that are multiples of 2^{10}; http://www.mersenneforum.org/showpos...75&postcount=7 Concepts in GIMPS trial factoring, https://www.mersenneforum.org/showpo...23&postcount=6 Git basics https://www.mersenneforum.org/showpo...postcount=1076 Preda's thread on roundoff error distribution https://mersenneforum.org/showthread...615#post542615 This is a reference content thread. Please place any discussion, suggestion, or comment in the reference discussion thread, not in this thread. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 20210328 at 16:45 
20190116, 04:26  #2 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2^{2}×5×251 Posts 
Mersenne prime exponent LL test worktodo lines (gpuowl <v0.7, CUDALucas and prime95 formats)
No AID, since these have already been found. Do not attempt to submit results for these entries to mersenne.org. It will not accept further results for known Mersenne primes. These are for test purposes only
Gpuowl V0.6 or lower format Code:
#gpuowl LL format Mersenne prime LL worktodo records # for gpu reliability validation or code modification testing # gpuowl LL minimum exponent for 4M fft length is 50000000, max is ~77000000. # other exponents produce error on worktodo read Test=0,57885161,69,1 Test=0,74207281,70,1 Test=0,77232917,70,1 Code:
#Gpuowl v6 or 7 format Mersenne prime PRP worktodo records # for gpu reliability validation or code modification testing # gpuowl minimum exponent is 200k, # smaller exponents produce error "FFT size too large" on worktodo read #PRP=N/A,1,2,2,1,1 #PRP=N/A,1,2,3,1,1 #PRP=N/A,1,2,5,1,2,1 #PRP=N/A,1,2,7,1,3,1 #PRP=N/A,1,2,13,1,6,1 #PRP=N/A,1,2,17,1,8,1 #PRP=N/A,1,2,19,1,9,1 #PRP=N/A,1,2,31,1,15,1 #PRP=N/A,1,2,61,1,30,1 #PRP=N/A,1,2,89,1,40,1 #PRP=N/A,1,2,107,1,40,1 #PRP=N/A,1,2,127,1,40,1 #PRP=N/A,1,2,521,1,40,1 #PRP=N/A,1,2,607,1,40,1 #PRP=N/A,1,2,1279,1,40,1 #PRP=N/A,1,2,2203,1,40,1 #PRP=N/A,1,2,2281,1,40,1 #PRP=N/A,1,2,3217,1,40,1 #PRP=N/A,1,2,4253,1,40,1 #PRP=N/A,1,2,4423,1,40,1 #PRP=N/A,1,2,9689,1,40,1 #PRP=N/A,1,2,9941,1,40,1 #PRP=N/A,1,2,11213,1,40,1 #PRP=N/A,1,2,19937,1,40,1 #PRP=N/A,1,2,21701,1,40,1 #PRP=N/A,1,2,23209,1,40,1 #PRP=N/A,1,2,44497,1,40,1 #PRP=N/A,1,2,86243,1,40,1 #PRP=N/A,1,2,110503,1,40,1 #PRP=N/A,1,2,132049,1,40,1 PRP=N/A,1,2,216091,1,40,1 PRP=N/A,1,2,756839,1,40,1 PRP=N/A,1,2,859433,1,40,1 PRP=N/A,1,2,1257787,1,56,1 PRP=N/A,1,2,1398269,1,56,1 PRP=N/A,1,2,2976221,1,60,1 PRP=N/A,1,2,3021377,1,60,1 PRP=N/A,1,2,6972593,1,63,1 PRP=N/A,1,2,13466917,1,64,1 PRP=N/A,1,2,20996011,1,65,1 PRP=N/A,1,2,24036583,1,66,1 PRP=N/A,1,2,25964951,1,66,1 PRP=N/A,1,2,30402457,1,67,1 PRP=N/A,1,2,32582657,1,67,1 PRP=N/A,1,2,37156667,1,67,1 PRP=N/A,1,2,42643801,1,68,1 PRP=N/A,1,2,43112609,1,68,1 PRP=N/A,1,2,57885161,1,69,1 PRP=N/A,1,2,74207281,1,70,1 PRP=N/A,1,2,77232917,1,70,1 PRP=N/A,1,2,82589933,1,71,1 Code:
#CUDALucas format Mersenne prime LL worktodo records # for gpu reliability validation or code modification testing # CUDALucas minimum exponent is 7500, # smaller exponents produce error "invalid data" on worktodo read #Test=2,1,1 #Test=3,1,1 #Test=5,2,1 #Test=7,3,1 #Test=13,6,1 #Test=17,8,1 #Test=19,9,1 #Test=31,15,1 #Test=61,30,1 #Test=89,40,1 #Test=107,40,1 #Test=127,40,1 #Test=521,40,1 #Test=607,40,1 #Test=1279,40,1 #Test=2203,40,1 #Test=2281,40,1 #Test=3217,40,1 #Test=4253,40,1 #Test=4423,40,1 Test=9689,40,1 Test=9941,40,1 Test=11213,40,1 Test=19937,40,1 Test=21701,40,1 Test=23209,40,1 Test=44497,40,1 Test=86243,40,1 Test=110503,40,1 Test=132049,40,1 Test=216091,40,1 Test=756839,40,1 Test=859433,40,1 Test=1257787,56,1 Test=1398269,56,1 Test=2976221,60,1 Test=3021377,60,1 Test=6972593,63,1 Test=13466917,64,1 Test=20996011,65,1 Test=24036583,66,1 Test=25964951,66,1 Test=30402457,67,1 Test=32582657,67,1 Test=37156667,67,1 Test=42643801,68,1 Test=43112609,68,1 Test=57885161,69,1 Test=74207281,70,1 Test=77232917,70,1 Test=82589933,71,1 Code:
;prime95 format Mersenne prime LL worktodo records ; for cpu reliability validation or code modification testing Test=N/A,2,1,1 Test=N/A,3,1,1 Test=N/A,5,2,1 Test=N/A,7,3,1 Test=N/A,13,6,1 Test=N/A,17,8,1 Test=N/A,19,9,1 Test=N/A,31,15,1 Test=N/A,61,30,1 Test=N/A,89,40,1 Test=N/A,107,40,1 Test=N/A,127,40,1 Test=N/A,521,40,1 Test=N/A,607,40,1 Test=N/A,1279,40,1 Test=N/A,2203,40,1 Test=N/A,2281,40,1 Test=N/A,3217,40,1 Test=N/A,4253,40,1 Test=N/A,4423,40,1 Test=N/A,9689,40,1 Test=N/A,9941,40,1 Test=N/A,11213,40,1 Test=N/A,19937,40,1 Test=N/A,21701,40,1 Test=N/A,23209,40,1 Test=N/A,44497,40,1 Test=N/A,86243,40,1 Test=N/A,110503,40,1 Test=N/A,132049,40,1 Test=N/A,216091,40,1 Test=N/A,756839,40,1 Test=N/A,859433,40,1 Test=N/A,1257787,56,1 Test=N/A,1398269,56,1 Test=N/A,2976221,60,1 Test=N/A,3021377,60,1 Test=N/A,6972593,63,1 Test=N/A,13466917,64,1 Test=N/A,20996011,65,1 Test=N/A,24036583,66,1 Test=N/A,25964951,66,1 Test=N/A,30402457,67,1 Test=N/A,32582657,67,1 Test=N/A,37156667,67,1 Test=N/A,42643801,68,1 Test=N/A,43112609,68,1 Test=N/A,57885161,69,1 Test=N/A,74207281,70,1 Test=N/A,77232917,70,1 Test=N/A,82589933,71,1 Last fiddled with by kriesel on 20201009 at 00:17 Reason: added recent version gpuowl format entries 
20190116, 04:46  #3 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2^{2}·5·251 Posts 
Mersenne prime exponent PRP and PRP1 worktodo lines (gpuowl format)
Code:
#draft qa suite for checking false negatives in PRP, and limits imposed by fft lengths available # or for gpu reliability validation or code modification testing PRP=0,1,2,2,1,1,1 PRP=0,1,2,3,1,1,1 PRP=0,1,2,5,1,2,1 PRP=0,1,2,7,1,3,1 PRP=0,1,2,13,1,6,1 PRP=0,1,2,17,1,8,1 PRP=0,1,2,19,1,9,1 PRP=0,1,2,31,1,15,1 PRP=0,1,2,61,1,30,1 PRP=0,1,2,89,1,40,1 PRP=0,1,2,107,1,40,1 PRP=0,1,2,127,1,40,1 PRP=0,1,2,521,1,40,1 PRP=0,1,2,607,1,40,1 PRP=0,1,2,1279,1,40,1 PRP=0,1,2,2203,1,40,1 PRP=0,1,2,2281,1,40,1 PRP=0,1,2,3217,1,40,1 PRP=0,1,2,4253,1,40,1 PRP=0,1,2,4423,1,40,1 PRP=0,1,2,9689,1,40,1 PRP=0,1,2,9941,1,40,1 PRP=0,1,2,11213,1,40,1 PRP=0,1,2,19937,1,40,1 PRP=0,1,2,21701,1,40,1 PRP=0,1,2,23209,1,40,1 PRP=0,1,2,44497,1,40,1 PRP=0,1,2,86243,1,40,1 PRP=0,1,2,110503,1,0,1 PRP=0,1,2,132049,1,40,1 PRP=0,1,2,216091,1,40,1 PRP=0,1,2,756839,1,40,1 PRP=0,1,2,859433,1,40,1 PRP=0,1,2,1257787,1,56,1 PRP=0,1,2,1398269,1,56,1 PRP=0,1,2,2976221,1,60,1 PRP=0,1,2,3021377,1,60,1 PRP=0,1,2,6972593,1,63,1 PRP=0,1,2,13466917,1,64,1 PRP=0,1,2,20996011,1,65,1 PRP=0,1,2,24036583,1,66,1 PRP=0,1,2,25964951,1,66,1 PRP=0,1,2,30402457,1,67,1 PRP=0,1,2,32582657,1,67,1 PRP=0,1,2,37156667,1,67,1 PRP=0,1,2,42643801,1,68,1 PRP=0,1,2,43112609,1,68,1 PRP=0,1,2,57885161,1,69,1 PRP=0,1,2,74207281,1,70,1 PRP=0,1,2,77232917,1,70,1 PRP=0,1,2,82589933,1,71,1 Code:
#draft qa suite for checking false negatives in PRP1 #bounds listed are greater of primenet or gpu72 indicated at mersenne.ca #tf are gpu72 target indicated at mersenne.ca #following block that are commented out are less than 1.5 bits/word at #gpuowl 128k fft length minimum #B1=1,B2=25;PRP=0,1,2,107,1,44,2 #B1=1,B2=25;PRP=0,1,2,127,1,44,2 #B1=6,B2=150;PRP=0,1,2,521,1,44,2 #B1=7,B2=175;PRP=0,1,2,607,1,44,2 #B1=14,B2=350;PRP=0,1,2,1279,1,44,2 #B1=25,B2=625;PRP=0,1,2,2203,1,44,2 #B1=26,B2=650;PRP=0,1,2,2281,1,44,2 #B1=26,B2=650;PRP=0,1,2,3217,1,44,2 #B1=48,B2=1200;PRP=0,1,2,4253,1,44,2 #B1=49,B2=1225;PRP=0,1,2,4423,1,44,2 #B1=108,B2=2700;PRP=0,1,2,9689,1,44,2 #B1=111,B2=2775;PRP=0,1,2,9941,1,44,2 #B1=125,B2=3125;PRP=0,1,2,11213,1,44,2 #B1=223,B2=5575;PRP=0,1,2,19937,1,44,2 #B1=243,B2=6075;PRP=0,1,2,21701,1,44,2 #B1=260,B2=6500;PRP=0,1,2,23209,1,44,2 #B1=498,B2=12450;PRP=0,1,2,44497,1,44,2 #B1=964,B2=24100;PRP=0,1,2,86243,1,44,2 #B1=10000,B2=60000;PRP=0,1,2,110503,1,44,2 #B1=10000,B2=70000;PRP=0,1,2,132049,1,44,2 #following are usable at 128k fft length B1=10000,B2=130000;PRP=0,1,2,216091,1,44,2 B1=20000,B2=500000;PRP=0,1,2,756839,1,44,2 B1=20000,B2=580000;PRP=0,1,2,859433,1,44,2 #following seem suitable for gpuowl v3.x5.0 supported fft lengths B1=20000,B2=240000;PRP=0,1,2,1257787,1,60,2 B1=20000,B2=300000;PRP=0,1,2,1398269,1,60,2 B1=40000,B2=560000;PRP=0,1,2,2976221,1,64,2 B1=40000,B2=560000;PRP=0,1,2,3021377,1,64,2 B1=80000,B2=1440000;PRP=0,1,2,6972593,1,67,2 B1=150000,B2=3150000;PRP=0,1,2,13466917,1,69,2 B1=260000,B2=6500000;PRP=0,1,2,20996011,1,69,2 B1=280000,B2=7000000;PRP=0,1,2,24036583,1,70,2 B1=310000,B2=7750000;PRP=0,1,2,25964951,1,70,2 B1=350000,B2=8750000;PRP=0,1,2,30402457,1,71,2 B1=380000,B2=9500000;PRP=0,1,2,32582657,1,71,2 B1=430000,B2=11610000;PRP=0,1,2,37156667,1,71,2 B1=480000,B2=12960000;PRP=0,1,2,42643801,1,72,2 B1=490000,B2=13230000;PRP=0,1,2,43112609,1,72,2 B1=650000,B2=17550000;PRP=0,1,2,57885161,1,73,2 B1=810000,B2=13860000;PRP=0,1,2,74207281,1,74,2 B1=840000,B2=23520000;PRP=0,1,2,77232917,1,74,2 B1=870000,B2=23490000;PRP=0,1,2,82589933,1,75,2 Last fiddled with by kriesel on 20191219 at 16:19 
20190117, 20:09  #4 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
1001110011100_{2} Posts 
Interim 64bit residues for LL sequences of known Mersenne prime exponents
For comparison to the output of any new or revised software in development, the attachment contains interim res64 values for selected iteration numbers of LucasLehmer (seed 4) sequences for a wide range of known Mersenne primes. These can also be used to test hardware reliability and OS/driver/application/dlls combinations for correct function.
See also: https://www.mersenneforum.org/showpo...&postcount=179 p=82589933 interim residues list every 5M https://www.mersenneforum.org/showpo...&postcount=192 ATH log files from cudalucas and mlucas verification runs for 82589933 Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 20191119 at 16:04 
20190117, 20:09  #5 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2^{2}×5×251 Posts 
Interim 64bit residues for PRP3 sequences of known Mersenne prime exponents
For comparison to the output of any new or revised software in development, the attachment contains interim res64 values for selected iteration numbers of PRP3 sequences for a wide range of known Mersenne primes. Can also be used to test hardware reliability and OS/driver/application/dlls combinations for correct function.
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 20191119 at 16:04 
20190120, 21:06  #6 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2^{2}·5·251 Posts 
Seed value ten LL series final residues
The Lucas Lehmer series in GIMPS usage is standardized at seed value (S_{0}) 4. Other values, such as ten, are also usable. There's a reference tabulation for various Mersenne primes of the final iterations' residues for seed 10 at http://www.hoegge.dk/mersenne/penult...sultsS0=10.txt These other seed values are to be avoided, for normal GIMPS use, in my opinion, because the math makes interim residues not comparable. A list of seed values usable in Lucas Lehmer sequences is available at https://oeis.org/A018844
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 20191119 at 16:05 
20190313, 18:34  #7 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2^{2}×5×251 Posts 
How fast can we multiply two integers, or square an integer?
Central to the work at the current wavefronts of GIMPS, or higher exponent, is fast math with large numbers requiring many words to contain the large number of bits involved.
This applies to primality testing by LL or probable prime test, P1 factoring, and to a lesser extent, trial factoring. How fast, or rather, to make it relatively independent of computing hardware speed, how efficiently can we multiply two integers, or square a large integer? Most of the following addresses the general case of two different operands multiplied. It also mostly omits special cases such as one operand is a small power of two, which can often be accomplished quickly by shifts. George Woltman gives an overview with computing times for early GIMPS hardware in his Colloquium available on Youtube. Long multiplication Beginning with the simplest method, long multiplication, which is most efficient for small multiword integers, such as occur in typical trial factoring, we multiply each digit, and add those partial products together. We select a digit size that fits the computing hardware well. If programming an online calculator, where computations are short and frequent conversion to and from base 10 would be inefficient, one might select a power of ten as the digit size. Typically for a GIMPS application we will use a power of two, for efficiency, since many computations are done between base conversions. On modern computing hardware we might select 2^{32} or 2^{64} as the digit size. (Because 2^{32} is usually faster on gpus, mfaktc uses it. Mfaktc goes up only to 95bit factors.) For fourdigit width, it might look something like the following. Let the operands be abcd and efgh, with each letter representing the value of the respective large digit. Here a is the most significant digit, and d the least. Code:
a b c d x e f g h  ah bh ch dh ag bg cg dg af bf cf df ae be ce de  They are shown in this pattern for correspondence with the operand digits, but the alignment for summing is not correct above. Digit by digit, with alignment, the partial products are Code:
bhhi bhlo dhhi dhlo ahhi ahlo chhi chlo bghi bglo dghi dglo aghi aglo cghi cglo bfhi bflo dfhi dflo afhi aflo cfhi cflo behi belo dehi delo aehi aelo cehi celo  Generally, the effort goes up as the square of the operand length, for a fixed digit size. Squaring a number is less effort than general multiplication in this method. Code:
a b c d x a b c d  ad bd cd dd ac bc cc DC ab bb CB DB aa BA CA DA  In general, (n1)n/2 multiplications can be avoided in long squaring relative to long multiplication. I think of this as folding the square in half diagonally, and needing to compute products for the diagonal terms and only one of the two offdiagonal triangles. As n becomes large, this savings approaches half the digit multiplications. The additions must still occur. It may be more efficient to shift the partial product left by one bit and add once with carry than to do two adds. As the number of digits increases, the time to do multiplication or squaring increases rapidly. Above a certain point, which is generally hundreds of bits, it is faster to use a different, more complex method. Karatsuba The least complex of these alternate methods was discovered by Karatsuba in 1960 and published in 1962. http://www.ccas.ru/personal/karatsuba/divcen.pdf (Before then it was conjectured the lower bound was O(n^{2}).) It uses an algebraic identity to accomplish a 2ndigit product of ndigit operands in 3n^{2}/4 partial products rather than n^{2}, at the cost of some additional additions and subtractions. For sufficiently large operands, it is worthwhile to use the Karatsuba method recursively. Recursion reduces the order of multiplication from O(n^{2}) to O(n^{log2(3)})= O(n^{1.585}) https://en.wikipedia.org/wiki/Karatsuba_algorithm A rather readable description of long multiplication and Karatsuba algorithm is http://people.mpiinf.mpg.de/~mehlho...apter2Aen.pdf At values corresponding to the current GIMPS first or second primality test wavefront, n~85M or ~47M, or above, the ratio of number of multiplications is considerable (thousands). See the attachment. ToomCook There is a generalization of the Karatsuba approach, to splitting the operands into k pieces, where k=2 for Karatsuba. It was discovered by Toom in 1963, and refined by Cook in 1966. So it is now called ToomCook. For k=3, it is O(n^{1.465}). Due to higher overhead, it is efficient beginning at somewhat higher operand lengths than Karatsuba. It's also possible to split the two operands differently, to k and l pieces, as when one is significantly larger. SchönhageStrassen (FFT) An even faster (lower order) method is the SchönhageStrassen algorithm, developed in 1971. "The runtime bit complexity is, in Big O notation, O ( n ⋅ log n ⋅ log log n ) for two ndigit numbers. The algorithm uses recursive Fast Fourier transforms in rings with 2n+1 elements, a specific type of number theoretic transform." https://en.wikipedia.org/wiki/Sch%C3...ssen_algorithm Note that this algorithm requires zeropadding the operands. For sufficiently large operands, this is faster than ToomCook or other preceding methods of multiplication or squaring. There is a very helpful elaboration on this method early in the thread https://www.mersenneforum.org/showthread.php?t=120 and an Excel example at http://www.gweep.net/%7Eshifty/portf...eet/index.html Some recommended reading later in the thread reference https://books.google.com/books?isbn=3662005514 or http://cnx.org/content/col10550/latest/ Irrational Base Discrete Weighted Transform (IBDWT) An improvement on SchönhageStrassen is Crandall & Fagin's irrational base discrete weighted transform, published 1994. https://www.ams.org/journals/mcom/19...11852441.pdf https://en.wikipedia.org/wiki/Irrati...hted_transform Efficient implementations of this are the basis for the leading current GIMPS primality testing and P1 factoring software packages. The computations are performed modulo the Mersenne number being tested for primality, which is very quick in base two and limits doubling in size of the products, at each of the millions of iterations required. It's common that they are implemented in floating point and sacrifice some provablecorrectness of results for increased performance. In practice, the observed computation effort per iteration in the fastest available GIMPS software is O(n^{~1.1}) Various error checking techniques are employed to obtain acceptable reliability. This is to my knowledge the fastest currently known algorithm for operands of size relevant to primality testing or P1 factoring progress at the GIMPS wavefront and far above. Overall error rates per primality check, which are the result of errors due to hardware, software, numerical, and user effects, are discussed at https://www.mersenneforum.org/showpo...3&postcount=19 Fürer and following work Schönhage & Strassen conjectured a lower bound for multiplication of O(n log n). Fürer published in 2007 an algorithm with O(n log n 2^{O(log* n)}) where log* is the iteration logarithm (number of times logarithm is applied to reach a number less than or equal to one).https://en.wikipedia.org/wiki/F%C3%BCrer%27s_algorithm Subsequently, others have lowered or better defined the order, with modifications to the Fürer algorithm or by finding other algorithms; 2008 De et al (DKSS algorithm) https://dl.acm.org/citation.cfm?id=2...dl=ACM&coll=DL; 2014 Harvey, van der Hoeven and Lecerf (https://arxiv.org/abs/1407.3360); 2015 Covanov and Thome (modified Fürer); 2016 Covanov and Thome O(n log n 2^{2log* n}) via a generalization of Fermat primes https://arxiv.org/abs/1502.02800; 2018 Harvey and van der Hoeven O(n log n 2^{2log* n}) and does not depend on unproven conjectures https://arxiv.org/abs/1802.07932; None of these developments from 2007 onward have yet bested the IBDWT Crandall approach, in the range of number sizes of interest to the GIMPS effort, or even considerably larger. Work continues, to improve the proportionality constant or order. Combining techniques; symmetry The various techniques can be combined in extendedprecision math software libraries, and the technique fastest for a given operand size used. So far the description here has been in regard to symmetric multiplication (operands with essentially equal length). Long multiplication, ToomCook, and perhaps others support asymmetry. Asymmetry is not known to me to be of advantage in GIMPS primality testing, but it may be in P1 factoring. The switching point between algorithms depends on the processor type and implementation. It can get rather complicated, as shown in GMP's threshold table https://gmplib.org/devel/thres/ and the color diagrams near the bottom of https://gmplib.org/devel/ that are processorspecific. It's worth noting those diagrams cover l<7600 or less, where l is "limbs", or words of size 2^{k}, typically 32 or 64 bits, not the number of bits. https://gmplib.org/manual/Nomenclatu...tml#indexLimb The diagonals of those diagrams would represent the symmetric operand cases. On the diagonals, we see transitions among 5 algorithms (schoolbook = conventional long multiplication, Toom22 = Karatsuba, Toom33, Toom44, fft mod2^{k}+1) with increasing l, at differing l depending on processor model. In all cases shown, schoolbook appears fastest for sizes involved in TF (which is ~<96 bits or <3 32bit limbs), and FFT fastest for sizes above 5000 limbs, so presumably throughout GIMPS double checking wavefront and upwards beyond mersenne.org maximum (10^{9} bits) or Mlucas maximum (2^{32} bits). Practical implementation Great care is taken in implementations such as GMP, or GWNUM (underlying prime95 and mprime) and other packages to efficiently use real machine resources such as cache levels, register counts, and memory bandwidth. Tuning the code is processor model or model family specific, because the different models' designs present differing constraints and tradeoffs. Further reading Even faster integer multiplication thread https://www.mersenneforum.org/showthread.php?t=19486 Ernst Mayer's article in ODROID magazine https://magazine.odroid.com/article/...ticalhistory/ Please PM me with any suggestions or corrections for this post, or use the reference discussion thread at https://www.mersenneforum.org/showthread.php?t=23383 Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 20210215 at 23:06 Reason: added colloquium link sentence 
20190314, 00:45  #8 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2^{2}×5×251 Posts 
PRP residue types
Woltman describes and discusses PRP residue types 1 through 5 in https://www.mersenneforum.org/showpo...&postcount=209
Quoting in part: "Here is my PRP residue proposal: There are (at least) 5 PRP residue types: 1; Fermat PRP. N is PRP if a^(N1) = 1 mod N 2: SPRP variant, N is PRP if a^((N1)/2) = +/1 mod N 3: Fermat PRP variant. N is PRP if a^(N+1) = a^2 mod N 4: SPRP variant. N is PRP if a^((N+1)/2) = +/a mod N 5: Fermat PRP cofactor variant, N/d is PRP if a^(N1) = a^d mod N/d I encourage programs to return type 1 residues as that has been the standard for prime95, PFGW, LLR for many years." Note there is also a type 0, associated with gpuowl PRP1 runs (combined PRP and P1), involving computing powers of a base other than 3, mod Mp. https://www.mersenneforum.org/showpo...postcount=1005 https://www.mersenneforum.org/showpo...postcount=1006 Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 20191119 at 16:06 
20190422, 20:01  #9 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
1001110011100_{2} Posts 
The Gerbicz error check for PRP, and on LL and P1 with restrictions
The Gerbicz error check (GEC) is a very reliable check on the correctness of blocks of PRP test iterations. While it is very good, it is not an absolute guarantee of no error. It's possible that the error check fails, in the presence of certain software bugs, or that some code outside the span that the GEC protects has an error.
So far, the GEC has been implemented in GIMPS software only in prime95 and mprime PRP since v29.4, gpuowl PRP since V0.7, and Mlucas PRP since v19. It's only currently applicable to PRP, not to LL, TF, or to conventional P1 factoring. (Applicability to conventional lefttoright exponentiation as in usual P1 stage 1 or in LL requires a known factor. See https://www.mersenneforum.org/showpo...&postcount=252) Recently Preda has posted that changing P1 stage 1 to righttoleft exponentiation would allow use of the GEC there. https://mersenneforum.org/showpost.p...postcount=1856 There's no single well organized thread as a repository for GEC knowledge, as far as I know. Some threads & other sources contain more than others. Here's what I was able to find. I think many of these are well cross linked, so, finding one, may lead a diligent reader to many of them. For gpuowl initial implementation see post 186 of http://www.mersenneforum.org/showthread.php?t=22204 and a considerable amount following. The prime95 whatsnew.txt indicates addition at v29.4: "3) PRP tests now support a type of low overhead error checking that almost guarantees correct results even on flaky hardware. We call this Gerbicz errorchecking after it was proposed by Robert Gerbicz at mersenneforum.org. This errorcheck only works for base2 numbers." Parts of the 29.4 thread describe it. https://www.mersenneforum.org/showthread.php?t=22683 Overhead for the GEC is given as 0.2% (and this is somewhat dependent on the choice of Gerbicz check block size). This very small overhead is worthwhile because of its high effectiveness at detecting errors and allowing immediate retreat to the last known good condition. https://www.mersenneforum.org/showpo...46&postcount=6 includes a section on Gerbicz the man, with links. https://www.mersenneforum.org/showpo...1&postcount=88 is his original post about the error check, in the thread Mihai had launched, and there's considerable discussion about it following. https://mersenneforum.org/showthread.php?t=22510 is background regarding its earlier applicability to Proth/Pepin tests, and a bit re P1. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 20200224 at 22:06 Reason: updated for Mlucas PRP/GEC at V19, possible further use of GEC 
20190507, 21:12  #10  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2^{2}×5×251 Posts 
The Jacobi symbol check for LL
The Jacobi symbol check for the LucasLehmer sequence was discovered by forum user "error" and posted 20170807 at https://mersenneforum.org/showpost.p...3&postcount=30
Its announcement led to the request in the forum for additional error checks, and rather directly to the Gerbicz check for PRP. Some programs use the Jacobi test periodically which detects LL computation errors at 50% probability. To my knowlege, only prime95/mprime and some of the early LLbased versions of gpuOwL incorporated the Jacobi test so far. Quote:
It appears the Jacobi check is applicable also to some portions of the P1 factoring computations, although its application there is more complex, so it is less likely to be fast enough to be useful. As far as I know, this has never been implemented in production GIMPS P1 code, until gpuowl V7. Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 20201022 at 22:53 Reason: gpuowl V7 use of Jacobi symbol for P1 

20190509, 00:28  #11 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2^{2}×5×251 Posts 
CUDA Toolkit compatibility vs CUDA level
CUDA toolkit levels and support, requirements for OS, driver, etc grid attached.
Note that a given version of Visual Studio can be configured to generate code for differing OS than the OS the VS runs on at the development system. (For example, XP compatible executables can be produced from a Windows 7 development system.) Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1 Last fiddled with by kriesel on 20191119 at 16:07 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Windows 10 SP1 will have UBUNTU developer support !!!!  tServo  Software  19  20160423 21:30 
Cheesehead's Corner?  jasong  jasong  6  20131016 20:09 
Intel Xeon Phi  Knights Corner  BotXXX  Hardware  16  20120621 23:54 
Debian developer needed...  Xyzzy  Linux  5  20060601 14:56 