mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   kriesel (https://www.mersenneforum.org/forumdisplay.php?f=154)
-   -   Developer's corner (https://www.mersenneforum.org/showthread.php?t=24003)

kriesel 2019-01-16 04:12

Developer's corner
 
This thread is intended for data and other aids for Mersenne prime finder software testing or development. These may also be useful in assessing hardware reliability.
[LIST=1][*] This post[*]Mersenne prime exponent LL worktodo lines (gpuowl <v0.7, CUDALucas and prime95 formats) [URL]https://www.mersenneforum.org/showpost.php?p=506080&postcount=2[/URL][*]Mersenne prime exponent PRP and PRP-1 worktodo lines (gpuOwL format) [URL]https://www.mersenneforum.org/showpost.php?p=506082&postcount=3[/URL][*]Interim 64-bit residues for LL sequences of known Mersenne prime exponents [URL]https://www.mersenneforum.org/showpost.php?p=506282&postcount=4[/URL][*]Interim 64-bit residues for PRP3 sequences of known Mersenne prime exponents [URL]https://www.mersenneforum.org/showpost.php?p=506283&postcount=5[/URL][*]Seed value ten LL series final residues [URL]https://www.mersenneforum.org/showpost.php?p=506512&postcount=6[/URL][*]How fast can we multiply 2 integers, or square an integer? [URL]https://www.mersenneforum.org/showpost.php?p=510721&postcount=7[/URL][*]PRP residue types [URL]https://www.mersenneforum.org/showpost.php?p=510732&postcount=8[/URL][*]The Gerbicz error check for PRP, and on LL and P-1 with restrictions [URL]https://www.mersenneforum.org/showpost.php?p=514417&postcount=9[/URL][*]The Jacobi check for LL [URL]https://www.mersenneforum.org/showpost.php?p=516081&postcount=10[/URL][*]CUDA Toolkit compatibility vs CUDA level [URL]https://www.mersenneforum.org/showpost.php?p=516181&postcount=11[/URL][*]Interim residues for Mersenne number large exponents [URL]https://www.mersenneforum.org/showpost.php?p=546384&postcount=12[/URL][*]LL with shift [URL]https://www.mersenneforum.org/showpost.php?p=572818&postcount=13[/URL][*]Challenges of large exponents [URL]https://www.mersenneforum.org/showpost.php?p=574636&postcount=14[/URL][*]Proof file format [url]https://www.mersenneforum.org/showpost.php?p=576063&postcount=15[/url][*]etc.[/LIST]See also, in other threads,
Lists of 7-smooth FFT lengths that are multiples of 2[SUP]10[/SUP]; [URL]http://www.mersenneforum.org/showpost.php?p=488475&postcount=7[/URL]
Concepts in GIMPS trial factoring, [URL]https://www.mersenneforum.org/showpost.php?p=508523&postcount=6[/URL]
Git basics [URL]https://www.mersenneforum.org/showpost.php?p=513605&postcount=1076[/URL]
Preda's thread on roundoff error distribution [URL]https://mersenneforum.org/showthread.php?p=542615#post542615[/URL]


This is a reference content thread. Please place any discussion, suggestion, or comment in the reference discussion thread, not in this thread.


Top of reference tree: [URL="https://www.mersenneforum.org/showpost.php?p=521922&postcount=1"]https://www.mersenneforum.org/showpo...22&postcount=1[/URL]

kriesel 2019-01-16 04:26

Mersenne prime exponent LL test worktodo lines (gpuowl <v0.7, CUDALucas and prime95 formats)
 
No AID, since these have already been found. Do not attempt to submit results for these entries to mersenne.org. It will not accept further results for known Mersenne primes. These are for test purposes only

Gpuowl V0.6 or lower format[CODE]#gpuowl LL format Mersenne prime LL worktodo records
# for gpu reliability validation or code modification testing
# gpuowl LL minimum exponent for 4M fft length is 50000000, max is ~77000000.
# other exponents produce error on worktodo read
Test=0,57885161,69,1
Test=0,74207281,70,1
Test=0,77232917,70,1[/CODE]Gpuowl recent version format[CODE]#Gpuowl v6 or 7 format Mersenne prime PRP worktodo records
# for gpu reliability validation or code modification testing
# gpuowl minimum exponent is 200k,
# smaller exponents produce error "FFT size too large" on worktodo read
#PRP=N/A,1,2,2,1,1
#PRP=N/A,1,2,3,1,1
#PRP=N/A,1,2,5,-1,2,1
#PRP=N/A,1,2,7,-1,3,1
#PRP=N/A,1,2,13,-1,6,1
#PRP=N/A,1,2,17,-1,8,1
#PRP=N/A,1,2,19,-1,9,1
#PRP=N/A,1,2,31,-1,15,1
#PRP=N/A,1,2,61,-1,30,1
#PRP=N/A,1,2,89,-1,40,1
#PRP=N/A,1,2,107,-1,40,1
#PRP=N/A,1,2,127,-1,40,1
#PRP=N/A,1,2,521,-1,40,1
#PRP=N/A,1,2,607,-1,40,1
#PRP=N/A,1,2,1279,-1,40,1
#PRP=N/A,1,2,2203,-1,40,1
#PRP=N/A,1,2,2281,-1,40,1
#PRP=N/A,1,2,3217,-1,40,1
#PRP=N/A,1,2,4253,-1,40,1
#PRP=N/A,1,2,4423,-1,40,1
#PRP=N/A,1,2,9689,-1,40,1
#PRP=N/A,1,2,9941,-1,40,1
#PRP=N/A,1,2,11213,-1,40,1
#PRP=N/A,1,2,19937,-1,40,1
#PRP=N/A,1,2,21701,-1,40,1
#PRP=N/A,1,2,23209,-1,40,1
#PRP=N/A,1,2,44497,-1,40,1
#PRP=N/A,1,2,86243,-1,40,1
#PRP=N/A,1,2,110503,-1,40,1
#PRP=N/A,1,2,132049,-1,40,1
PRP=N/A,1,2,216091,-1,40,1
PRP=N/A,1,2,756839,-1,40,1
PRP=N/A,1,2,859433,-1,40,1
PRP=N/A,1,2,1257787,-1,56,1
PRP=N/A,1,2,1398269,-1,56,1
PRP=N/A,1,2,2976221,-1,60,1
PRP=N/A,1,2,3021377,-1,60,1
PRP=N/A,1,2,6972593,-1,63,1
PRP=N/A,1,2,13466917,-1,64,1
PRP=N/A,1,2,20996011,-1,65,1
PRP=N/A,1,2,24036583,-1,66,1
PRP=N/A,1,2,25964951,-1,66,1
PRP=N/A,1,2,30402457,-1,67,1
PRP=N/A,1,2,32582657,-1,67,1
PRP=N/A,1,2,37156667,-1,67,1
PRP=N/A,1,2,42643801,-1,68,1
PRP=N/A,1,2,43112609,-1,68,1
PRP=N/A,1,2,57885161,-1,69,1
PRP=N/A,1,2,74207281,-1,70,1
PRP=N/A,1,2,77232917,-1,70,1
PRP=N/A,1,2,82589933,-1,71,1



[/CODE]CUDALucas format:[CODE]#CUDALucas format Mersenne prime LL worktodo records
# for gpu reliability validation or code modification testing
# CUDALucas minimum exponent is 7500,
# smaller exponents produce error "invalid data" on worktodo read
#Test=2,1,1
#Test=3,1,1
#Test=5,2,1
#Test=7,3,1
#Test=13,6,1
#Test=17,8,1
#Test=19,9,1
#Test=31,15,1
#Test=61,30,1
#Test=89,40,1
#Test=107,40,1
#Test=127,40,1
#Test=521,40,1
#Test=607,40,1
#Test=1279,40,1
#Test=2203,40,1
#Test=2281,40,1
#Test=3217,40,1
#Test=4253,40,1
#Test=4423,40,1
Test=9689,40,1
Test=9941,40,1
Test=11213,40,1
Test=19937,40,1
Test=21701,40,1
Test=23209,40,1
Test=44497,40,1
Test=86243,40,1
Test=110503,40,1
Test=132049,40,1
Test=216091,40,1
Test=756839,40,1
Test=859433,40,1
Test=1257787,56,1
Test=1398269,56,1
Test=2976221,60,1
Test=3021377,60,1
Test=6972593,63,1
Test=13466917,64,1
Test=20996011,65,1
Test=24036583,66,1
Test=25964951,66,1
Test=30402457,67,1
Test=32582657,67,1
Test=37156667,67,1
Test=42643801,68,1
Test=43112609,68,1
Test=57885161,69,1
Test=74207281,70,1
Test=77232917,70,1
Test=82589933,71,1[/CODE]Prime95 format follows
[CODE];prime95 format Mersenne prime LL worktodo records
; for cpu reliability validation or code modification testing
Test=N/A,2,1,1
Test=N/A,3,1,1
Test=N/A,5,2,1
Test=N/A,7,3,1
Test=N/A,13,6,1
Test=N/A,17,8,1
Test=N/A,19,9,1
Test=N/A,31,15,1
Test=N/A,61,30,1
Test=N/A,89,40,1
Test=N/A,107,40,1
Test=N/A,127,40,1
Test=N/A,521,40,1
Test=N/A,607,40,1
Test=N/A,1279,40,1
Test=N/A,2203,40,1
Test=N/A,2281,40,1
Test=N/A,3217,40,1
Test=N/A,4253,40,1
Test=N/A,4423,40,1
Test=N/A,9689,40,1
Test=N/A,9941,40,1
Test=N/A,11213,40,1
Test=N/A,19937,40,1
Test=N/A,21701,40,1
Test=N/A,23209,40,1
Test=N/A,44497,40,1
Test=N/A,86243,40,1
Test=N/A,110503,40,1
Test=N/A,132049,40,1
Test=N/A,216091,40,1
Test=N/A,756839,40,1
Test=N/A,859433,40,1
Test=N/A,1257787,56,1
Test=N/A,1398269,56,1
Test=N/A,2976221,60,1
Test=N/A,3021377,60,1
Test=N/A,6972593,63,1
Test=N/A,13466917,64,1
Test=N/A,20996011,65,1
Test=N/A,24036583,66,1
Test=N/A,25964951,66,1
Test=N/A,30402457,67,1
Test=N/A,32582657,67,1
Test=N/A,37156667,67,1
Test=N/A,42643801,68,1
Test=N/A,43112609,68,1
Test=N/A,57885161,69,1
Test=N/A,74207281,70,1
Test=N/A,77232917,70,1
Test=N/A,82589933,71,1[/CODE]Top of reference tree: [URL]https://www.mersenneforum.org/showpost.php?p=521922&postcount=1[/URL]

kriesel 2019-01-16 04:46

Mersenne prime exponent PRP and PRP-1 worktodo lines (gpuowl format)
 
[CODE]#draft qa suite for checking false negatives in PRP, and limits imposed by fft lengths available
# or for gpu reliability validation or code modification testing
PRP=0,1,2,2,-1,1,1
PRP=0,1,2,3,-1,1,1
PRP=0,1,2,5,-1,2,1
PRP=0,1,2,7,-1,3,1
PRP=0,1,2,13,-1,6,1
PRP=0,1,2,17,-1,8,1
PRP=0,1,2,19,-1,9,1
PRP=0,1,2,31,-1,15,1
PRP=0,1,2,61,-1,30,1
PRP=0,1,2,89,-1,40,1
PRP=0,1,2,107,-1,40,1
PRP=0,1,2,127,-1,40,1
PRP=0,1,2,521,-1,40,1
PRP=0,1,2,607,-1,40,1
PRP=0,1,2,1279,-1,40,1
PRP=0,1,2,2203,-1,40,1
PRP=0,1,2,2281,-1,40,1
PRP=0,1,2,3217,-1,40,1
PRP=0,1,2,4253,-1,40,1
PRP=0,1,2,4423,-1,40,1
PRP=0,1,2,9689,-1,40,1
PRP=0,1,2,9941,-1,40,1
PRP=0,1,2,11213,-1,40,1
PRP=0,1,2,19937,-1,40,1
PRP=0,1,2,21701,-1,40,1
PRP=0,1,2,23209,-1,40,1
PRP=0,1,2,44497,-1,40,1
PRP=0,1,2,86243,-1,40,1
PRP=0,1,2,110503,-1,0,1
PRP=0,1,2,132049,-1,40,1
PRP=0,1,2,216091,-1,40,1
PRP=0,1,2,756839,-1,40,1
PRP=0,1,2,859433,-1,40,1
PRP=0,1,2,1257787,-1,56,1
PRP=0,1,2,1398269,-1,56,1
PRP=0,1,2,2976221,-1,60,1
PRP=0,1,2,3021377,-1,60,1
PRP=0,1,2,6972593,-1,63,1
PRP=0,1,2,13466917,-1,64,1
PRP=0,1,2,20996011,-1,65,1
PRP=0,1,2,24036583,-1,66,1
PRP=0,1,2,25964951,-1,66,1
PRP=0,1,2,30402457,-1,67,1
PRP=0,1,2,32582657,-1,67,1
PRP=0,1,2,37156667,-1,67,1
PRP=0,1,2,42643801,-1,68,1
PRP=0,1,2,43112609,-1,68,1
PRP=0,1,2,57885161,-1,69,1
PRP=0,1,2,74207281,-1,70,1
PRP=0,1,2,77232917,-1,70,1
PRP=0,1,2,82589933,-1,71,1[/CODE][CODE]#draft qa suite for checking false negatives in PRP-1
#bounds listed are greater of primenet or gpu72 indicated at mersenne.ca
#tf are gpu72 target indicated at mersenne.ca

#following block that are commented out are less than 1.5 bits/word at
#gpuowl 128k fft length minimum
#B1=1,B2=25;PRP=0,1,2,107,-1,44,2
#B1=1,B2=25;PRP=0,1,2,127,-1,44,2
#B1=6,B2=150;PRP=0,1,2,521,-1,44,2
#B1=7,B2=175;PRP=0,1,2,607,-1,44,2
#B1=14,B2=350;PRP=0,1,2,1279,-1,44,2
#B1=25,B2=625;PRP=0,1,2,2203,-1,44,2
#B1=26,B2=650;PRP=0,1,2,2281,-1,44,2
#B1=26,B2=650;PRP=0,1,2,3217,-1,44,2
#B1=48,B2=1200;PRP=0,1,2,4253,-1,44,2
#B1=49,B2=1225;PRP=0,1,2,4423,-1,44,2
#B1=108,B2=2700;PRP=0,1,2,9689,-1,44,2
#B1=111,B2=2775;PRP=0,1,2,9941,-1,44,2
#B1=125,B2=3125;PRP=0,1,2,11213,-1,44,2
#B1=223,B2=5575;PRP=0,1,2,19937,-1,44,2
#B1=243,B2=6075;PRP=0,1,2,21701,-1,44,2
#B1=260,B2=6500;PRP=0,1,2,23209,-1,44,2
#B1=498,B2=12450;PRP=0,1,2,44497,-1,44,2
#B1=964,B2=24100;PRP=0,1,2,86243,-1,44,2
#B1=10000,B2=60000;PRP=0,1,2,110503,-1,44,2
#B1=10000,B2=70000;PRP=0,1,2,132049,-1,44,2

#following are usable at 128k fft length
B1=10000,B2=130000;PRP=0,1,2,216091,-1,44,2
B1=20000,B2=500000;PRP=0,1,2,756839,-1,44,2
B1=20000,B2=580000;PRP=0,1,2,859433,-1,44,2

#following seem suitable for gpuowl v3.x-5.0 supported fft lengths
B1=20000,B2=240000;PRP=0,1,2,1257787,-1,60,2
B1=20000,B2=300000;PRP=0,1,2,1398269,-1,60,2
B1=40000,B2=560000;PRP=0,1,2,2976221,-1,64,2
B1=40000,B2=560000;PRP=0,1,2,3021377,-1,64,2
B1=80000,B2=1440000;PRP=0,1,2,6972593,-1,67,2
B1=150000,B2=3150000;PRP=0,1,2,13466917,-1,69,2
B1=260000,B2=6500000;PRP=0,1,2,20996011,-1,69,2
B1=280000,B2=7000000;PRP=0,1,2,24036583,-1,70,2
B1=310000,B2=7750000;PRP=0,1,2,25964951,-1,70,2
B1=350000,B2=8750000;PRP=0,1,2,30402457,-1,71,2
B1=380000,B2=9500000;PRP=0,1,2,32582657,-1,71,2
B1=430000,B2=11610000;PRP=0,1,2,37156667,-1,71,2
B1=480000,B2=12960000;PRP=0,1,2,42643801,-1,72,2
B1=490000,B2=13230000;PRP=0,1,2,43112609,-1,72,2
B1=650000,B2=17550000;PRP=0,1,2,57885161,-1,73,2
B1=810000,B2=13860000;PRP=0,1,2,74207281,-1,74,2
B1=840000,B2=23520000;PRP=0,1,2,77232917,-1,74,2
B1=870000,B2=23490000;PRP=0,1,2,82589933,-1,75,2[/CODE]

Top of reference tree: [URL]https://www.mersenneforum.org/showpost.php?p=521922&postcount=1[/URL]

kriesel 2019-01-17 20:09

Interim 64-bit residues for LL sequences of known Mersenne prime exponents
 
1 Attachment(s)
For comparison to the output of any new or revised software in development, the attachment contains interim res64 values for selected iteration numbers of Lucas-Lehmer (seed 4) sequences for a wide range of known Mersenne primes. These can also be used to test hardware reliability and OS/driver/application/dlls combinations for correct function.

See also:
[URL]https://www.mersenneforum.org/showpost.php?p=503597&postcount=179[/URL]
p=82589933 interim residues list every 5M

[URL]https://www.mersenneforum.org/showpost.php?p=503633&postcount=192[/URL]
ATH log files from cudalucas and mlucas verification runs for 82589933


Top of reference tree: [url]https://www.mersenneforum.org/showpost.php?p=521922&postcount=1[/url]

kriesel 2019-01-17 20:09

Interim 64-bit residues for PRP3 sequences of known Mersenne prime exponents
 
1 Attachment(s)
For comparison to the output of any new or revised software in development, the attachment contains interim res64 values for selected iteration numbers of PRP3 sequences for a wide range of known Mersenne primes. Can also be used to test hardware reliability and OS/driver/application/dlls combinations for correct function.


Top of reference tree: [url]https://www.mersenneforum.org/showpost.php?p=521922&postcount=1[/url]

kriesel 2019-01-20 21:06

Seed value ten LL series final residues
 
The Lucas Lehmer series in GIMPS usage is standardized at seed value (S[SUB]0[/SUB]) 4. Other values, such as ten, are also usable. There's a reference tabulation for various Mersenne primes of the final iterations' residues for seed 10 at [URL]http://www.hoegge.dk/mersenne/penultimateresultsS0=10.txt[/URL] These other seed values are to be avoided, for normal GIMPS use, in my opinion, because the math makes interim residues not comparable. A list of seed values usable in Lucas Lehmer sequences is available at [URL]https://oeis.org/A018844[/URL]


Top of reference tree: [url]https://www.mersenneforum.org/showpost.php?p=521922&postcount=1[/url]

kriesel 2019-03-13 18:34

How fast can we multiply two integers, or square an integer?
 
1 Attachment(s)
Central to the work at the current wavefronts of GIMPS, or higher exponent, is fast math with large numbers requiring many words to contain the large number of bits involved.
This applies to primality testing by LL or probable prime test, P-1 factoring, and to a lesser extent, trial factoring.
How fast, or rather, to make it relatively independent of computing hardware speed, how efficiently can we multiply two integers, or square a large integer?

Most of the following addresses the general case of two different operands multiplied. It also mostly omits special cases such as one operand is a small power of two, which can often be accomplished quickly by shifts.
George Woltman gives an overview with computing times for early GIMPS hardware in his Colloquium available [URL="https://www.youtube.com/watch?v=K962u9gpARY"]on Youtube[/URL].

[B]Long multiplication[/B]
Beginning with the simplest method, long multiplication, which is most efficient for small multiword integers, such as occur in typical trial factoring, we multiply each digit, and add those partial products together. We select a digit size that fits the computing hardware well.
If programming an online calculator, where computations are short and frequent conversion to and from base 10 would be inefficient, one might select a power of ten as the digit size. Typically for a GIMPS application we will use a power of two, for efficiency, since many computations are done between base conversions. On modern computing hardware we might select 2[SUP]32[/SUP] or 2[SUP]64[/SUP] as the digit size. (Because 2[SUP]32[/SUP] is usually faster on gpus, mfaktc uses it. Mfaktc goes up only to 95-bit factors.) For four-digit width, it might look something like the following.
Let the operands be abcd and efgh, with each letter representing the value of the respective large digit.
Here a is the most significant digit, and d the least.
[CODE] a b c d
x e f g h
----------------
ah bh ch dh
ag bg cg dg
af bf cf df
ae be ce de
--------------------[/CODE]Each of those partial products may be two digits wide. There are 16 of them (4[SUP]2[/SUP]).
They are shown in this pattern for correspondence with the operand digits, but the alignment for summing is not correct above.
Digit by digit, with alignment, the partial products are
[CODE]
bhhi bhlo dhhi dhlo
ahhi ahlo chhi chlo
bghi bglo dghi dglo
aghi aglo cghi cglo
bfhi bflo dfhi dflo
afhi aflo cfhi cflo
behi belo dehi delo
aehi aelo cehi celo
---------------------------------------
[/CODE]There are also 24 single-digit additions, with carries.
Generally, the effort goes up as the square of the operand length, for a fixed digit size.

Squaring a number is less effort than general multiplication in this method.
[CODE]
a b c d
x a b c d
-----------------
ad bd cd dd
ac bc cc DC
ab bb CB DB
aa BA CA DA
-----------------
[/CODE]Since ad=da, ab=ba, ac=ca, cb=bc, db=bd, cd=dc, it's not necessary to compute the 6 partial products shown in upper case above.
In general, (n-1)n/2 multiplications can be avoided in long squaring relative to long multiplication. I think of this as folding the square in half diagonally, and needing to compute products for the diagonal terms and only one of the two off-diagonal triangles.
As n becomes large, this savings approaches half the digit multiplications. The additions must still occur.
It may be more efficient to shift the partial product left by one bit and add once with carry than to do two adds.

As the number of digits increases, the time to do multiplication or squaring increases rapidly.
Above a certain point, which is generally hundreds of bits, it is faster to use a different, more complex method.

[B]Karatsuba[/B]
The least complex of these alternate methods was discovered by Karatsuba in 1960 and published in 1962. [URL]http://www.ccas.ru/personal/karatsuba/divcen.pdf[/URL] (Before then it was conjectured the lower bound was O(n[SUP]2[/SUP]).)
It uses an algebraic identity to accomplish a 2n-digit product of n-digit operands in 3n[SUP]2[/SUP]/4 partial products rather than n[SUP]2[/SUP], at the cost of some additional additions and subtractions.
For sufficiently large operands, it is worthwhile to use the Karatsuba method recursively.
Recursion reduces the order of multiplication from O(n[SUP]2[/SUP]) to O(n[SUP]log2(3)[/SUP])= O(n[SUP]1.585[/SUP]) [URL]https://en.wikipedia.org/wiki/Karatsuba_algorithm[/URL]
A rather readable description of long multiplication and Karatsuba algorithm is [URL]http://people.mpi-inf.mpg.de/~mehlhorn/ftp/chapter2A-en.pdf[/URL]
At values corresponding to the current GIMPS first or second primality test wavefront, n~85M or ~47M, or above, the ratio of number of multiplications is considerable (thousands). See the attachment.

[B]Toom-Cook[/B]
There is a generalization of the Karatsuba approach, to splitting the operands into k pieces, where k=2 for Karatsuba.
It was discovered by Toom in 1963, and refined by Cook in 1966. So it is now called Toom-Cook.
For k=3, it is O(n[SUP]1.465[/SUP]). Due to higher overhead, it is efficient beginning at somewhat higher operand lengths than Karatsuba. It's also possible to split the two operands differently, to k and l pieces, as when one is significantly larger.

[B]Schönhage-Strassen (FFT)[/B]
An even faster (lower order) method is the Schönhage-Strassen algorithm, developed in 1971.
"The run-time bit complexity is, in Big O notation, O ( n ⋅ log ⁡ n ⋅ log ⁡ log ⁡ n ) for two n-digit numbers.
The algorithm uses recursive Fast Fourier transforms in rings with 2n+1 elements, a specific type of number theoretic transform."
[URL]https://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm[/URL]
Note that this algorithm requires zero-padding the operands. For sufficiently large operands, this is faster than Toom-Cook or other preceding methods of multiplication or squaring.
There is a very helpful elaboration on this method early in the thread [URL]https://www.mersenneforum.org/showthread.php?t=120[/URL]
and an Excel example at [URL]http://www.gweep.net/%7Eshifty/portfolio/fftmulspreadsheet/index.html[/URL]

Some recommended reading later in the thread reference [URL]https://books.google.com/books?isbn=3662005514[/URL] or [URL]http://cnx.org/content/col10550/latest/[/URL]

[B]Irrational Base Discrete Weighted Transform (IBDWT)[/B]
An improvement on Schönhage-Strassen is Crandall & Fagin's irrational base discrete weighted transform, published 1994.
[URL]https://www.ams.org/journals/mcom/1994-62-205/S0025-5718-1994-1185244-1/S0025-5718-1994-1185244-1.pdf[/URL]
[URL]https://en.wikipedia.org/wiki/Irrational_base_discrete_weighted_transform[/URL]
Efficient implementations of this are the basis for the leading current GIMPS primality testing and P-1 factoring software packages.
The computations are performed modulo the Mersenne number being tested for primality, which is very quick in base two and limits doubling in size of the products, at each of the millions of iterations required.
It's common that they are implemented in floating point and sacrifice some provable-correctness of results for increased performance. In practice, the observed computation effort per iteration in the fastest available GIMPS software is O(n[SUP]~1.1[/SUP])
Various error checking techniques are employed to obtain acceptable reliability. This is to my knowledge the fastest currently known algorithm for operands of size relevant to primality testing or P-1 factoring progress at the GIMPS wavefront and far above.
Overall error rates per primality check, which are the result of errors due to hardware, software, numerical, and user effects, are discussed at
[URL]https://www.mersenneforum.org/showpost.php?p=509283&postcount=19[/URL]

[B]Fürer and following work[/B]
Schönhage & Strassen conjectured a lower bound for multiplication of O(n log n).
Fürer published in 2007 an algorithm with O(n log n 2[SUP]O(log* n)[/SUP]) where log* is the iteration logarithm (number of times logarithm is applied to reach a number less than or equal to one).[URL]https://en.wikipedia.org/wiki/F%C3%BCrer%27s_algorithm[/URL]
Subsequently, others have lowered or better defined the order, with modifications to the Fürer algorithm or by finding other algorithms;
2008 De et al (DKSS algorithm) [URL]https://dl.acm.org/citation.cfm?id=2756643&dl=ACM&coll=DL[/URL];
2014 Harvey, van der Hoeven and Lecerf ([URL="https://arxiv.org/abs/1407.3360);"]https://arxiv.org/abs/1407.3360[/URL]);
2015 Covanov and Thome (modified Fürer);
2016 Covanov and Thome O(n log n 2[SUP]2log* n[/SUP]) via a generalization of Fermat primes [URL]https://arxiv.org/abs/1502.02800;[/URL]
2018 Harvey and van der Hoeven O(n log n 2[SUP]2log* n[/SUP]) and does not depend on unproven conjectures [URL="https://arxiv.org/abs/1802.07932;"]https://arxiv.org/abs/1802.07932[/URL];
2019 Harvey and van der Hoeven O(n log n) "We present an algorithm that computes the product of two n-bit integers in O(n log n) bit operations, thus confirming a conjecture of Schönhage and Strassen from 1971. Our complexity analysis takes place in the multitape Turing machine model, with integers encoded in the usual binary representation. Central to the new algorithm is a novel “Gaussian resampling” technique that enables us to reduce the integer multiplication problem to a collection of multidimensional discrete Fourier transforms over the complex numbers, whose dimensions are all powers of two. These transforms may then be evaluated rapidly by means of Nussbaumer’s fast polynomial transforms." [URL]https://hal.archives-ouvertes.fr/hal-02070778[/URL]; [URL]https://www.quantamagazine.org/mathematicians-discover-the-perfect-way-to-multiply-20190411/[/URL]

None of these developments from 2007 onward have yet bested the IBDWT Crandall approach, in the range of number sizes of interest to the GIMPS effort, or even considerably larger.

[B]Combining techniques; symmetry[/B]
The various techniques can be combined in extended-precision math software libraries, and the technique fastest for a given operand size used.
So far the description here has been in regard to symmetric multiplication (operands with essentially equal length). Long multiplication, Toom-Cook, and perhaps others support asymmetry. Asymmetry is not known to me to be of advantage in GIMPS primality testing, but it may be in P-1 factoring. The switching point between algorithms depends on the processor type and implementation. It can get rather complicated, as shown in GMP's threshold table [URL]https://gmplib.org/devel/thres/[/URL] and the color diagrams near the bottom of [URL]https://gmplib.org/devel/[/URL] that are processor-specific. It's worth noting those diagrams cover l<7600 or less, where l is "limbs", or words of size 2[SUP]k[/SUP], typically 32 or 64 bits, not the number of bits. [URL]https://gmplib.org/manual/Nomenclature-and-Types.html#index-Limb[/URL]
The diagonals of those diagrams would represent the symmetric operand cases. On the diagonals, we see transitions among 5 algorithms (schoolbook = conventional long multiplication, Toom22 = Karatsuba, Toom33, Toom44, fft mod2[SUP]k[/SUP]+1) with increasing l, at differing l depending on processor model. In all cases shown, schoolbook appears fastest for sizes involved in TF (which is ~<96 bits or <3 32-bit limbs), and FFT fastest for sizes above 5000 limbs, so presumably throughout GIMPS double checking wavefront and upwards beyond mersenne.org maximum (10[SUP]9[/SUP] bits) or Mlucas maximum (2[SUP]32[/SUP] bits).

[B]Practical implementation[/B]
Great care is taken in implementations such as GMP, or GWNUM (underlying prime95 and mprime) and other packages to efficiently use real machine resources such as cache levels, register counts, and memory bandwidth. Tuning the code is processor model or model family specific, because the different models' designs present differing constraints and tradeoffs.

[B]Further reading[/B]
Even faster integer multiplication thread [URL]https://www.mersenneforum.org/showthread.php?t=19486[/URL]
Ernst Mayer's article in ODROID magazine [URL]https://magazine.odroid.com/article/prime-number-discovery-use-odroid-c2-make-mathematical-history/[/URL]

Please PM me with any suggestions or corrections for this post, or use the reference discussion thread at [URL]https://www.mersenneforum.org/showthread.php?t=23383[/URL]


Top of reference tree: [URL]https://www.mersenneforum.org/showpost.php?p=521922&postcount=1[/URL]

kriesel 2019-03-14 00:45

PRP residue types
 
Woltman describes and discusses PRP residue types 1 through 5 in [URL]https://www.mersenneforum.org/showpost.php?p=468378&postcount=209[/URL]
Quoting in part: "Here is my PRP residue proposal:

There are (at least) 5 PRP residue types:
1; Fermat PRP. N is PRP if a^(N-1) = 1 mod N
2: SPRP variant, N is PRP if a^((N-1)/2) = +/-1 mod N
3: Fermat PRP variant. N is PRP if a^(N+1) = a^2 mod N
4: SPRP variant. N is PRP if a^((N+1)/2) = +/-a mod N
5: Fermat PRP cofactor variant, N/d is PRP if a^(N-1) = a^d mod N/d

I encourage programs to return type 1 residues as that has been the standard for prime95, PFGW, LLR for many years."

Note there is also a type 0, associated with gpuowl PRP-1 runs (combined PRP and P-1), involving computing powers of a base other than 3, mod Mp. [URL]https://www.mersenneforum.org/showpost.php?p=508299&postcount=1005[/URL]
[URL]https://www.mersenneforum.org/showpost.php?p=508318&postcount=1006[/URL]


Top of reference tree: [url]https://www.mersenneforum.org/showpost.php?p=521922&postcount=1[/url]

kriesel 2019-04-22 20:01

The Gerbicz error check for PRP, and on LL and P-1 with restrictions
 
The Gerbicz error check (GEC) is a very reliable check on the correctness of blocks of PRP test iterations. While it is very good, it is not an absolute guarantee of no error. It's possible that the error check fails, in the presence of certain software bugs, or that some code outside the span that the GEC protects has an error.

So far, the GEC has been implemented in GIMPS software only in prime95 and mprime PRP since v29.4, gpuowl PRP since V0.7, and Mlucas PRP since v19.
It's only currently applicable to PRP, not to LL, TF, or to conventional P-1 factoring. (Applicability to conventional left-to-right exponentiation as in usual P-1 stage 1 or in LL requires a known factor. See [URL]https://www.mersenneforum.org/showpost.php?p=470624&postcount=252[/URL]) Recently Preda has posted that changing P-1 stage 1 to right-to-left exponentiation would allow use of the GEC there. [URL]https://mersenneforum.org/showpost.php?p=537628&postcount=1856[/URL]

There's no single well organized thread as a repository for GEC knowledge, as far as I know.
Some threads & other sources contain more than others. Here's what I was able to find. I think many of these are well cross linked, so, finding one, may lead a diligent reader to many of them.

For gpuowl initial implementation see post 186 of [URL]http://www.mersenneforum.org/showthread.php?t=22204[/URL] and a considerable amount following.

The prime95 whatsnew.txt indicates addition at v29.4:
"3) PRP tests now support a type of low overhead error checking that almost guarantees correct results even on flaky hardware. We call this Gerbicz error-checking after it was proposed by Robert Gerbicz at mersenneforum.org. This error-check only works for base-2 numbers."
Parts of the 29.4 thread describe it. [URL]https://www.mersenneforum.org/showthread.php?t=22683[/URL]
Overhead for the GEC is given as 0.2% (and this is somewhat dependent on the choice of Gerbicz check block size). This very small overhead is worthwhile because of its high effectiveness at detecting errors and allowing immediate retreat to the last known good condition.

[URL]https://www.mersenneforum.org/showpost.php?p=504146&postcount=6[/URL] includes a section on Gerbicz the man, with links.

[URL]https://www.mersenneforum.org/showpost.php?p=465431&postcount=88[/URL] is his original post about the error check, in the thread Mihai had launched, and there's considerable discussion about it following.

[URL]https://mersenneforum.org/showthread.php?t=22510[/URL] is background regarding its earlier applicability to Proth/Pepin tests, and a bit re P-1.


Top of reference tree: [URL]https://www.mersenneforum.org/showpost.php?p=521922&postcount=1[/URL]

kriesel 2019-05-07 21:12

The Jacobi symbol check for LL
 
The Jacobi symbol check for the Lucas-Lehmer sequence was discovered by forum user "error" and posted 2017-08-07 at [URL]https://mersenneforum.org/showpost.php?p=465033&postcount=30[/URL]
Its announcement led to the request in the forum for additional error checks, and rather directly to the Gerbicz check for PRP.

Some programs use the Jacobi test periodically which detects LL computation errors at 50% probability. To my knowlege, only prime95/mprime and some of the early LL-based versions of gpuOwL incorporated the Jacobi test so far.
[QUOTE]All valid LL-residues for Mp will have (Res-2/Mp) = -1
If you make a random error at some iteration, you have 50% chance of getting all subsequent residues with (Res-2/Mp) = 1
This condition can be checked at any suitable interval, since it won't change back again, if no further errors occur.[/QUOTE]Computing the Jacobi symbol is a lengthy computation, so it is only done rarely (at intervals of 12 hours by default in prime95).

It appears the Jacobi check is applicable also to some portions of the P-1 factoring computations, although its application there is more complex, so it is less likely to be fast enough to be useful. As far as I know, this has never been implemented in production GIMPS P-1 code, until gpuowl V7.


Top of reference tree: [URL]https://www.mersenneforum.org/showpost.php?p=521922&postcount=1[/URL]

kriesel 2019-05-09 00:28

CUDA Toolkit compatibility vs CUDA level
 
1 Attachment(s)
CUDA toolkit levels and support, requirements for OS, driver, etc grid attached.
Note that a given version of Visual Studio can be configured to generate code for differing OS than the OS the VS runs on at the development system. (For example, XP compatible executables can be produced from a Windows 7 development system, in some compatible development environments.)


Top of reference tree: [URL]https://www.mersenneforum.org/showpost.php?p=521922&postcount=1[/URL]


All times are UTC. The time now is 11:31.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.