mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Blogorrhea > kriesel

Closed Thread
 
Thread Tools
Old 2019-01-16, 04:12   #1
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

139C16 Posts
Default Developer's corner

This thread is intended for data and other aids for Mersenne prime finder software testing or development. These may also be useful in assessing hardware reliability.
  1. This post
  2. Mersenne prime exponent LL worktodo lines (gpuowl <v0.7, CUDALucas and prime95 formats) https://www.mersenneforum.org/showpo...80&postcount=2
  3. Mersenne prime exponent PRP and PRP-1 worktodo lines (gpuOwL format) https://www.mersenneforum.org/showpo...82&postcount=3
  4. Interim 64-bit residues for LL sequences of known Mersenne prime exponents https://www.mersenneforum.org/showpo...82&postcount=4
  5. Interim 64-bit residues for PRP3 sequences of known Mersenne prime exponents https://www.mersenneforum.org/showpo...83&postcount=5
  6. Seed value ten LL series final residues https://www.mersenneforum.org/showpo...12&postcount=6
  7. How fast can we multiply 2 integers, or square an integer? https://www.mersenneforum.org/showpo...21&postcount=7
  8. PRP residue types https://www.mersenneforum.org/showpo...32&postcount=8
  9. The Gerbicz error check for PRP, and on LL and P-1 with restrictions https://www.mersenneforum.org/showpo...17&postcount=9
  10. The Jacobi check for LL https://www.mersenneforum.org/showpo...1&postcount=10
  11. CUDA Toolkit compatibility vs CUDA level https://www.mersenneforum.org/showpo...1&postcount=11
  12. Interim residues for Mersenne number large exponents https://www.mersenneforum.org/showpo...4&postcount=12
  13. LL with shift https://www.mersenneforum.org/showpo...8&postcount=13
  14. Challenges of large exponents https://www.mersenneforum.org/showpo...6&postcount=14
  15. etc.
See also, in other threads,
Lists of 7-smooth FFT lengths that are multiples of 210; http://www.mersenneforum.org/showpos...75&postcount=7
Concepts in GIMPS trial factoring, https://www.mersenneforum.org/showpo...23&postcount=6
Git basics https://www.mersenneforum.org/showpo...postcount=1076
Preda's thread on roundoff error distribution https://mersenneforum.org/showthread...615#post542615


This is a reference content thread. Please place any discussion, suggestion, or comment in the reference discussion thread, not in this thread.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-03-28 at 16:45
kriesel is offline  
Old 2019-01-16, 04:26   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×5×251 Posts
Default Mersenne prime exponent LL test worktodo lines (gpuowl <v0.7, CUDALucas and prime95 formats)

No AID, since these have already been found. Do not attempt to submit results for these entries to mersenne.org. It will not accept further results for known Mersenne primes. These are for test purposes only

Gpuowl V0.6 or lower format
Code:
#gpuowl LL format Mersenne prime LL worktodo records 
# for gpu reliability validation or code modification testing
# gpuowl LL minimum exponent for 4M fft length is 50000000, max is ~77000000.
# other exponents produce error on worktodo read
Test=0,57885161,69,1
Test=0,74207281,70,1
Test=0,77232917,70,1
Gpuowl recent version format
Code:
#Gpuowl v6 or 7 format Mersenne prime PRP worktodo records 
# for gpu reliability validation or code modification testing
# gpuowl minimum exponent is 200k, 
# smaller exponents produce error "FFT size too large" on worktodo read
#PRP=N/A,1,2,2,1,1
#PRP=N/A,1,2,3,1,1
#PRP=N/A,1,2,5,-1,2,1
#PRP=N/A,1,2,7,-1,3,1
#PRP=N/A,1,2,13,-1,6,1
#PRP=N/A,1,2,17,-1,8,1
#PRP=N/A,1,2,19,-1,9,1
#PRP=N/A,1,2,31,-1,15,1
#PRP=N/A,1,2,61,-1,30,1
#PRP=N/A,1,2,89,-1,40,1
#PRP=N/A,1,2,107,-1,40,1
#PRP=N/A,1,2,127,-1,40,1
#PRP=N/A,1,2,521,-1,40,1
#PRP=N/A,1,2,607,-1,40,1
#PRP=N/A,1,2,1279,-1,40,1
#PRP=N/A,1,2,2203,-1,40,1
#PRP=N/A,1,2,2281,-1,40,1
#PRP=N/A,1,2,3217,-1,40,1
#PRP=N/A,1,2,4253,-1,40,1
#PRP=N/A,1,2,4423,-1,40,1
#PRP=N/A,1,2,9689,-1,40,1
#PRP=N/A,1,2,9941,-1,40,1
#PRP=N/A,1,2,11213,-1,40,1
#PRP=N/A,1,2,19937,-1,40,1
#PRP=N/A,1,2,21701,-1,40,1
#PRP=N/A,1,2,23209,-1,40,1
#PRP=N/A,1,2,44497,-1,40,1
#PRP=N/A,1,2,86243,-1,40,1
#PRP=N/A,1,2,110503,-1,40,1
#PRP=N/A,1,2,132049,-1,40,1
PRP=N/A,1,2,216091,-1,40,1
PRP=N/A,1,2,756839,-1,40,1
PRP=N/A,1,2,859433,-1,40,1
PRP=N/A,1,2,1257787,-1,56,1
PRP=N/A,1,2,1398269,-1,56,1
PRP=N/A,1,2,2976221,-1,60,1
PRP=N/A,1,2,3021377,-1,60,1
PRP=N/A,1,2,6972593,-1,63,1
PRP=N/A,1,2,13466917,-1,64,1
PRP=N/A,1,2,20996011,-1,65,1
PRP=N/A,1,2,24036583,-1,66,1
PRP=N/A,1,2,25964951,-1,66,1
PRP=N/A,1,2,30402457,-1,67,1
PRP=N/A,1,2,32582657,-1,67,1
PRP=N/A,1,2,37156667,-1,67,1
PRP=N/A,1,2,42643801,-1,68,1
PRP=N/A,1,2,43112609,-1,68,1
PRP=N/A,1,2,57885161,-1,69,1
PRP=N/A,1,2,74207281,-1,70,1
PRP=N/A,1,2,77232917,-1,70,1
PRP=N/A,1,2,82589933,-1,71,1
CUDALucas format:
Code:
#CUDALucas format Mersenne prime LL worktodo records 
# for gpu reliability validation or code modification testing
# CUDALucas minimum exponent is 7500, 
# smaller exponents produce error "invalid data" on worktodo read
#Test=2,1,1
#Test=3,1,1
#Test=5,2,1
#Test=7,3,1
#Test=13,6,1
#Test=17,8,1
#Test=19,9,1
#Test=31,15,1
#Test=61,30,1
#Test=89,40,1
#Test=107,40,1
#Test=127,40,1
#Test=521,40,1
#Test=607,40,1
#Test=1279,40,1
#Test=2203,40,1
#Test=2281,40,1
#Test=3217,40,1
#Test=4253,40,1
#Test=4423,40,1
Test=9689,40,1
Test=9941,40,1
Test=11213,40,1
Test=19937,40,1
Test=21701,40,1
Test=23209,40,1
Test=44497,40,1
Test=86243,40,1
Test=110503,40,1
Test=132049,40,1
Test=216091,40,1
Test=756839,40,1
Test=859433,40,1
Test=1257787,56,1
Test=1398269,56,1
Test=2976221,60,1
Test=3021377,60,1
Test=6972593,63,1
Test=13466917,64,1
Test=20996011,65,1
Test=24036583,66,1
Test=25964951,66,1
Test=30402457,67,1
Test=32582657,67,1
Test=37156667,67,1
Test=42643801,68,1
Test=43112609,68,1
Test=57885161,69,1
Test=74207281,70,1
Test=77232917,70,1
Test=82589933,71,1
Prime95 format follows
Code:
;prime95 format Mersenne prime LL worktodo records
; for cpu reliability validation or code modification testing
Test=N/A,2,1,1
Test=N/A,3,1,1
Test=N/A,5,2,1
Test=N/A,7,3,1
Test=N/A,13,6,1
Test=N/A,17,8,1
Test=N/A,19,9,1
Test=N/A,31,15,1
Test=N/A,61,30,1
Test=N/A,89,40,1
Test=N/A,107,40,1
Test=N/A,127,40,1
Test=N/A,521,40,1
Test=N/A,607,40,1
Test=N/A,1279,40,1
Test=N/A,2203,40,1
Test=N/A,2281,40,1
Test=N/A,3217,40,1
Test=N/A,4253,40,1
Test=N/A,4423,40,1
Test=N/A,9689,40,1
Test=N/A,9941,40,1
Test=N/A,11213,40,1
Test=N/A,19937,40,1
Test=N/A,21701,40,1
Test=N/A,23209,40,1
Test=N/A,44497,40,1
Test=N/A,86243,40,1
Test=N/A,110503,40,1
Test=N/A,132049,40,1
Test=N/A,216091,40,1
Test=N/A,756839,40,1
Test=N/A,859433,40,1
Test=N/A,1257787,56,1
Test=N/A,1398269,56,1
Test=N/A,2976221,60,1
Test=N/A,3021377,60,1
Test=N/A,6972593,63,1
Test=N/A,13466917,64,1
Test=N/A,20996011,65,1
Test=N/A,24036583,66,1
Test=N/A,25964951,66,1
Test=N/A,30402457,67,1
Test=N/A,32582657,67,1
Test=N/A,37156667,67,1
Test=N/A,42643801,68,1
Test=N/A,43112609,68,1
Test=N/A,57885161,69,1
Test=N/A,74207281,70,1
Test=N/A,77232917,70,1
Test=N/A,82589933,71,1
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-10-09 at 00:17 Reason: added recent version gpuowl format entries
kriesel is offline  
Old 2019-01-16, 04:46   #3
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22·5·251 Posts
Default Mersenne prime exponent PRP and PRP-1 worktodo lines (gpuowl format)

Code:
#draft qa suite for checking false negatives in PRP, and limits imposed by fft lengths available
# or for gpu reliability validation or code modification testing
PRP=0,1,2,2,-1,1,1
PRP=0,1,2,3,-1,1,1
PRP=0,1,2,5,-1,2,1
PRP=0,1,2,7,-1,3,1
PRP=0,1,2,13,-1,6,1
PRP=0,1,2,17,-1,8,1
PRP=0,1,2,19,-1,9,1
PRP=0,1,2,31,-1,15,1
PRP=0,1,2,61,-1,30,1
PRP=0,1,2,89,-1,40,1
PRP=0,1,2,107,-1,40,1
PRP=0,1,2,127,-1,40,1
PRP=0,1,2,521,-1,40,1
PRP=0,1,2,607,-1,40,1
PRP=0,1,2,1279,-1,40,1
PRP=0,1,2,2203,-1,40,1
PRP=0,1,2,2281,-1,40,1
PRP=0,1,2,3217,-1,40,1
PRP=0,1,2,4253,-1,40,1
PRP=0,1,2,4423,-1,40,1
PRP=0,1,2,9689,-1,40,1
PRP=0,1,2,9941,-1,40,1
PRP=0,1,2,11213,-1,40,1
PRP=0,1,2,19937,-1,40,1
PRP=0,1,2,21701,-1,40,1
PRP=0,1,2,23209,-1,40,1
PRP=0,1,2,44497,-1,40,1
PRP=0,1,2,86243,-1,40,1
PRP=0,1,2,110503,-1,0,1
PRP=0,1,2,132049,-1,40,1
PRP=0,1,2,216091,-1,40,1
PRP=0,1,2,756839,-1,40,1
PRP=0,1,2,859433,-1,40,1
PRP=0,1,2,1257787,-1,56,1
PRP=0,1,2,1398269,-1,56,1
PRP=0,1,2,2976221,-1,60,1
PRP=0,1,2,3021377,-1,60,1
PRP=0,1,2,6972593,-1,63,1
PRP=0,1,2,13466917,-1,64,1
PRP=0,1,2,20996011,-1,65,1
PRP=0,1,2,24036583,-1,66,1
PRP=0,1,2,25964951,-1,66,1
PRP=0,1,2,30402457,-1,67,1
PRP=0,1,2,32582657,-1,67,1
PRP=0,1,2,37156667,-1,67,1
PRP=0,1,2,42643801,-1,68,1
PRP=0,1,2,43112609,-1,68,1
PRP=0,1,2,57885161,-1,69,1
PRP=0,1,2,74207281,-1,70,1
PRP=0,1,2,77232917,-1,70,1
PRP=0,1,2,82589933,-1,71,1
Code:
#draft qa suite for checking false negatives in PRP-1
 #bounds listed are greater of primenet or gpu72 indicated at mersenne.ca
#tf are gpu72 target indicated at mersenne.ca

#following block that are commented out are less than 1.5 bits/word at 
#gpuowl 128k fft length minimum 
#B1=1,B2=25;PRP=0,1,2,107,-1,44,2
#B1=1,B2=25;PRP=0,1,2,127,-1,44,2
#B1=6,B2=150;PRP=0,1,2,521,-1,44,2
#B1=7,B2=175;PRP=0,1,2,607,-1,44,2
#B1=14,B2=350;PRP=0,1,2,1279,-1,44,2
#B1=25,B2=625;PRP=0,1,2,2203,-1,44,2
#B1=26,B2=650;PRP=0,1,2,2281,-1,44,2
#B1=26,B2=650;PRP=0,1,2,3217,-1,44,2
#B1=48,B2=1200;PRP=0,1,2,4253,-1,44,2
#B1=49,B2=1225;PRP=0,1,2,4423,-1,44,2
#B1=108,B2=2700;PRP=0,1,2,9689,-1,44,2
#B1=111,B2=2775;PRP=0,1,2,9941,-1,44,2
#B1=125,B2=3125;PRP=0,1,2,11213,-1,44,2
#B1=223,B2=5575;PRP=0,1,2,19937,-1,44,2
#B1=243,B2=6075;PRP=0,1,2,21701,-1,44,2
#B1=260,B2=6500;PRP=0,1,2,23209,-1,44,2
#B1=498,B2=12450;PRP=0,1,2,44497,-1,44,2
#B1=964,B2=24100;PRP=0,1,2,86243,-1,44,2
#B1=10000,B2=60000;PRP=0,1,2,110503,-1,44,2
#B1=10000,B2=70000;PRP=0,1,2,132049,-1,44,2

#following are usable at 128k fft length 
B1=10000,B2=130000;PRP=0,1,2,216091,-1,44,2
B1=20000,B2=500000;PRP=0,1,2,756839,-1,44,2
B1=20000,B2=580000;PRP=0,1,2,859433,-1,44,2

#following seem suitable for gpuowl v3.x-5.0 supported fft lengths
B1=20000,B2=240000;PRP=0,1,2,1257787,-1,60,2
B1=20000,B2=300000;PRP=0,1,2,1398269,-1,60,2
B1=40000,B2=560000;PRP=0,1,2,2976221,-1,64,2
B1=40000,B2=560000;PRP=0,1,2,3021377,-1,64,2
B1=80000,B2=1440000;PRP=0,1,2,6972593,-1,67,2
B1=150000,B2=3150000;PRP=0,1,2,13466917,-1,69,2
B1=260000,B2=6500000;PRP=0,1,2,20996011,-1,69,2
B1=280000,B2=7000000;PRP=0,1,2,24036583,-1,70,2
B1=310000,B2=7750000;PRP=0,1,2,25964951,-1,70,2
B1=350000,B2=8750000;PRP=0,1,2,30402457,-1,71,2
B1=380000,B2=9500000;PRP=0,1,2,32582657,-1,71,2
B1=430000,B2=11610000;PRP=0,1,2,37156667,-1,71,2
B1=480000,B2=12960000;PRP=0,1,2,42643801,-1,72,2
B1=490000,B2=13230000;PRP=0,1,2,43112609,-1,72,2
B1=650000,B2=17550000;PRP=0,1,2,57885161,-1,73,2
B1=810000,B2=13860000;PRP=0,1,2,74207281,-1,74,2
B1=840000,B2=23520000;PRP=0,1,2,77232917,-1,74,2
B1=870000,B2=23490000;PRP=0,1,2,82589933,-1,75,2
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2019-12-19 at 16:19
kriesel is offline  
Old 2019-01-17, 20:09   #4
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10011100111002 Posts
Default Interim 64-bit residues for LL sequences of known Mersenne prime exponents

For comparison to the output of any new or revised software in development, the attachment contains interim res64 values for selected iteration numbers of Lucas-Lehmer (seed 4) sequences for a wide range of known Mersenne primes. These can also be used to test hardware reliability and OS/driver/application/dlls combinations for correct function.

See also:
https://www.mersenneforum.org/showpo...&postcount=179
p=82589933 interim residues list every 5M

https://www.mersenneforum.org/showpo...&postcount=192
ATH log files from cudalucas and mlucas verification runs for 82589933


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf Reference Mp LL res64 milestones.pdf (16.9 KB, 132 views)

Last fiddled with by kriesel on 2019-11-19 at 16:04
kriesel is offline  
Old 2019-01-17, 20:09   #5
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×5×251 Posts
Default Interim 64-bit residues for PRP3 sequences of known Mersenne prime exponents

For comparison to the output of any new or revised software in development, the attachment contains interim res64 values for selected iteration numbers of PRP3 sequences for a wide range of known Mersenne primes. Can also be used to test hardware reliability and OS/driver/application/dlls combinations for correct function.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf Reference Mp PRP3 res64 milestones.pdf (15.3 KB, 136 views)

Last fiddled with by kriesel on 2019-11-19 at 16:04
kriesel is offline  
Old 2019-01-20, 21:06   #6
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22·5·251 Posts
Default Seed value ten LL series final residues

The Lucas Lehmer series in GIMPS usage is standardized at seed value (S0) 4. Other values, such as ten, are also usable. There's a reference tabulation for various Mersenne primes of the final iterations' residues for seed 10 at http://www.hoegge.dk/mersenne/penult...sultsS0=10.txt These other seed values are to be avoided, for normal GIMPS use, in my opinion, because the math makes interim residues not comparable. A list of seed values usable in Lucas Lehmer sequences is available at https://oeis.org/A018844


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2019-11-19 at 16:05
kriesel is offline  
Old 2019-03-13, 18:34   #7
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×5×251 Posts
Default How fast can we multiply two integers, or square an integer?

Central to the work at the current wavefronts of GIMPS, or higher exponent, is fast math with large numbers requiring many words to contain the large number of bits involved.
This applies to primality testing by LL or probable prime test, P-1 factoring, and to a lesser extent, trial factoring.
How fast, or rather, to make it relatively independent of computing hardware speed, how efficiently can we multiply two integers, or square a large integer?

Most of the following addresses the general case of two different operands multiplied. It also mostly omits special cases such as one operand is a small power of two, which can often be accomplished quickly by shifts.
George Woltman gives an overview with computing times for early GIMPS hardware in his Colloquium available on Youtube.

Long multiplication
Beginning with the simplest method, long multiplication, which is most efficient for small multiword integers, such as occur in typical trial factoring, we multiply each digit, and add those partial products together. We select a digit size that fits the computing hardware well.
If programming an online calculator, where computations are short and frequent conversion to and from base 10 would be inefficient, one might select a power of ten as the digit size. Typically for a GIMPS application we will use a power of two, for efficiency, since many computations are done between base conversions. On modern computing hardware we might select 232 or 264 as the digit size. (Because 232 is usually faster on gpus, mfaktc uses it. Mfaktc goes up only to 95-bit factors.) For four-digit width, it might look something like the following.
Let the operands be abcd and efgh, with each letter representing the value of the respective large digit.
Here a is the most significant digit, and d the least.
Code:
         a b c d
 x       e f g h
----------------
     ah bh ch dh
    ag bg cg dg
  af bf cf df
ae be ce de
--------------------
Each of those partial products may be two digits wide. There are 16 of them (42).
They are shown in this pattern for correspondence with the operand digits, but the alignment for summing is not correct above.
Digit by digit, with alignment, the partial products are
Code:
                    bhhi bhlo dhhi dhlo
               ahhi ahlo chhi chlo
               bghi bglo dghi dglo
          aghi aglo cghi cglo 
          bfhi bflo dfhi dflo
     afhi aflo cfhi cflo
     behi belo dehi delo
aehi aelo cehi celo
---------------------------------------
There are also 24 single-digit additions, with carries.
Generally, the effort goes up as the square of the operand length, for a fixed digit size.

Squaring a number is less effort than general multiplication in this method.
Code:
          a b c d
x         a b c d
-----------------
      ad bd cd dd
    ac bc cc DC
  ab bb CB DB
aa BA CA DA
-----------------
Since ad=da, ab=ba, ac=ca, cb=bc, db=bd, cd=dc, it's not necessary to compute the 6 partial products shown in upper case above.
In general, (n-1)n/2 multiplications can be avoided in long squaring relative to long multiplication. I think of this as folding the square in half diagonally, and needing to compute products for the diagonal terms and only one of the two off-diagonal triangles.
As n becomes large, this savings approaches half the digit multiplications. The additions must still occur.
It may be more efficient to shift the partial product left by one bit and add once with carry than to do two adds.

As the number of digits increases, the time to do multiplication or squaring increases rapidly.
Above a certain point, which is generally hundreds of bits, it is faster to use a different, more complex method.

Karatsuba
The least complex of these alternate methods was discovered by Karatsuba in 1960 and published in 1962. http://www.ccas.ru/personal/karatsuba/divcen.pdf (Before then it was conjectured the lower bound was O(n2).)
It uses an algebraic identity to accomplish a 2n-digit product of n-digit operands in 3n2/4 partial products rather than n2, at the cost of some additional additions and subtractions.
For sufficiently large operands, it is worthwhile to use the Karatsuba method recursively.
Recursion reduces the order of multiplication from O(n2) to O(nlog2(3))= O(n1.585) https://en.wikipedia.org/wiki/Karatsuba_algorithm
A rather readable description of long multiplication and Karatsuba algorithm is http://people.mpi-inf.mpg.de/~mehlho...apter2A-en.pdf
At values corresponding to the current GIMPS first or second primality test wavefront, n~85M or ~47M, or above, the ratio of number of multiplications is considerable (thousands). See the attachment.

Toom-Cook
There is a generalization of the Karatsuba approach, to splitting the operands into k pieces, where k=2 for Karatsuba.
It was discovered by Toom in 1963, and refined by Cook in 1966. So it is now called Toom-Cook.
For k=3, it is O(n1.465). Due to higher overhead, it is efficient beginning at somewhat higher operand lengths than Karatsuba. It's also possible to split the two operands differently, to k and l pieces, as when one is significantly larger.

Schönhage-Strassen (FFT)
An even faster (lower order) method is the Schönhage-Strassen algorithm, developed in 1971.
"The run-time bit complexity is, in Big O notation, O ( n ⋅ log ⁡ n ⋅ log ⁡ log ⁡ n ) for two n-digit numbers.
The algorithm uses recursive Fast Fourier transforms in rings with 2n+1 elements, a specific type of number theoretic transform."
https://en.wikipedia.org/wiki/Sch%C3...ssen_algorithm
Note that this algorithm requires zero-padding the operands. For sufficiently large operands, this is faster than Toom-Cook or other preceding methods of multiplication or squaring.
There is a very helpful elaboration on this method early in the thread https://www.mersenneforum.org/showthread.php?t=120
and an Excel example at http://www.gweep.net/%7Eshifty/portf...eet/index.html

Some recommended reading later in the thread reference https://books.google.com/books?isbn=3662005514 or http://cnx.org/content/col10550/latest/

Irrational Base Discrete Weighted Transform (IBDWT)
An improvement on Schönhage-Strassen is Crandall & Fagin's irrational base discrete weighted transform, published 1994.
https://www.ams.org/journals/mcom/19...-1185244-1.pdf
https://en.wikipedia.org/wiki/Irrati...hted_transform
Efficient implementations of this are the basis for the leading current GIMPS primality testing and P-1 factoring software packages.
The computations are performed modulo the Mersenne number being tested for primality, which is very quick in base two and limits doubling in size of the products, at each of the millions of iterations required.
It's common that they are implemented in floating point and sacrifice some provable-correctness of results for increased performance. In practice, the observed computation effort per iteration in the fastest available GIMPS software is O(n~1.1)
Various error checking techniques are employed to obtain acceptable reliability. This is to my knowledge the fastest currently known algorithm for operands of size relevant to primality testing or P-1 factoring progress at the GIMPS wavefront and far above.
Overall error rates per primality check, which are the result of errors due to hardware, software, numerical, and user effects, are discussed at
https://www.mersenneforum.org/showpo...3&postcount=19

Fürer and following work
Schönhage & Strassen conjectured a lower bound for multiplication of O(n log n).
Fürer published in 2007 an algorithm with O(n log n 2O(log* n)) where log* is the iteration logarithm (number of times logarithm is applied to reach a number less than or equal to one).https://en.wikipedia.org/wiki/F%C3%BCrer%27s_algorithm
Subsequently, others have lowered or better defined the order, with modifications to the Fürer algorithm or by finding other algorithms;
2008 De et al (DKSS algorithm) https://dl.acm.org/citation.cfm?id=2...dl=ACM&coll=DL;
2014 Harvey, van der Hoeven and Lecerf (https://arxiv.org/abs/1407.3360);
2015 Covanov and Thome (modified Fürer);
2016 Covanov and Thome O(n log n 22log* n) via a generalization of Fermat primes https://arxiv.org/abs/1502.02800;
2018 Harvey and van der Hoeven O(n log n 22log* n) and does not depend on unproven conjectures https://arxiv.org/abs/1802.07932;

None of these developments from 2007 onward have yet bested the IBDWT Crandall approach, in the range of number sizes of interest to the GIMPS effort, or even considerably larger. Work continues, to improve the proportionality constant or order.

Combining techniques; symmetry
The various techniques can be combined in extended-precision math software libraries, and the technique fastest for a given operand size used.
So far the description here has been in regard to symmetric multiplication (operands with essentially equal length). Long multiplication, Toom-Cook, and perhaps others support asymmetry. Asymmetry is not known to me to be of advantage in GIMPS primality testing, but it may be in P-1 factoring. The switching point between algorithms depends on the processor type and implementation. It can get rather complicated, as shown in GMP's threshold table https://gmplib.org/devel/thres/ and the color diagrams near the bottom of https://gmplib.org/devel/ that are processor-specific. It's worth noting those diagrams cover l<7600 or less, where l is "limbs", or words of size 2k, typically 32 or 64 bits, not the number of bits. https://gmplib.org/manual/Nomenclatu...tml#index-Limb
The diagonals of those diagrams would represent the symmetric operand cases. On the diagonals, we see transitions among 5 algorithms (schoolbook = conventional long multiplication, Toom22 = Karatsuba, Toom33, Toom44, fft mod2k+1) with increasing l, at differing l depending on processor model. In all cases shown, schoolbook appears fastest for sizes involved in TF (which is ~<96 bits or <3 32-bit limbs), and FFT fastest for sizes above 5000 limbs, so presumably throughout GIMPS double checking wavefront and upwards beyond mersenne.org maximum (109 bits) or Mlucas maximum (232 bits).

Practical implementation
Great care is taken in implementations such as GMP, or GWNUM (underlying prime95 and mprime) and other packages to efficiently use real machine resources such as cache levels, register counts, and memory bandwidth. Tuning the code is processor model or model family specific, because the different models' designs present differing constraints and tradeoffs.

Further reading
Even faster integer multiplication thread https://www.mersenneforum.org/showthread.php?t=19486
Ernst Mayer's article in ODROID magazine https://magazine.odroid.com/article/...tical-history/

Please PM me with any suggestions or corrections for this post, or use the reference discussion thread at https://www.mersenneforum.org/showthread.php?t=23383


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf karatsuba long mult ratio.pdf (12.6 KB, 143 views)

Last fiddled with by kriesel on 2021-02-15 at 23:06 Reason: added colloquium link sentence
kriesel is offline  
Old 2019-03-14, 00:45   #8
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×5×251 Posts
Default PRP residue types

Woltman describes and discusses PRP residue types 1 through 5 in https://www.mersenneforum.org/showpo...&postcount=209
Quoting in part: "Here is my PRP residue proposal:

There are (at least) 5 PRP residue types:
1; Fermat PRP. N is PRP if a^(N-1) = 1 mod N
2: SPRP variant, N is PRP if a^((N-1)/2) = +/-1 mod N
3: Fermat PRP variant. N is PRP if a^(N+1) = a^2 mod N
4: SPRP variant. N is PRP if a^((N+1)/2) = +/-a mod N
5: Fermat PRP cofactor variant, N/d is PRP if a^(N-1) = a^d mod N/d

I encourage programs to return type 1 residues as that has been the standard for prime95, PFGW, LLR for many years."

Note there is also a type 0, associated with gpuowl PRP-1 runs (combined PRP and P-1), involving computing powers of a base other than 3, mod Mp. https://www.mersenneforum.org/showpo...postcount=1005
https://www.mersenneforum.org/showpo...postcount=1006


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2019-11-19 at 16:06
kriesel is offline  
Old 2019-04-22, 20:01   #9
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10011100111002 Posts
Default The Gerbicz error check for PRP, and on LL and P-1 with restrictions

The Gerbicz error check (GEC) is a very reliable check on the correctness of blocks of PRP test iterations. While it is very good, it is not an absolute guarantee of no error. It's possible that the error check fails, in the presence of certain software bugs, or that some code outside the span that the GEC protects has an error.

So far, the GEC has been implemented in GIMPS software only in prime95 and mprime PRP since v29.4, gpuowl PRP since V0.7, and Mlucas PRP since v19.
It's only currently applicable to PRP, not to LL, TF, or to conventional P-1 factoring. (Applicability to conventional left-to-right exponentiation as in usual P-1 stage 1 or in LL requires a known factor. See https://www.mersenneforum.org/showpo...&postcount=252) Recently Preda has posted that changing P-1 stage 1 to right-to-left exponentiation would allow use of the GEC there. https://mersenneforum.org/showpost.p...postcount=1856

There's no single well organized thread as a repository for GEC knowledge, as far as I know.
Some threads & other sources contain more than others. Here's what I was able to find. I think many of these are well cross linked, so, finding one, may lead a diligent reader to many of them.

For gpuowl initial implementation see post 186 of http://www.mersenneforum.org/showthread.php?t=22204 and a considerable amount following.

The prime95 whatsnew.txt indicates addition at v29.4:
"3) PRP tests now support a type of low overhead error checking that almost guarantees correct results even on flaky hardware. We call this Gerbicz error-checking after it was proposed by Robert Gerbicz at mersenneforum.org. This error-check only works for base-2 numbers."
Parts of the 29.4 thread describe it. https://www.mersenneforum.org/showthread.php?t=22683
Overhead for the GEC is given as 0.2% (and this is somewhat dependent on the choice of Gerbicz check block size). This very small overhead is worthwhile because of its high effectiveness at detecting errors and allowing immediate retreat to the last known good condition.

https://www.mersenneforum.org/showpo...46&postcount=6 includes a section on Gerbicz the man, with links.

https://www.mersenneforum.org/showpo...1&postcount=88 is his original post about the error check, in the thread Mihai had launched, and there's considerable discussion about it following.

https://mersenneforum.org/showthread.php?t=22510 is background regarding its earlier applicability to Proth/Pepin tests, and a bit re P-1.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-02-24 at 22:06 Reason: updated for Mlucas PRP/GEC at V19, possible further use of GEC
kriesel is offline  
Old 2019-05-07, 21:12   #10
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×5×251 Posts
Default The Jacobi symbol check for LL

The Jacobi symbol check for the Lucas-Lehmer sequence was discovered by forum user "error" and posted 2017-08-07 at https://mersenneforum.org/showpost.p...3&postcount=30
Its announcement led to the request in the forum for additional error checks, and rather directly to the Gerbicz check for PRP.

Some programs use the Jacobi test periodically which detects LL computation errors at 50% probability. To my knowlege, only prime95/mprime and some of the early LL-based versions of gpuOwL incorporated the Jacobi test so far.
Quote:
All valid LL-residues for Mp will have (Res-2/Mp) = -1
If you make a random error at some iteration, you have 50% chance of getting all subsequent residues with (Res-2/Mp) = 1
This condition can be checked at any suitable interval, since it won't change back again, if no further errors occur.
Computing the Jacobi symbol is a lengthy computation, so it is only done rarely (at intervals of 12 hours by default in prime95).

It appears the Jacobi check is applicable also to some portions of the P-1 factoring computations, although its application there is more complex, so it is less likely to be fast enough to be useful. As far as I know, this has never been implemented in production GIMPS P-1 code, until gpuowl V7.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-10-22 at 22:53 Reason: gpuowl V7 use of Jacobi symbol for P-1
kriesel is offline  
Old 2019-05-09, 00:28   #11
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×5×251 Posts
Default CUDA Toolkit compatibility vs CUDA level

CUDA toolkit levels and support, requirements for OS, driver, etc grid attached.
Note that a given version of Visual Studio can be configured to generate code for differing OS than the OS the VS runs on at the development system. (For example, XP compatible executables can be produced from a Windows 7 development system.)


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf compatibility.pdf (34.6 KB, 137 views)

Last fiddled with by kriesel on 2019-11-19 at 16:07
kriesel is offline  
Closed Thread

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Windows 10 SP1 will have UBUNTU developer support !!!! tServo Software 19 2016-04-23 21:30
Cheesehead's Corner? jasong jasong 6 2013-10-16 20:09
Intel Xeon Phi - Knights Corner BotXXX Hardware 16 2012-06-21 23:54
Debian developer needed... Xyzzy Linux 5 2006-06-01 14:56

All times are UTC. The time now is 09:13.

Fri Apr 16 09:13:52 UTC 2021 up 8 days, 3:54, 0 users, load averages: 1.25, 1.40, 1.48

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.