View Single Post
Old 2021-11-13, 00:26   #35
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

34·7·13 Posts
Default Requirements for comparability of interim residues

To compare parallel long runs in progress, it is useful/necessary to have runs produce and preserve comparable interim results. Differences may indicate errors early. That is especially important for longer runs, and code relatively lower in error detection and correction. Standalone P-1 factoring typically has less error detection. That is partly because less emphasis was placed on it due to shorter run time than primality testing for the same exponent. It's also partly because the number theory does not offer as many opportunities for error detection in standalone P-1 factoring. (The Jacobi symbol check is more costly in P-1 stage 1, because unlike for LL, we don't know the correct Jacobi symbol value; we must compute it, and compute the check on the interim results.) It's also partly because the impact of an error is lower in P-1 stage 2, affecting only a small fraction of the stage, not the whole stage, or whole primality test as can occur in LL. However, in P-1 factoring for F33 or OBD or higher Mersennes, P-1 run times become months or years on typical fast hardware, increasing the importance of error detection.

In the context of GIMPS Mersenne prime searching, for P-1 interim res64 matching between multiple runs on the same number, all the following must be present, which is more complex than required conditions for LL or PRP interim res64 matching:

In P-1 stage 1:
  1. same exponent and type, e.g. 3321928171 Mersenne for an OBD attempt;
  2. same seed (the initial value that gets powered, usually 3, at the beginning of stage 1);
  3. precisely the same stage 1 prime-powers product, which requires precisely the same formula for computing it (typically B1-powersmooth of all primes < B1, * 2 * exponent);
  4. same modpow algorithm, such as all left-to-right (prime95/mprime does this for initial stage 1, but does right-to-left for stage1-extension);
  5. same interim iteration (squarings) count, and therefore compatible logging intervals
For P-1 stage 2, which depends on stage 1 final residue:
  1. same exponent and type;
  2. matching stage 1 final full size residue results as start point files;
  3. same software (e.g. Mlucas, and perhaps version restrictions);
  4. exact same stage 2 buffer count, implying specifying buffer count on a system with more memory than the other system;
  5. res64 stage 2 output implemented (some software does not implement s2 res64 output, e.g. gpuowl)
  6. same iteration number (q in Mlucas parlance, or perhaps in other software s2 primes coverage), and therefore compatible logging intervals
  7. same prime-pairings algorithm (More or less implied by same software, unless there are differences between versions, as has occurred with prime95 IIRC, and may occur in the future with Mlucas)
  8. equivalence of subtle implementation details, including those that may affect total number of modmuls etc. required for the same gross parameter set (exponent, type, B1, B2)

For P+1, ECM: no idea. P+1 random seed, ECM random curve parameter, would seem to make comparing runs more difficult and less necessary.


For TF: interim residues are not output, so there's nothing to check.


By comparison, for LL, it's simpler:
  1. LL seed value 4,
  2. same exponent, & type, e.g. 3321928171, Mersenne
  3. same iteration number, with proper allowance for prime95/mprime's high-by-2 loop counter

And for PRP (& PRPDC), it's a little more complicated again. With GEC, there is extremely reliable checking in progress, typically with automatic rollback and retry from the last saved confirmed-good state. The ability to compare and check any PRP iteration number's interim residue is likely to be needed by developers during debugging. This requires:
  1. same PRP seed value, typically 3 (except for PRP-1 type 0),
  2. same exponent, & type, e.g. 3321928171, Mersenne
  3. same iteration number, with proper allowance for prime95/mprime's possibly different loop counter
  4. same PRP type 1-5 is not always strictly required for interim residue matching, although it is for final residues (except that type 1 and 5 are equivalent in the absence of factors)
  5. same approach on producing the increasing powers from 1 to ~exponent, e.g. left to right or right to left and nuances relating to optimization.
  6. If using GEC, which requires a straight squaring sequence, corresponding to type 3, adjustments to power will be made at the end to convert to other PRP types.
  7. same approach on reporting the interim residues. (When using GEC, are the residues recomputed for reporting to correspond to the nominal type, or are they left as in the type 3 sequence.)
  8. Some very limited testing indicates gpuowl v5 (type 4), gpuowl v6.11-380 (type 1), and Mlucas v20.1.1 (type 1) have compatible interim 64-bit residues for the same exponent and iteration number while prime95 v30.7 is an outlier.
Compatible logging intervals are simple to achieve in LL or PRP, with [1,2,5]*10n multipliers on iteration count typically for log intervals. It is not so simple in P-1, when the units (modmuls-in-algorithm-I or specific delta s2-primes progress or whatever) differ, and possibly in a way that the least common multiple might be substantial compared to the total run duration, or where the number of modmuls may differ slightly between implementations.


(Perhaps more to come. Including corrections.)


A special thanks here to Ernst Mayer for considerable explanation by PM regarding P-1.
Constructive comments especially by software authors are invited.


Top of this reference thread: https://www.mersenneforum.org/showpo...89&postcount=1
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2022-04-15 at 19:10 Reason: minor formatting
kriesel is online now