View Single Post
Old 2019-05-04, 13:51   #1
kriesel's Avatar
Mar 2017
US midwest

29·167 Posts
Default Statistical properties of categories of GIMPS results and interim results

I've been thinking lately, about what the expected probability distributions are for various categories of outputs, in GIMPS computing runs, how the distributions may be affected by various types of errors, and whether or when it might be feasible to detect the difference. All the following is in regard to p>>64, such as at the current GIMPS wavefronts of primality testing and double checking, and up to p<232.

In perl, the table for gpu applications of believed-bad res64 values is currently
'cllucas', '0x0000000000000002, 0xffffffff80000000',
'cudalucas', '0x0000000000000000, 0x0000000000000002, 0xfffffffffffffffd',
'cudapm1', '0x0000000000000000, 0x0000000000000001, 0xfff7fffbfffdfffe, 0xfff7fffbfffdffff, 0xfff7fffbfffffffe, 0xfff7fffbffffffff, '.
'0xfff7fffffffdfffe, 0xfff7fffffffdffff, 0xfff7fffffffffffe, 0xfff7ffffffffffff, 0xfffffffbfffdfffe, 0xfffffffbfffdffff, '.
'0xfffffffbfffffffe, 0xfffffffbffffffff, 0xfffffffffffdfffe, 0xfffffffffffdffff, 0xfffffffffffffffe, 0xffffffffffffffff',
'gpuowl', '0x0000000000000000',
'mfaktc', '',
'mfakto', ''
For cpu-based primality testers, interim res64 values 0x00, 0x02, and 0xfffffffffffffffd are regarded as bad LL values, and 0x00 is regarded as a bad PRP res64 value.

However, in we see that res64 values 0x000000000000 or 0xffffffffffffffff occur sometimes legitimately, as in the last iteration before the LL test identifies a Mersenne prime.

a) LL final residues:
0 if Mp is a prime and the computation went correctly.

Certain error conditions generate spurious 0x02 or 0x0 or 0xffffffff80000000 or 0xfffffffffffffffd sequences or final res64. Some of these are application specific.

Composite mersenne numbers' final LL full res are I think expected to be uniformly distributed in 0<res<Mp, so res64 are expected to be (almost) uniformly distributed 0<res64<ffffffffffffffff. (res64 would have a slightly lowered probability of ffffffffffffffff because res64(Mp)=ffffffffffffffff is excluded.) Which would mean final res64 values are likely to be, but not guaranteed to be, unique over p<232.

b) LL interim residues:
for a given seed value, there's initially a lockstep sequence, until the modulo Mp begins to make the values different for each p. For seed 4, with sufficiently high p (near 232), there is an iteration that would correctly produce res64 0x0000000000000002. Which coincides with an error condition check value. At some point after the mod Mp starts having effect, the interim res64 values start to look pseudorandom to us mere mortals. Are they, in the absence of various error conditions we already have checks for, and other than the "penultimate" values associated with Mersenne primes, equally likely, over a large number of exponents and iterations, 0<=res64<=ffffffffffffffff? (again, except for the pesky Mp res64 exclusion and deferred carry effect. I vaguely recall some carry operations are deferred in fft multiplies. Which should not affect final residues, but mean we must be careful about interim residues used for validity tests or screen output.)

c) PRP3
Situations similar to the preceding arise for PRP3, although it depends on base, residue type, etc. There's less incentive for any statistical evaluation there since the Gerbicz error check is so good. Interim PRP res64 can range 0<=res64<=0xffffffffffffffff (<=ffffffffffffffff with deferred carry) with uniform distribution (again, except for the pesky Mp res64 exclusion and deferred carry effect)

d) P-1 res64
Like the primality tests, P-1 factoring such as in CUDAPm1 produces periodically res64 outputs, both in stage 1 and in stage 2. Certain error condition values have been found to apply. Some error conditions have been observed to cause arbitrary res64 values to repeat at every output to console in stage 2. On one gpu model and P-1 run, I observed a sort of cyclic pattern of several res64 values which was mostly composed of hex f and consisted only of the 5 hex digits 7bdef (the only hex chars with at most one zero bit; 0111, 1011, 1101, 1110, 1111). In the absence of errors, is the res64 in P-1 stage 2 expected to be a uniform distribution over 0<=res64<=ffffffffffffffff? (again, except for the pesky Mp res64 exclusion and deferred carry effect) Similarly, for stage 1? I think they are, because they are from computations mod Mp.

e) Factor found
The distribution of factors found is nonuniform. k=1 is most common. The probability of prime f=2kp+1 being a prime factor of Mp is 1/f, which for given p is nearly proportional to 1/k. (There's also the f= 1 or 7 mod 8 requirement.)

f) No factor found
The probability distribution of remainders in trial factoring where no factor is found is uniform, but over a different range 1 to f-1 for each candidate factor f.

1) What errors do you see in the above?
2) What potentially useful statistical behavior is omitted?

Last fiddled with by kriesel on 2019-05-04 at 13:59
kriesel is offline   Reply With Quote