(most of this was originally posted as

https://www.mersenneforum.org/showpo...&postcount=268)

It turns out to be a nanooptimization at best.

Something I've thought about from time to time and it comes up as questions by others, is beginning at iteration small x>0 with a precomputed interim term. For LL, seed of 4, iteration 1 14, iteration 2 194, etc, and it almost doubles in number of bits per iteration, so iteration 3, 37634, 16 bits, still fits in a single word and is far shorter than the mod n for any reasonable sized exponent. So begin with 37634 and the next iteration is called 4. There's an equivalent possibility in type 4 residue PRP3. Saving a few iterations is usually dismissed as not worth the bother, at current exponents.

As a fraction of the work it's tiny, only ~3/100M saved, 30ppb. There are around 400,000 exponents to primality test within 10% of 100M, times two each; 800,000. It adds up to about 0.024 primality tests saved over that span. Step and repeat over additional exponent spans, and the tiny savings add up overall, theoretically, ignoring erasure of some by people quitting, and the savings being distributed over the many participants. Seems like a tiny but fast and straightforward tweak to implement. But it does not result in somewhere an additional primality test being completed, because the savings are divided up among too many systems and workers. (Maybe it results in additional factoring, but even that seems unlikely.) Results are reached a tiny bit sooner. Over a year's time, 30ppb is ~1 second. I just spent centuries worth of the savings, estimating and describing it.

(If one was doing 64 bit unsigned int math, it could go to iteration 5. In a variable length representation, the initial iterations are only a single word long so the net effect is much much less.) This is also why the first 5 iterations' res64 values are independent of exponent>64. In hex:

LL seed 4

1 E

2 C2

3 9302

4 546B 4C02

5 1BD6 96D9 F03D 3002

Without much trouble, it could be carried to 6 iterations, 128 bits, or higher.

But 6 iterations only saves 60ppb at 100M, and proportionately less at higher exponents, to 6ppb at the mersenne.org current limit 1000M; ~1.4ppb at the Mlucas limit.

One could press on to greater numbers of iterations, but it's probably very quickly more efficient to generate the rapidly growing interim residues in place than to read them in from disk.

One could also consider caching in memory some sufficiently early iteration's residue for reuse as a slight head-start on a sufficiently high next Mersenne number. Until the modulo starts having effect, the residue doubles in length every iteration, and is independent of exponent. The mod Mp taking effect limits the number of exponent-independent iterations to about 24 iterations maximum for current GIMPS double check (24/48M ~ 500. ppb) to about 30 iterations maximum in the current capacity of Mlucas (30/2

^{32} ~7. ppb).

Another tradeoff is the code needs to be changed, to attempt any such nanooptimization, and there's a nonzero chance of some bug being introduced, that could cost far more computing time than the seconds saved per generation.

It also diverts precious programmer time from the possibility of other progress, which might include 0.1% (1,000,000 ppb) or higher performance improvements for some processor type that may constitute 1% or more of the GIMPS aggregate throughput, or adding some useful error check to a software package. The available voluntary productive programming time of the talented programmers capable of producing efficient reliable GIMPS code is probably the project's scarcest and limiting resource.

Top of reference tree:

https://www.mersenneforum.org/showpo...22&postcount=1