View Single Post
Old 2019-08-08, 13:35   #4
kriesel's Avatar
Mar 2017
US midwest

649410 Posts
Default How much work is it to do x

Effort can be computed at for TF, P-1 factoring, or LL testing. Effort is expressed in Ghz-Days, the measure of what one core of a Core2 cpu running at 1Ghz could do in a day. Estimated performance of a given GPU is available at for TF and for other work types. Gpuowl performance with a recent 6.11-x or 7.x version is considerably better than indicated there.

To TF a Mersenne number with exponent 100M from starting bit level 73 to finishing bit level 76 is 133.9 GhzDays.
Double the exponent is about half the effort for equal bit levels. Each bit level is twice as much effort as the one preceding.
Note, in TF a GhzDay is not comparable to a GhzDay for other computation types, since GPUs are MUCH faster at TF. The ratio can be 11:1 ranging up to 40:1 or higher depending on GPU model and computation parameters.

To P-1 factor a Mersenne number with exponent ~100M to PrimeNet bounds B1=1040000,B2=28080000 is 13.90 GhzDays. This scales similarly to how PRP or LL testing do, ~p2.1.

GCD phase of P-1 or P+1 run time is O(p (log p)2 log log p), and strongly dependent on CPU core speed since known GIMPS implementations use single-threaded gmplib for GCD. For p~110M, Xeon Phi 7210, GCD time ~5.7 minutes. Run time scaling is in the range of p relevant to DC and upward to 1G, ~p1.14. In most applications GCD runs sequentially, stalling other CPU cores of a worker, or a GPU, for the duration of the GCD, while in some versions of Gpuowl it runs in parallel with the next P-1 stage or next assignment if a valid one exists in the worktodo file.

Server confirmation of a reported factor for TF or P-1 is a trivially fast computation.

To LL test a Mersenne number with exponent ~100M is 381.39 GhzDays. For ~110M it is ~482 GHzDays, or about a day on a Radeon VII gpu in a relatively recent version of gpuowl. (But do PRP with GEC and proof generation instead for greater reliability and efficiency.)
Effort scales as p log p log log p per iteration, or about p2.1 per test.

LL Double checking ("LLDC") and the occasional triple check, quadruple check, etc. are the same effort per attempt as a first test for a given exponent. Therefore, first testing using LL should cease as soon as possible. Using PRP with proof generation instead is more than twice as efficient, given LL's real world higher error rate and extremely high verification cost and extreme delays in verification time of occurrence. (Eight years is not unusual.)

To PRP test a Mersenne number is basically the same effort as an LL test. In gpuowl on a Radeon VII that could be a day for ~110M. On a Core 2 Duo it could be 11 weeks or more.

Gerbicz error check (GEC) as a fraction of a PRP, depends inversely on block size, typically ~0.2% of a PRP test at block size 1000. Overhead * blocksize ~ constant.

Jacobi symbol check, as a fraction of an LL test, depends on frequency, typically ~0.3% of an LL test.

PRP DC (without proof and verification as below) is the same effort as a first PRP test for the same exponent. Upgrade to proof generation capability as soon as possible.

PRP proof generation and verification
Total effort, assuming a single verification on a system separate from the PRP tester/proof-generator system and server, is, for a 100M exponent, approximately:
A) power= 8,  3.2 GB temporary disk space needed, proof file size 113MB, 413K squarings = 0.41% of a full DC, default
B) power= 9,  6.4 GB temporary disk space needed, proof file size 125MB, 239K squarings = 0.24% of a full DC
C) power=10, 12.8 GB temporary disk space needed, proof file size 138MB, 182K squarings = 0.18% of a full DC.
Proof generation as a fraction of a PRP, for a 100M exponent:
A) power= 8,  3.2 GB temporary disk space needed, proof file size 113MB, computation ~0.02% of a full DC, default
B) power= 9,  6.4 GB temporary disk space needed, proof file size 125MB, computation ~0.04% of a full DC; 
C) power=10, 12.8 GB temporary disk space needed, proof file size 138MB, computation ~0.08% of a full DC.
In practice it is somewhat longer, with gpuowl proof generation for power 8 taking about 0.07% of elapsed time, which includes SHA3 hashes, disk reads, misc. other small activities. Temporary space increases about proportionally to exponent, so power 10, 1G would be around 130GB per working instance!

Prime95 will reserve proof generation required disk space at the beginning and hold it for the duration, releasing the temporary disk space upon completion. "As exponents increase, squarings, disk space, and proof size increase roughly linearly."

For Gpuowl, maximum working system ram during proof generation for proof power 9 was observed in Task Manager as ~0.25 GB, which only takes about a minute at the end of a PRP computation for p~104M, occupying 1 cpu core. Ram in use increased as it began at level 1 and successively built higher levels of the proof, with ~0.25 GB seen as it performed the level 9 proof build step.

Server computation related to PRP proof is a small fraction of the total verification effort, at 1414 squarings ~14 ppm of a PRP test for p~100M, power 8; 1577 squarings ~16 ppm for power 9. It's unclear how that varies versus exponent.
Note, the server CPU is SSE2 hardware and its code is based on gwnum routines, so is limited to handling up to ~595.8M exponent automatically. Higher requires manual intervention by George.

PRP Proof Verification as a fraction of a PRP or PRPDC, for a hypothetical 100M exponent:
A) power= 8, proof file size 113MB, topk= ceiling(p/28)*28 = 100M, topk/28 = 390,625 squarings = 0.39% of a full DC
B) power= 9, proof file size 125MB, topk= ceiling(p/29)*29 = 100000256; topk/29 =  195313 squarings ~0.195% of a full DC
C) power=10, proof file size 138MB, topk= ceiling(p/210)*210 = 100000768; topk/210 = 97657 squarings = 0.098% of a full DC.
Power 7 would be 0.78%

Overall, LL vs. PRP compared:
LL + DC + occasional TC, QC, etc, ~2.04 tests at ~100M exponent, ~2.5 tests at 100Mdigits, to get a matched pair of res64s, which are presumed to constitute verification of those two runs. (There are some bugs which will cause erroneous residues that are far from random.)
PRP with GEC & proof generation & cert: ~1.01 test equivalent, to get a proven correct result.
PRP's error detection is far superior, and the overall project efficiency is more than double that of LL. (Increasingly so at larger exponents.)
That's why first time LL assignments are no longer issued by the PrimeNet server.
The reliability of LL has historically been a declining function with exponent increase. Longer run times create more chance of computing error that may escape detection.
That strengthens the case against LL which inherently has inferior error detection and recovery, as exponent and run time increase.

For first tests, run PRP with GEC & proof generation whenever possible. Only run LL with its lesser error detection and lesser efficiency, if PRP is not possible.
The preceding is for implementations of equal efficiency on equal or equivalent hardware. If comparing recent gpuowl to CUDALucas or ClLucas, add about another factor of 2 disadvantage for LL, and note neither of them include the Jacobi symbol check. Just don't LL!

To find the next Mersenne prime, compared to the current largest. R D Silverman lays it out at as approximately 8 times as much effort, based on conjectures about the expected distribution. GIMPS has had a very lucky run for the past several years where the Mersenne primes have been more closely spaced recently, than expected on the average.

If the remaining number of Mersenne primes with exponent p<109 fits conjectures, there are 6 left to find. A rough estimate of time to complete the search of p<109 is 150 years. If they are equally spaced in time that's 25 years apart. That's far longer than GIMPS previous experience, averaging ~17/25 ~ 0.68 per year.

(Particular thanks go to Preda and Prime95 who helped me understand the proof and verification resource usage)

Top of reference tree:
Attached Files
File Type: pdf prp preferable.pdf (21.8 KB, 203 views)

Last fiddled with by kriesel on 2021-10-02 at 16:41 Reason: add GCD time, P-1 scaling, update discovery rate, misc edits
kriesel is online now