View Single Post
Old 2018-12-14, 19:46   #10
kriesel's Avatar
Mar 2017
US midwest

2·29·127 Posts
Default PRP3 run time scaling in V5.0-9c13870 (no P-1)

Gpuowl PRP3 has been run on all known Mersenne prime exponents feasible on its currently available fft lengths, mostly in ascending order. This provides run time scaling, reliability check on the hardware, and a check for any occurrence of false negatives or error detections, from the same run set. The test is being run on an RX480 under Windows 7 x64, along with a running instance of prime95 and mfakto running on an RX550 in the same system.

For the exponents below 216091, the minimum available fft length, 128K, is too large, giving bits/word below 1.5, and in most cases immediate fatal errors. p=132049 runs briefly, at 1.01 bits/word, but detects Gerbicz check errors repeatably in the initial 800 iteration block and exits after 3 rounds of that.
For exponents 216091 to 1398269, the run time is highly linear since they all are run at fft length 128K; p0.99.
For exponents above 1398269, since the fft length is chosen approximately proportional to the exponent, it seems reasonable to expect the scaling to approximate a power law above 2, since fft multiplication time is, per Knuth and other sources, proportional to n ln n ln ln n. Then a full PRP3 test would take n-1 iterations, or approximately n2 ln n ln ln n for large n.

In the attachment for CUDALucas run time scaling at there is scaling to p1.85 for 106<p<107, and to p2.095 for 107<p<108.
Run time scaling for prime95 for 86243<=p<=2976221 was p2.094.

The scaling for gpuowl appears to be lower than expected and lower than seen for other applications. For 1398269<p<107, runtime scales as p1.518; for 107<p<108 it is p1.72 to 1.88, which implies an fft multiplication time scaling proportional to lower than linear, similar to a lower exponent range in CUDALucas. Perhaps gpuowl does not reach asymptotic scaling until higher exponents. From 100M exponent to 100Mdigit, the gpuowl scaling was p2.04, consistent with that. Low n runs appear to be affected by setup overhead in CUDALucas and clLucas also, reducing the power seen in scaling fits. For gpuOwL, the OpenCl compilation each time contributes 2 to 3 seconds overhead. Frequent console or log output may also be contributing.

Finally, and importantly, no false negatives and no detected errors were observed.

Top of reference tree:
Attached Files
File Type: pdf gpuowl Mp PRP run times.pdf (16.0 KB, 367 views)

Last fiddled with by kriesel on 2019-11-17 at 14:52
kriesel is offline