Gpuowl PRP3 has been run on all known Mersenne prime exponents feasible on its currently available fft lengths, mostly in ascending order. This provides run time scaling, reliability check on the hardware, and a check for any occurrence of false negatives or error detections, from the same run set. The test is being run on an RX480 under Windows 7 x64, along with a running instance of prime95 and mfakto running on an RX550 in the same system.

For the exponents below 216091, the minimum available fft length, 128K, is too large, giving bits/word below 1.5, and in most cases immediate fatal errors. p=132049 runs briefly, at 1.01 bits/word, but detects Gerbicz check errors repeatably in the initial 800 iteration block and exits after 3 rounds of that.

For exponents 216091 to 1398269, the run time is highly linear since they all are run at fft length 128K; p

^{0.99}.

For exponents above 1398269, since the fft length is chosen approximately proportional to the exponent, it seems reasonable to expect the scaling to approximate a power law above 2, since fft multiplication time is, per Knuth and other sources, proportional to n ln n ln ln n. Then a full PRP3 test would take n-1 iterations, or approximately n

^{2} ln n ln ln n for large n.

In the attachment for CUDALucas run time scaling at

https://www.mersenneforum.org/showpo...23&postcount=2 there is scaling to p

^{1.85} for 10

^{6}<p<10

^{7}, and to p

^{2.095} for 10

^{7}<p<10

^{8}.

Run time scaling for prime95 for 86243<=p<=2976221 was p

^{2.094}.

https://www.mersenneforum.org/showpo...78&postcount=2
The scaling for gpuowl appears to be lower than expected and lower than seen for other applications. For 1398269<p<10

^{7}, runtime scales as p

^{1.518}; for 10

^{7}<p<10

^{8} it is p

^{1.72 to 1.88}, which implies an fft multiplication time scaling proportional to lower than linear, similar to a lower exponent range in CUDALucas. Perhaps gpuowl does not reach asymptotic scaling until higher exponents. From 100M exponent to 100Mdigit, the gpuowl scaling was p

^{2.04}, consistent with that. Low n runs appear to be affected by setup overhead in CUDALucas and clLucas also, reducing the power seen in scaling fits. For gpuOwL, the OpenCl compilation each time contributes 2 to 3 seconds overhead. Frequent console or log output may also be contributing.

Finally, and importantly, no false negatives and no detected errors were observed.

Top of reference tree:

https://www.mersenneforum.org/showpo...22&postcount=1