![]() |
[QUOTE=kriesel;499410]I tested v4.3 a total of one little 11 minute m1257787 run to see if my edit left it still functional. Why would you object to that?
Worked through a backlog of builds, giving the newest at the time (V5.0-f604bb1) priority. Execution speed seems to me to have been declining since v3.8, and V5.0 apparently continues that trend even when no P-1 is done. [URL]https://www.mersenneforum.org/showpost.php?p=499306&postcount=830[/URL] I'm still running V3.8 for production because of its speed advantage. I think I'm not alone in that. I may run V5 for exponents needing P-1, after it's shown sufficiently reliable, if the run time of PRP-1 in V5 is shown to be less than the run time of P-1 in CUDAPm1 plus PRP in V3.8 separately on comparable gpus. My recent benchmarking indicates V5 is often 4-5% slower (PRP only, not PRP-1) than V3.8. That difference is longer than an entire typical P-1, which is 2-2.5% of a CUDALucas LL test, in the 100M to 500M range, unless PRP was ~ twice as fast as LL. Case in point: 87m P-1 on GTX1060, ~5 hours, PRP in V3.8, 3d 22 hours; so combined time is 4d 3h. Compare to 4d 12h for gpuowl V5, 9 hours slower. Again: 171m P-1 on GTX1060, ~20 hours, PRP in V3.8, 14d 21h, combined 15d 17 h; Estimated V5.0 PRP time 15d 16h. If PRP-1 is within ~1 hour of PRP, V5 wins.[/QUOTE] [QUOTE=SELROC;499418]The fastest version was 3.5, performance regression after that, and little performance recovery in 4.6.[/QUOTE] To my great surprise the gpuowl performance regression on amdgpu-pro is [B]now gone[/B] with amdgpu-pro version 18.50 I have the same performance on amdgpu-pro 18.50 and ROCm 2.2 I am using the latest gpuowl version aka GitHub master branch. |
Will the below DC definitely DC and not do a first time test of the wrong type? gpuowl did that once before so I avoided DC but now I want to verify that this card is producing correct results:
PRP=N/A,1,2,79335979,-1,75,0,3,1 [URL]https://www.mersenne.org/report_exponent/?exp_lo=79335979[/URL] gpuowl 6.2-3a95f98-mod, a recent version. |
That looks like the correct worktodo.txt entry to me.
|
[QUOTE=kriesel;509060]Mihai is using a nomenclature of height, width, and middle for the components of the individual fft transforms in gpuowl, and up to 3 components. See [URL]https://www.mersenneforum.org/showpost.php?p=507015&postcount=956[/URL]
Mlucas does differently, with up to 5, if I recall correctly, and possibly more to come. Here's an excerpt from an mlucas.cfg file, with 5, and it appears room to go up to 10:[CODE]radices = 36 16 16 32 32 0 0 0 0 0[/CODE] for an 18M fft length. (36 x 16 x 16 x 32 x 32, x 2 for real plus imaginary I guess, = 18432K)[/QUOTE] Correct re. Mlucas - those entries are complex-FFT radices. You may see 6 or even 7 for some FFT lengths of current-GIMPS-wavefront interest, depending on what works best on your hardware, as determined by the standard post-install self-tests, but 4-5 radices is more common. |
|
|
|
preda if you're saying that the DC will erroneously do a type 4 blink twice
|
haha, good one!
|
benchmarking
|
last attempt: benchmarking gpus is subject to temperature. when benchmarking please note gpu temp.
test advanced editor.
OK |
| All times are UTC. The time now is 23:12. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.