mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

SELROC 2019-03-15 11:13

[QUOTE=kriesel;499410]I tested v4.3 a total of one little 11 minute m1257787 run to see if my edit left it still functional. Why would you object to that?
Worked through a backlog of builds, giving the newest at the time (V5.0-f604bb1) priority.
Execution speed seems to me to have been declining since v3.8, and V5.0 apparently continues that trend even when no P-1 is done. [URL]https://www.mersenneforum.org/showpost.php?p=499306&postcount=830[/URL]

I'm still running V3.8 for production because of its speed advantage. I think I'm not alone in that.
I may run V5 for exponents needing P-1, after it's shown sufficiently reliable, if the run time of PRP-1 in V5 is shown to be less than the run time of P-1 in CUDAPm1 plus PRP in V3.8 separately on comparable gpus. My recent benchmarking indicates V5 is often 4-5% slower (PRP only, not PRP-1) than V3.8. That difference is longer than an entire typical P-1, which is 2-2.5% of a CUDALucas LL test, in the 100M to 500M range, unless PRP was ~ twice as fast as LL.

Case in point: 87m P-1 on GTX1060, ~5 hours, PRP in V3.8, 3d 22 hours; so combined time is 4d 3h. Compare to 4d 12h for gpuowl V5, 9 hours slower.
Again: 171m P-1 on GTX1060, ~20 hours, PRP in V3.8, 14d 21h, combined 15d 17 h; Estimated V5.0 PRP time 15d 16h. If PRP-1 is within ~1 hour of PRP, V5 wins.[/QUOTE]

[QUOTE=SELROC;499418]The fastest version was 3.5, performance regression after that, and little performance recovery in 4.6.[/QUOTE]




To my great surprise the gpuowl performance regression on amdgpu-pro is [B]now gone[/B] with amdgpu-pro version 18.50


I have the same performance on amdgpu-pro 18.50 and ROCm 2.2


I am using the latest gpuowl version aka GitHub master branch.

M344587487 2019-03-26 18:31

Will the below DC definitely DC and not do a first time test of the wrong type? gpuowl did that once before so I avoided DC but now I want to verify that this card is producing correct results:

PRP=N/A,1,2,79335979,-1,75,0,3,1
[URL]https://www.mersenne.org/report_exponent/?exp_lo=79335979[/URL]
gpuowl 6.2-3a95f98-mod, a recent version.

Prime95 2019-03-26 19:28

That looks like the correct worktodo.txt entry to me.

ewmayer 2019-03-26 19:47

[QUOTE=kriesel;509060]Mihai is using a nomenclature of height, width, and middle for the components of the individual fft transforms in gpuowl, and up to 3 components. See [URL]https://www.mersenneforum.org/showpost.php?p=507015&postcount=956[/URL]

Mlucas does differently, with up to 5, if I recall correctly, and possibly more to come. Here's an excerpt from an mlucas.cfg file, with 5, and it appears room to go up to 10:[CODE]radices = 36 16 16 32 32 0 0 0 0 0[/CODE] for an 18M fft length. (36 x 16 x 16 x 32 x 32, x 2 for real plus imaginary I guess, = 18432K)[/QUOTE]
Correct re. Mlucas - those entries are complex-FFT radices. You may see 6 or even 7 for some FFT lengths of current-GIMPS-wavefront interest, depending on what works best on your hardware, as determined by the standard post-install self-tests, but 4-5 radices is more common.

M344587487 2019-03-27 01:31


SELROC 2019-03-27 06:39


preda 2019-03-27 06:46


M344587487 2019-03-27 08:43

preda if you're saying that the DC will erroneously do a type 4 blink twice
 

LaurV 2019-03-27 10:23

haha, good one!
 

SELROC 2019-03-27 12:44

benchmarking
 

SELROC 2019-03-27 16:49

last attempt: benchmarking gpus is subject to temperature. when benchmarking please note gpu temp.
 
test advanced editor.


OK


All times are UTC. The time now is 23:12.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.