mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

SELROC 2018-08-30 11:02

Testing version 3.9 -- same 84M exponent started from checkpoint -- it is slower than version 3.6.
Timing went from 3.80 to 4.16 ms/it
FFT 4608K

preda 2018-08-30 12:07

[QUOTE=SELROC;494927]Testing version 3.9 -- same 84M exponent started from checkpoint -- it is slower than version 3.6.
Timing went from 3.80 to 4.16 ms/it
FFT 4608K[/QUOTE]

Acknowledged. I'll fix soon and post here a note when it's done.

tServo 2018-09-09 03:34

Mr Preda,
Do you use opencl 2.0 in gpuOwl or is it coded to 1.2 version?
Thanks

SELROC 2018-09-09 10:41

[QUOTE=preda;494930]Acknowledged. I'll fix soon and post here a note when it's done.[/QUOTE]




Started new 86M exponents, with both 3.6 and 3.9 versions.


The default FFT 4608K is too small, I have encountered errors and retries at around 20% (it is 18.x bits/word).



Now I have restarted the computation with -fft +1 which is 5120K and all seems to proceed.

xx005fs 2018-09-09 19:10

[QUOTE=SELROC;495737]Started new 86M exponents, with both 3.6 and 3.9 versions.


The default FFT 4608K is too small, I have encountered errors and retries at around 20% (it is 18.x bits/word).



Now I have restarted the computation with -fft +1 which is 5120K and all seems to proceed.[/QUOTE]

I had no issue using 4608K FFT for 86.5M exponents on version 3.8 on my AMD GPU.

SELROC 2018-09-09 20:23

[QUOTE=xx005fs;495772]I had no issue using 4608K FFT for 86.5M exponents on version 3.8 on my AMD GPU.[/QUOTE]


There is a point where the fft becomes too small, that is after 86.5M

tServo 2018-09-09 22:19

I just ran some PRP via tests via gpuOwl to completion and got results that look like this:

{"exponent":86524913, "worktype":"PRP-3", "status":"C", "program":{"name":"gpuowl", "version":"3.9-OpenCL"}, "timestamp":"2018-09-09 08:15:59 UTC", "computer":"Ellesmere-36x1266-@9:0.0", "aid":"C6C41CB2A726CF8BF01B808ED6BAF3A1", "residue-type":1, "fft-length":"5120K", "res64":"c373d39ff37389ec", "errors":{"gerbicz":0}}

What was my result, anyway? Does the status:C mean it is composite ?

How do I reformat this to submit to PrimeNet?

Thanks

preda 2018-09-09 23:48

[QUOTE=tServo;495780]I just ran some PRP via tests via gpuOwl to completion and got results that look like this:

{"exponent":86524913, "worktype":"PRP-3", "status":"C", "program":{"name":"gpuowl", "version":"3.9-OpenCL"}, "timestamp":"2018-09-09 08:15:59 UTC", "computer":"Ellesmere-36x1266-@9:0.0", "aid":"C6C41CB2A726CF8BF01B808ED6BAF3A1", "residue-type":1, "fft-length":"5120K", "res64":"c373d39ff37389ec", "errors":{"gerbicz":0}}

What was my result, anyway? Does the status:C mean it is composite ?

How do I reformat this to submit to PrimeNet?

Thanks[/QUOTE]
Yes "C" is composite, "P" is prime. Also the res64 would have a particular form for a prime.. I think for "type 1" the res64 for a prime would be 1.

Submit the JSON as is (e.g. cut-paste or file upload) to the manual results page, it parses that syntax.

preda 2018-09-09 23:51

[QUOTE=tServo;495709]Mr Preda,
Do you use opencl 2.0 in gpuOwl or is it coded to 1.2 version?
Thanks[/QUOTE]

It's mostly OpenCL 1.x, but it does use some aspects of the "memory model" that were introduced in OpenCL 2.0; related to memory synchronization between groups, used mostly in the kernel carryFused() I think.

It was plain 1.x not so long ago, but that stopped working on ROCm (because of the memory model), so switched to requiring 2.x at that point.

preda 2018-09-20 14:14

[QUOTE=preda;494930]Acknowledged. I'll fix soon and post here a note when it's done.[/QUOTE]

I moved to the new ROCm 1.9, and I can't reproduce this on my GPUs (Vega & FuryX). I don't know if the slowdown is still affecting 580. Let's keep an eye on it in the future.

SELROC 2018-09-20 14:37

[QUOTE=preda;496442]I moved to the new ROCm 1.9, and I can't reproduce this on my GPUs (Vega & FuryX). I don't know if the slowdown is still affecting 580. Let's keep an eye on it in the future.[/QUOTE]


Hello Mihai, I am testing version 4.1 from yesterday, yes it is still slower by around 0.30 ms/it. The performance is better in version 3.6, this should help you track down the modifications that made the slowdown.


All times are UTC. The time now is 23:07.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.