mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

preda 2018-10-16 18:29

[QUOTE=SELROC;498136]Update: by putting the GPU in a Gen3 16x slot, the timing is the same, but the check becomes faster by 1-2 sec.

I think the gpu-cpu communication is faster, this affects GEC speed.[/QUOTE]

Yes. During a check, the data in round-tripped from-and-to the GPU, to make sure that the CPU also has correct data. i.e:
1. the CPU reads from the GPU.
2. the CPU writes what it just read back to the GPU.
3. the check is done on the GPU.
if the check (on the GPU) succeeds, we are confident that the CPU has correct data.

If the check is done rarely, this CPU-GPU communication overhead is not a burden.

SELROC 2018-10-16 19:06

[QUOTE=preda;498145]Yes. During a check, the data in round-tripped from-and-to the GPU, to make sure that the CPU also has correct data. i.e:
1. the CPU reads from the GPU.
2. the CPU writes what it just read back to the GPU.
3. the check is done on the GPU.
if the check (on the GPU) succeeds, we are confident that the CPU has correct data.

If the check is done rarely, this CPU-GPU communication overhead is not a burden.[/QUOTE]


Thank you for explaining.

kriesel 2018-10-18 14:37

Determining upper exponent limit for a transform type and fft length
 
See #8 in [url]https://www.mersenneforum.org/showthread.php?t=23391[/url], [URL]https://www.mersenneforum.org/showthread.php?p=498231#post498231[/URL]
It is the result of some gpu-months.

SELROC 2018-10-20 15:00

[QUOTE=preda;498145]Yes. During a check, the data in round-tripped from-and-to the GPU, to make sure that the CPU also has correct data. i.e:
1. the CPU reads from the GPU.
2. the CPU writes what it just read back to the GPU.
3. the check is done on the GPU.
if the check (on the GPU) succeeds, we are confident that the CPU has correct data.

If the check is done rarely, this CPU-GPU communication overhead is not a burden.[/QUOTE]




There is a slight performance improvement in version 4.6 of about 5-6 ms.

kriesel 2018-10-20 16:29

For current issuing GIMPS first time PRP test assignments, how does gpuOwL v4.6 compare in speed to v3.8 (which is what I'm running now)?

Anyone out there have a Windows 7 x64 compatible build of gpuOwL 4.x to share?

SELROC 2018-10-20 16:53

[QUOTE=kriesel;498364]For current issuing GIMPS first time PRP test assignments, how does gpuOwL v4.6 compare in speed to v3.8 (which is what I'm running now)?
[/QUOTE]


I think v3.8 is slower than v3.5, so version 4.6 should be faster. Still slower than v3.5 but better than v3.8



[QUOTE]
Anyone out there have a Windows 7 x64 compatible build of gpuOwL 4.x to share?[/QUOTE]


no, my data comes from a gpuOwl build on Debian 9.

kriesel 2018-10-20 17:51

[QUOTE=SELROC;498365]I think v3.8 is slower than v3.5, so version 4.6 should be faster. Still slower than v3.5 but better than v3.8.[/QUOTE]
Thanks. How much percentage difference do you see, on what exponents?

SELROC 2018-10-20 18:28

[QUOTE=kriesel;498369]Thanks. How much percentage difference do you see, on what exponents?[/QUOTE]


v4.6 faster by 5-8 ms/it, exponent range 87M

kriesel 2018-10-20 20:14

[QUOTE=SELROC;498373]v4.6 faster by 5-8 ms/it, exponent range 87M[/QUOTE]
I saw that 5-6 ms/it stated in post 730.

V4.6 is what percentage faster, than what other version, v3.8? Or alternately, what is the timing per iteration for one of them, and which version is that for. (On an RX580, or what gpu model?)

kriesel 2018-10-21 02:47

V4.6 gpuowl build attempt on win7 x64
 
[CODE]ken@condorella MINGW64 ~/gpuowl-compile/v4.6
$ g++ -std=c++17 -DREV=\"bb691cb\" -O2 -c Worktodo.cpp Result.cpp common.cpp gpuowl.cpp Gpu.cpp clwrap.cpp Task.cpp checkpoint.cpp timeutil.cpp Args.cpp GCD.cpp Primes.cpp Stats.cpp state.cpp -lOpenCL -lgmp -pthread
Gpu.cpp:19:28: error: static assertion failed: size long
static_assert(sizeof(long) == 8, "size long");
~~~~~~~~~~~~~^~~~
[/CODE]

kriesel 2018-10-21 02:48

v3.8 claims a large exponent is 0 ghz-days to PRP
 
[CODE]2018-10-20 20:41:05 condorella-rx480 PRP M(658000139), FFT 36864K, 17.43 bits/word, 0 GHz-day
2018-10-20 20:41:42 condorella-rx480 OK loaded: 0/658000139, blockSize 400, 0000000000000003
2018-10-20 20:41:56 condorella-rx480 OK initial check: 0000000000000003
2018-10-20 20:42:41 condorella-rx480 OK 800/658000139 [ 0.00%], 33.96 ms/it [33.90, 34.02] (0.0 GHz-day/day); ETA 258d 14:41; 2761274864abe7be (check 18.35
s) (saved)
2018-10-20 20:47:56 condorella-rx480 10000/658000139 [ 0.00%], 34.15 ms/it [33.93, 37.25] (0.0 GHz-day/day); ETA 260d 01:31; cf50aff61e4a6d54
[/CODE]exponents 335M and below that were tried had nonzero ghz-day totals indicated, and so nonzero ghz-day/day rates computed.


All times are UTC. The time now is 23:08.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.