![]() |
[QUOTE=SELROC;498136]Update: by putting the GPU in a Gen3 16x slot, the timing is the same, but the check becomes faster by 1-2 sec.
I think the gpu-cpu communication is faster, this affects GEC speed.[/QUOTE] Yes. During a check, the data in round-tripped from-and-to the GPU, to make sure that the CPU also has correct data. i.e: 1. the CPU reads from the GPU. 2. the CPU writes what it just read back to the GPU. 3. the check is done on the GPU. if the check (on the GPU) succeeds, we are confident that the CPU has correct data. If the check is done rarely, this CPU-GPU communication overhead is not a burden. |
[QUOTE=preda;498145]Yes. During a check, the data in round-tripped from-and-to the GPU, to make sure that the CPU also has correct data. i.e:
1. the CPU reads from the GPU. 2. the CPU writes what it just read back to the GPU. 3. the check is done on the GPU. if the check (on the GPU) succeeds, we are confident that the CPU has correct data. If the check is done rarely, this CPU-GPU communication overhead is not a burden.[/QUOTE] Thank you for explaining. |
Determining upper exponent limit for a transform type and fft length
See #8 in [url]https://www.mersenneforum.org/showthread.php?t=23391[/url], [URL]https://www.mersenneforum.org/showthread.php?p=498231#post498231[/URL]
It is the result of some gpu-months. |
[QUOTE=preda;498145]Yes. During a check, the data in round-tripped from-and-to the GPU, to make sure that the CPU also has correct data. i.e:
1. the CPU reads from the GPU. 2. the CPU writes what it just read back to the GPU. 3. the check is done on the GPU. if the check (on the GPU) succeeds, we are confident that the CPU has correct data. If the check is done rarely, this CPU-GPU communication overhead is not a burden.[/QUOTE] There is a slight performance improvement in version 4.6 of about 5-6 ms. |
For current issuing GIMPS first time PRP test assignments, how does gpuOwL v4.6 compare in speed to v3.8 (which is what I'm running now)?
Anyone out there have a Windows 7 x64 compatible build of gpuOwL 4.x to share? |
[QUOTE=kriesel;498364]For current issuing GIMPS first time PRP test assignments, how does gpuOwL v4.6 compare in speed to v3.8 (which is what I'm running now)?
[/QUOTE] I think v3.8 is slower than v3.5, so version 4.6 should be faster. Still slower than v3.5 but better than v3.8 [QUOTE] Anyone out there have a Windows 7 x64 compatible build of gpuOwL 4.x to share?[/QUOTE] no, my data comes from a gpuOwl build on Debian 9. |
[QUOTE=SELROC;498365]I think v3.8 is slower than v3.5, so version 4.6 should be faster. Still slower than v3.5 but better than v3.8.[/QUOTE]
Thanks. How much percentage difference do you see, on what exponents? |
[QUOTE=kriesel;498369]Thanks. How much percentage difference do you see, on what exponents?[/QUOTE]
v4.6 faster by 5-8 ms/it, exponent range 87M |
[QUOTE=SELROC;498373]v4.6 faster by 5-8 ms/it, exponent range 87M[/QUOTE]
I saw that 5-6 ms/it stated in post 730. V4.6 is what percentage faster, than what other version, v3.8? Or alternately, what is the timing per iteration for one of them, and which version is that for. (On an RX580, or what gpu model?) |
V4.6 gpuowl build attempt on win7 x64
[CODE]ken@condorella MINGW64 ~/gpuowl-compile/v4.6
$ g++ -std=c++17 -DREV=\"bb691cb\" -O2 -c Worktodo.cpp Result.cpp common.cpp gpuowl.cpp Gpu.cpp clwrap.cpp Task.cpp checkpoint.cpp timeutil.cpp Args.cpp GCD.cpp Primes.cpp Stats.cpp state.cpp -lOpenCL -lgmp -pthread Gpu.cpp:19:28: error: static assertion failed: size long static_assert(sizeof(long) == 8, "size long"); ~~~~~~~~~~~~~^~~~ [/CODE] |
v3.8 claims a large exponent is 0 ghz-days to PRP
[CODE]2018-10-20 20:41:05 condorella-rx480 PRP M(658000139), FFT 36864K, 17.43 bits/word, 0 GHz-day
2018-10-20 20:41:42 condorella-rx480 OK loaded: 0/658000139, blockSize 400, 0000000000000003 2018-10-20 20:41:56 condorella-rx480 OK initial check: 0000000000000003 2018-10-20 20:42:41 condorella-rx480 OK 800/658000139 [ 0.00%], 33.96 ms/it [33.90, 34.02] (0.0 GHz-day/day); ETA 258d 14:41; 2761274864abe7be (check 18.35 s) (saved) 2018-10-20 20:47:56 condorella-rx480 10000/658000139 [ 0.00%], 34.15 ms/it [33.93, 37.25] (0.0 GHz-day/day); ETA 260d 01:31; cf50aff61e4a6d54 [/CODE]exponents 335M and below that were tried had nonzero ghz-day totals indicated, and so nonzero ghz-day/day rates computed. |
| All times are UTC. The time now is 23:08. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.