mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
Thread Tools
Old 2019-05-13, 20:33   #1167
chengsun
 
"Cheng Sun"
May 2019

22 Posts
Default

Quote:
Originally Posted by kriesel View Post
Always! I'm sure iteration times for specific exponents will be of interest to RTX20xx owners, or those considering buying one, for comparison to CUDALucas on the same model. And congratulations on getting it to run.
Here's some Nvidia RTX 2070 benchmark numbers. Looks like gpuowl performs quite admirably for most FFT sizes.

gpuowl PRP, M57885161, 3072K FFT, 2.98ms/iter:
Quote:
2019-05-13 20:57:00 gpuowl v6.5-25-gc48d46f
2019-05-13 20:57:00 Note: no config.txt file found
2019-05-13 20:57:00 config: -prp 57885161
2019-05-13 20:57:00 57885161 FFT 3072K: Width 256x4, Height 64x4, Middle 6; 18.40 bits/word
2019-05-13 20:57:00 using short carry kernels
2019-05-13 20:57:00

2019-05-13 20:57:00 OpenCL compilation in 3 ms, with "-DEXP=57885161u -DWIDTH=1024u -DSMALL_HEIGHT=256u -D
MIDDLE=6u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-13 20:57:01 57885161.owl not found, starting from the beginning.
2019-05-13 20:57:13 57885161 OK 2000 0.00%; 2.93 ms/sq; ETA 1d 23:09; 904fc4ed927722e7 (check 3.03s)
2019-05-13 20:58:06 57885161 20000 0.03%; 2.95 ms/sq; ETA 1d 23:29; f2c610087d02c3ea
2019-05-13 20:59:06 57885161 40000 0.07%; 2.97 ms/sq; ETA 1d 23:45; adb226c2322baa14
2019-05-13 21:00:05 57885161 60000 0.10%; 2.98 ms/sq; ETA 1d 23:48; 175901ec29adfa87
2019-05-13 21:01:05 57885161 80000 0.14%; 2.98 ms/sq; ETA 1d 23:47; c2ee4a9ca385f917
2019-05-13 21:02:04 57885161 100000 0.17%; 2.98 ms/sq; ETA 1d 23:46; f1cbf8d474fd3237
CUDALucas, M57885161, 3136K FFT, 3.62ms/iter:

Quote:
CUDALucas v2.06beta 64-bit build, compiled May 13 2019 @ 20:34:37

binary compiled for CUDA 10.10
CUDA runtime version 10.10
CUDA driver version 10.10

------- DEVICE 0 -------
name GeForce RTX 2070
UUID GPU-<redacted>
ECC Support? Disabled
Compatibility 7.5
clockRate (MHz) 1620
memClockRate (MHz) 7001
totalGlobalMem 8338604032
totalConstMem 65536
l2CacheSize 4194304
sharedMemPerBlock 49152
regsPerBlock 65536
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsPerMP 1024
multiProcessorCount 36
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 2147483647,65535,65535
textureAlignment 512
deviceOverlap 1
pciDeviceID 0
pciBusID 1

You may experience a small delay on 1st startup to due to Just-in-Time Compilation

Using threads: square 256, splice 128.
Starting M57885161 fft length = 3136K
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| May 13 21:04:21 | M57885161 10000 0x76c27556683cd84d | 3136K 0.18750 3.5870 35.87s | 2:09:40:03 0.01% |
| May 13 21:04:57 | M57885161 20000 0xfd8e311d20ffe6ab | 3136K 0.17969 3.6011 36.01s | 2:09:46:13 0.03% |
| May 13 21:05:33 | M57885161 30000 0xce0d85ab0065a232 | 3136K 0.17188 3.6198 36.19s | 2:09:53:52 0.05% |
| May 13 21:06:09 | M57885161 40000 0x6746379dfc966410 | 3136K 0.17188 3.6199 36.19s | 2:09:57:27 0.06% |
| May 13 21:06:46 | M57885161 50000 0xa5797ceaebc59091 | 3136K 0.17969 3.6192 36.19s | 2:09:59:13 0.08% |
| May 13 21:07:22 | M57885161 60000 0x169388139f3463d6 | 3136K 0.18750 3.6202 36.20s | 2:10:00:20 0.10% |
| May 13 21:07:58 | M57885161 70000 0x82ed6e5a5048987a | 3136K 0.17188 3.6203 36.20s | 2:10:00:59 0.12% |
| May 13 21:08:34 | M57885161 80000 0x3bf6fd44b89b51e1 | 3136K 0.16406 3.6199 36.19s | 2:10:01:16 0.13% |
| May 13 21:09:10 | M57885161 90000 0xc316bcb121f8288a | 3136K 0.17188 3.6195 36.19s | 2:10:01:19 0.15% |
| May 13 21:09:47 | M57885161 100000 0xe54ba81dac4ff3d8 | 3136K 0.17188 3.6200 36.20s | 2:10:01:18 0.17% |

Now testing the same exponent but using a 4096K FFT size for both:

gpuowl PRP, M57885161, 4096K FFT, 3.88ms/iter:
Quote:
2019-05-13 21:16:59 57885161 120000 0.21%; 3.88 ms/sq; ETA 2d 14:15; 2172b8f3cc5b3272
2019-05-13 21:18:17 57885161 140000 0.24%; 3.88 ms/sq; ETA 2d 14:14; af31f96be3309024
2019-05-13 21:19:34 57885161 160000 0.28%; 3.88 ms/sq; ETA 2d 14:12; fd84ac518a5eb59d
CUDALucas, M57885161, 4096K FFT, 3.72ms/iter:
Quote:
| May 13 21:13:56 | M57885161 140000 0xf0ab82e1a9a1aa0e | 4096K 0.00061 3.7190 37.19s | 2:11:36:23 0.24% |
| May 13 21:14:33 | M57885161 150000 0x8e9733fee4029132 | 4096K 0.00052 3.7188 37.18s | 2:11:36:21 0.25% |
| May 13 21:15:10 | M57885161 160000 0x0b5dadf12ed96a4d | 4096K 0.00052 3.7178 37.17s | 2:11:35:55 0.27% |

Now testing a 91M exponent, default FFT size for both:

gpuowl PRP, M91260713, 5120K FFT, 5.04ms/iter:
Quote:
2019-05-13 21:20:25 gpuowl v6.5-25-gc48d46f
2019-05-13 21:20:25 Note: no config.txt file found
2019-05-13 21:20:25 config: -prp 91260713
2019-05-13 21:20:25 91260713 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.41 bits/word
2019-05-13 21:20:25 using short carry kernels
2019-05-13 21:20:26

2019-05-13 21:20:26 OpenCL compilation in 885 ms, with "-DEXP=91260713u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-13 21:20:27 91260713.owl not found, starting from the beginning.
2019-05-13 21:20:48 91260713 OK 2000 0.00%; 5.00 ms/sq; ETA 5d 06:47; 7f2e65a79606215a (check 5.15s)
2019-05-13 21:22:18 91260713 20000 0.02%; 5.03 ms/sq; ETA 5d 07:31; 9f439bcb988863f2
2019-05-13 21:23:59 91260713 40000 0.04%; 5.04 ms/sq; ETA 5d 07:40; fee8273824cbf2b2
2019-05-13 21:25:40 91260713 60000 0.07%; 5.04 ms/sq; ETA 5d 07:38; 8e003220fc40d3b1
CUDALucas, M91260713, 5120K FFT, 6.05ms/iter:
Quote:
Using threads: square 256, splice 128.
Starting M91260713 fft length = 5120K
| Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done |
| May 13 21:27:05 | M91260713 10000 0xa4a207ab75eb658d | 5120K 0.10156 6.0194 60.19s | 6:08:34:42 0.01% |
| May 13 21:28:05 | M91260713 20000 0xa64c665efe179474 | 5120K 0.10156 6.0523 60.52s | 6:08:58:43 0.02% |
| May 13 21:29:06 | M91260713 30000 0xd2e93c5b85c2f694 | 5120K 0.10938 6.0522 60.52s | 6:09:05:58 0.03% |
| May 13 21:30:06 | M91260713 40000 0x36199318621f54ee | 5120K 0.10156 6.0523 60.52s | 6:09:09:08 0.04% |
chengsun is offline   Reply With Quote
Old 2019-05-13, 22:24   #1168
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Quote:
Originally Posted by chengsun View Post
Here's some Nvidia RTX 2070 benchmark numbers. Looks like gpuowl performs quite admirably for most FFT sizes.

gpuowl PRP, M57885161, 3072K FFT, 2.98ms/iter:

CUDALucas, M57885161, 3136K FFT, 3.62ms/iter:



Now testing the same exponent but using a 4096K FFT size for both:

gpuowl PRP, M57885161, 4096K FFT, 3.88ms/iter:

CUDALucas, M57885161, 4096K FFT, 3.72ms/iter:



Now testing a 91M exponent, default FFT size for both:

gpuowl PRP, M91260713, 5120K FFT, 5.04ms/iter:

CUDALucas, M91260713, 5120K FFT, 6.05ms/iter:
Thank you! So, ~20% more throughput per unit time in two cases, out of 3.
Do you have any other NVIDIA models such as in the GTX 10xx family that could be tested both ways?
kriesel is offline   Reply With Quote
Old 2019-05-13, 22:48   #1169
chengsun
 
"Cheng Sun"
May 2019

22 Posts
Default

Quote:
Originally Posted by kriesel View Post
Thank you! So, ~20% more throughput per unit time in two cases, out of 3.
Do you have any other NVIDIA models such as in the GTX 10xx family that could be tested both ways?

Unfortunately no. I'm sure there exist plenty of other folks on here who can help though.
chengsun is offline   Reply With Quote
Old 2019-05-14, 00:11   #1170
xx005fs
 
"Eric"
Jan 2018
USA

3248 Posts
Default Build Error

I tried to build the newest commit myself. However, I am getting build errors using MSYS2 on Windows, and I have no clue what's going wrong with it. Here's the error messages I am getting:



Code:
echo \"`git describe --long --dirty --always`\" > version.inc
echo Version: `cat version.inc`
Version: "v6.5-25-gc48d46f-dirty"
g++ -Wall -O2 -std=c++17 -Wall Pm1Plan.cpp GmpUtil.cpp Worktodo.cpp common.cpp main.cpp Gpu.cpp clwrap.cpp Task.cpp checkpoint.cpp timeutil.cpp Args.cpp state.cpp Signal.cpp FFTConfig.cpp -o gpuowl -lOpenCL -lgmp -lstdc++fs -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L.
d000046.o:(.idata$5+0x0): multiple definition of `__imp___C_specific_handler'
d000043.o:(.idata$5+0x0): first defined here
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o: In function `pre_c_init':
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:146: undefined reference to `__p__fmode'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o: In function `__tmainCRTStartup':
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:290: undefined reference to `_set_invalid_parameter_handler'
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:299: undefined reference to `__p__acmdln'
C:\msys64\tmp\ccs0oL4i.o:common.cpp:(.text+0x53c): undefined reference to `__imp___acrt_iob_func'
C:\msys64\tmp\ccV4hz2N.o:Args.cpp:(.text+0x29): undefined reference to `__imp___acrt_iob_func'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/lib/../lib/libmingw32.a(lib64_libmingw32_a-merr.o): In function `_matherr':
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/merr.c:46: undefined reference to `__acrt_iob_func'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/lib/../lib/libmingw32.a(lib64_libmingw32_a-pseudo-reloc.o): In function `__report_error':
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/pseudo-reloc.c:149: undefined reference to `__acrt_iob_func'
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/pseudo-reloc.c:150: undefined reference to `__acrt_iob_func'
C:/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/lib/../lib/libmingwex.a(lib64_libmingwex_a-mingw_vfprintf.o): In function `__mingw_vfprintf':
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/stdio/mingw_vfprintf.c:53: undefined reference to `_lock_file'
E:/mingwbuild/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/stdio/mingw_vfprintf.c:55: undefined reference to `_unlock_file'
collect2.exe: error: ld returned 1 exit status
make: *** [Makefile:14: gpuowl] Error 1
xx005fs is offline   Reply With Quote
Old 2019-05-14, 05:12   #1171
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default Gpuowl v6.5-c48d46f on Win7 x64, AMD & NVIDIA

Executables on Windows are filename.exe. Strip $@ seems to go after filename. instead. So maybe this in the makefile:
Code:
gpuowl-win: ${HEADERS} ${SRCS}
    ${BUILD} -static
    strip $@.exe

version.inc: FORCE
    #echo \"`git describe --long --dirty --always`\" > version.inc
    echo \"v6.5-c48d46f\" > version.inc
    echo Version: `cat version.inc`
Readme.md says to put gpuowl.cl with the executable. Should that be changed to gpuowl-wrap.cl?
There is no gpuowl.cl in the v6.5 file set.
Readme.md mentions config.txt but gives no indication of format or contents, optional or required.

Code:
Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\msys64\home\ken\gpuowl-compile\v6.5-c48d46f>gpuowl-win
2019-05-13 20:27:07 gpuowl v6.5-c48d46f
2019-05-13 20:27:07 Note: no config.txt file found
2019-05-13 20:27:07 Can't open 'worktodo.txt' (mode 'rb')
2019-05-13 20:27:07 Bye

C:\msys64\home\ken\gpuowl-compile\v6.5-c48d46f>gpuowl-win -h
2019-05-13 20:27:16 gpuowl v6.5-c48d46f

Command line options:

-dir <folder>      : specify work directory (containing worktodo.txt, results.txt, config.txt, gpuowl.log)
-user <name>       : specify the user name.
-cpu  <name>       : specify the hardware name.
-time              : display kernel profiling information.
-fft <size>        : specify FFT size, such as: 5000K, 4M, +2, -1.
-block <value>     : PRP GEC block size. Default 1000. Smaller block is slower but detects errors sooner.
-log <step>        : log every <step> iterations, default 20000. Multiple of 10000.
-carry long|short  : force carry type. Short carry may be faster, but requires high bits/word.
-B1                : P-1 B1 bound, default 500000
-B2                : P-1 B2 bound, default B1 * 30
-rB2               : ratio of B2 to B1. Default 30, used only if B2 is not explicitly set
-prp <exponent>    : run a single PRP test and exit, ignoring worktodo.txt
-pm1 <exponent>    : run a single P-1 test and exit, ignoring worktodo.txt
-results <file>    : name of results file, default 'results.txt'
-device <N>        : select a specific device:
 0 : Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
 1 : gfx804-8x1203-@3:0.0 Radeon 550 Series

FFT Configurations:
FFT    8K [  0.01M -    0.18M]  64-64
FFT   32K [  0.05M -    0.68M]  64-256 256-64
FFT   48K [  0.07M -    1.01M]  64-64-6
FFT   64K [  0.10M -    1.34M]  64-512 512-64
FFT   72K [  0.11M -    1.50M]  64-64-9
FFT   80K [  0.12M -    1.66M]  64-64-10
FFT  128K [  0.20M -    2.63M]  1K-64 64-1K 256-256
FFT  192K [  0.29M -    3.91M]  64-256-6 256-64-6
FFT  256K [  0.39M -    5.18M]  64-2K 256-512 512-256 2K-64
FFT  288K [  0.44M -    5.81M]  64-256-9 256-64-9
FFT  320K [  0.49M -    6.44M]  64-256-10 256-64-10
FFT  384K [  0.59M -    7.69M]  64-512-6 512-64-6
FFT  512K [  0.79M -   10.18M]  1K-256 256-1K 512-512 4K-64
FFT  576K [  0.88M -   11.42M]  64-512-9 512-64-9
FFT  640K [  0.98M -   12.66M]  64-512-10 512-64-10
FFT  768K [  1.18M -   15.12M]  1K-64-6 64-1K-6 256-256-6
FFT    1M [  1.57M -   20.02M]  1K-512 256-2K 512-1K 2K-256
FFT 1152K [  1.77M -   22.45M]  1K-64-9 64-1K-9 256-256-9
FFT 1280K [  1.97M -   24.88M]  1K-64-10 64-1K-10 256-256-10
FFT 1536K [  2.36M -   29.72M]  64-2K-6 256-512-6 512-256-6 2K-64-6
FFT    2M [  3.15M -   39.34M]  1K-1K 512-2K 2K-512 4K-256
FFT 2304K [  3.54M -   44.13M]  64-2K-9 256-512-9 512-256-9 2K-64-9
FFT 2560K [  3.93M -   48.90M]  64-2K-10 256-512-10 512-256-10 2K-64-10
FFT    3M [  4.72M -   58.41M]  1K-256-6 256-1K-6 512-512-6 4K-64-6
FFT    4M [  6.29M -   77.30M]  1K-2K 2K-1K 4K-512
FFT 4608K [  7.08M -   86.70M]  1K-256-9 256-1K-9 512-512-9 4K-64-9
FFT    5M [  7.86M -   96.07M]  1K-256-10 256-1K-10 512-512-10 4K-64-10
FFT    6M [  9.44M -  114.74M]  1K-512-6 256-2K-6 512-1K-6 2K-256-6
FFT    8M [ 12.58M -  151.83M]  2K-2K 4K-1K
FFT    9M [ 14.16M -  170.28M]  1K-512-9 256-2K-9 512-1K-9 2K-256-9
FFT   10M [ 15.73M -  188.68M]  1K-512-10 256-2K-10 512-1K-10 2K-256-10
FFT   12M [ 18.87M -  225.32M]  1K-1K-6 512-2K-6 2K-512-6 4K-256-6
FFT   16M [ 25.17M -  298.13M]  4K-2K
FFT   18M [ 28.31M -  334.34M]  1K-1K-9 512-2K-9 2K-512-9 4K-256-9
FFT   20M [ 31.46M -  370.44M]  1K-1K-10 512-2K-10 2K-512-10 4K-256-10
FFT   24M [ 37.75M -  442.34M]  1K-2K-6 2K-1K-6 4K-512-6
FFT   36M [ 56.62M -  656.22M]  1K-2K-9 2K-1K-9 4K-512-9
FFT   40M [ 62.91M -  727.03M]  1K-2K-10 2K-1K-10 4K-512-10
FFT   48M [ 75.50M -  868.07M]  2K-2K-6 4K-1K-6
FFT   72M [113.25M - 1287.53M]  2K-2K-9 4K-1K-9
FFT   80M [125.83M - 1426.38M]  2K-2K-10 4K-1K-10
FFT   96M [150.99M - 1702.92M]  4K-2K-6
FFT  144M [226.49M - 2525.23M]  4K-2K-9
FFT  160M [251.66M - 2797.39M]  4K-2K-10
2019-05-13 20:27:21 Exiting because "help"
2019-05-13 20:27:21 Bye
AMD RX-480:
Code:
2019-05-13 21:24:03 3021377     2920000 96.62%; 0.45 ms/sq; ETA 0d 00:01; 5d037e84da227645
2019-05-13 21:24:12 3021377     2940000 97.29%; 0.45 ms/sq; ETA 0d 00:01; 6c21576a5db33c3a
2019-05-13 21:24:21 3021377     2960000 97.95%; 0.45 ms/sq; ETA 0d 00:00; c0ed3bea8248de6a
2019-05-13 21:24:30 3021377     2980000 98.61%; 0.45 ms/sq; ETA 0d 00:00; 7c6e0a5c571c077c
2019-05-13 21:24:40 3021377 OK  3000000 99.27%; 0.45 ms/sq; ETA 0d 00:00; f054d62d735ab1d3 (check 0.46s)
2019-05-13 21:24:49 3021377     3020000 99.93%; 0.45 ms/sq; ETA 0d 00:00; 592d0f8328d071bb
2019-05-13 21:24:49 PP  3021376 / 3021377, fffffffffffffffc
2019-05-13 21:24:50 3021377 OK  3022000 100.00%; 0.46 ms/sq; ETA 0d 00:00; 819e8d019eb1c11a (check 0.46s)
2019-05-13 21:24:50 {"exponent":"3021377", "worktype":"PRP-3", "status":"P", "program":{"name":"gpuowl", "version":"v6.5-c48d46f"}, "timestamp":"2019-05-14 02:2
4:50 UTC", "aid":"0", "fft-length":196608, "res64":"fffffffffffffffc", "residue-type":4}
2019-05-13 21:24:50 Bye
NVIDIA GTX-1080Ti:
Code:
2019-05-13 21:25:02 gpuowl v6.5-c48d46f
2019-05-13 21:25:02 Note: no config.txt file found
2019-05-13 21:25:02 1398269 FFT 72K: Width 8x8, Height 8x8, Middle 9; 18.97 bits
/word
2019-05-13 21:25:02 using short carry kernels
2019-05-13 21:25:09

2019-05-13 21:25:09 OpenCL compilation in 5569 ms, with "-DEXP=1398269u -DWIDTH=
64u -DSMALL_HEIGHT=64u -DMIDDLE=9u  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-13 21:25:09 1398269.owl not found, starting from the beginning.
2019-05-13 21:25:10 Exception 9gpu_error: OUT_OF_RESOURCES tailFused at clwrap.c
pp:284 run
2019-05-13 21:25:10 Bye

C:\Users\ken\Documents\gpuowl-gtx1080ti>gpuowl-win
2019-05-13 21:26:29 gpuowl v6.5-c48d46f
2019-05-13 21:26:29 Note: no config.txt file found
2019-05-13 21:26:29 3021377 FFT 192K: Width 8x8, Height 64x4, Middle 6; 15.37 bi
ts/word
2019-05-13 21:26:29 using short carry kernels
2019-05-13 21:26:33

2019-05-13 21:26:33 OpenCL compilation in 3634 ms, with "-DEXP=3021377u -DWIDTH=
64u -DSMALL_HEIGHT=256u -DMIDDLE=6u  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-13 21:26:33 3021377.owl not found, starting from the beginning.
2019-05-13 21:26:40 3021377 OK     2000  0.07%; 1.47 ms/sq; ETA 0d 01:14; 6b9478b5b056ae34 (check 1.47s)
2019-05-13 21:27:06 3021377       20000  0.66%; 1.46 ms/sq; ETA 0d 01:13; ac22902eccba7fcf
...(shut down mfaktc instances to give it the gpu's whole attention and improve gpuowl times considerably:)...
2019-05-13 21:39:41 3021377     2940000 97.29%; 0.23 ms/sq; ETA 0d 00:00; 6c21576a5db33c3a
2019-05-13 21:39:45 3021377     2960000 97.95%; 0.23 ms/sq; ETA 0d 00:00; c0ed3bea8248de6a
2019-05-13 21:39:50 3021377     2980000 98.61%; 0.23 ms/sq; ETA 0d 00:00; 7c6e0a5c571c077c
2019-05-13 21:39:55 3021377 OK  3000000 99.27%; 0.23 ms/sq; ETA 0d 00:00; f054d62d735ab1d3 (check 0.25s)
2019-05-13 21:39:59 3021377     3020000 99.93%; 0.23 ms/sq; ETA 0d 00:00; 592d0f8328d071bb
2019-05-13 21:40:00 PP  3021376 / 3021377, fffffffffffffffc
2019-05-13 21:40:00 3021377 OK  3022000 100.00%; 0.25 ms/sq; ETA 0d 00:00; 819e8d019eb1c11a (check 0.23s)
2019-05-13 21:40:00 {"exponent":"3021377", "worktype":"PRP-3", "status":"P", "program":{"name":"gpuowl", "version":"v6.5-c48d46f"}, "timestamp":"2019-05-14 02:40:00 UTC", "aid":"0", "fft-length":196608, "res64":"ffffffff
fffffffc", "residue-type":4}
2019-05-13 21:40:00 1398269 FFT 72K: Width 8x8, Height 8x8, Middle 9; 18.97 bits/word
2019-05-13 21:40:00 using short carry kernels
2019-05-13 21:40:01

2019-05-13 21:40:01 OpenCL compilation in 15 ms, with "-DEXP=1398269u -DWIDTH=64u -DSMALL_HEIGHT=64u -DMIDDLE=
9u  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-13 21:40:01 1398269.owl not found, starting from the beginning.
2019-05-13 21:40:01 Exception 9gpu_error: OUT_OF_RESOURCES tailFused at clwrap.cpp:284 run
 2019-05-13 21:40:01 Bye
Not sure why 1398269 reliably fails and 3021377 correctly runs to completion. Haven't tried P-1.

For comparison, CUDALucas v2.06 May 5 2017 on a GTX1080 (slower card):
Code:
Starting M3021377 fft length = 162K
|  Jan 15  13:22:39  |   M3021377    100000  0xd3b692657258a4b1  |   162K  0.04492   0.2332   23.32s  |        11:21   3.30%  |
|  Jan 15  13:23:04  |   M3021377    200000  0x317375cf0872b91d  |   162K  0.04492   0.2444   24.43s  |        11:13   6.61%  |
|  Jan 15  13:23:28  |   M3021377    300000  0x55615500f93ed130  |   162K  0.04688   0.2442   24.42s  |        10:54   9.92%  |
|   Date     Time    |   Test Num     Iter        Residue        |    FFT   Error     ms/It     Time  |       ETA      Done   |
GTX1080Ti, CUDALucas again, M3021377 only loads gpu to 39% gpu load per gpu-z, 33% wattage
Code:
Continuing M3021377 @ iteration 1290251 with fft length 160K, 42.70% done

|  May 13  22:57:58  |   M3021377   1300000  0x0403acd0e4e1fd74  |   160K  0.06250   0.3499    3.41s  |         9:54  43.02%  |
|  May 13  22:58:16  |   M3021377   1350000  0x532acbea155e60d0  |   160K  0.06250   0.3497   17.48s  |         9:37  44.68%  |
|  May 13  22:58:33  |   M3021377   1400000  0xf065342928108572  |   160K  0.07031   0.3497   17.48s  |         9:20  46.33%  |
|  May 13  22:58:51  |   M3021377   1450000  0xb69363302b8ff95c  |   160K  0.05977   0.3497   17.48s  |         9:03  47.99%  |
Almost the same timing, but 88% gpu load, 67% wattage for CUDALucas on GTX1080Ti, M6972593:
Code:
Starting M6972593 fft length = 392K
|   Date     Time    |   Test Num     Iter        Residue        |    FFT   Error     ms/It     Time  |       ETA      Done   |
|  May 13  23:00:07  |   M6972593     50000  0x47f7a70a1ccc0a62  |   392K  0.02441   0.3543   17.71s  |        40:53   0.71%  |
|  May 13  23:00:25  |   M6972593    100000  0xd96976da492dd84b  |   392K  0.02539   0.3541   17.70s  |        40:34   1.43%  |
|  May 13  23:00:43  |   M6972593    150000  0x9166b52b8e6a12df  |   392K  0.02637   0.3562   17.81s  |        40:21   2.15%  |
|  May 13  23:01:01  |   M6972593    200000  0x87d2d0d2b81517a8  |   392K  0.02539   0.3541   17.70s  |        40:02   2.86%  |
|  May 13  23:01:18  |   M6972593    250000  0x35380a283f796d25  |   392K  0.02637   0.3545   17.72s  |        39:44   3.58%  |
|  May 13  23:01:36  |   M6972593    300000  0xffe349823712cb1e  |   392K  0.02539   0.3567   17.83s  |        39:28   4.30%  |
M402143717 head to head, gpuowl and CUDALucas, on GTX1080Ti:
Code:
2019-05-13 23:39:22 gpuowl v6.5-c48d46f
2019-05-13 23:39:22 Note: no config.txt file found
2019-05-13 23:39:22 402143717 FFT 24576K: Width 256x4, Height 256x8, Middle 6; 15.98 bits/word
2019-05-13 23:39:22 using short carry kernels
2019-05-13 23:39:27

2019-05-13 23:39:27 OpenCL compilation in 4680 ms, with "-DEXP=402143717u -DWIDTH=1024u -DSMALL_HEIGHT=2048u -DMIDDLE=6u  -I. -cl-fast-relaxed-math -cl-std=CL2.
0"
2019-05-13 23:39:32 402143717.owl not found, starting from the beginning.
2019-05-13 23:40:44 402143717 OK     2000  0.00%; 16.46 ms/sq; ETA 76d 14:24; a332f060843aa370 (check 18.24s)
2019-05-13 23:45:52 402143717       20000  0.00%; 17.10 ms/sq; ETA 79d 14:33; 3e
5470f28ca0c885
2019-05-13 23:51:37 402143717       40000  0.01%; 17.24 ms/sq; ETA 80d 05:22; 55
b406aa58445e27
2019-05-13 23:52:11 Stopping, please wait..
2019-05-13 23:52:30 402143717 OK    42000  0.01%; 17.25 ms/sq; ETA 80d 07:04; 4a665e4bb58f8cd1 (check 18.77s)
CUDALucas 2.06:
Code:
|   Date     Time    |   Test Num     Iter        Residue        |    FFT   Error     ms/It     Time  |       ETA      Done   |
|  May 13  23:23:23  | M402143717   3200000  0x052db08213096b64  | 23040K  0.13281  16.3654  318.83s  |  76:06:05:18   0.79%  |
|  May 13  23:37:10  | M402143717   3250000  0x100461267507bad9  | 23040K  0.14063  16.5556  827.78s  |  76:05:55:45   0.80%  |
Attached Files
File Type: 7z gpuowl-win7-x64-v6.5-c48d46f.7z (393.5 KB, 108 views)
kriesel is offline   Reply With Quote
Old 2019-05-14, 07:55   #1172
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

the gpuowl-wrap.cl is built into the executable, there's no need to put .cl anywhere
preda is offline   Reply With Quote
Old 2019-05-14, 09:42   #1173
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

55B16 Posts
Default

Quote:
Originally Posted by kriesel View Post
Readme.md mentions config.txt but gives no indication of format or contents, optional or required.
config.txt contains one or more lines with exactly the same format as normal command line arguments.
e.g.:
-user foo -cpu bar -log 50000 -device 1
preda is offline   Reply With Quote
Old 2019-05-14, 10:24   #1174
SELROC
 

7×337 Posts
Default

Quote:
Originally Posted by preda View Post
the gpuowl-wrap.cl is built into the executable, there's no need to put .cl anywhere

I note that you have added a -results option. 2 questions:


1) to support entirely execution from another directory would be useful an additional -worktodo option (like mfakto does);


2) Can various instances write to the same results.txt ?
  Reply With Quote
Old 2019-05-14, 11:07   #1175
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3×457 Posts
Default

Quote:
Originally Posted by SELROC View Post
I note that you have added a -results option. 2 questions:


1) to support entirely execution from another directory would be useful an additional -worktodo option (like mfakto does);
Why not put the worktodo.txt one-per-directory? Do you need two differently-named worktodos in the same folder? My setup is one-folder-per-run. The binaries (executable) can be shared (only one for all the runs), it takes the -dir argument, or the startup directory by default.


Quote:
2) Can various instances write to the same results.txt ?
Probably yes :) Never tried, but should work :)

Try to mix in the same way two instances in a single gpuowl.log and see how that works.
preda is offline   Reply With Quote
Old 2019-05-14, 11:13   #1176
SELROC
 

2,693 Posts
Default

Quote:
Originally Posted by preda View Post
Why not put the worktodo.txt one-per-directory? Do you need two differently-named worktodos in the same folder? My setup is one-folder-per-run. The binaries (executable) can be shared (only one for all the runs), it takes the -dir argument, or the startup directory by default.



Probably yes :) Never tried, but should work :)

Try to mix in the same way two instances in a single gpuowl.log and see how that works.



That what I do now one worktodo per directory. It works fine, was just a question.


Ok, thank you :-)
  Reply With Quote
Old 2019-05-14, 18:24   #1177
chengsun
 
"Cheng Sun"
May 2019

22 Posts
Default

Quote:
Originally Posted by kriesel View Post
Not sure why 1398269 reliably fails and 3021377 correctly runs to completion. Haven't tried P-1.
Interesting. Can reproduce this. I will take a look.
chengsun is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 07:41.


Fri Aug 6 07:41:20 UTC 2021 up 14 days, 2:10, 1 user, load averages: 2.49, 2.65, 2.70

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.