mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

preda 2020-04-02 01:49

Should be fixed in the most recent commit, please re-try.
This was again the ROCm optimizer that is generating broken code for our own sin/cos in some particular cases, that we try carefully to avoid.

When seeing unexplained failures like here, it is often useful to try with -use ORIG_SLOWTRIG as that usually works (slower though).

[QUOTE=kriesel;541479]I don't know why, but -fft 0 through -fft +5 all hit EE in 800 iterations on this exponent 131500093. Gpuowl v6.11-134-g1e0ce1d chose the initial 7M fft length on its own. After finding it reproducible, I successively incremented -fft to seek a reliable run case. It wasn't until it reached 9M fft that it succeeded in the GEC. The resulting speed penalty is considerable, 7.5 msec/iter versus 5.3 on an RX480. From the program's help output,[CODE]FFT 7M [ 11.01M - 132.46M] 1K-512-7 256-2K-7 512-1K-7 2K-256-7
FFT 8M [ 12.58M - 150.85M] 2K-2K 4K-1K
FFT 9M [ 14.16M - 169.18M] 1K-512-9 256-2K-9 512-1K-9 2K-256-9[/CODE][CODE]C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win
2020-04-01 07:47:57 gpuowl v6.11-134-g1e0ce1d
2020-04-01 07:47:57 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM
2020-04-01 07:47:57 device 0, unique id ''
2020-04-01 07:47:57 condorella/rx480 131500093 FFT 7168K: Width 256x4, Height 64x8, Middle 7; 17.92 bits/word
2020-04-01 07:47:59 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=7u -DWEIGHT_STEP=0x8.7b964bd91a558p-3 -DIWEIGHT_STEP=0xf.16e489ea55fc8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DAMDGPU=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-04-01 07:48:03 condorella/rx480 OpenCL compilation in 3.97 s
2020-04-01 07:48:06 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:48:13 condorella/rx480 131500093 EE 800 0.00%; 5272 us/it; ETA 8d 00:34; 6781adfa7991c92a (check 2.31s)
2020-04-01 07:48:15 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:48:22 condorella/rx480 131500093 EE 800 0.00%; 5309 us/it; ETA 8d 01:56; 6781adfa7991c92a (check 2.31s) 1 errors
2020-04-01 07:48:24 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:48:31 condorella/rx480 131500093 EE 800 0.00%; 5298 us/it; ETA 8d 01:32; 6781adfa7991c92a (check 2.33s) 2 errors
2020-04-01 07:48:31 condorella/rx480 3 sequential errors, will stop.
2020-04-01 07:48:31 condorella/rx480 Exiting because "too many errors"
2020-04-01 07:48:31 condorella/rx480 Bye
C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611

C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win
2020-04-01 07:48:50 gpuowl v6.11-134-g1e0ce1d
2020-04-01 07:48:50 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +1
2020-04-01 07:48:50 device 0, unique id ''
2020-04-01 07:48:50 condorella/rx480 131500093 FFT 7168K: Width 64x4, Height 256x8, Middle 7; 17.92 bits/word
2020-04-01 07:48:53 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=256u -DSMALL_HEIGHT=2048u -DMIDDLE=7u -DWEIGHT_STE
P=0x8.7b964bd91a558p-3 -DIWEIGHT_STEP=0xf.16e489ea55fc8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f05
18db8a8p-4 -DAMDGPU=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-04-01 07:48:57 condorella/rx480 OpenCL compilation in 4.67 s
2020-04-01 07:49:01 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:49:11 condorella/rx480 131500093 EE 800 0.00%; 7714 us/it; ETA 11d 17:46; 55f854bea6c1cecf (check 3.28s)
2020-04-01 07:49:14 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:49:24 condorella/rx480 131500093 EE 800 0.00%; 7697 us/it; ETA 11d 17:10; 55f854bea6c1cecf (check 3.29s) 1 errors
2020-04-01 07:49:27 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:49:37 condorella/rx480 131500093 EE 800 0.00%; 7687 us/it; ETA 11d 16:46; 55f854bea6c1cecf (check 3.27s) 2 errors
2020-04-01 07:49:37 condorella/rx480 3 sequential errors, will stop.
2020-04-01 07:49:37 condorella/rx480 Exiting because "too many errors"
2020-04-01 07:49:37 condorella/rx480 Bye
C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611

C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win
2020-04-01 07:50:25 gpuowl v6.11-134-g1e0ce1d
2020-04-01 07:50:25 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +2
2020-04-01 07:50:25 device 0, unique id ''
2020-04-01 07:50:25 condorella/rx480 131500093 FFT 7168K: Width 64x8, Height 256x4, Middle 7; 17.92 bits/word
2020-04-01 07:50:27 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=512u -DSMALL_HEIGHT=1024u -DMIDDLE=7u -DWEIGHT_STEP=0x8.7b964bd91a558p-3 -DIWEIGHT_STEP=0xf.16e489ea55fc8p-4 -DWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-3 -DIWEIGHT_BIGSTEP=0xc.5672a115506d8p-4 -DAMDGPU=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-04-01 07:50:31 condorella/rx480 OpenCL compilation in 3.72 s
2020-04-01 07:50:34 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:50:42 condorella/rx480 131500093 EE 800 0.00%; 6286 us/it; ETA 9d 13:37; 6f8253cbb2fe58e9 (check 2.71s)
2020-04-01 07:50:45 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:50:53 condorella/rx480 131500093 EE 800 0.00%; 6283 us/it; ETA 9d 13:29; 6f8253cbb2fe58e9 (check 2.71s) 1 errors
2020-04-01 07:50:56 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:51:03 condorella/rx480 131500093 EE 800 0.00%; 6299 us/it; ETA 9d 14:05; 6f8253cbb2fe58e9 (check 2.71s) 2 errors
2020-04-01 07:51:03 condorella/rx480 3 sequential errors, will stop.
2020-04-01 07:51:03 condorella/rx480 Exiting because "too many errors"
2020-04-01 07:51:03 condorella/rx480 Bye
C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611

C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win
2020-04-01 07:51:29 gpuowl v6.11-134-g1e0ce1d
2020-04-01 07:51:29 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +3
2020-04-01 07:51:29 device 0, unique id ''
2020-04-01 07:51:29 condorella/rx480 131500093 FFT 7168K: Width 256x8, Height 64x4, Middle 7; 17.92 bits/word
2020-04-01 07:51:29 condorella/rx480 using long carry kernels
2020-04-01 07:51:32 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=2048u -DSMALL_HEIGHT=256u -DMIDDLE=7u -DWEIGHT_STE
P=0x8.7b964bd91a558p-3 -DIWEIGHT_STEP=0xf.16e489ea55fc8p-4 -DWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-3 -DIWEIGHT_BIGSTEP=0xc.5672a1
15506d8p-4 -DAMDGPU=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-04-01 07:51:36 condorella/rx480 OpenCL compilation in 3.97 s
2020-04-01 07:51:39 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:51:46 condorella/rx480 131500093 EE 800 0.00%; 5275 us/it; ETA 8d 00:42; cfbd904e74b67aae (check 2.31s)
2020-04-01 07:51:48 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:51:54 condorella/rx480 131500093 EE 800 0.00%; 5249 us/it; ETA 7d 23:44; cfbd904e74b67aae (check 2.29s)1 errors
2020-04-01 07:51:57 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:52:03 condorella/rx480 131500093 EE 800 0.00%; 5239 us/it; ETA 7d 23:23; cfbd904e74b67aae (check 2.29s)2 errors
2020-04-01 07:52:03 condorella/rx480 3 sequential errors, will stop.
2020-04-01 07:52:03 condorella/rx480 Exiting because "too many errors"
2020-04-01 07:52:03 condorella/rx480 Bye
C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611

C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win
2020-04-01 07:52:07 gpuowl v6.11-134-g1e0ce1d
2020-04-01 07:52:07 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +4
2020-04-01 07:52:07 device 0, unique id ''
2020-04-01 07:52:07 condorella/rx480 131500093 FFT 8192K: Width 256x8, Height 256x8; 15.68 bits/word
2020-04-01 07:52:07 condorella/rx480 using long carry kernels
2020-04-01 07:52:10 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=2048u -DSMALL_HEIGHT=2048u -DMIDDLE=1u -DWEIGHT_ST
EP=0xa.039f00d8f95f8p-3 -DIWEIGHT_STEP=0xc.c82be96a7181p-4 -DWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-3 -DIWEIGHT_BIGSTEP=0xc.5672a1
15506d8p-4 -DAMDGPU=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-04-01 07:52:15 condorella/rx480 OpenCL compilation in 5.16 s
2020-04-01 07:52:18 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:52:27 condorella/rx480 131500093 EE 800 0.00%; 6583 us/it; ETA 10d 00:28; 05252a7f59574e37 (check 2.85s)
2020-04-01 07:52:30 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:52:38 condorella/rx480 131500093 EE 800 0.00%; 6587 us/it; ETA 10d 00:36; 05252a7f59574e37 (check 2.85s) 1 errors
2020-04-01 07:52:41 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:52:49 condorella/rx480 131500093 EE 800 0.00%; 6594 us/it; ETA 10d 00:53; 05252a7f59574e37 (check 2.86s) 2 errors
2020-04-01 07:52:49 condorella/rx480 3 sequential errors, will stop.
2020-04-01 07:52:49 condorella/rx480 Exiting because "too many errors"
2020-04-01 07:52:49 condorella/rx480 Bye
C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611

C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win
2020-04-01 07:53:21 gpuowl v6.11-134-g1e0ce1d
2020-04-01 07:53:21 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +5
2020-04-01 07:53:21 device 0, unique id ''
2020-04-01 07:53:21 condorella/rx480 131500093 FFT 8192K: Width 512x8, Height 256x4; 15.68 bits/word
2020-04-01 07:53:21 condorella/rx480 using long carry kernels
2020-04-01 07:53:23 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=4096u -DSMALL_HEIGHT=1024u -DMIDDLE=1u -DWEIGHT_ST
EP=0xa.039f00d8f95f8p-3 -DIWEIGHT_STEP=0xc.c82be96a7181p-4 -DWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-3 -DIWEIGHT_BIGSTEP=0xc.5672a1
15506d8p-4 -DAMDGPU=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-04-01 07:53:26 condorella/rx480 OpenCL compilation in 3.53 s
2020-04-01 07:53:30 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:53:39 condorella/rx480 131500093 EE 800 0.00%; 7196 us/it; ETA 10d 22:51; 6df742314b82f841 (check 3.11s)
2020-04-01 07:53:42 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:53:51 condorella/rx480 131500093 EE 800 0.00%; 7219 us/it; ETA 10d 23:43; 6df742314b82f841 (check 3.11s) 1 errors
2020-04-01 07:53:54 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:54:03 condorella/rx480 131500093 EE 800 0.00%; 7190 us/it; ETA 10d 22:38; 6df742314b82f841 (check 3.10s) 2 errors
2020-04-01 07:54:03 condorella/rx480 3 sequential errors, will stop.
2020-04-01 07:54:03 condorella/rx480 Exiting because "too many errors"
2020-04-01 07:54:03 condorella/rx480 Bye
C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611

C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win
2020-04-01 07:54:08 gpuowl v6.11-134-g1e0ce1d
2020-04-01 07:54:08 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +6
2020-04-01 07:54:08 device 0, unique id ''
2020-04-01 07:54:08 condorella/rx480 131500093 FFT 9216K: Width 256x4, Height 64x8, Middle 9; 13.93 bits/word
2020-04-01 07:54:08 condorella/rx480 using long carry kernels
2020-04-01 07:54:12 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=9u -DWEIGHT_STEP=0x8.5f7e7ead6051p-3 -DIWEIGHT_STEP=0xf.498539ec95fe8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DAMDGPU=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-04-01 07:54:16 condorella/rx480 OpenCL compilation in 4.11 s
2020-04-01 07:54:20 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:54:29 condorella/rx480 131500093 OK 800 0.00%; 7461 us/it; ETA 11d 08:32; bbe24bd13cd73020 (check 3.26s)
2020-04-01 08:19:33 condorella/rx480 131500093 OK 200000 0.15%; 7541 us/it; ETA 11d 11:03; 190bb27ff665f83b (check 3.25s)
[/CODE][/QUOTE]

preda 2020-04-02 01:57

[QUOTE=ATH;541561]RTX 2080 is so bad at double precision and the timings are very inconsistent.

But NEW_SLOWTRIG is better at 3520µs/ite vs 3680µs/ite for ORIG_SLOWTRIG.
T2_SHUFFLE is slightly better at 3520µs vs 3553µs for NO_T2_SHUFFLE
Otherwise CARRY64 and CARRY32 is about the same.
I'm not going to test all those 6 variables on this, since it is very slow and the inconsistencies in the timings is larger than the differences.

Btw UNROLL_NONE,UNROLL_WIDTH and UNROLL_HEIGHT does not work at all on either the Tesla P100 or the RTX 2080.[/QUOTE]

Thank you, this seems to suggest: keep the defaults unchanged for Nvidia (as they are better on at least some Nvidia GPUs). The Nvidia user can tune by trying ORIG_SLOWTRIG and NO_T2_SHUFFLE if so inclined.

preda 2020-04-04 09:54

ROCm 3.3
 
ROCm 3.1 has a severe bug that is affecting our sin/cos routines, and we had to use a workaround for that bug.

Recently ROCm 3.3 was released, and it seems this bug is fixed, so the ROCm 3.1 workarounds are not needed anymore (they do have a slight perf impact).

Right now the ROCm 3.3 performance is very close to ROCm 3.1, a tiny bit slower (less that 0.3% slower). OTOH ROCm 3.3 might have a few advantages -- for one it uses less VGPRs (and it doesn't have the terrible ROCm 3.1 bug).

In a recent commit I removed by default the ROCm 3.1 bug-workaround -- it must now be explicitly enabled with -use ROCM31 . If this is not done, there are errors in PRP. A slower alternative is using ORIG_SLOWTRIG which does not trigger the bug.

So in brief:
- ROCm 3.3 is OK and can be used
- if using ROCm 3.1, *must* specify -use ROCM31 or -use ORIG_SLOWTRIG


(for users who are now on ROCm 2.10 or earlier, I recommend moving directly to 3.3, skipping 3.1)

There is also the possibility of having multiple ROCm versions installed at the same time (this is useful when one wants to experiment and compare versions); here is one way to do it:

- install multiple ROCm versions in separate folders, e.g.: /opt/rocm-3.1.0/ and /opt/rocm-3.3.0/
- verify that the ROCm folder containing libamdocl64.so is listed in the LIBPATH in Makefile or SConstruct.
- edit the Makefile to link with -lamdocl64 instead of -lOpenCL (or build with scons)
- when running gpuowl, specify LD_LIBRARY_PATH pointing to the folder with libamdocl64 for the desired ROCm version, e.g.

LD_LIBRARY_PATH=/opt/rocm-3.3.0/opencl/lib/x86_64 ./gpuowl

preda 2020-04-04 11:41

[QUOTE=preda;541738]ROCm 3.1 has a severe bug that is affecting our sin/cos routines, and we had to use a workaround for that bug.

Recently ROCm 3.3 was released, and it seems this bug is fixed, so the ROCm 3.1 workarounds are not needed anymore (they do have a slight perf impact).
[/QUOTE]

I spoke too soon: it seems the bug still affects ROCm 3.3 (but not for exactly the same exponents). As such, I re-enabled the workaround by default as it was before.

ewmayer 2020-04-04 19:45

I'm glad I'm a late adopter with these things - still on rocm 2.10 :). Mihai, bugs aside, roughly what % speedup can one expect from moving from 2.10 to 3.3?

preda 2020-04-04 20:46

[QUOTE=ewmayer;541792]I'm glad I'm a late adopter with these things - still on rocm 2.10 :). Mihai, bugs aside, roughly what % speedup can one expect from moving from 2.10 to 3.3?[/QUOTE]

I would guess something in the range 2% - 4%.

preda 2020-04-07 12:04

FYI, I'm working on re-enabling LL in gpuowl (intended for DC). (I do not plan to add offset or Jacobi check though)

kracker 2020-04-07 13:12

[QUOTE=preda;542013]FYI, I'm working on re-enabling LL in gpuowl (intended for DC). (I do not plan to add offset or Jacobi check though)[/QUOTE]

Thank you!!!
:party:

preda 2020-04-08 11:39

[QUOTE=preda;542013]FYI, I'm working on re-enabling LL in gpuowl (intended for DC). (I do not plan to add offset or Jacobi check though)[/QUOTE]

I just commited a first iteration of LL. Here's a brief summary of changes:

1. How to run LL
a) add a line like DoubleCheck=1AAFFAAD0000000FFFF,51456287,74,1 to worktodo.txt
where the AID (the initial hex) can be N/A or missing. The values after the exponent ("74,1") are ignored now.
b) or, pass on the command line "-ll <exponent>" (similar to -pm1 and -prp); if this is a test-run and the result should not be published, can be used in conjunction with "-results /dev/null" or a similar file.

2. Savefiles
There are savefiles that follow the usual pattern, with the extension ".ll.owl", you can look into them with "head -1 <savefile>" which prints the header. The savefile contains a checksum that covers all the values stored, so if the savefile is corrupted or edited manually without updating the checksum this should be detected (and the file rejected).

3. The command line flags -block and -log are used by LL too:
-block indicates how many iterations to queue to the GPU, and the default for LL is 1000. If the GPU becomes sluggish (slow to react) I would look into reducing -block to e.g. 100 (although this problem does not appear on ROCm). Also large values for -block together with slower iterations would produce a slower reaction to manual interrupt (Ctrl-C).
-log indicates how often to log, and to save (the two, log and save, are linked as they are for PRP).


warning: when running a very small exponent, 1398269, one of my Radeon VII started to act flaky. This turned out to be fixed by increasing the voltage on that GPU; but it seems to indicate that, for R7, sometimes such small FFTs may expose the GPU more than the typical PRP. In my case the fix was increase voltage, *not* decrease memory frequency.


There is no Jacobi check for now. (I'll consider how hard it is to add).

There is a change in how the FFT size is specified. Look at -h for the new format of FFT specifiers, and pass either the full FFT size (e.g. "5.5M") or one of the FFT specs (e.g. "1K:5:512") from the list displayed by -h .

kracker 2020-04-08 13:21

1 Attachment(s)
Windows binaries(untested!)

kriesel 2020-04-08 14:38

[QUOTE=preda;542085]I just commited a first iteration of LL. Here's a brief summary of changes[/QUOTE]Excellent! Will have a look shortly.

You separately added offset and Jacobi check back at v0.6.
How much of that is reusable?

[url]https://www.mersenneforum.org/showpost.php?p=489083&postcount=7[/url]

preda 2020-04-08 14:59

[QUOTE=kriesel;542095]Excellent! Will have a look shortly.

You separately added offset and Jacobi check back at v0.6.
How much of that is reusable?

[url]https://www.mersenneforum.org/showpost.php?p=489083&postcount=7[/url][/QUOTE]

The Jacobi implem should be the same. For the "offset", it's too much trouble to fit it in without adversely affecting PRP which is still the main focus; also "offset" brings too little benefit to be worth bothering with IMO.

I consider a matching gpuowl DC with offset==0 for a mprime LL with offset != 0 a very strong verification. What additional benefit is there for the trouble of adding offset to gpuowl? I don't see the point -- maybe you could explain the motivation for adding offset in this context.

kriesel 2020-04-08 23:50

[QUOTE=preda;542102]The Jacobi implem should be the same. For the "offset", it's too much trouble to fit it in without adversely affecting PRP which is still the main focus; also "offset" brings too little benefit to be worth bothering with IMO.

I consider a matching gpuowl DC with offset==0 for a mprime LL with offset != 0 a very strong verification. What additional benefit is there for the trouble of adding offset to gpuowl? I don't see the point -- maybe you could explain the motivation for adding offset in this context.[/QUOTE]There are two issues; gpuowl zero offset twice on the same exponent, and gpuowl zero offset matching some other software's offset.

As gpus become a greater fraction of the total primality testing throughput, the work of manually ensuring that gpuowl is not double-checking gpuowl results with the same (zero) offset becomes more onerous. As you know, a Radeon VII with your code and George's modifications is a very fast way of running primality tests, so these gpus and their NVIDIA or cloud-computing near equivalents will have an outsized effect on the increasing number of gpu-produced primality tests, greater than their unit count would indicate.

Gpu assignments are manual assignments. The PrimeNet server does not know what software will be used for a manual assignment. There is no way of manually communicating that. There is no way of specifying which software will be used, or specifying for double-check assignments that first-tests from any specific software or initial-run-offset are desired. I find that I generally forget to consider checking before putting the work on my gpus. How many others do also?

Maybe mprime/prime95, Mlucas, CUDALucas, etc have or could have zero-offset avoidance in its runs? But that does not address the chance of a previous software version (Mlucas before V18 for example) having produced a zero offset result, or gpuowl-gpuowl zero-offset coincidence on the same exponent. Its chance of producing zero offset from a pseudorandom offset generator is quite low. I agree that different software running differing offsets are good verifications. In fact, it's superior to same software and different offsets, since remaining software bugs are less likely to align among very different softwares.

However, if this is working properly, including for any gpuowl results from its LL infancy, it may not be much of a problem while gpuowl work assignments flow through manual reservations and certain other conditions are met. [URL]https://www.mersenneforum.org/showpost.php?p=296358&postcount=36[/URL]

[QUOTE]I just changed the manual reservations for double-checks. The page should no longer hand out exponents previously tested by GLucas, Mlucas, or CUDALucas.

That is, only prime95 with its shift count capability is allowed to do the double (or triple) checking.

This feature needs some testing. [/QUOTE] Zero-offset tandem runs could still be performed manually by users, as Laurv or others have done for large exponents. Creation of a PrimeNet API connection for gpu applications would change the offset calculus. If gpus become a large majority of the primality testing throughput, it will become difficult to avoid gpuowl-first-test double checks as manual assignments. Maybe it also starts to affect strategic double and triple checking [URL]https://www.mersenneforum.org/showthread.php?goto=newpost&t=24148[/URL]

I don't know what the overall project mix manual (gpus)/ primenet-API (cpus) throughput ratio is. But on my own cpus page data, it's about 7.6 to 1. I expect that ratio to increase over time. The gpu throughput is a mix of TF, P-1, and primality testing, with several gpus dedicated to double-checking.

A desire to not slow PRP by offset provisions motivated by LL, and a desire not to duplicate code and increase complexity further by separating offset behavior between PRP and LL, are both understandable. As is conserving your available time for other things, such as P-1 error detection and handling.

kriesel 2020-04-09 01:28

CUDALucas v2.05 was the beginning of nonzero shift there; late 2013 or so. There's no telling how long changeover from earlier versions took.
[URL]https://www.mersenneforum.org/showpost.php?p=359150&postcount=1962[/URL]
We're still double-checking LL tests from 2010 and 2011.
[URL]https://www.mersenne.org/report_exponent/?exp_lo=50281067&full=1[/URL]
[URL]https://www.mersenne.org/report_exponent/?exp_lo=50485051&full=1[/URL]
[URL]https://www.mersenne.org/report_exponent/?exp_lo=50584823&exp_hi=&full=1[/URL]


Mlucas V18 and its introduction of nonzero shift would have been sometime in 2018.

kriesel 2020-04-09 03:44

I didn't find anything indicating nonzero offset was ever implemented in clLucas.

preda 2020-04-09 08:52

[QUOTE=kriesel;542153]There are two issues; gpuowl zero offset twice on the same exponent, and gpuowl zero offset matching some other software's offset.
[/QUOTE]

It seems to me that the zero-offset-DC problem can be addressed through external means. For example, manual DC assignments could be handed out only for exponents that had initial-LL with non-zero offset; and the need to DC the cases with zero-offset-initial-LL can be adequately covered by mprime through non-manual assignments.

ATH 2020-04-09 11:11

If people do 2 zero-offset LL tests with gpuowl, we just have to triple check them with Prime95/mprime/CUDALucas. It should not occur that often.

Maybe you should limit the exponent to 90M for LL test, since it is only for double checks. The few LL tests above 90M does not need to be double checked for a long time.

kriesel 2020-04-09 12:57

[QUOTE=ATH;542193]Maybe you should limit the exponent to 90M for LL test, since it is only for double checks. The few LL tests above 90M does not need to be double checked for a long time.[/QUOTE]As a subproject I am running spot double checks ahead of the first-test wavefront, and it would be useful to be able to use the very efficient gpuowl on a fast new reliable Radeon VII in that. In general, while the extent of changeover from LL to PRP first test is heartening, there is much further to go in that regard. There are LL first test results reported this morning up to 108M on [URL]https://www.mersenne.org/report_recent_cleared/[/URL]
The two highest exponent primality tests were 107985967 and 107981609 LL by brode-runner.
Ten of the highest 25 exponents' primality tests were LL. Note that some older gpu hardware is not capable of running gpuowl, so if used for primality testing, it will run CUDALucas for primality tests, forcing LL not PRP. Old hardware output should be checked sooner since it is less likely to be reliable and CUDALucas has no Jacobi check. A subjective summary of the recent cleared I saw follows.

Anonymous and many other users submitted mixed LL & PRP. In some cases, including WR and kriesel, the LL were DC.

all LL:
AUM - Kuwait
brode-runner
curtisc
Ryan Propper (all DC)
TAMUC-ComputerScience

all PRP:
Ben Delo
dcheuk
George Woltman
Gordon Spence
marssystems
Mihai Preda (shocking! ;)
mrh.org
Oliver Kruse
oodaira
S00030
Simon Josefsson
Sebastien Broucke
trebor

Other things being equal or nearly so, I'm in favor of orthogonality, and against artificial limitations built into the software.
If GIMPS as a project wants to limit future LL activity to below some exponent value, the place to do that is at the PrimeNet server.
When the next Mersenne prime is found, we'll want to use gpuowl to confirm it. There are many >100Mdigit exponents LL tested and without a double-check.

(end)

Prime95 2020-04-09 17:21

[QUOTE=preda;542189]It seems to me that the zero-offset-DC problem can be addressed through external means. For example, manual DC assignments could be handed out only for exponents that had initial-LL with non-zero offset;.[/QUOTE]

The server currently does this.

ewmayer 2020-04-09 18:55

Mihai/George, could you explain why residue shift apparently incurs such a heavy performance penalty in gpuOwl?

For LL with shift, I can maybe understand it - during the carry step of each iteration, one needs to precompute the bit offset of the -2 for the current shift value and then inject it into the corresponding residue word - not a lot of cycles needed, but perhaps in a massively-parallel GPU context, slowing whichever one of those smaller work units gets the shifted -2 causes the others to stall - just speculating here.

But in a PRP context, there is no per-iteration -2 subtrahend, we simply apply some initial shift value to the starting residue, then repeated-square-mod happily away, with the only shift-related expense being the per-iteration update of the shift value, shift = 2*shift (mod p), where the * and mod can both be replaced with low-latency operations, shift (or add), compute shift2 = shift - p, followed by cmov to select the proper one of shift and shift2.

Prime95 2020-04-09 21:30

[QUOTE=ewmayer;542231]Mihai/George, could you explain why residue shift apparently incurs such a heavy performance penalty in gpuOwl?[/QUOTE]

I don't believe there would be a performance penalty.

preda 2020-04-09 21:50

[QUOTE=ewmayer;542231]Mihai/George, could you explain why residue shift apparently incurs such a heavy performance penalty in gpuOwl?

For LL with shift, I can maybe understand it - during the carry step of each iteration, one needs to precompute the bit offset of the -2 for the current shift value and then inject it into the corresponding residue word - not a lot of cycles needed, but perhaps in a massively-parallel GPU context, slowing whichever one of those smaller work units gets the shifted -2 causes the others to stall - just speculating here.

But in a PRP context, there is no per-iteration -2 subtrahend, we simply apply some initial shift value to the starting residue, then repeated-square-mod happily away, with the only shift-related expense being the per-iteration update of the shift value, shift = 2*shift (mod p), where the * and mod can both be replaced with low-latency operations, shift (or add), compute shift2 = shift - p, followed by cmov to select the proper one of shift and shift2.[/QUOTE]

I would also need to look into how the error check needs to be updated for non-zero offset. I don't think it would be a big cost, but probably not a zero-cost either. IMO not the highest priority thing to do ATM.

ewmayer 2020-04-09 21:53

[QUOTE=Prime95;542241]I don't believe there would be a performance penalty.[/QUOTE]

Mihai in #2049:
[quote]For the "offset", it's too much trouble to fit it in without adversely affecting PRP which is still the main focus; also "offset" brings too little benefit to be worth bothering with IMO.[/quote]
I read "adversely affecting PRP" as implying a performance hit.

I have a special interest here - once I finish my current round of v20-related updates to Mlucas I intend to get up to speed on the programming model used for gpuowl, with a long-term goal of enhancing it to support the negacyclic DWT (on top of the Mersenne-style IBDWT) and right-angle-transform data layout needed to support Fermat-mod arithmetic. For my Fermat number testing to date I've used pairs of side-by-side runs, both with 0 shift but at different FFT lengths. The problem is, as we approach F33, the window of possible sizes for the smaller, slightly-less-than-power-of-2 FFT length of said run pairings rapidly shrinks. For F31 a smaller-FFT length of 120M = 15*8M is gonna really be pushing the accuracy limits of a doubles-based FFT. For F33 we'd need at a minimum 496M = 31*16M, but that prime 31 means a 31-DFT, and even the best-of-breed such algorithm is horribly inefficient. So I originally had in mind some highly-composite length < 512M, specifically 504M = 63*8M, but even though 63 = 3^2.7 is decently smooth, the result will likely be slower than the accompanying 512M run. But in the meantime I've worked out all the needed details to do residue-shifted Fermat-mod arithmetic - it's quite a bit more involved than Mersenne-mod, for reasons I'll detail soon in an upcoming post to the "Pepin tests of Fermat numbers beyond F24" thread - but now that I've worked out the mathematical details and have working proof-of-principle code, it's clear that performance-wise it should be no worse than Mersenne-mod with shift. So F33 - starting with a deep p-1 stage 1 (where it's crucial to obtain a correct residue, since absent a resulting factor one wants to distribute said residue to many stage-2 subinterval-runners) can use paired runs at 512M FFT, each with a different shift.

preda 2020-04-09 22:37

[QUOTE=ewmayer;542245]Mihai in #2049:

I read "adversely affecting PRP" as implying a performance hit.

I have a special interest here - once I finish my current round of v20-related updates to Mlucas I intend to get up to speed on the programming model used for gpuowl, with a long-term goal of enhancing it to support the negacyclic DWT (on top of the Mersenne-style IBDWT) and right-angle-transform data layout needed to support Fermat-mod arithmetic. For my Fermat number testing to date I've used pairs of side-by-side runs, both with 0 shift but at different FFT lengths. The problem is, as we approach F33, the window of possible sizes for the smaller, slightly-less-than-power-of-2 FFT length of said run pairings rapidly shrinks. For F31 a smaller-FFT length of 120M = 15*8M is gonna really be pushing the accuracy limits of a doubles-based FFT. For F33 we'd need at a minimum 496M = 31*16M, but that prime 31 means a 31-DFT, and even the best-of-breed such algorithm is horribly inefficient. So I originally had in mind some highly-composite length < 512M, specifically 504M = 63*8M, but even though 63 = 3^2.7 is decently smooth, the result will likely be slower than the accompanying 512M run. But in the meantime I've worked out all the needed details to do residue-shifted Fermat-mod arithmetic - it's quite a bit more involved than Mersenne-mod, for reasons I'll detail soon in an upcoming post to the "Pepin tests of Fermat numbers beyond F24" thread - but now that I've worked out the mathematical details and have working proof-of-principle code, it's clear that performance-wise it should be no worse than Mersenne-mod with shift. So F33 - starting with a deep p-1 stage 1 (where it's crucial to obtain a correct residue, since absent a resulting factor one wants to distribute said residue to many stage-2 subinterval-runners) can use paired runs at 512M FFT, each with a different shift.[/QUOTE]

Nice work! and your contributions to gpuowl are more than welcome. Did you consider a GEC-style error check for Pepin instead of paired runs?

kracker 2020-04-10 16:28

Tried to submit an LL result, got "Did not understand 1 lines."

{"exponent":"54907981", "worktype":"LL", "status":"C", "program":{"name":"gpuowl", "version":"v6.11-252-gaf403e2"}, "timestamp":"2020-04-10 14:05:02 UTC", "user":"kracker", "computer":"core", "aid":"xxxxxxxxxx", "fft-length":3145728, "res64":"xxxxxxxxxxxxx", "offset":0}

kriesel 2020-04-10 16:59

gpuowl-v6.11-255-g81fa7c3 for Win 7 x64 or up
 
2 Attachment(s)
Latest commit build, build log, help output, etc.

ewmayer 2020-04-10 19:27

[QUOTE=preda;542252]Nice work! and your contributions to gpuowl are more than welcome. Did you consider a GEC-style error check for Pepin instead of paired runs?[/QUOTE]

The primality-test runs - whatever hardware they end up being done on - will of course use the GEC, but for this type of rare-but-historic computation, us all believing the GEC is foolproof will not suffice in terms of a research-quality announcement and attendant peer-reviewed paper - there must be at least 2 runs, which, if not done by independently developed programs, must at least provide reasonable assurance of independent-FFT-data. Having done F24, believe me, merely omitting a third pure-integer-code "drone" run (using interim residues provided by 2 cross-checked fast floating-FFT runs) is already a stretch, as far as more conservative parts of the computational number theory community are concerned.

One could object "but they accept GIMPS new-prime announcements, based on matching independent floating-FFT runs" -- true, and that establishes the minimum baseline for e.g. an F33 testing effort. Further, my ongoing Fermat number tests - currently finishing up run #2 of F30 @64M FFT, first run @60M finished late last year - all deposit interim every-10Miter checkpoint files, so knowing the format of same, anyone could do a parallel (in the sense of multiple runs, each covering a separate 10Miter subinterval) triple-check using whatever code they like. For F33 the resulting fileset, at ~1 GB per checkpoint and 858 such, will occupy slightly less than 1TB, so any such file sharing might have to be done using physical disk drives, depending on the state of storage technology at that timepoint.

ATH 2020-04-10 19:43

[QUOTE=kracker;542292]Tried to submit an LL result, got "Did not understand 1 lines."

{"exponent":"54907981", "worktype":"LL", "status":"C", "program":{"name":"gpuowl", "version":"v6.11-252-gaf403e2"}, "timestamp":"2020-04-10 14:05:02 UTC", "user":"kracker", "computer":"core", "aid":"xxxxxxxxxx", "fft-length":3145728, "res64":"xxxxxxxxxxxxx", "offset":0}[/QUOTE]

Yep, I just got the same message, so I guess George or Aaron needs to update the manual result script for this "new" gpuowl LL test. I guess it is different than back in gpuowl 0.6?

James Heinrich 2020-04-10 22:02

[QUOTE=kracker;542292]Tried to submit an LL result, got "Did not understand 1 lines."[/QUOTE][QUOTE=ATH;542304]Yep, I just got the same message, so I guess George or Aaron needs to update the manual result script for this "new" gpuowl LL test. I guess it is different than back in gpuowl 0.6?[/QUOTE]Looks like Mihai changed the result format and forgot to tell me about it. :smile: (I do the manual results form, most everything else is George or Aaron).

I'm just waiting to hear back from Mihai regarding the change in format, I'll post back when the manual form will accept these results.

preda 2020-04-11 00:36

[QUOTE=James Heinrich;542315]Looks like Mihai changed the result format and forgot to tell me about it. :smile: (I do the manual results form, most everything else is George or Aaron).

I'm just waiting to hear back from Mihai regarding the change in format, I'll post back when the manual form will accept these results.[/QUOTE]

Sorry for that. I need to remind myself what is the right format for LL JSON.

kriesel 2020-04-11 00:45

[QUOTE=ATH;542304]I guess it is different than back in gpuowl 0.6?[/QUOTE]Very different; V0.6 was before the switch to JSON.
[URL]https://www.mersenneforum.org/showpost.php?p=531029&postcount=28[/URL]

preda 2020-04-11 09:18

[QUOTE=kracker;542292]Tried to submit an LL result, got "Did not understand 1 lines."

{"exponent":"54907981", "worktype":"LL", "status":"C", "program":{"name":"gpuowl", "version":"v6.11-252-gaf403e2"}, "timestamp":"2020-04-10 14:05:02 UTC", "user":"kracker", "computer":"core", "aid":"xxxxxxxxxx", "fft-length":3145728, "res64":"xxxxxxxxxxxxx", "offset":0}[/QUOTE]

Please replace "offset" with "shift-count" and re-submit the result -- it should be accepted after this change.

This same change has been comitted to gpuowl, so this should be fixed after a re-checkout.

kriesel 2020-04-11 14:03

[QUOTE=kriesel;542296]Latest commit build, build log, help output, etc.[/QUOTE]
v6.11-255 on Win7 x64, RX550 did not like the default fft at all. +1 etc syntax is apparently gone and if used, gpuowl fails in an interesting way. A quick read of the help output set it right and on its way with the second fft specification for the fft length.
[CODE]C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>title gpuowl-v6.11-255-g81fa7c3/rx550

C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>gpuowl-win
2020-04-10 12:09:43 gpuowl v6.11-255-g81fa7c3
2020-04-10 12:09:43 config: -device 1 -user kriesel -cpu condorella/rx550 -yield -maxAlloc 3600 -use NO_ASM
2020-04-10 12:09:43 device 1, unique id ''
2020-04-10 12:09:43 condorella/rx550 94741139 FFT: 5M 1K:10:256 (18.07 bpw)
2020-04-10 12:09:43 condorella/rx550 Expected maximum carry32: 461E0000
2020-04-10 12:09:46 condorella/rx550 OpenCL args "-DEXP=94741139u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xf.3cd1fc041
1148p-3 -DIWEIGHT_STEP=0x8.66790bf53aca8p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DPM1=0 -DAMDGPU=1
-DNO_ASM=1 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-04-10 12:09:53 condorella/rx550 OpenCL compilation in 6.96 s
2020-04-10 12:10:09 condorella/rx550 94741139 EE 0 loaded: blockSize 400, 0000000000000000 (expected 0000000000000003)
2020-04-10 12:10:09 condorella/rx550 Exiting because "error on load"
2020-04-10 12:10:09 condorella/rx550 Bye
C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>g611

C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>title gpuowl-v6.11-255-g81fa7c3/rx550

C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>gpuowl-win
2020-04-10 12:10:51 gpuowl v6.11-255-g81fa7c3
2020-04-10 12:10:51 config: -device 1 -user kriesel -cpu condorella/rx550 -yield -maxAlloc 3600 -use NO_ASM -fft +1
2020-04-10 12:10:51 device 1, unique id ''
2020-04-10 12:10:51 condorella/rx550 94741139 FFT: 128K 256:1:256 (722.82 bpw)
2020-04-10 12:10:51 condorella/rx550 FFT size too small for exponent (722.82 bits/word).
2020-04-10 12:10:51 condorella/rx550 Exiting because "FFT size too small"
2020-04-10 12:10:51 condorella/rx550 Bye
C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>g611

C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>title gpuowl-v6.11-255-g81fa7c3/rx550

C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>gpuowl-win
2020-04-10 12:12:45 gpuowl v6.11-255-g81fa7c3
2020-04-10 12:12:45 config: -device 1 -user kriesel -cpu condorella/rx550 -yield -maxAlloc 3600 -use NO_ASM -fft 1K:5:512
2020-04-10 12:12:45 device 1, unique id ''
2020-04-10 12:12:45 condorella/rx550 94741139 FFT: 5M 1K:5:512 (18.07 bpw)
2020-04-10 12:12:45 condorella/rx550 Expected maximum carry32: 461E0000
2020-04-10 12:12:47 condorella/rx550 OpenCL args "-DEXP=94741139u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=5u -DWEIGHT_STEP=0xf.3cd1fc0411
148p-3 -DIWEIGHT_STEP=0x8.66790bf53aca8p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DPM1=0 -DAMDGPU=1 -
DNO_ASM=1 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-04-10 12:12:55 condorella/rx550 OpenCL compilation in 8.18 s
2020-04-10 12:13:02 condorella/rx550 94741139 OK 0 loaded: blockSize 400, 0000000000000003
2020-04-10 12:13:19 condorella/rx550 94741139 OK 800 0.00%; 14229 us/it; ETA 15d 14:28; 738c4e015132f834 (check 5.86s)
2020-04-10 13:00:54 condorella/rx550 94741139 OK 200000 0.21%; 14317 us/it; ETA 15d 15:59; e0463c77c58b0105 (check 5.87s)
2020-04-10 13:48:40 condorella/rx550 94741139 OK 400000 0.42%; 14319 us/it; ETA 15d 15:14; 5b1fe09cbecb5e40 (check 5.89s)
2020-04-10 14:36:27 condorella/rx550 94741139 OK 600000 0.63%; 14321 us/it; ETA 15d 14:29; 5f62cf32c024e1a2 (check 5.87s)
2020-04-10 15:24:15 condorella/rx550 94741139 OK 800000 0.84%; 14322 us/it; ETA 15d 13:44; 3dd122479d7dde25 (check 5.88s)
2020-04-10 16:12:02 condorella/rx550 94741139 OK 1000000 1.06%; 14319 us/it; ETA 15d 12:52; e44ae2f6c9046662 (check 5.87s)
2020-04-10 16:59:49 condorella/rx550 94741139 OK 1200000 1.27%; 14320 us/it; ETA 15d 12:06; b3a0108ad221f8fd (check 5.88s)
2020-04-10 17:47:36 condorella/rx550 94741139 OK 1400000 1.48%; 14319 us/it; ETA 15d 11:17; 6077a7f20c7ee45c (check 5.88s)
2020-04-10 17:49:53 condorella/rx550 Stopping, please wait..
2020-04-10 17:50:05 condorella/rx550 94741139 OK 1410000 1.49%; 14328 us/it; ETA 15d 11:28; e02e0d0dca18d9f5 (check 5.87s)
2020-04-10 17:50:05 condorella/rx550 Exiting because "stop requested"
2020-04-10 17:50:05 condorella/rx550 Bye[/CODE]

LaurV 2020-04-11 14:26

[QUOTE=kriesel;542296]Latest commit build, build log, help output, etc.[/QUOTE]
Could you (or kracker) please rebuild with the last change from preda, and repost?

(I am not yet able to build gpuowl, I mean, I didn't try yet, but I will give it few tests as long as it can LL).

ATH 2020-04-11 14:43

[QUOTE=preda;542348]Please replace "offset" with "shift-count" and re-submit the result -- it should be accepted after this change.

This same change has been comitted to gpuowl, so this should be fixed after a re-checkout.[/QUOTE]

Thanks, that worked. 2 successful double checks from gpuowl:
[M]83174053[/M]
[M]83180563[/M]

kriesel 2020-04-11 16:04

Win7 x64 build of gpuowl v6.11-257
 
2 Attachment(s)
Latest available commit as of ~12 minutes before this post. Usual shower of warning in the build log; help output included; no testing performed. Enjoy, and please report here any issues.

kracker 2020-04-11 17:07

1 Attachment(s)
Just now, I made the very stupid mistake of not checking a few DC residues before submitting a batch... :sorry:
I can redo them - or whatever is best.
Nvidia P100 in colab.
gpuowl v6.11-252-gaf403e2
OUT_SIZEX=16,IN_SIZEX=8,IN_SPACING=8
[code]
51509873
51491101
51491059
51490883
51490843
51491267
51491119
51509257
51490799
51490723
51490343
51490339
51508747
58650941
51488837
51491983
51491773
51491731
[/code]

preda 2020-04-11 21:15

It seems the problem is associated with the setup
[QUOTE]
OUT_SIZEX=16,IN_SIZEX=8,IN_SPACING=8
[/QUOTE]

Did this setup work for another exponent?

One way to check whether the FFT is broken is to run a few PRP iterations before starting the LL, e.g.
./gpuowl -prp 51509873

[QUOTE=kracker;542369]Just now, I made the very stupid mistake of not checking a few DC residues before submitting a batch... :sorry:
I can redo them - or whatever is best.
Nvidia P100 in colab.
gpuowl v6.11-252-gaf403e2
OUT_SIZEX=16,IN_SIZEX=8,IN_SPACING=8
[code]
51509873
51491101
51491059
51490883
51490843
51491267
51491119
51509257
51490799
51490723
51490343
51490339
51508747
58650941
51488837
51491983
51491773
51491731
[/code][/QUOTE]

kracker 2020-04-12 02:17

with the previously set settings I'm getting an immediate EE... seems to work with no -use arguments.
[code]
/content/drive/My Drive/gpuowl-colab
2020-04-12 02:08:53 gpuowl v6.11-252-gaf403e2
2020-04-12 02:08:53 config: -user kracker -cpu pce
2020-04-12 02:08:53 config: -ll 51509873
2020-04-12 02:08:53 device 0, unique id ''
2020-04-12 02:08:53 pce 51509873 FFT: 2.75M 256:11:512 (17.86 bpw)
2020-04-12 02:08:53 pce Expected maximum carry32: 2B810000
2020-04-12 02:08:54 pce OpenCL args "-DEXP=51509873u -DWIDTH=256u -DSMALL_HEIGHT=512u -DMIDDLE=11u -DWEIGHT_STEP=0x1.19794ea80bcb4p+0 -DIWEIGHT_STEP=0x1.d1a9c3958d155p-1 -DWEIGHT_BIGSTEP=0x1.ae89f995ad3adp+0 -DIWEIGHT_BIGSTEP=0x1.306fe0a31b715p-1 -DPM1=0 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-04-12 02:08:57 pce

2020-04-12 02:08:57 pce OpenCL compilation in 2.80 s
2020-04-12 02:08:57 pce 51509873 LL 0 loaded: 0000000000000004
2020-04-12 02:09:48 pce 51509873 LL 100000 0.19%; 509 us/it; ETA 0d 07:16; d4bf953f17f5dd56
2020-04-12 02:10:15 pce Stopping, please wait..
2020-04-12 02:10:15 pce 51509873 LL 154000 0.30%; 510 us/it; ETA 0d 07:17; be98350bc1fe8687
2020-04-12 02:10:15 pce Exiting because "stop requested"
2020-04-12 02:10:15 pce Bye
[/code]

[code]

/content/drive/My Drive/gpuowl-colab
2020-04-12 02:12:19 gpuowl v6.11-252-gaf403e2
2020-04-12 02:12:19 config: -user kracker -cpu pce
2020-04-12 02:12:19 config: -use OUT_SIZEX=16,IN_SIZEX=8,IN_SPACING=8 -ll 51509873
2020-04-12 02:12:19 device 0, unique id ''
2020-04-12 02:12:19 pce 51509873 FFT: 2.75M 256:11:512 (17.86 bpw)
2020-04-12 02:12:19 pce Expected maximum carry32: 2B810000
2020-04-12 02:12:19 pce OpenCL args "-DEXP=51509873u -DWIDTH=256u -DSMALL_HEIGHT=512u -DMIDDLE=11u -DWEIGHT_STEP=0x1.19794ea80bcb4p+0 -DIWEIGHT_STEP=0x1.d1a9c3958d155p-1 -DWEIGHT_BIGSTEP=0x1.ae89f995ad3adp+0 -DIWEIGHT_BIGSTEP=0x1.306fe0a31b715p-1 -DPM1=0 -DIN_SIZEX=8 -DIN_SPACING=8 -DOUT_SIZEX=16 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-04-12 02:12:19 pce

2020-04-12 02:12:19 pce OpenCL compilation in 0.01 s
2020-04-12 02:12:19 pce 51509873 LL 0 loaded: 0000000000000004
2020-04-12 02:13:09 pce 51509873 LL 100000 0.19%; 496 us/it; ETA 0d 07:05; a2891146b3ded4b9
2020-04-12 02:13:16 pce Stopping, please wait..
2020-04-12 02:13:17 pce 51509873 LL 115000 0.22%; 502 us/it; ETA 0d 07:10; 42848d9cb649a731
2020-04-12 02:13:17 pce Exiting because "stop requested"
2020-04-12 02:13:17 pce Bye
[/code]

ATH 2020-04-13 22:44

I created a script to test the speed of a bunch of combinations of the OUT_WG,OUT_SIZEX,OUT_SPACING,IN_WG,IN_SIZEX,IN_SPACING variables for the LL test.

It seems for LL test there is no block to stop combinations that will not work. Instead it zeros the residue. For example these:

[CODE]./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=256,OUT_SIZEX=4,OUT_SPACING=128,IN_WG=64,IN_SIZEX=128,IN_SPACING=4

./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=256,OUT_SIZEX=4,OUT_SPACING=128,IN_WG=64,IN_SIZEX=128,IN_SPACING=128

./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=64,OUT_SIZEX=128,OUT_SPACING=8,IN_WG=64,IN_SIZEX=128,IN_SPACING=64

./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=64,OUT_SIZEX=128,OUT_SPACING=128,IN_WG=64,IN_SIZEX=128,IN_SPACING=64

Output:

2020-04-13 22:32:34 Tesla P100-PCIE-16GB-0 OpenCL compilation in 2.22 s
2020-04-13 22:32:34 Tesla P100-PCIE-16GB-0 95000011 LL 0 loaded: 0000000000000004
2020-04-13 22:32:41 Tesla P100-PCIE-16GB-0 95000011 LL 10000 0.01%; 641 us/it; ETA 0d 16:54; fffffffffffffffd
2020-04-13 22:32:43 Tesla P100-PCIE-16GB-0 Stopping, please wait..
2020-04-13 22:32:43 Tesla P100-PCIE-16GB-0 95000011 LL 14000 0.01%; 657 us/it; ETA 0d 17:20; fffffffffffffffd

[/CODE]

preda 2020-04-13 23:17

LL is "naked", no error check at all. Please try/tune combinations on PRP, which will help detect the invalid ones. Only after validation with PRP use any combination for LL.

[QUOTE=ATH;542574]I created a script to test the speed of a bunch of combinations of the OUT_WG,OUT_SIZEX,OUT_SPACING,IN_WG,IN_SIZEX,IN_SPACING variables for the LL test.

It seems for LL test there is no block to stop combinations that will not work. Instead it zeros the residue. For example these:

[CODE]./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=256,OUT_SIZEX=4,OUT_SPACING=128,IN_WG=64,IN_SIZEX=128,IN_SPACING=4

./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=256,OUT_SIZEX=4,OUT_SPACING=128,IN_WG=64,IN_SIZEX=128,IN_SPACING=128

./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=64,OUT_SIZEX=128,OUT_SPACING=8,IN_WG=64,IN_SIZEX=128,IN_SPACING=64

./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=64,OUT_SIZEX=128,OUT_SPACING=128,IN_WG=64,IN_SIZEX=128,IN_SPACING=64

Output:

2020-04-13 22:32:34 Tesla P100-PCIE-16GB-0 OpenCL compilation in 2.22 s
2020-04-13 22:32:34 Tesla P100-PCIE-16GB-0 95000011 LL 0 loaded: 0000000000000004
2020-04-13 22:32:41 Tesla P100-PCIE-16GB-0 95000011 LL 10000 0.01%; 641 us/it; ETA 0d 16:54; fffffffffffffffd
2020-04-13 22:32:43 Tesla P100-PCIE-16GB-0 Stopping, please wait..
2020-04-13 22:32:43 Tesla P100-PCIE-16GB-0 95000011 LL 14000 0.01%; 657 us/it; ETA 0d 17:20; fffffffffffffffd

[/CODE][/QUOTE]

kriesel 2020-04-14 03:18

[QUOTE=preda;542577]LL is "naked", no error check at all. Please try/tune combinations on PRP, which will help detect the invalid ones. Only after validation with PRP use any combination for LL.[/QUOTE]Yikes, that means the LL side of gpuowl will be less reliable than CUDALucas v2.06, which has checks for known bad residues seen to occur,
0x0000000000000000, 0x0000000000000002, 0xffffffff80000000, 0xfffffffffffffffd, and excessive roundoff error. Gpuowl checks bits/word.

A memory copy fail could give 0; +-2 values come from the residue getting zeroed and then the -2 and the squaring; the 33-bits-set value 0xffffffff80000000 comes from using far too short an fft length as was seen in both cllucas 1.02 and CUDALucas v2.03.
[URL]https://mersenneforum.org/showpost.php?p=355661&postcount=232[/URL]
[URL]https://mersenneforum.org/showpost.php?p=386081&postcount=299[/URL]

LaurV 2020-04-14 07:41

We are struggling to run gpuOwn on windoze 7, nvidia card (2080Ti, but also 1080Ti). We are sure we are missing something. Can anyone point us to a tutorial? Right now, with cudaLucas, we are squeezing about 22 hours for a 55M LL test. We want to see what the owl can do, before replacing the cards with a couple of radeon vees (or.. it is wees? like in "waa wee cafè"?)

preda 2020-04-14 07:42

[QUOTE=kriesel;542591]Yikes, that means the LL side of gpuowl will be less reliable than CUDALucas v2.06, which has checks for known bad residues seen to occur,
0x0000000000000000, 0x0000000000000002, 0xffffffff80000000, 0xfffffffffffffffd, and excessive roundoff error. Gpuowl checks bits/word.

A memory copy fail could give 0; +-2 values come from the residue getting zeroed and then the -2 and the squaring; the 33-bits-set value 0xffffffff80000000 comes from using far too short an fft length as was seen in both cllucas 1.02 and CUDALucas v2.03.
[URL]https://mersenneforum.org/showpost.php?p=355661&postcount=232[/URL]
[URL]https://mersenneforum.org/showpost.php?p=386081&postcount=299[/URL][/QUOTE]

Things may improve in time; this is an intermediary point in the timeline, not the final perfect LL.

preda 2020-04-14 07:45

[QUOTE=LaurV;542605]We are struggling to run gpuOwn on windoze 7, nvidia card (2080Ti, but also 1080Ti). We are sure we are missing something. Can anyone point us to a tutorial? Right now, with cudaLucas, we are squeezing about 22 hours for a 55M LL test. We want to see what the owl can do, before replacing the cards with a couple of radeon vees (or.. it is wees? like in "waa wee cafè"?)[/QUOTE]

1. does clinfo run, detecting the GPUs?
2. does gpuowl -h run, printing the list of GPUs?
3. what fails?

kriesel 2020-04-14 14:05

v6.11-134, no errors for months on this RX550 gpu running 9xM PRP.

v6.11-257, 2 errors in 5 hours, same gpu[CODE]2020-04-14 02:08:31 gpuowl v6.11-257-g39fc002
2020-04-14 02:08:31 config: -device 1 -user kriesel -cpu condorella/rx550 -yield -maxAlloc 3600 -use NO_ASM
2020-04-14 02:08:31 device 1, unique id ''
2020-04-14 02:08:31 condorella/rx550 94741139 FFT: 5M 1K:10:256 (18.07 bpw)
2020-04-14 02:08:31 condorella/rx550 Expected maximum carry32: 461E0000
2020-04-14 02:08:32 condorella/rx550 OpenCL args "-DEXP=94741139u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xf
.3cd1fc0411148p-3 -DIWEIGHT_STEP=0x8.66790bf53aca8p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4
-DPM1=0 -DAMDGPU=1 -DNO_ASM=1 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-04-14 02:08:38 condorella/rx550 OpenCL compilation in 5.62 s
2020-04-14 02:08:44 condorella/rx550 94741139 OK 6054400 loaded: blockSize 400, cc34a0f738ddbc39
2020-04-14 02:09:01 condorella/rx550 94741139 OK 6055200 6.39%; 13606 us/it; ETA 13d 23:12; 2b8af08eb69c5bb4 (check 5.61s)
2020-04-14 02:42:10 condorella/rx550 94741139 OK 6200000 6.54%; 13711 us/it; ETA 14d 01:13; 36657b8d4cf7b2b8 (check 5.63s)
2020-04-14 03:27:58 condorella/rx550 94741139 OK 6400000 6.76%; 13717 us/it; ETA 14d 00:36; 4941c60b1a288320 (check 5.63s)
2020-04-14 04:13:45 condorella/rx550 94741139 OK 6600000 6.97%; 13717 us/it; ETA 13d 23:50; 67aa94150e6fcccf (check 5.62s)
2020-04-14 04:59:31 condorella/rx550 94741139 OK 6800000 7.18%; 13712 us/it; ETA 13d 22:58; cab0b7a0fb0cc066 (check 5.65s)
2020-04-14 05:45:17 condorella/rx550 94741139 EE 7000000 7.39%; 13710 us/it; ETA 13d 22:09; 5e731e02beb738ea (check 5.61s)
2020-04-14 05:45:23 condorella/rx550 94741139 OK 6800000 loaded: blockSize 400, cab0b7a0fb0cc066
2020-04-14 05:54:37 condorella/rx550 94741139 OK 6840000 7.22%; 13711 us/it; ETA 13d 22:46; 4f7b98cea0650fb9 (check 5.63s) 1 er
rors
2020-04-14 06:22:07 condorella/rx550 94741139 OK 6960000 7.35%; 13714 us/it; ETA 13d 22:24; a47542d527e8a188 (check 5.63s) 1 er
rors
2020-04-14 06:49:37 condorella/rx550 94741139 EE 7080000 7.47%; 13711 us/it; ETA 13d 21:53; b71198a3d710f35b (check 5.62s) 1 er
rors
2020-04-14 06:49:43 condorella/rx550 94741139 OK 6960000 loaded: blockSize 400, a47542d527e8a188
2020-04-14 07:09:07 condorella/rx550 94741139 OK 7040000 7.43%; 13716 us/it; ETA 13d 22:08; b7ef942604ff7e9d (check 5.62s) 2 er
rors[/CODE]

paulunderwood 2020-04-14 16:03

Luckily it was a motherboard that failed for me not the R7. I had a mammoth battle installing rocm-3.3.0 onto a different Debian Buster machine, my desktop, which involved a kernel upgrade and it works! With the memory at stock: 0 1000 820 1050 4 and two instances running I am getting 1409us/it each :smile:

ewmayer 2020-04-14 19:42

[QUOTE=paulunderwood;542641]Luckily it was a motherboard that failed for me not the R7. I had a mammoth battle installing rocm-3.3.0 onto a different Debian Buster machine, my desktop, which involved a kernel upgrade and it works! With the memory at stock: 0 1000 820 1050 4 and two instances running I am getting 1409us/it each :smile:[/QUOTE]

Good for you - what sclk setting is the quoted timing at, and what's the total system wall wattage, if you have it running through a wattmeter?

paulunderwood 2020-04-14 19:58

[QUOTE=ewmayer;542680]Good for you - what sclk setting is the quoted timing at, and what's the total system wall wattage, if you have it running through a wattmeter?[/QUOTE]

No meter. At sclk 4 it is drawing 214 watts according to sensors.

The odd thing I noticed is that timings for a PRP test used to go down when the other instance was running P-1, but now (as a desktop GPU) it goes up when P-1 is running.

ewmayer 2020-04-14 20:17

[QUOTE=paulunderwood;542685]No meter. At sclk 4 it is drawing 214 watts according to sensors.

The odd thing I noticed is that timings for a PRP test used to go down when the other instance was running P-1, but now (as a desktop GPU) it goes up when P-1 is running.[/QUOTE]

I've noticed similar timings effects with one-PRP-one-P-1 ... I think it's due to some kind of internal GPU task-priority setting, which (AFAIK) the user has no contol over.

kriesel 2020-04-15 03:09

gpuowl-win v6.11-259-g83434d8 build fail
 
some usual-looking warnings, then:[CODE]Gpu.cpp: In member function 'void Gpu::printRoundoff(u32)':
Gpu.cpp:844:35: error: 'M_PI' was not declared in this scope; did you mean 'M_PIl'?
844 | double beta = sdev * (sqrt(6) / M_PI);
| ^~~~
| M_PIl
make: *** [Makefile:30: Gpu.o] Error 1
[/CODE]

kriesel 2020-04-16 03:08

gpuowl-win v6.11-259-g83434d8 build
 
2 Attachment(s)
After the trivial edit, see preceding post, built ok.

ATH 2020-04-16 05:44

[QUOTE=preda;542577]LL is "naked", no error check at all. Please try/tune combinations on PRP, which will help detect the invalid ones. Only after validation with PRP use any combination for LL.[/QUOTE]

Yes, be very careful tuning LL performance.

I tried using the fastest settings that did not zero the residue, but the final residue from the test did not match the initial LL test on the exponent.
So I was suspicious and did not turn in the result, but ran the LL test again using the settings from PRP tuning that I used for the other successful LL double checks, and this time it did match the first test and finished the double check. Which means that my first test was faulty due to too aggressive settings.

LaurV 2020-04-16 16:54

3 Attachment(s)
[QUOTE=preda;542607]1. does clinfo run, detecting the GPUs?
2. does gpuowl -h run, printing the list of GPUs?
3. what fails?[/QUOTE]
Managed to make it run. I only had to look into the mirror to see where the dumb was. For some odd reason, I was using very old drivers (397), and when I decided to use the new owl freshly compiled by the people here, I had to upgrade the drivers to the new 445.7x (?) available last week. Either something went wrong at that time, or the driver didn't work properly. I suspect, the last, because they just released 445.87 yesterday, and with this it works. It seemed a fast fix on their side, so maybe indeed something was wrong with the drivers.

Now, the comments... Be ready... It is coming... [U]Now[/U]...

The owl is about 7% faster than cudaLucas running with old drivers. And trying cudaLucas with newer drivers, I suddenly remembered why I kept the old :picard:... all drivers from the generation 4xx are about 6% slower when running cudaLucas for 1080Ti and 2080Ti. I already forgot this, because in the last time I only played mfaktc/gpu72.

So, with the new drivers, in the race between the owl and cudaLucas, the owl comes out about 13% faster. This is good. Also, the cards run much cooler, about 9-10°C cooler that cudaLucas runs them. Also good.

[ATTACH]22045[/ATTACH]
(ignore the iteration at 300k, that is because we were doing something else more important in that time, for few seconds).

But then, the CPU activity seems strange in that graphic, the temperature jumps up and down like somebody is stealing ticks from P95. And voila, the owl gets full core for itself. (in the photo, 5% and 45% is due to HT, this CPU has 10 phys cores, so it is 10% and 90% occupancy).

[ATTACH]22046[/ATTACH]

That's not good. As a proof, P95 performance goes down 10% compared with the case when cudaLucas runs. That is 20%, if two copies (two cards) run. And if your CPU has only 4 cores, than the performance of P95 gets down 50%, because gpuOwl (2 copies) can monopolize 2 cores from 4. That's bad.

[ATTACH]22047[/ATTACH]
(in this picture, P95 runs parallel with 1 gpuOwl thread then with 2 cudaLucas threads, then back, then some other tests)


So, if you have Nvidia cards and want to use gpuOwl, AND use P95 in the same time, you have to see how do you get more output, running P95 al lower performance, and use the faster owl with the GPU(s) or running P95 full speed and using the (slower) cudaLucas (as cudaLucas does not take [U]any[/U] CPU resources).

Temperature is also a thing, but there is no dilemma here, if you have common water circuit, then the CPU running cooler reflects also in the GPUs running cooler. To make sure, separate water circuit will be needed (which I will do, probably during the weekend).

And checkpoint file names suck... where is the "exponent.iteration.residue.ll" naming convention? :razz:
And where is the shift? :rant:

Prime95 2020-04-16 17:18

[QUOTE=LaurV;542874]Now, the comments... Be ready... It is coming... [U]Now[/U]..[/QUOTE]

Everybody's a critic :)

Try the -yield option to tell gpuowl to yield the CPU to other tasks. I don't recall all the details but I believe this is really an nVidia openCL bug.

LaurV 2020-04-16 17:36

1 Attachment(s)
[QUOTE=Prime95;542879]
Try the -yield option [/QUOTE]
No difference. (this was in fact written in the help, being dumb again, or too tired, 0:35AM here, going to bed after this...)

But wait! What? WHAT? WTF? :shock:

[ATTACH]22048[/ATTACH]
(cL running on the other card, regardless of the fact that gpuOwl is running or not, now it takes a CPU core).

It seems a NV problem indeed. I didn't have this in the past. Maybe I am dumb, but I will have to dig up the old drivers to verify.

kriesel 2020-04-17 16:54

All on the same RX550 gpu and host system, that has reliably run without GEC errors for multiple 5M PRP first-tests on v6.11-134:
90710093 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.30 bits/word
92858651 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.71 bits/word
93461911 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.83 bits/word
93873049 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.90 bits/word
94418047 began FFT 5120K: Width 256x4, Height 64x4, Middle 10; 18.01 bits/word, ran ok. finished at 6M because of problems encountered at 7M on 131.5M, requiring -fft +6

All runs -use NO_ASM
V6.11-134 on rx550 from start on 94741139; no GEC to 2.8M iterations; this was at 6M fft, 18370 us/iter due to problems seen with 7M fft (leftover -fft +6 config.txt content)
V6.11-257 continuation on same rx550, 5M fft 1K:5:512; 14310 us/iter to 6.0544M iterations
V6.11-257 continuation on same RX550, no fft specification; 1K:10:256 chosen by program; ~13712 us/iter, 9 GEC errors by 22.1M iterations.
V6.11-259 continuation with -use STATS on same RX550, 1K:10:256, ~16542 us/iter, no additional GEC through 25.64M iterations.
V6.11-259 continuation without -use STATS underway now, 13750 usec/iter

kriesel 2020-04-17 19:57

gpuowl-win v6.11-264-264-g5c977d4 build
 
2 Attachment(s)
Nice, gracefully deals with the ASM issue. That may have been there a while, I haven't tried it recently until now.

[CODE]2020-04-17 14:17:34 roa/radeonvii ASM compilation failed, retrying compilation using NO_ASM[/CODE]Please fix the github source for this, previously reported minor issue.[CODE]Gpu.cpp: In member function 'void Gpu::printRoundoff(u32)':
Gpu.cpp:851:35: error: 'M_PI' was not declared in this scope; did you mean 'M_PIl'?
851 | double beta = sdev * (sqrt(6) / M_PI);
| ^~~~
| M_PIl
make: *** [Makefile:30: Gpu.o] Error 1[/CODE]Please fix the readme.md which says P-1 iterations for LL; it's P-2. (See [URL]https://www.mersenne.org/various/math.php#lucas-lehmer[/URL])

-use doc from top of gpuowl.cl source code; note that there are additional that are derived from these optional inputs:[CODE]//

gpuOwl, an OpenCL Mersenne primality test.
// Copyright Mihai Preda and George Woltman.

/* List of user-serviceable -use flags and their effects

DEBUG : enable asserts. Slow, but allows to verify that all asserts hold.

NO_ASM : request to not use any inline __asm()
NO_OMOD: do not use GCN output modifiers in __asm()

OUT_WG,OUT_SIZEX,OUT_SPACING <AMD default is 256,32,4> <nVidia default is 256,4,1 but needs testing>
IN_WG,IN_SIZEX,IN_SPACING <AMD default is 256,32,1> <nVidia default is 256,4,1 but needs testing>

UNROLL_WIDTH <nVidia default>
NO_UNROLL_WIDTH <AMD default>

OLD_FFT8 <default>
NEWEST_FFT8
NEW_FFT8

OLD_FFT5
NEW_FFT5 <default>
NEWEST_FFT5

NEW_FFT10 <default>
OLD_FFT10

CARRY32 <AMD default for PRP when appropriate>
CARRY64 <nVidia default>, <AMD default for PM1 when appropriate>

CARRYM32
CARRYM64 <default>

ORIG_SLOWTRIG
NEW_SLOWTRIG <default> // Our own sin/cos implementation
ROCM_SLOWTRIG // Use ROCm's private reduced-argument sin/cos

ROCM31 <AMD default> // Enable workaround for ROCm 3.1 bug affecting kcos()
NO_ROCM31 <nVidia default>

---- P-1 below ----

NO_P2_FUSED_TAIL // Do not use the big kernel tailFusedMulDelta

*/
[/CODE]Somehow describing the available values and restrictions would be a useful addition. Granted, we don't want to absorb too much time of the coders in documentation of transient states. However, more readily available info would aid testing support by others and reduce lost time by users.

-use STATS and ROUNDOFF are not listed above. Have they been removed?


There do seem to be some oddities, like getting the message both CARRY32 and CARRY64 have been specified when CARRY32 is specified, or out of resources message and terminate when -use STATS attempted.
[CODE]2020-04-17 14:37:29 gpuowl v6.11-264-g5c977d4-dirty
2020-04-17 14:37:29 config: -device 1 -user kriesel -cpu roa/radeonvii-w2
2020-04-17 14:37:29 device 1, unique id ''
2020-04-17 14:37:29 roa/radeonvii-w2 852348659 FFT: 48M 4K:12:512 (16.93 bpw)
2020-04-17 14:37:29 roa/radeonvii-w2 Expected maximum carry32: 70BA0000
2020-04-17 14:37:41 roa/radeonvii-w2 OpenCL args "-DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-04-17 14:37:41 roa/radeonvii-w2 ASM compilation failed, retrying compilation using NO_ASM
2020-04-17 14:37:57 roa/radeonvii-w2 OpenCL compilation in 16.49 s
2020-04-17 14:38:07 roa/radeonvii-w2 852348659 OK 165317600 loaded: blockSize 400, ad90fdb1696eabf0
2020-04-17 14:38:25 roa/radeonvii-w2 852348659 OK 165318400 19.40%; 12853 us/it; ETA 102d 04:56; 8dcd685e25e59f08 (check 7.34s) 4 errors
2020-04-17 14:38:53 roa/radeonvii-w2 852348659 OK 165320000 19.40%; 12831 us/it; ETA 102d 00:41; d1ff263c2a76e8c1 (check 7.27s) 4 errors
2020-04-17 14:40:31 roa/radeonvii-w2 Stopping, please wait..
2020-04-17 14:40:42 roa/radeonvii-w2 852348659 OK 165328000 19.40%; 12834 us/it; ETA 102d 01:12; f095cd3f6ba99a30 (check 6.65s) 4 errors
2020-04-17 14:40:42 roa/radeonvii-w2 Exiting because "stop requested"
2020-04-17 14:40:42 roa/radeonvii-w2 Bye

2020-04-17 14:40:47 gpuowl v6.11-264-g5c977d4-dirty
2020-04-17 14:40:47 config: -device 1 -user kriesel -cpu roa/radeonvii-w2 -use NO_ASM,CARRY32,STATS,DEBUG
2020-04-17 14:40:47 device 1, unique id ''
2020-04-17 14:40:47 roa/radeonvii-w2 852348659 FFT: 48M 4K:12:512 (16.93 bpw)
2020-04-17 14:40:47 roa/radeonvii-w2 Expected maximum carry32: 70BA0000
2020-04-17 14:40:59 roa/radeonvii-w2 OpenCL args "-DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DCARRY32=1 -DDEBUG=1 -DNO_ASM=1 -DSTATS=1 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-04-17 14:41:00 roa/radeonvii-w2 ASM compilation failed, retrying compilation using NO_ASM
2020-04-17 14:41:00 roa/radeonvii-w2 OpenCL compilation error -11 (args -DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DCARRY32=1 -DDEBUG=1 -DNO_ASM=1 -DSTATS=1 -cl-fast-relaxed-math -cl-std=CL2.0 -DNO_ASM=1)
2020-04-17 14:41:00 roa/radeonvii-w2 C:\Users\ken\AppData\Local\Temp\\OCL2832T1.cl:82:2: error: Conflict: both CARRY32 and CARRY64 requested
#error Conflict: both CARRY32 and CARRY64 requested
^
1 error generated.

error: Clang front-end compilation failed!
Frontend phase failed compilation.
Error: Compiling CL to IR

2020-04-17 14:41:00 roa/radeonvii-w2 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:247 build
2020-04-17 14:41:00 roa/radeonvii-w2 Bye

2020-04-17 14:41:33 gpuowl v6.11-264-g5c977d4-dirty
2020-04-17 14:41:33 config: -device 1 -user kriesel -cpu roa/radeonvii-w2 -use NO_ASM,STATS,DEBUG
2020-04-17 14:41:33 device 1, unique id ''
2020-04-17 14:41:33 roa/radeonvii-w2 852348659 FFT: 48M 4K:12:512 (16.93 bpw)
2020-04-17 14:41:33 roa/radeonvii-w2 Expected maximum carry32: 70BA0000
2020-04-17 14:41:45 roa/radeonvii-w2 OpenCL args "-DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DDEBUG=1 -DNO_ASM=1 -DSTATS=1 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-04-17 14:42:09 roa/radeonvii-w2 OpenCL compilation in 24.22 s
2020-04-17 14:42:13 roa/radeonvii-w2 Exception gpu_error: OUT_OF_RESOURCES carryFused at clwrap.cpp:310 run
2020-04-17 14:42:13 roa/radeonvii-w2 Bye

2020-04-17 14:42:31 gpuowl v6.11-264-g5c977d4-dirty
2020-04-17 14:42:31 config: -device 1 -user kriesel -cpu roa/radeonvii-w2 -use NO_ASM,STATS,DEBUG,ROUNDOFF
2020-04-17 14:42:31 device 1, unique id ''
2020-04-17 14:42:31 roa/radeonvii-w2 852348659 FFT: 48M 4K:12:512 (16.93 bpw)
2020-04-17 14:42:31 roa/radeonvii-w2 Expected maximum carry32: 70BA0000
2020-04-17 14:42:43 roa/radeonvii-w2 OpenCL args "-DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DDEBUG=1 -DNO_ASM=1 -DROUNDOFF=1 -DSTATS=1 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-04-17 14:43:06 roa/radeonvii-w2 OpenCL compilation in 23.37 s
2020-04-17 14:43:11 roa/radeonvii-w2 Exception gpu_error: OUT_OF_RESOURCES carryFused at clwrap.cpp:310 run
2020-04-17 14:43:11 roa/radeonvii-w2 Bye

2020-04-17 14:43:40 gpuowl v6.11-264-g5c977d4-dirty
2020-04-17 14:43:40 config: -device 1 -user kriesel -cpu roa/radeonvii-w2 -use NO_ASM,STATS
2020-04-17 14:43:40 device 1, unique id ''
2020-04-17 14:43:40 roa/radeonvii-w2 852348659 FFT: 48M 4K:12:512 (16.93 bpw)
2020-04-17 14:43:40 roa/radeonvii-w2 Expected maximum carry32: 70BA0000
2020-04-17 14:43:52 roa/radeonvii-w2 OpenCL args "-DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DNO_ASM=1 -DSTATS=1 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-04-17 14:44:10 roa/radeonvii-w2 OpenCL compilation in 18.00 s
2020-04-17 14:44:14 roa/radeonvii-w2 Exception gpu_error: OUT_OF_RESOURCES carryFused at clwrap.cpp:310 run
2020-04-17 14:44:14 roa/radeonvii-w2 Bye

2020-04-17 14:44:54 gpuowl v6.11-264-g5c977d4-dirty
2020-04-17 14:44:54 config: -device 1 -user kriesel -cpu roa/radeonvii-w2 -use NO_ASM,DEBUG
2020-04-17 14:44:54 device 1, unique id ''
2020-04-17 14:44:54 roa/radeonvii-w2 852348659 FFT: 48M 4K:12:512 (16.93 bpw)
2020-04-17 14:44:54 roa/radeonvii-w2 Expected maximum carry32: 70BA0000
2020-04-17 14:45:06 roa/radeonvii-w2 OpenCL args "-DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DDEBUG=1 -DNO_ASM=1 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-04-17 14:45:31 roa/radeonvii-w2 OpenCL compilation in 24.75 s
2020-04-17 14:45:43 roa/radeonvii-w2 852348659 OK 165328000 loaded: blockSize 400, f095cd3f6ba99a30
2020-04-17 14:46:07 roa/radeonvii-w2 852348659 OK 165328800 19.40%; 17545 us/it; ETA 139d 12:21; ebb1fef7de6d72cf (check 9.52s) 4 errors
/[/CODE]
Debug overhead seems to be around 35.%; roundoff 0.03% based on very brief trials.

preda 2020-04-17 22:30

[QUOTE=kriesel;542984]

-use STATS and ROUNDOFF are not listed above. Have they been removed?

There do seem to be some oddities, like getting the message both CARRY32 and CARRY64 have been specified when CARRY32 is specified, or out of resources message and terminate when -use STATS attempted.[/QUOTE]

(the small problems fixed in a recent commit: README p-1, M_PI)

- ROUNDOFF does not exist anymore, it's called STATS now.
- with that exponent, CARRY64 is required so the software inserts it for you. At the same time, you manually request CARRY32. This situation is reported as a conflict. This hand-holding has the advantage to protect against users specifying invalid CARRY32 which would be severe for a naked LL
- do you have more details about the "out of resources" with STATS?

kriesel 2020-04-17 23:34

[QUOTE=preda;543008](the small problems fixed in a recent commit: README p-1, M_PI)

- ROUNDOFF does not exist anymore, it's called STATS now.
- with that exponent, CARRY64 is required so the software inserts it for you. At the same time, you manually request CARRY32. This situation is reported as a conflict. This hand-holding has the advantage to protect against users specifying invalid CARRY32 which would be severe for a naked LL[/QUOTE]Thanks for the above.
[QUOTE]- do you have more details about the "out of resources" with STATS?[/QUOTE]I don't have any more than what the console showed in the preceding post's long CODE section. gpuowl.log says the same; see following.[CODE]2020-04-17 14:43:40 config: -device 1 -user kriesel -cpu roa/radeonvii-w2 -use NO_ASM,STATS
2020-04-17 14:43:40 config: ;,DEBUG,ROUNDOFF
2020-04-17 14:43:40 device 1, unique id ''
2020-04-17 14:43:40 roa/radeonvii-w2 852348659 FFT: 48M 4K:12:512 (16.93 bpw)
2020-04-17 14:43:40 roa/radeonvii-w2 Expected maximum carry32: 70BA0000
2020-04-17 14:43:52 roa/radeonvii-w2 OpenCL args "-DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DNO_ASM=1 -DSTATS=1 -cl-fast-relaxed-math -cl-std=CL2.0 "
2020-04-17 14:44:10 roa/radeonvii-w2 OpenCL compilation in 18.00 s
2020-04-17 14:44:14 roa/radeonvii-w2 Exception gpu_error: OUT_OF_RESOURCES carryFused at clwrap.cpp:310 run
2020-04-17 14:44:14 roa/radeonvii-w2 Bye
[/CODE]I'll try -use STATS on some other exponents / fft lengths.

kriesel 2020-04-18 17:40

-use STATS through the occurrence of a GEC error
 
Finally caught a GEC error with -use STATS in effect, after ~ 30 hours, on gpuowl-win v6.11-259, rx550 that was reliable previously. Maybe the fft length's exponent limit is just slightly too high and this exponent is pushing it. See also [URL]https://mersenneforum.org/showpost.php?p=542963&postcount=2094[/URL] and [URL]https://mersenneforum.org/showthread.php?t=25452[/URL]

The program's help output says the 5M fft is usable up to 95.71M.[CODE]
2020-04-18 00:29:06 condorella/rx550 94741139 OK 28880000 30.48%; 16548 us/it; ETA 12d 14:44; dee8912001b65af1 (check 6.82s) 9 errors
2020-04-18 00:40:08 condorella/rx550 Roundoff: N=40500, max 0.300651, avg 0.205114, sdev 0.012225 (0.059601, 0.061244), max-round 0.400715
2020-04-18 00:40:08 condorella/rx550 Carry: N=40499, max 3ddbbf85, avg 2b51cb2f; CarryM: N=1, max 81bd1be4, avg 81bd1be4
2020-04-18 00:40:15 condorella/rx550 94741139 OK 28920000 30.53%; 16549 us/it; ETA 12d 14:35; 75ec23b835c4110e (check 6.91s) 9 errors
2020-04-18 00:51:17 condorella/rx550 Roundoff: N=40500, max 0.287802, avg 0.205098, sdev 0.012112 (0.059053, 0.060665), max-round 0.398885
2020-04-18 00:51:17 condorella/rx550 Carry: N=40499, max 3ab5e0b9, avg 2b54d6af; CarryM: N=1, max 8a7e172e, avg 8a7e172e
2020-04-18 00:51:23 condorella/rx550 94741139 OK 28960000 30.57%; 16543 us/it; ETA 12d 14:17; bd54d7293f55a663 (check 6.80s) 9 errors
2020-04-18 01:02:25 condorella/rx550 Roundoff: N=40500, max 0.303200, avg 0.205065, sdev 0.012201 (0.059499, 0.061136), max-round 0.400285
2020-04-18 01:02:25 condorella/rx550 Carry: N=40499, max 3c80220d, avg 2b50c51e; CarryM: N=1, max 904d2cef, avg 904d2cef
2020-04-18 01:02:32 condorella/rx550 94741139 OK 29000000 30.61%; 16550 us/it; ETA 12d 14:14; 4abacbf0300b8d02 (check 6.78s) 9 errors
2020-04-18 01:13:34 condorella/rx550 Roundoff: N=40500, [B]max 0.507677[/B], avg 0.205098, sdev 0.012272 (0.059834, 0.061490), max-round 0.401449
2020-04-18 01:13:34 condorella/rx550 Carry: N=40499, max 3ca2ec25, avg 2b5134d2; CarryM: N=1, max 8ee0956e, avg 8ee0956e
2020-04-18 01:13:41 condorella/rx550 94741139 [COLOR=Red][B]EE[/B][/COLOR] 29040000 30.65%; 16547 us/it; ETA 12d 14:00; 94dffebb91e5bac9 (check 6.79s) 9 errors
2020-04-18 01:13:48 condorella/rx550 94741139 OK 29000000 loaded: blockSize 400, 4abacbf0300b8d02
2020-04-18 01:24:50 condorella/rx550 Roundoff: N=40928, max 0.308405, avg 0.205040, sdev 0.012282 (0.059900, 0.061560), max-round 0.401552
2020-04-18 01:24:50 condorella/rx550 Carry: N=40926, max 3ca2ec25, avg 2b4e7a6d; CarryM: N=2, max 879ffbad, avg 71a4f7ac
2020-04-18 01:24:57 condorella/rx550 94741139 OK 29040000 30.65%; 16552 us/it; ETA 12d 14:05; 94dffebb91e5bac9 (check 6.81s) 10 errors
2020-04-18 01:35:59 condorella/rx550 Roundoff: N=40500, max 0.289172, avg 0.205136, sdev 0.012255 (0.059742, 0.061392), max-round 0.401219
2020-04-18 01:35:59 condorella/rx550 Carry: N=40499, max 3cd22042, avg 2b55bcd1; CarryM: N=1, max 8e3bb9df, avg 8e3bb9df
2020-04-18 01:36:06 condorella/rx550 94741139 OK 29080000 30.69%; 16549 us/it; ETA 12d 13:50; 1af282f82b04e1e9 (check 6.79s) 10 errors
2020-04-18 01:47:08 condorella/rx550 Roundoff: N=40500, max 0.286431, avg 0.205259, sdev 0.012291 (0.059879, 0.061538), max-round 0.401912
2020-04-18 01:47:08 condorella/rx550 Carry: N=40499, max 3c82a79c, avg 2b52a12b; CarryM: N=1, max 7cc1bbcb, avg 7cc1bbcb
2020-04-18 01:47:14 condorella/rx550 94741139 OK 29120000 30.74%; 16545 us/it; ETA 12d 13:35; ca53d4b9ba5f4404 (check 6.80s) 10 errors
2020-04-18 01:58:16 condorella/rx550 Roundoff: N=40500, max 0.311799, avg 0.205057, sdev 0.012195 (0.059469, 0.061105), max-round 0.400171
2020-04-18 01:58:16 condorella/rx550 Carry: N=40499, max 390cef07, avg 2b50e612; CarryM: N=1, max 81a290db, avg 81a290db
2020-04-18 01:58:23 condorella/rx550 94741139 OK 29160000 30.78%; 16549 us/it; ETA 12d 13:29; c18cada5e14a957b (check 6.79s) 10 errors
2020-04-18 02:09:25 condorella/rx550 Roundoff: N=40500, max 0.306561, avg 0.205154, sdev 0.012206 (0.059498, 0.061135), max-round 0.400454
2020-04-18 02:09:25 condorella/rx550 Carry: N=40499, max 3b2efd5d, avg 2b527db0; CarryM: N=1, max 82169439, avg 82169439
2020-04-18 02:09:32 condorella/rx550 94741139 OK 29200000 30.82%; 16544 us/it; ETA 12d 13:12; e8410c7d73b8f089 (check 6.81s) 10 errors
2020-04-18 02:20:34 condorella/rx550 Roundoff: N=40500, max 0.299344, avg 0.205238, sdev 0.012232 (0.059601, 0.061244), max-round 0.400957
2020-04-18 02:20:34 condorella/rx550 Carry: N=40499, max 39891e5b, avg 2b510501; CarryM: N=1, max 7aa1e3fc, avg 7aa1e3fc[/CODE]

ewmayer 2020-04-18 19:37

Quibble - reported roundoff error should never be > 0.5, as the fractional part is computed as abs(x - rnd(x)).

preda 2020-04-18 21:57

I see in this case that the residue was correct when the error was reported (because the line with "OK" on the same iteration has the same residue). Is this a pattern -- do you see the same for the previous errors?

The most likely explanation is still GPU error (either memory-related or processor related). Do you have another similar GPU to try on, for comparison?

The roundoff being large is most likely a red herring here.

[QUOTE=kriesel;543080]Finally caught a GEC error with -use STATS in effect, after ~ 30 hours, on gpuowl-win v6.11-259, rx550 that was reliable previously. Maybe the fft length's exponent limit is just slightly too high and this exponent is pushing it. See also [URL]https://mersenneforum.org/showpost.php?p=542963&postcount=2094[/URL] and [URL]https://mersenneforum.org/showthread.php?t=25452[/URL]

The program's help output says the 5M fft is usable up to 95.71M.[CODE]
2020-04-18 01:13:34 condorella/rx550 Roundoff: N=40500, [B]max 0.507677[/B], avg 0.205098, sdev 0.012272 (0.059834, 0.061490), max-round 0.401449
2020-04-18 01:13:34 condorella/rx550 Carry: N=40499, max 3ca2ec25, avg 2b5134d2; CarryM: N=1, max 8ee0956e, avg 8ee0956e
2020-04-18 01:13:41 condorella/rx550 94741139 [COLOR=Red][B]EE[/B][/COLOR] 29040000 30.65%; 16547 us/it; ETA 12d 14:00; 94dffebb91e5bac9 (check 6.79s) 9 errors
2020-04-18 01:13:48 condorella/rx550 94741139 OK 29000000 loaded: blockSize 400, 4abacbf0300b8d02
2020-04-18 01:24:50 condorella/rx550 Roundoff: N=40928, max 0.308405, avg 0.205040, sdev 0.012282 (0.059900, 0.061560), max-round 0.401552
2020-04-18 01:24:50 condorella/rx550 Carry: N=40926, max 3ca2ec25, avg 2b4e7a6d; CarryM: N=2, max 879ffbad, avg 71a4f7ac
2020-04-18 01:24:57 condorella/rx550 94741139 OK 29040000 30.65%; 16552 us/it; ETA 12d 14:05; 94dffebb91e5bac9 (check 6.81s) 10 errors
[/CODE][/QUOTE]

preda 2020-04-18 22:06

[QUOTE=ewmayer;543087]Quibble - reported roundoff error should never be > 0.5, as the fractional part is computed as abs(x - rnd(x)).[/QUOTE]

More precisely, given reverse-weight "w" and FFT-output word "x", the error is computed as:

abs(FMA(x, w, -rint(x * w)));

which, arguably, can be larger that 0.5.

kriesel 2020-04-18 22:30

[QUOTE=preda;543102]I see in this case that the residue was correct when the error was reported (because the line with "OK" on the same iteration has the same residue). Is this a pattern -- do you see the same for the previous errors?

The most likely explanation is still GPU error (either memory-related or processor related). Do you have another similar GPU to try on, for comparison?

The roundoff being large is most likely a red herring here.[/QUOTE]
I now have -use STATS on 3 gpus, for the time being, and will report anything that seems interesting. The performance drag is considerable. That 30 hours to catch one EE with stats could have been run in 25 hours without stats. I may update an rx480 instance and add it to the test; that's on the same dual-hex-core system so might be same cpu core but more likely not unless I start setting core affinities. More likely the same complement of system ram though.
[CODE]Preda asks, Were the earlier occurrences' res64s correct despite the EEs?
yes 6, unknown 2, no 0; the ayes have it.

1,2, can't tell from logs
v6.11-257
2020-04-14 04:59:31 condorella/rx550 94741139 OK 6800000 7.18%; 13712 us/it; ETA 13d 22:58; cab0b7a0fb0cc066 (check 5.65s)
2020-04-14 05:45:17 condorella/rx550 94741139 EE 7000000 7.39%; 13710 us/it; ETA 13d 22:09; 5e731e02beb738ea (check 5.61s)
2020-04-14 05:45:23 condorella/rx550 94741139 OK 6800000 loaded: blockSize 400, cab0b7a0fb0cc066
2020-04-14 05:54:37 condorella/rx550 94741139 OK 6840000 7.22%; 13711 us/it; ETA 13d 22:46; 4f7b98cea0650fb9 (check 5.63s) 1 errors
2020-04-14 06:22:07 condorella/rx550 94741139 OK 6960000 7.35%; 13714 us/it; ETA 13d 22:24; a47542d527e8a188 (check 5.63s) 1 errors
2020-04-14 06:49:37 condorella/rx550 94741139 EE 7080000 7.47%; 13711 us/it; ETA 13d 21:53; b71198a3d710f35b (check 5.62s) 1 errors
2020-04-14 06:49:43 condorella/rx550 94741139 OK 6960000 loaded: blockSize 400, a47542d527e8a188
2020-04-14 07:09:07 condorella/rx550 94741139 OK 7040000 7.43%; 13716 us/it; ETA 13d 22:08; b7ef942604ff7e9d (check 5.62s) 2 errors
2020-04-14 07:27:29 condorella/rx550 94741139 OK 7120000 7.52%; 13710 us/it; ETA 13d 21:42; 5c248da8f1d53306 (check 5.64s) 2 errors
2020-04-14 07:45:51 condorella/rx550 94741139 OK 7200000 7.60%; 13713 us/it; ETA 13d 21:27; 679505ffa3183075 (check 5.63s) 2 errors

3,4 yes
2020-04-14 10:31:16 condorella/rx550 94741139 OK 7920000 8.36%; 13706 us/it; ETA 13d 18:32; 65631a9de20e1074 (check 5.80s) 2 errors
2020-04-14 10:49:38 condorella/rx550 94741139 EE 8000000 8.44%; 13714 us/it; ETA 13d 18:27; d5fca8bd937ae862 (check 5.62s) 2 errors
2020-04-14 10:49:44 condorella/rx550 94741139 OK 7920000 loaded: blockSize 400, 65631a9de20e1074
2020-04-14 10:58:58 condorella/rx550 94741139 EE 7960000 8.40%; 13708 us/it; ETA 13d 18:27; 5512a7950572a594 (check 5.61s) 3 errors
2020-04-14 10:59:04 condorella/rx550 94741139 OK 7920000 loaded: blockSize 400, 65631a9de20e1074
2020-04-14 11:08:18 condorella/rx550 94741139 OK 7960000 8.40%; 13716 us/it; ETA 13d 18:38; 5512a7950572a594 (check 5.62s) 4 errors
2020-04-14 11:17:32 condorella/rx550 94741139 OK 8000000 8.44%; 13712 us/it; ETA 13d 18:24; d5fca8bd937ae862 (check 5.73s) 4 errors
2020-04-14 11:26:45 condorella/rx550 94741139 OK 8040000 8.49%; 13706 us/it; ETA 13d 18:06; 8443b148d25a6ac2 (check 5.63s) 4 errors

5 yes
2020-04-14 15:17:31 condorella/rx550 94741139 OK 9040000 9.54%; 13708 us/it; ETA 13d 14:20; 1d49e02d84ebc14b (check 5.80s) 4 errors
2020-04-14 15:26:44 condorella/rx550 94741139 EE 9080000 9.58%; 13709 us/it; ETA 13d 14:13; 071b118bfd9c270e (check 5.61s) 4 errors
2020-04-14 15:26:50 condorella/rx550 94741139 OK 9040000 loaded: blockSize 400, 1d49e02d84ebc14b
2020-04-14 15:36:04 condorella/rx550 94741139 OK 9080000 9.58%; 13714 us/it; ETA 13d 14:19; 071b118bfd9c270e (check 5.62s) 5 errors
2020-04-14 15:45:18 condorella/rx550 94741139 OK 9120000 9.63%; 13708 us/it; ETA 13d 14:02; 6a7e1626bbbcb964 (check 5.63s) 5 errors

6 yes
2020-04-14 20:49:58 condorella/rx550 94741139 OK 10440000 11.02%; 13712 us/it; ETA 13d 09:06; d2720a83125e79ee (check 5.62s) 5 errors
2020-04-14 20:59:11 condorella/rx550 94741139 EE 10480000 11.06%; 13717 us/it; ETA 13d 09:03; 947499a4cc1fd4fe (check 5.61s) 5 errors
2020-04-14 20:59:18 condorella/rx550 94741139 OK 10440000 loaded: blockSize 400, d2720a83125e79ee
2020-04-14 21:08:32 condorella/rx550 94741139 OK 10480000 11.06%; 13722 us/it; ETA 13d 09:10; 947499a4cc1fd4fe (check 5.63s) 6 errors

7,8 yes
2020-04-15 00:31:38 condorella/rx550 94741139 OK 11360000 11.99%; 13710 us/it; ETA 13d 05:33; 98dbac1057665909 (check 5.63s) 6 errors
2020-04-15 00:40:52 condorella/rx550 94741139 EE 11400000 12.03%; 13717 us/it; ETA 13d 05:33; a0464605ba0b9bf6 (check 5.61s) 6 errors
2020-04-15 00:40:58 condorella/rx550 94741139 OK 11360000 loaded: blockSize 400, 98dbac1057665909
2020-04-15 00:50:12 condorella/rx550 94741139 OK 11400000 12.03%; 13715 us/it; ETA 13d 05:30; a0464605ba0b9bf6 (check 5.63s) 7 errors
2020-04-15 00:59:25 condorella/rx550 94741139 EE 11440000 12.07%; 13705 us/it; ETA 13d 05:07; 59fd18a546ca4936 (check 5.61s) 7 errors
2020-04-15 00:59:31 condorella/rx550 94741139 OK 11400000 loaded: blockSize 400, a0464605ba0b9bf6
2020-04-15 01:08:45 condorella/rx550 94741139 OK 11440000 12.07%; 13712 us/it; ETA 13d 05:17; 59fd18a546ca4936 (check 5.72s) 8 errors
[/CODE]

preda 2020-04-19 00:02

[QUOTE=kriesel;543107]I now have -use STATS on 3 gpus, for the time being, and will report anything that seems interesting. The performance drag is considerable. That 30 hours to catch one EE with stats could have been run in 25 hours without stats. I may update an rx480 instance and add it to the test; that's on the same dual-hex-core system so might be same cpu core but more likely not unless I start setting core affinities. More likely the same complement of system ram though.
[CODE]Preda asks, Were the earlier occurrences' res64s correct despite the EEs?
yes 6, unknown 2, no 0; the ayes have it.
[/CODE]
[/QUOTE]

Interesting. Are you running with any -use options, in particular CARRYM32 ?

I'm trying to understand what would produce the "spurious errors" you see. The main computation is correct, the errors are affecting the check only (or something related to the check).

I don't see any particular benefit in running this with STATS. There is no danger of a genuine overflow for those exponents.

kriesel 2020-04-19 00:54

[QUOTE=preda;543113]Interesting. Are you running with any -use options, in particular CARRYM32 ?[/QUOTE]NO_ASM and sometimes STATS. That's it.

I gave up for now on trying to keep up on the rapidly shifting availability of optimization choices. They also made operation fragile, since they sometimes would be illegal and terminate the program when fft length changed from one worktodo exponent to the next. What was optimal for one fft length was not allowed on another and the program would terminate, costing hours or days.

[QUOTE=kriesel;542963]All on the same RX550 gpu and host system, that has reliably run without GEC errors for multiple 5M PRP first-tests on v6.11-134:
90710093 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.30 bits/word
92858651 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.71 bits/word
93461911 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.83 bits/word
93873049 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.90 bits/word
94418047 began FFT 5120K: Width 256x4, Height 64x4, Middle 10; 18.01 bits/word, ran ok. finished at 6M because of problems encountered at 7M on 131.5M, requiring -fft +6

All runs [B]-use NO_ASM[/B]
V6.11-134 on rx550 from start on 94741139; no GEC to 2.8M iterations; this was at 6M fft, 18370 us/iter due to problems seen with 7M fft (leftover -fft +6 config.txt content)
V6.11-257 continuation on same rx550, 5M fft 1K:5:512; 14310 us/iter to 6.0544M iterations
V6.11-257 continuation on same RX550, no fft specification; 1K:10:256 chosen by program; ~13712 us/iter, 9 GEC errors by 22.1M iterations.
V6.11-259 continuation with -use STATS on same RX550, 1K:10:256, ~16542 us/iter, no additional GEC through 25.64M iterations.
V6.11-259 continuation without -use STATS underway now, 13750 usec/iter[/QUOTE]

V6.11-257 folder's entire config.txt:[CODE]-device 1 -user kriesel -cpu condorella/rx550 -yield -maxAlloc 3600 -use NO_ASM
[/CODE]V6.11-259 folder's entire config.txt:[CODE]-device 1 -user kriesel -cpu condorella/rx550 -yield -maxAlloc 3600 -use NO_ASM,STATS
[/CODE]A look through the Windows system log showed nothing relevant around the times of the EE occurrences.

kriesel 2020-04-19 17:48

[QUOTE=preda;543113]I don't see any particular benefit in running this with STATS. There is no danger of a genuine overflow for those exponents.[/QUOTE]I've continued STATS a while. If nothing else, it provides confirmation of the margins built in against roundoff error.

The RX480 instance has been updated to v6.11-264 and has not shown any issues on the same system. I own multiple RX550s, and this may be one of the older ones.

EEs #11 and 12 also matched their respective ok res64s:[CODE]2020-04-19 08:41:38 condorella/rx550 94741139 OK 35760000 37.74%; 16554 us/it; ETA 11d 07:13; f181b958d4edbb40 (check 6.79s) 10 errors
2020-04-19 08:52:40 condorella/rx550 Roundoff: N=40500, max 0.507109, avg 0.205147, sdev 0.012284 (0.059880, 0.061539), max-round 0.401694
2020-04-19 08:52:40 condorella/rx550 Carry: N=40499, max 39fed29e, avg 2b55373f; CarryM: N=1, max 8d57bc50, avg 8d57bc50
2020-04-19 08:52:47 condorella/rx550 94741139 EE 35800000 37.79%; 16553 us/it; ETA 11d 07:01; 09e2502ff060aa59 (check 6.80s) 10 errors
2020-04-19 08:52:54 condorella/rx550 94741139 OK 35760000 loaded: blockSize 400, f181b958d4edbb40
2020-04-19 09:03:56 condorella/rx550 Roundoff: N=40928, max 0.292142, avg 0.205081, sdev 0.012278 (0.059870, 0.061527), max-round 0.401532
2020-04-19 09:03:56 condorella/rx550 Carry: N=40926, max 39fed29e, avg 2b52fd08; CarryM: N=2, max 7f8fee13, avg 6ae2001d
2020-04-19 09:04:03 condorella/rx550 94741139 OK 35800000 37.79%; 16549 us/it; ETA 11d 06:57; 09e2502ff060aa59 (check 6.79s) 11 errors
2020-04-19 09:15:05 condorella/rx550 Roundoff: N=40500, max 0.308341, avg 0.205111, sdev 0.012159 (0.059278, 0.060903), max-round 0.399648
2020-04-19 09:15:05 condorella/rx550 Carry: N=40499, max 3ff202a2, avg 2b532f47; CarryM: N=1, max 79fdf537, avg 79fdf537
2020-04-19 09:15:11 condorella/rx550 94741139 OK 35840000 37.83%; 16551 us/it; ETA 11d 06:47; 72081dbe2707fc8b (check 6.80s) 11 errors
...
2020-04-19 12:02:25 condorella/rx550 94741139 OK 36440000 38.46%; 16554 us/it; ETA 11d 04:05; 371ec60215fd8888 (check 6.83s) 11 errors
2020-04-19 12:13:28 condorella/rx550 Roundoff: N=40500, max 0.317265, avg 0.205178, sdev 0.012361 (0.060245, 0.061923), max-round 0.402951
2020-04-19 12:13:28 condorella/rx550 Carry: N=40499, max 39e4bbc8, avg 2b541ee3; CarryM: N=1, max 88c9f1c4, avg 88c9f1c4
2020-04-19 12:13:34 condorella/rx550 94741139 OK 36480000 38.50%; 16552 us/it; ETA 11d 03:53; cf6fab81c6c3e53d (check 6.79s) 11 errors
2020-04-19 12:24:37 condorella/rx550 Roundoff: N=40500, max 0.507318, avg 0.205169, sdev 0.012261 (0.059762, 0.061414), max-round 0.401350
2020-04-19 12:24:37 condorella/rx550 Carry: N=40499, max 4031f5fd, avg 2b513507; CarryM: N=1, max 7ca22e72, avg 7ca22e72
2020-04-19 12:24:43 condorella/rx550 94741139 EE 36520000 38.55%; 16557 us/it; ETA 11d 03:46; a73fe45b46915805 (check 6.78s) 11 errors
2020-04-19 12:24:51 condorella/rx550 94741139 OK 36480000 loaded: blockSize 400, cf6fab81c6c3e53d
2020-04-19 12:35:56 condorella/rx550 Roundoff: N=40928, max 0.289129, avg 0.205117, sdev 0.012271 (0.059823, 0.061479), max-round 0.401450
2020-04-19 12:35:56 condorella/rx550 Carry: N=40926, max 4031f5fd, avg 2b4ebaf7; CarryM: N=2, max 9fd2a241, avg 7a4f0e2d
2020-04-19 12:36:03 condorella/rx550 94741139 OK 36520000 38.55%; 16643 us/it; ETA 11d 05:10; a73fe45b46915805 (check 6.79s) 12 errors[/CODE]I've just migrated this run in progress again, to v6.11-268 on the same gpu, same config.txt.

kriesel 2020-04-19 18:15

Gpuowl-win v6.11-268-g0d07d21 build
 
2 Attachment(s)
Under test now.

ewmayer 2020-04-19 21:35

[QUOTE=preda;543106]More precisely, given reverse-weight "w" and FFT-output word "x", the error is computed as:

abs(FMA(x, w, -rint(x * w)));

which, arguably, can be larger that 0.5.[/QUOTE]

If we agree that x is a given convolution output, which is expected to be an integer using exact arithmetic, and the fractional part of x as computed using inexact arithmetic is the absolute value of the difference between x and nearest_int(x), that is by definition in [0,0.5]. If your actual way of computing said fractional error itself introduces addition error so that your frac(x) != abs(x - nearest_int(x)), that is a separate issue. But appreciate the clarification.

Since gpuOwl is not using stats about such fractional errors to decide if the FFT length needs to be upped, accurately computing frac(x) is less important than for programs which do make use of same.

Prime95 2020-04-20 03:20

[QUOTE=ewmayer;543200]Since gpuOwl is not using stats about such fractional errors to decide if the FFT length needs to be upped, accurately computing frac(x) is less important than for programs which do make use of same.[/QUOTE]

Well, we are using the stats to decide when the FFT length needs to be upped. However, the current code is just as likely to compute the fractional part low vs. high and were using the average max roundoff error so it all works in out in the end.

Over the last few weeks we've managed to increase the maximum exponent that can be tested with a 5M FFT by over a million.

I had to do this because I'm oh so close to being assigned exponents that would have pushed me into the 5.5M FFT. I know, very selfish :)

kriesel 2020-04-20 12:43

[QUOTE=kriesel;543176]I've just migrated this run in progress again, to v6.11-268 on the same gpu, same config.txt.[/QUOTE]This was still V6.11-259, same rx550 and system:

For EEs # 13, 14, 15, res64s also repeated, and max > 0.50 were observed.[CODE]2020-04-19 18:35:46 condorella/rx550 94741139 OK 37800000 39.90%; 16558 us/it; ETA 10d 21:54; c6ff48ecd994dbaf (check 6.81s) 12 errors
2020-04-19 18:46:49 condorella/rx550 Roundoff: N=40500, [B]max 0.507793[/B], avg 0.205110, sdev 0.012265 (0.059799, 0.061453), max-round 0.401355
2020-04-19 18:46:49 condorella/rx550 Carry: N=40499, max 3938aaef, avg 2b4f1494; CarryM: N=1, max 8da124ce, avg 8da124ce
2020-04-19 18:46:55 condorella/rx550 94741139 [B]EE[/B] 37840000 39.94%; 16556 us/it; ETA 10d 21:41; 55bb210f90f33b08 (check 6.77s) 12 errors
2020-04-19 18:47:03 condorella/rx550 94741139 OK 37800000 loaded: blockSize 400, c6ff48ecd994dbaf
2020-04-19 18:58:05 condorella/rx550 Roundoff: N=40928, max 0.285702, avg 0.205056, sdev 0.012282 (0.059894, 0.061553), max-round 0.401561
2020-04-19 18:58:05 condorella/rx550 Carry: N=40926, max 3938aaef, avg 2b4d3aac; CarryM: N=2, max 8184df04, avg 6c12c45b
2020-04-19 18:58:12 condorella/rx550 94741139 OK 37840000 39.94%; 16561 us/it; ETA 10d 21:45; 55bb210f90f33b08 (check 6.80s) 13 errors
2020-04-19 19:09:14 condorella/rx550 Roundoff: N=40500, max 0.280020, avg 0.205015, sdev 0.012088 (0.058962, 0.060569), max-round 0.398423
2020-04-19 19:09:14 condorella/rx550 Carry: N=40499, max 3a6f97e3, avg 2b4b4bdd; CarryM: N=1, max 80be9c3a, avg 80be9c3a
...
2020-04-19 21:23:08 condorella/rx550 94741139 OK 38360000 40.49%; 16551 us/it; ETA 10d 19:12; 7a4b392aea8ba6b3 (check 6.79s) 13 errors
2020-04-19 21:34:10 condorella/rx550 Roundoff: N=40500, [B]max 0.505375[/B], avg 0.205216, sdev 0.012323 (0.060051, 0.061719), max-round 0.402390
2020-04-19 21:34:10 condorella/rx550 Carry: N=40499, max 39289dae, avg 2b50054a; CarryM: N=1, max 89e4790c, avg 89e4790c
2020-04-19 21:34:17 condorella/rx550 94741139 [B]EE[/B] 38400000 40.53%; 16552 us/it; ETA 10d 19:03; 72bd6fa0e937b804 (check 6.77s) 13 errors
2020-04-19 21:34:24 condorella/rx550 94741139 OK 38360000 loaded: blockSize 400, 7a4b392aea8ba6b3
2020-04-19 21:45:26 condorella/rx550 Roundoff: N=40928, max 0.308075, avg 0.205175, sdev 0.012357 (0.060226, 0.061904), max-round 0.402885
2020-04-19 21:45:26 condorella/rx550 Carry: N=40926, max 39289dae, avg 2b4e41e9; CarryM: N=2, max 87679ef6, avg 75444180
2020-04-19 21:45:33 condorella/rx550 94741139 OK 38400000 40.53%; 16549 us/it; ETA 10d 19:00; 72bd6fa0e937b804 (check 6.84s) 14 errors
2020-04-19 21:56:35 condorella/rx550 Roundoff: N=40500, max 0.297735, avg 0.205119, sdev 0.012226 (0.059606, 0.061249), max-round 0.400741
2020-04-19 21:56:35 condorella/rx550 Carry: N=40499, max 3b844536, avg 2b5043a2; CarryM: N=1, max 8422e711, avg 8422e711
...
2020-04-20 01:50:43 condorella/rx550 94741139 OK 39280000 41.46%; 16541 us/it; ETA 10d 14:49; 656629c4657b02f0 (check 6.84s) 14 errors
2020-04-20 02:01:44 condorella/rx550 Roundoff: N=40500, [B]max 0.503906[/B], avg 0.205072, sdev 0.012220 (0.059587, 0.061229), max-round 0.400584
2020-04-20 02:01:44 condorella/rx550 Carry: N=40499, max 3d252f79, avg 2b52cd29; CarryM: N=1, max 7af459d9, avg 7af459d9
2020-04-20 02:01:51 condorella/rx550 94741139 [B]EE [/B]39320000 41.50%; 16542 us/it; ETA 10d 14:40; 4b750a9575434d29 (check 6.77s) 14 errors
2020-04-20 02:01:58 condorella/rx550 94741139 OK 39280000 loaded: blockSize 400, 656629c4657b02f0
2020-04-20 02:13:00 condorella/rx550 Roundoff: N=40928, max 0.302707, avg 0.205027, sdev 0.012249 (0.059741, 0.061392), max-round 0.401003
2020-04-20 02:13:00 condorella/rx550 Carry: N=40926, max 3d252f79, avg 2b513065; CarryM: N=2, max 82276967, avg 704e0507
2020-04-20 02:13:07 condorella/rx550 94741139 OK 39320000 41.50%; 16543 us/it; ETA 10d 14:40; 4b750a9575434d29 (check 6.80s) 15 errors
2020-04-20 02:24:09 condorella/rx550 Roundoff: N=40500, max 0.297502, avg 0.205151, sdev 0.012245 (0.059690, 0.061338), max-round 0.401078
2020-04-20 02:24:09 condorella/rx550 Carry: N=40499, max 3de87af2, avg 2b53fed7; CarryM: N=1, max 7d3fca8a, avg 7d3fca8a[/CODE]I will swap out the RX550 for a different unit after a trial of v6.11-268 if it also produces such EE occurrences.

LaurV 2020-04-20 15:50

1 Attachment(s)
The biggest issue I see now with gpuOwl+LL is the fact that when something like this happens, you need to restart both of them from scratch, even if you would be able to detect when the "thing" happens, if the cards are just a bit "out of phase" (which they ARE, because one is always a bit faster), there is no way to know which one is good and which one is bad, and there is no way to resume from THAT specific iteration. We tried old switches that used to work, like -saveStep or variants, they are not in the help anymore, but we hoped... and hoped...
[ATTACH]22071[/ATTACH]

We want checkpoint files, called "exponent.iteration.residue.whatever", otherwise we are doomed to waste two R7s for few hours in average every time a naughty bit humps here and there...

And non-zero shift...

And vanilla ice cream...

kriesel 2020-04-20 16:21

[QUOTE=LaurV;543250]The biggest issue I see now with gpuOwl+LL is the fact that when something like this happens, you need to restart both of them from scratch[/QUOTE]You don't do backups?
You could try a tiebreaker third gpu. And/or one running CUDALucas on a suitable gpu.
From the CUDALucas.ini file, [CODE]# SaveAllCheckpoints is the same as the -s option. When active, CUDALucas will
# save each checkpoint separately in the folder specified in the "SaveFolder"
# option above. This is a binary option; set to 1 to activate, 0 to de-activate.

SaveAllCheckpoints=1


# This option is the name of the folder where the separate checkpoint files are
# saved. This option is only checked if SaveAllCheckpoints is activated.

SaveFolder=savefiles[/CODE] [URL]https://www.mersenneforum.org/showpost.php?p=489059&postcount=2[/URL]

LaurV 2020-04-20 17:10

As usual, unrelated. Lots of clutter.

[QUOTE=LaurV;543250]gpuOwl+LL[/QUOTE]

ewmayer 2020-04-20 21:20

[QUOTE=Prime95;543226]Over the last few weeks we've managed to increase the maximum exponent that can be tested with a 5M FFT by over a million.

I had to do this because I'm oh so close to being assigned exponents that would have pushed me into the 5.5M FFT. I know, very selfish :)[/QUOTE]

Nice! So what are the default maxp limits for 5 and 5.5M in the latest commit? And how conservative are those, in your estimation?

[p.s.: It's only selfish if you have said improvements in place in your local dev-branch, and refuse to share. :]

kriesel 2020-04-20 22:04

LaurV:
You have choices. We all do. Shown before, 3 work arounds:
a) backups. User picks how often. Re-start runs from the point in time of last backup with matched res64s.
b) tie-breaker 3rd run. If two or 3 match, great; if none match, some erred.
c) CUDALucas as a run. It has save files each n steps, but requires NVIDIA. It has long been the standard for LL on gpu. It can be rerun from the last save file before the res64 mismatch.
A block of text how CUDALucas does it was mostly meant for Preda, whose time as a great coder is precious. (That fits for a few more people in GIMPS too.) I don't know if Preda has run CUDALucas. I know you have. Others who read this thread may not have. The choice for the gpuowl user to set save step would be good.

And:
d) code the change you want, and give it to Preda, as George, SELROC, chengsun, kracker etc have done for gpuowl, and others have done for other GIMPS software.
e) do single tests and wait for others to double check them, like most users do, with other software and shift.
f) wait until the feature set you want appears

ALL on topic, as was [URL]https://www.mersenneforum.org/showpost.php?p=543260&postcount=2111[/URL]

Prime95 2020-04-20 22:28

[QUOTE=ewmayer;543305]Nice! So what are the default maxp limits for 5 and 5.5M in the latest commit? And how conservative are those, in your estimation?[/QUOTE]

From gpuowl -h

[CODE]FFT 5M [ 7.86M - 97.42M] 1K:10:256 1K:5:512 256:10:1K 512:10:512 512:5:1K
FFT 5.50M [ 8.65M - 106.63M] 1K:11:256 256:11:1K 512:11:512
FFT 6M [ 9.44M - 115.86M] 1K:12:256 1K:6:512 1K:3:1K 256:12:1K 512:12:512 512:6:1K 4K:3:256
[/CODE]

I'd say the limits are aggressive.

ewmayer 2020-04-20 22:54

[QUOTE=Prime95;543316]From gpuowl -h

[CODE]FFT 5M [ 7.86M - 97.42M] 1K:10:256 1K:5:512 256:10:1K 512:10:512 512:5:1K
FFT 5.50M [ 8.65M - 106.63M] 1K:11:256 256:11:1K 512:11:512
FFT 6M [ 9.44M - 115.86M] 1K:12:256 1K:6:512 1K:3:1K 256:12:1K 512:12:512 512:6:1K 4K:3:256
[/CODE]

I'd say the limits are aggressive.[/QUOTE]

Indeed - from the version I'm currently on, v6.11-238-g62a3025-dirty:
[code]FFT 5M [ 7.86M - 95.71M] 1K-256-10 256-1K-10 512-512-10
FFT 5632K [ 8.65M - 105.06M] 1K-256-11 256-1K-11 512-512-11
FFT 6M [ 9.44M - 114.40M] 1K-256-12 1K-512-6 256-1K-12 256-2K-6 512-512-12 512-1K-6 2K-256-6[/code]
Just updated to current... wait. there's an issue related to a small change I made in my local primenet.py, which is to re-add a couple lines (i.e. to match the way the Mlucas primenet.py does things) so that '-t 0' means 'run py-script just once and quit'. Renamed my custom version, now we're good.

kriesel 2020-04-20 23:42

[QUOTE=kriesel;543240]I will swap out the RX550 for a different unit after a trial of v6.11-268 if it also produces such EE occurrences.[/QUOTE]Time to swap it out. V6.11-268 had EE #16.
[CODE]2020-04-20 13:25:12 condorella/rx550 94741139 OK 41850000 44.17%; 14679 us/it; ETA 8d 23:40; 24439ce356cbcd12 (check 6.02s) 15 errors
2020-04-20 13:37:26 condorella/rx550 Roundoff: N=50525, mean 0.202943, SD 0.012035, CV 0.059305, [B]max 0.507728[/B], pErr 0.000001
2020-04-20 13:37:26 condorella/rx550 Carry: N=50524, max 3ba0c0a4, avg 2b56dd02; CarryM: N=1, max 7ac075bf, avg 7ac075bf
2020-04-20 13:37:32 condorella/rx550 94741139 [B]EE[/B] 41900000 44.23%; 14680 us/it; ETA 8d 23:29; [B]6dead1fc3993bd7b[/B] (check 6.01s) 15 errors
2020-04-20 13:37:39 condorella/rx550 94741139 OK 41850000 loaded: blockSize 400, 24439ce356cbcd12
2020-04-20 13:49:52 condorella/rx550 Roundoff: N=50953, mean 0.202905, SD 0.012028, CV 0.059281, max 0.299187, pErr 0.000001
2020-04-20 13:49:52 condorella/rx550 Carry: N=50951, max 3ba0c0a4, avg 2b54ff6f; CarryM: N=2, max 825305ba, avg 6b44b674
2020-04-20 13:49:58 condorella/rx550 94741139 [B]OK[/B] 41900000 44.23%; 14670 us/it; ETA 8d 23:20; [B]6dead1fc3993bd7b[/B] (check 6.03s) 16 errors
2020-04-20 14:02:12 condorella/rx550 Roundoff: N=50525, mean 0.203002, SD 0.012050, CV 0.059358, max 0.305012, pErr 0.000001
2020-04-20 14:02:12 condorella/rx550 Carry: N=50524, max 3b45831d, avg 2b4ff588; CarryM: N=1, max 814cae6d, avg 814cae6d[/CODE]

preda 2020-04-21 00:26

[QUOTE=ewmayer;543318]Indeed - from the version I'm currently on, v6.11-238-g62a3025-dirty:
[code]FFT 5M [ 7.86M - 95.71M] 1K-256-10 256-1K-10 512-512-10
FFT 5632K [ 8.65M - 105.06M] 1K-256-11 256-1K-11 512-512-11
FFT 6M [ 9.44M - 114.40M] 1K-256-12 1K-512-6 256-1K-12 256-2K-6 512-512-12 512-1K-6 2K-256-6[/code]
Just updated to current... wait. there's an issue related to a small change I made in my local primenet.py, which is to re-add a couple lines (i.e. to match the way the Mlucas primenet.py does things) so that '-t 0' means 'run py-script just once and quit'. Renamed my custom version, now we're good.[/QUOTE]

Feel free to submit a pull request with the "-t 0" change.

The current upper bound for 5M (97.4M) looks fine to me.

kriesel 2020-04-21 01:28

[QUOTE=kriesel;543322]Time to swap it out. V6.11-268 had EE #16.[/QUOTE]The issue appears at the moment to be a bad memory fan in this HP Z600, resulting in hotter than operating spec for half the system ram. That fan if it died or is spinning too slowly would leave the air in the memory fan duct pretty stagnant and warm. I don't know why that would create issues in one gpu's gpuowl run but not the prime95 runs saturating the cpus. There were no GEC errors on that system's prime95's GUI display, or in its log files, going back months. Nor has it affected that system's RX480 gpuowl runs.

Symptoms:
"514 Memory fan not detected" message from BIOS on startup, which re-seating did not cure.
HWMonitor showed system ram, bank of 3 nearer the closer Xeon 90C+, other bank in the 70s.
Other Z600s in the same large room running similar workloads on cpus and NVIDIA gpus had memory temps in the 50s.
Experimenting with the prime95 instance, turning off half the workers to reduce power at the nearer Xeon, lowered the hotter ram into the 70s.

Replacement fan on the way.

kriesel 2020-04-21 07:20

[QUOTE=kriesel;543333]The issue appears at the moment to be a bad memory fan ...
Experimenting with the prime95 instance, turning off half the workers to reduce power at the nearer Xeon, lowered the hotter ram into the 70s.

Replacement fan on the way.[/QUOTE]Even after dropping cpu heat, and swapping the gpu for another, it's still getting EEs.[CODE]2020-04-20 19:22:40 gpuowl v6.11-268-g0d07d21
2020-04-20 19:22:40 config: -device 1 -user kriesel -cpu condorella/rx550 -yield -maxAlloc 3600 -use NO_ASM
2020-04-20 19:22:40 device 1, unique id ''
2020-04-20 19:22:40 condorella/rx550 94741139 FFT: 5M 1K:10:256 (18.07 bpw)
2020-04-20 19:22:40 condorella/rx550 Expected maximum carry32: 461E0000
2020-04-20 19:22:41 condorella/rx550 OpenCL args "-DEXP=94741139u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xf.3cd1fc0411148p-3 -DIWEIGHT_ST
EP=0x8.66790bf53aca8p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DPM1=0 -DAMDGPU=1 -DNO_ASM=1 -cl-fast-relaxed-math -cl-st
d=CL2.0 "
2020-04-20 19:22:47 condorella/rx550 OpenCL compilation in 5.58 s
2020-04-20 19:22:53 condorella/rx550 94741139 OK 43154000 loaded: blockSize 400, 850d5d673cf6ad49
2020-04-20 19:23:10 condorella/rx550 94741139 OK 43154800 45.55%; 13701 us/it; ETA 8d 04:20; e0021e93eddece6a (check 5.65s) 16 errors
2020-04-20 19:33:38 condorella/rx550 94741139 OK 43200000 45.60%; 13772 us/it; ETA 8d 05:10; 18847855ef4addd5 (check 5.70s) 16 errors
2020-04-20 19:45:13 condorella/rx550 94741139 OK 43250000 45.65%; 13775 us/it; ETA 8d 05:01; c8b93071fb167821 (check 5.64s) 16 errors
2020-04-20 19:56:47 condorella/rx550 94741139 OK 43300000 45.70%; 13770 us/it; ETA 8d 04:45; e36f93de9f65e252 (check 5.64s) 16 errors
2020-04-20 20:08:21 condorella/rx550 94741139 OK 43350000 45.76%; 13771 us/it; ETA 8d 04:35; db7548eeff7fd82d (check 5.64s) 16 errors
2020-04-20 20:19:55 condorella/rx550 94741139 OK 43400000 45.81%; 13766 us/it; ETA 8d 04:19; d5890f6f7bc3bb62 (check 5.64s) 16 errors
2020-04-20 20:31:29 condorella/rx550 94741139 OK 43450000 45.86%; 13763 us/it; ETA 8d 04:05; a47eafb785a71fa4 (check 5.64s) 16 errors
2020-04-20 20:43:02 condorella/rx550 94741139 EE 43500000 45.91%; 13759 us/it; ETA 8d 03:50; 9c0cad0c6879242b (check 5.73s) 16 errors
2020-04-20 20:43:09 condorella/rx550 94741139 OK 43450000 loaded: blockSize 400, a47eafb785a71fa4
2020-04-20 20:54:42 condorella/rx550 94741139 OK 43500000 45.91%; 13759 us/it; ETA 8d 03:50; 9c0cad0c6879242b (check 5.64s) 17 errors
2020-04-20 21:06:16 condorella/rx550 94741139 OK 43550000 45.97%; 13765 us/it; ETA 8d 03:44; 80f24faafac9b03a (check 5.64s) 17 errors
2020-04-20 21:17:50 condorella/rx550 94741139 OK 43600000 46.02%; 13762 us/it; ETA 8d 03:30; 45d1a03b9cb91819 (check 5.64s) 17 errors
2020-04-20 21:29:24 condorella/rx550 94741139 OK 43650000 46.07%; 13766 us/it; ETA 8d 03:22; fac79b7ec0105d01 (check 5.64s) 17 errors
2020-04-20 21:40:57 condorella/rx550 94741139 OK 43700000 46.13%; 13759 us/it; ETA 8d 03:04; a66ca92be5e6dbb6 (check 5.64s) 17 errors
2020-04-20 21:52:31 condorella/rx550 94741139 OK 43750000 46.18%; 13764 us/it; ETA 8d 02:58; 3740bb97fee487d0 (check 5.64s) 17 errors
2020-04-20 22:04:05 condorella/rx550 94741139 OK 43800000 46.23%; 13764 us/it; ETA 8d 02:46; db25fa854c5db484 (check 5.65s) 17 errors
2020-04-20 22:15:39 condorella/rx550 94741139 OK 43850000 46.28%; 13764 us/it; ETA 8d 02:35; e69e2dbf65d78b2a (check 5.64s) 17 errors
2020-04-20 22:27:13 condorella/rx550 94741139 EE 43900000 46.34%; 13762 us/it; ETA 8d 02:21; 1f68378b7c6fc404 (check 5.63s) 17 errors
2020-04-20 22:27:19 condorella/rx550 94741139 OK 43850000 loaded: blockSize 400, e69e2dbf65d78b2a
2020-04-20 22:38:53 condorella/rx550 94741139 OK 43900000 46.34%; 13761 us/it; ETA 8d 02:20; 1f68378b7c6fc404 (check 5.68s) 18 errors
2020-04-20 22:50:26 condorella/rx550 94741139 OK 43950000 46.39%; 13759 us/it; ETA 8d 02:08; 31bdbf61721379f5 (check 5.68s) 18 errors
2020-04-20 23:02:00 condorella/rx550 94741139 OK 44000000 46.44%; 13762 us/it; ETA 8d 01:58; ab5f29aa5e0616d4 (check 5.64s) 18 errors
2020-04-20 23:13:34 condorella/rx550 94741139 OK 44050000 46.50%; 13764 us/it; ETA 8d 01:49; d15a6b5993812fc4 (check 5.64s) 18 errors
2020-04-20 23:25:08 condorella/rx550 94741139 OK 44100000 46.55%; 13761 us/it; ETA 8d 01:35; 72acbd04b3d43f04 (check 5.64s) 18 errors
2020-04-20 23:36:41 condorella/rx550 94741139 OK 44150000 46.60%; 13761 us/it; ETA 8d 01:23; 2894cbff475de263 (check 5.64s) 18 errors
2020-04-20 23:48:15 condorella/rx550 94741139 OK 44200000 46.65%; 13764 us/it; ETA 8d 01:15; d3091a2a24f15d8b (check 5.64s) 18 errors
2020-04-20 23:59:49 condorella/rx550 94741139 OK 44250000 46.71%; 13761 us/it; ETA 8d 01:00; d35597a77e451f9b (check 5.64s) 18 errors
2020-04-21 00:11:23 condorella/rx550 94741139 OK 44300000 46.76%; 13762 us/it; ETA 8d 00:50; 092708b97dc11cf0 (check 5.64s) 18 errors
2020-04-21 00:22:56 condorella/rx550 94741139 OK 44350000 46.81%; 13757 us/it; ETA 8d 00:34; a55be7644c8914ff (check 5.64s) 18 errors
2020-04-21 00:34:30 condorella/rx550 94741139 OK 44400000 46.86%; 13761 us/it; ETA 8d 00:26; 6c9cb184d9ae9fb9 (check 5.67s) 18 errors
2020-04-21 00:46:03 condorella/rx550 94741139 OK 44450000 46.92%; 13757 us/it; ETA 8d 00:11; 440bf81e51efd1b8 (check 5.64s) 18 errors
2020-04-21 00:57:37 condorella/rx550 94741139 OK 44500000 46.97%; 13760 us/it; ETA 8d 00:02; 4e2721d94c80f9a9 (check 5.67s) 18 errors
2020-04-21 01:09:11 condorella/rx550 94741139 OK 44550000 47.02%; 13758 us/it; ETA 7d 23:49; acc59d938a878840 (check 5.67s) 18 errors
2020-04-21 01:20:44 condorella/rx550 94741139 OK 44600000 47.08%; 13760 us/it; ETA 7d 23:39; e8ae6b2e1342173a (check 5.64s) 18 errors
2020-04-21 01:32:18 condorella/rx550 94741139 OK 44650000 47.13%; 13758 us/it; ETA 7d 23:26; 7738e5de79a41988 (check 5.64s) 18 errors
2020-04-21 01:43:51 condorella/rx550 94741139 OK 44700000 47.18%; 13754 us/it; ETA 7d 23:11; 0325e62041e2ef93 (check 5.66s) 18 errors
2020-04-21 01:55:25 condorella/rx550 94741139 OK 44750000 47.23%; 13757 us/it; ETA 7d 23:03; ac90cc4d821b536d (check 5.67s) 18 errors
2020-04-21 02:06:58 condorella/rx550 94741139 OK 44800000 47.29%; 13758 us/it; ETA 7d 22:52; 96fdda068a85c0ec (check 5.64s) 18 errors[/CODE]Next level is in effect now, stop and close prime95.

ewmayer 2020-04-21 20:53

[QUOTE=preda;543327]Feel free to submit a pull request with the "-t 0" change.[/QUOTE]

That's what I did, but without intending to commit my local change - got this error:
[code]git pull https://github.com/preda/gpuowl && make
remote: Enumerating objects: 119, done.
remote: Counting objects: 100% (119/119), done.
remote: Compressing objects: 100% (46/46), done.
remote: Total 136 (delta 96), reused 89 (delta 73), pack-reused 17
Receiving objects: 100% (136/136), 83.73 KiB | 2.20 MiB/s, done.
Resolving deltas: 100% (96/96), completed with 22 local objects.
From https://github.com/preda/gpuowl
* branch HEAD -> FETCH_HEAD
Updating 62a3025..f1fd1f7
error: Your local changes to the following files would be overwritten by merge:
tools/primenet.py
Please commit your changes or stash them before you merge.
Aborting[/code]
So this seems a good baby-step introduction to the rev-control setup ... what is the procedure for checking out a file, then testing and submitting a modified version? And what is the code review process you and George have in place?

Oh, another Q re. the latest primenet.py - just tried to use it with same flags I'd always used, -w 150 --tasks 10, to queue up new PRPs, but with the latest got

[i]primenet.py: error: argument -w: invalid choice: '150' (choose from 'PRP', 'PM1', 'LL_DC', 'PRP_DC', 'PRP_WORLD_RECORD', 'PRP_100M')[/i]

That "numeric value no longer works" appears to be due to a change in the choice=list(..) command - did you deliberately mean to disable numeric-server-worktype code support?

preda 2020-04-22 09:32

[QUOTE=ewmayer;543392]That's what I did, but without intending to commit my local change - got this error:
[code]git pull https://github.com/preda/gpuowl && make
remote: Enumerating objects: 119, done.
remote: Counting objects: 100% (119/119), done.
remote: Compressing objects: 100% (46/46), done.
remote: Total 136 (delta 96), reused 89 (delta 73), pack-reused 17
Receiving objects: 100% (136/136), 83.73 KiB | 2.20 MiB/s, done.
Resolving deltas: 100% (96/96), completed with 22 local objects.
From https://github.com/preda/gpuowl
* branch HEAD -> FETCH_HEAD
Updating 62a3025..f1fd1f7
error: Your local changes to the following files would be overwritten by merge:
tools/primenet.py
Please commit your changes or stash them before you merge.
Aborting[/code]
So this seems a good baby-step introduction to the rev-control setup ... what is the procedure for checking out a file, then testing and submitting a modified version?
[/QUOTE]
I wouldn't dare to write a git/github how-to here -- it's too large a subject, and there already are good tutorials out there. But the basic step sequence is:

1. create a github account
2. fork the project to your account (using github interface)
3. "git clone": check out locally *your* clone of the project (because you have write rights on your clone)
4. make local changes
5. "git commit": commit local changes
6. "git push": publish your local commits to your fork
7. using the github interface, create a pull request from your fork to the main project
8. I see the pull request, and I can merge it

[QUOTE]
And what is the code review process you and George have in place?
[/QUOTE]
It's extremely light right now:
- I commit without any reviews. Sometimes George detects errors I make, and notifies me (so, that's a form of post-commit review :).
- George sends me pull requests. I usually verify them before merging (by compiling and running an exponent for a bit). (the goal of my testing is mainly to detect performance differences between our respective setups)

[QUOTE]
Oh, another Q re. the latest primenet.py - just tried to use it with same flags I'd always used, -w 150 --tasks 10, to queue up new PRPs, but with the latest got

[i]primenet.py: error: argument -w: invalid choice: '150' (choose from 'PRP', 'PM1', 'LL_DC', 'PRP_DC', 'PRP_WORLD_RECORD', 'PRP_100M')[/i]

That "numeric value no longer works" appears to be due to a change in the choice=list(..) command - did you deliberately mean to disable numeric-server-worktype code support?[/QUOTE]

No, disabling the numeric values was unintentional (the goal of the change was to make the help less confusing by not displaying the numeric values there). But, why do you prefer using the numeric value (150) vs. the symbolic name "PRP"? Anyway, I'm fine with adding the numeric ids back if they're useful.

garo 2020-04-22 13:13

Ernst do git stash before doing the pull and then do git stash pop If you haven’t pulled any conflicting changes the pop will replay your own changes on the latest from Github

kriesel 2020-04-23 21:53

gpuowl v6.11-270-gf1fd1f7 Win 7 x64 build
 
2 Attachment(s)
Untested, except help output so far.

kriesel 2020-04-26 18:29

[QUOTE=kriesel;543345]Even after dropping cpu heat, and swapping the gpu for another, it's still getting EEs.[/QUOTE]Received and installed the replacement fan assembly, $15 used from ebay; these fan assemblies have an unusual 2x2 fan connector that mates when the whole ducted fan assembly is snapped into place, so it seemed money well spent. I was skeptical about whether the old fan was an issue because it did spin if powered on the bench. But the new assembly did a fine job of bringing ram temps from 100C max down to 65-72C among the 6 DIMMs. That's still a bit warmer than the other Z600s I have, but might be because they're at floor level and this is 4 feet above. Early results of lowering it to the floor 30 minutes ago is minimal difference, at 64-71C DIMM temps.
But in the nearly day of running since the fan swap, it's producing more errors than ever.
Maybe the Micron ram was permanently damaged? [URL]https://www.micron.com/products/dram/ddr3-sdram[/URL] shows operating limits as low as 95C.
Or maybe there's an issue with the particular PCIe slot.
[CODE]2020-04-25 13:13:33 gpuowl v6.11-268-g0d07d21
2020-04-25 13:13:33 config: -device 1 -user kriesel -cpu condorella/rx550 -yield -maxAlloc 3600 -use NO_ASM
2020-04-25 13:13:33 device 1, unique id ''
2020-04-25 13:13:33 condorella/rx550 94741139 FFT: 5M 1K:10:256 (18.07 bpw)
2020-04-25 13:13:33 condorella/rx550 Expected maximum carry32: 461E0000
2020-04-25 13:13:35 condorella/rx550 OpenCL args "-DEXP=94741139u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xf.3cd1fc0411148p-3 -DIWEIGHT_ST
EP=0x8.66790bf53aca8p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DPM1=0 -DAMDGPU=1 -DNO_ASM=1 -cl-fast-relaxed-math -cl-st
d=CL2.0 "
2020-04-25 13:13:40 condorella/rx550 OpenCL compilation in 5.54 s
2020-04-25 13:13:47 condorella/rx550 94741139 OK 72010000 loaded: blockSize 400, 69fc8cbdf6ee352e
2020-04-25 13:14:03 condorella/rx550 94741139 OK 72010800 76.01%; 13722 us/it; ETA 3d 14:38; 93b608104f71f185 (check 5.65s) 27 errors
...
2020-04-25 18:01:26 condorella/rx550 94741139 OK 73250000 77.32%; 13787 us/it; ETA 3d 10:18; 67324677e938628d (check 5.65s) 27 errors
2020-04-25 18:13:01 condorella/rx550 94741139 EE 73300000 77.37%; 13785 us/it; ETA 3d 10:06; 7da27d1bd2ca79bd (check 5.64s) 27 errors
2020-04-25 18:13:08 condorella/rx550 94741139 OK 73250000 loaded: blockSize 400, 67324677e938628d
2020-04-25 18:24:42 condorella/rx550 94741139 OK 73300000 77.37%; 13784 us/it; ETA 3d 10:06; 7da27d1bd2ca79bd (check 5.65s) 28 errors
2020-04-25 18:36:17 condorella/rx550 94741139 OK 73350000 77.42%; 13783 us/it; ETA 3d 09:54; 1cc91ad65d4d6fb0 (check 5.66s) 28 errors
...
2020-04-26 02:08:13 condorella/rx550 94741139 OK 75300000 79.48%; 13787 us/it; ETA 3d 02:27; 814796c75126ea7f (check 5.66s) 28 errors
2020-04-26 02:19:48 condorella/rx550 94741139 EE 75350000 79.53%; 13783 us/it; ETA 3d 02:14; 5f754504bd9d7e7e (check 5.67s) 28 errors
2020-04-26 02:19:54 condorella/rx550 94741139 OK 75300000 loaded: blockSize 400, 814796c75126ea7f
2020-04-26 02:31:29 condorella/rx550 94741139 OK 75350000 79.53%; 13786 us/it; ETA 3d 02:15; 5f754504bd9d7e7e (check 5.65s) 29 errors
2020-04-26 02:43:04 condorella/rx550 94741139 OK 75400000 79.59%; 13783 us/it; ETA 3d 02:03; 2eb2c8172e41590a (check 5.66s) 29 errors
...
2020-04-26 05:48:23 condorella/rx550 94741139 OK 76200000 80.43%; 13782 us/it; ETA 2d 22:59; 1398624a7e37f481 (check 5.65s) 29 errors
2020-04-26 05:59:58 condorella/rx550 94741139 EE 76250000 80.48%; 13780 us/it; ETA 2d 22:47; acfe1cce4b98f205 (check 5.64s) 29 errors
2020-04-26 06:00:04 condorella/rx550 94741139 OK 76200000 loaded: blockSize 400, 1398624a7e37f481
2020-04-26 06:11:39 condorella/rx550 94741139 OK 76250000 80.48%; 13780 us/it; ETA 2d 22:47; acfe1cce4b98f205 (check 5.65s) 30 errors
2020-04-26 06:23:14 condorella/rx550 94741139 OK 76300000 80.54%; 13779 us/it; ETA 2d 22:35; 886dbb4e437b2eb6 (check 5.67s) 30 errors
...
2020-04-26 07:32:46 condorella/rx550 94741139 OK 76600000 80.85%; 13772 us/it; ETA 2d 21:24; 14aea5c6cb66203e (check 5.65s) 30 errors
2020-04-26 07:44:23 condorella/rx550 94741139 EE 76650000 80.90%; 13820 us/it; ETA 2d 21:27; 3d54908aab697d76 (check 5.66s) 30 errors
2020-04-26 07:44:29 condorella/rx550 94741139 OK 76600000 loaded: blockSize 400, 14aea5c6cb66203e
2020-04-26 07:56:04 condorella/rx550 94741139 EE 76650000 80.90%; 13784 us/it; ETA 2d 21:16; 3d54908aab697d76 (check 5.64s) 31 errors
2020-04-26 07:56:11 condorella/rx550 94741139 OK 76600000 loaded: blockSize 400, 14aea5c6cb66203e
2020-04-26 08:07:46 condorella/rx550 94741139 OK 76650000 80.90%; 13787 us/it; ETA 2d 21:17; 3d54908aab697d76 (check 5.83s) 32 errors
...
2020-04-26 12:22:30 condorella/rx550 94741139 OK 77750000 82.07%; 13774 us/it; ETA 2d 17:01; 16108dac33118d12 (check 5.92s) 32 errors
2020-04-26 12:34:04 condorella/rx550 94741139 OK 77800000 82.12%; 13778 us/it; ETA 2d 16:50; 47d1f28515271fba (check 5.93s) 32 errors
[/CODE]I'll probably try memtest86+ or gpu-slot-swap or both next. Other suggestions?

preda 2020-04-27 06:12

Do you have another GPU of the same model that does not exhibit such errors? otherwise I'd suspect something amiss software-side (i.e. gpuowl, and the related OpenCL compilation).

Anyway on ROCm / Radeon VII I don't see this pattern.

[QUOTE=kriesel;543880]
[CODE]
2020-04-26 07:32:46 condorella/rx550 94741139 OK 76600000 80.85%; 13772 us/it; ETA 2d 21:24; 14aea5c6cb66203e (check 5.65s) 30 errors
2020-04-26 07:44:23 condorella/rx550 94741139 EE 76650000 80.90%; 13820 us/it; ETA 2d 21:27; 3d54908aab697d76 (check 5.66s) 30 errors
2020-04-26 07:44:29 condorella/rx550 94741139 OK 76600000 loaded: blockSize 400, 14aea5c6cb66203e
2020-04-26 07:56:04 condorella/rx550 94741139 EE 76650000 80.90%; 13784 us/it; ETA 2d 21:16; 3d54908aab697d76 (check 5.64s) 31 errors
2020-04-26 07:56:11 condorella/rx550 94741139 OK 76600000 loaded: blockSize 400, 14aea5c6cb66203e
2020-04-26 08:07:46 condorella/rx550 94741139 OK 76650000 80.90%; 13787 us/it; ETA 2d 21:17; 3d54908aab697d76 (check 5.83s) 32 errors
[/CODE][/QUOTE]

kriesel 2020-04-27 10:19

[QUOTE=preda;543924]Do you have another GPU of the same model that does not exhibit such errors? otherwise I'd suspect something amiss software-side (i.e. gpuowl, and the related OpenCL compilation).

Anyway on ROCm / Radeon VII I don't see this pattern.[/QUOTE]I have three RX550s. The two that are 4GB both have exhibited the EE occurrence when used during this exponent run. The other is a 2GB and has not been tried there. It could be, since it is idle for the moment while I wait for a replacement power supply for another system. Two days remain on the exponent at RX550 rate.
The last 16 hours, after lowering the system to the floor, has gone well, on the second 4GB RX550, no EE during that time in v6.11-268. The RX480 in the same system as the problem occurs is behaving well on a similar exponent PRP, with no EE yet and less than a day remaining at RX480 rate in v6.11-264.
The host system does not have adequate power connectors for trying a Radeon VII in the pcie slot where the frequent EE have been observed.

ewmayer 2020-04-27 18:45

Preparing to configure new build which will eventually host several Radeon VIIs. In reviewing/updating my personal setup menu, need to make sure I have the ROCm stuff updated for the current version - by default that will be 3.3, yes? And are there any extra command-line flags needed for running gpuOwl under 3.3, by way of working around issues with that ROCm version?

preda 2020-04-27 21:54

[QUOTE=ewmayer;543978]Preparing to configure new build which will eventually host several Radeon VIIs. In reviewing/updating my personal setup menu, need to make sure I have the ROCm stuff updated for the current version - by default that will be 3.3, yes? And are there any extra command-line flags needed for running gpuOwl under 3.3, by way of working around issues with that ROCm version?[/QUOTE]

Yes I think at the momement ROCm 3.3 is the most recent version, and what you get by default. The ROCm-bug-workaround is enabled by default, no special action needed.

kruoli 2020-04-28 10:00

Which is the latest stable version that supports LL? I'm currently using a build from kriesel (gpuowl-v6.11-268-g0d07d21), but it gives me
[CODE]Assertion failed: 0 <= w && w < (1 << nBits), file state.cpp, line 22[/CODE]
constantly, on both of my R9 290, and I doubt that both of them got bad so close in time. Especially, because they are different charges.

A lot of the results did not match the first LL, some did.

preda 2020-04-28 10:35

[QUOTE=kruoli;544049]Which is the latest stable version that supports LL? I'm currently using a build from kriesel (gpuowl-v6.11-268-g0d07d21), but it gives me
[CODE]Assertion failed: 0 <= w && w < (1 << nBits), file state.cpp, line 22[/CODE]
constantly, on both of my R9 290, and I doubt that both of them got bad so close in time. Especially, because they are different charges.

A lot of the results did not match the first LL, some did.[/QUOTE]

LL is experimental in GpuOwl ATM. The assert failing may indicate a bug. Could you please indicate repro steps: what exponent, when it happens, how often it happens (every time?) etc. Basically what you think would allow the developers to reproduce the problem you see -- this would allow us to debug it. At the minimum a log excerpt would also be helpful.

If you see any LL mismatching, you should bring it up because it's more likely it's an error on gpuowl's side that a genuine mismatch.

Before doing LL on an exponent range, you should validate by doing a few iterations of PRP on the exponent -- if that works fine then LL stands a chance.

kruoli 2020-04-28 11:07

Okay, thank you for the information! Somehow I thought, there has been working LL in the past, but I guess, I confused it with CudaLucas etc.

A few LL ran fine without any errors and matched (e.g. [URL="https://www.mersenne.org/report_exponent/?exp_lo=57234283&full=1"]M57234283[/URL]), but others went erroneous (e.g. [URL="https://www.mersenne.org/report_exponent/?exp_lo=57234167&full=1"]M57234167[/URL], [URL="https://www.mersenne.org/report_exponent/?exp_lo=57234179&full=1"]M57234179[/URL], [URL="https://www.mersenne.org/report_exponent/?exp_lo=57233941&full=1"]M57233941[/URL], [URL="https://www.mersenne.org/report_exponent/?exp_lo=55297621&full=1"]M55233941[/URL]).

I uploaded the full logs and residue folders (I guess, that's what they are) compressed for both cards I ran it on [URL="http://mc.oliver-kruse.de/GIMPS/gpuOwl"]here[/URL].

ATH 2020-04-28 12:30

Did you tune gpuowl parameters for LL tests? I found out you should only tune for PRP tests and use the paramters that works for PRP for LL tests as well, since there is no error checking on LL tests, so you do no know if you tuned so far it is not working correctly.

kruoli 2020-04-28 13:02

[QUOTE=ATH;544065]Did you tune gpuowl parameters for LL tests?[/QUOTE]

No, I have not tuned at all, because I did not saw such an option in the "-h" menu. Maybe a bit foolish...


All times are UTC. The time now is 07:02.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.