![]() |
Should be fixed in the most recent commit, please re-try.
This was again the ROCm optimizer that is generating broken code for our own sin/cos in some particular cases, that we try carefully to avoid. When seeing unexplained failures like here, it is often useful to try with -use ORIG_SLOWTRIG as that usually works (slower though). [QUOTE=kriesel;541479]I don't know why, but -fft 0 through -fft +5 all hit EE in 800 iterations on this exponent 131500093. Gpuowl v6.11-134-g1e0ce1d chose the initial 7M fft length on its own. After finding it reproducible, I successively incremented -fft to seek a reliable run case. It wasn't until it reached 9M fft that it succeeded in the GEC. The resulting speed penalty is considerable, 7.5 msec/iter versus 5.3 on an RX480. From the program's help output,[CODE]FFT 7M [ 11.01M - 132.46M] 1K-512-7 256-2K-7 512-1K-7 2K-256-7 FFT 8M [ 12.58M - 150.85M] 2K-2K 4K-1K FFT 9M [ 14.16M - 169.18M] 1K-512-9 256-2K-9 512-1K-9 2K-256-9[/CODE][CODE]C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win 2020-04-01 07:47:57 gpuowl v6.11-134-g1e0ce1d 2020-04-01 07:47:57 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM 2020-04-01 07:47:57 device 0, unique id '' 2020-04-01 07:47:57 condorella/rx480 131500093 FFT 7168K: Width 256x4, Height 64x8, Middle 7; 17.92 bits/word 2020-04-01 07:47:59 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=7u -DWEIGHT_STEP=0x8.7b964bd91a558p-3 -DIWEIGHT_STEP=0xf.16e489ea55fc8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DAMDGPU=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2020-04-01 07:48:03 condorella/rx480 OpenCL compilation in 3.97 s 2020-04-01 07:48:06 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:48:13 condorella/rx480 131500093 EE 800 0.00%; 5272 us/it; ETA 8d 00:34; 6781adfa7991c92a (check 2.31s) 2020-04-01 07:48:15 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:48:22 condorella/rx480 131500093 EE 800 0.00%; 5309 us/it; ETA 8d 01:56; 6781adfa7991c92a (check 2.31s) 1 errors 2020-04-01 07:48:24 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:48:31 condorella/rx480 131500093 EE 800 0.00%; 5298 us/it; ETA 8d 01:32; 6781adfa7991c92a (check 2.33s) 2 errors 2020-04-01 07:48:31 condorella/rx480 3 sequential errors, will stop. 2020-04-01 07:48:31 condorella/rx480 Exiting because "too many errors" 2020-04-01 07:48:31 condorella/rx480 Bye C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611 C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win 2020-04-01 07:48:50 gpuowl v6.11-134-g1e0ce1d 2020-04-01 07:48:50 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +1 2020-04-01 07:48:50 device 0, unique id '' 2020-04-01 07:48:50 condorella/rx480 131500093 FFT 7168K: Width 64x4, Height 256x8, Middle 7; 17.92 bits/word 2020-04-01 07:48:53 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=256u -DSMALL_HEIGHT=2048u -DMIDDLE=7u -DWEIGHT_STE P=0x8.7b964bd91a558p-3 -DIWEIGHT_STEP=0xf.16e489ea55fc8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f05 18db8a8p-4 -DAMDGPU=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2020-04-01 07:48:57 condorella/rx480 OpenCL compilation in 4.67 s 2020-04-01 07:49:01 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:49:11 condorella/rx480 131500093 EE 800 0.00%; 7714 us/it; ETA 11d 17:46; 55f854bea6c1cecf (check 3.28s) 2020-04-01 07:49:14 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:49:24 condorella/rx480 131500093 EE 800 0.00%; 7697 us/it; ETA 11d 17:10; 55f854bea6c1cecf (check 3.29s) 1 errors 2020-04-01 07:49:27 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:49:37 condorella/rx480 131500093 EE 800 0.00%; 7687 us/it; ETA 11d 16:46; 55f854bea6c1cecf (check 3.27s) 2 errors 2020-04-01 07:49:37 condorella/rx480 3 sequential errors, will stop. 2020-04-01 07:49:37 condorella/rx480 Exiting because "too many errors" 2020-04-01 07:49:37 condorella/rx480 Bye C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611 C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win 2020-04-01 07:50:25 gpuowl v6.11-134-g1e0ce1d 2020-04-01 07:50:25 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +2 2020-04-01 07:50:25 device 0, unique id '' 2020-04-01 07:50:25 condorella/rx480 131500093 FFT 7168K: Width 64x8, Height 256x4, Middle 7; 17.92 bits/word 2020-04-01 07:50:27 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=512u -DSMALL_HEIGHT=1024u -DMIDDLE=7u -DWEIGHT_STEP=0x8.7b964bd91a558p-3 -DIWEIGHT_STEP=0xf.16e489ea55fc8p-4 -DWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-3 -DIWEIGHT_BIGSTEP=0xc.5672a115506d8p-4 -DAMDGPU=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2020-04-01 07:50:31 condorella/rx480 OpenCL compilation in 3.72 s 2020-04-01 07:50:34 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:50:42 condorella/rx480 131500093 EE 800 0.00%; 6286 us/it; ETA 9d 13:37; 6f8253cbb2fe58e9 (check 2.71s) 2020-04-01 07:50:45 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:50:53 condorella/rx480 131500093 EE 800 0.00%; 6283 us/it; ETA 9d 13:29; 6f8253cbb2fe58e9 (check 2.71s) 1 errors 2020-04-01 07:50:56 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:51:03 condorella/rx480 131500093 EE 800 0.00%; 6299 us/it; ETA 9d 14:05; 6f8253cbb2fe58e9 (check 2.71s) 2 errors 2020-04-01 07:51:03 condorella/rx480 3 sequential errors, will stop. 2020-04-01 07:51:03 condorella/rx480 Exiting because "too many errors" 2020-04-01 07:51:03 condorella/rx480 Bye C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611 C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win 2020-04-01 07:51:29 gpuowl v6.11-134-g1e0ce1d 2020-04-01 07:51:29 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +3 2020-04-01 07:51:29 device 0, unique id '' 2020-04-01 07:51:29 condorella/rx480 131500093 FFT 7168K: Width 256x8, Height 64x4, Middle 7; 17.92 bits/word 2020-04-01 07:51:29 condorella/rx480 using long carry kernels 2020-04-01 07:51:32 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=2048u -DSMALL_HEIGHT=256u -DMIDDLE=7u -DWEIGHT_STE P=0x8.7b964bd91a558p-3 -DIWEIGHT_STEP=0xf.16e489ea55fc8p-4 -DWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-3 -DIWEIGHT_BIGSTEP=0xc.5672a1 15506d8p-4 -DAMDGPU=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2020-04-01 07:51:36 condorella/rx480 OpenCL compilation in 3.97 s 2020-04-01 07:51:39 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:51:46 condorella/rx480 131500093 EE 800 0.00%; 5275 us/it; ETA 8d 00:42; cfbd904e74b67aae (check 2.31s) 2020-04-01 07:51:48 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:51:54 condorella/rx480 131500093 EE 800 0.00%; 5249 us/it; ETA 7d 23:44; cfbd904e74b67aae (check 2.29s)1 errors 2020-04-01 07:51:57 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:52:03 condorella/rx480 131500093 EE 800 0.00%; 5239 us/it; ETA 7d 23:23; cfbd904e74b67aae (check 2.29s)2 errors 2020-04-01 07:52:03 condorella/rx480 3 sequential errors, will stop. 2020-04-01 07:52:03 condorella/rx480 Exiting because "too many errors" 2020-04-01 07:52:03 condorella/rx480 Bye C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611 C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win 2020-04-01 07:52:07 gpuowl v6.11-134-g1e0ce1d 2020-04-01 07:52:07 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +4 2020-04-01 07:52:07 device 0, unique id '' 2020-04-01 07:52:07 condorella/rx480 131500093 FFT 8192K: Width 256x8, Height 256x8; 15.68 bits/word 2020-04-01 07:52:07 condorella/rx480 using long carry kernels 2020-04-01 07:52:10 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=2048u -DSMALL_HEIGHT=2048u -DMIDDLE=1u -DWEIGHT_ST EP=0xa.039f00d8f95f8p-3 -DIWEIGHT_STEP=0xc.c82be96a7181p-4 -DWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-3 -DIWEIGHT_BIGSTEP=0xc.5672a1 15506d8p-4 -DAMDGPU=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2020-04-01 07:52:15 condorella/rx480 OpenCL compilation in 5.16 s 2020-04-01 07:52:18 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:52:27 condorella/rx480 131500093 EE 800 0.00%; 6583 us/it; ETA 10d 00:28; 05252a7f59574e37 (check 2.85s) 2020-04-01 07:52:30 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:52:38 condorella/rx480 131500093 EE 800 0.00%; 6587 us/it; ETA 10d 00:36; 05252a7f59574e37 (check 2.85s) 1 errors 2020-04-01 07:52:41 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:52:49 condorella/rx480 131500093 EE 800 0.00%; 6594 us/it; ETA 10d 00:53; 05252a7f59574e37 (check 2.86s) 2 errors 2020-04-01 07:52:49 condorella/rx480 3 sequential errors, will stop. 2020-04-01 07:52:49 condorella/rx480 Exiting because "too many errors" 2020-04-01 07:52:49 condorella/rx480 Bye C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611 C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win 2020-04-01 07:53:21 gpuowl v6.11-134-g1e0ce1d 2020-04-01 07:53:21 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +5 2020-04-01 07:53:21 device 0, unique id '' 2020-04-01 07:53:21 condorella/rx480 131500093 FFT 8192K: Width 512x8, Height 256x4; 15.68 bits/word 2020-04-01 07:53:21 condorella/rx480 using long carry kernels 2020-04-01 07:53:23 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=4096u -DSMALL_HEIGHT=1024u -DMIDDLE=1u -DWEIGHT_ST EP=0xa.039f00d8f95f8p-3 -DIWEIGHT_STEP=0xc.c82be96a7181p-4 -DWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-3 -DIWEIGHT_BIGSTEP=0xc.5672a1 15506d8p-4 -DAMDGPU=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2020-04-01 07:53:26 condorella/rx480 OpenCL compilation in 3.53 s 2020-04-01 07:53:30 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:53:39 condorella/rx480 131500093 EE 800 0.00%; 7196 us/it; ETA 10d 22:51; 6df742314b82f841 (check 3.11s) 2020-04-01 07:53:42 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:53:51 condorella/rx480 131500093 EE 800 0.00%; 7219 us/it; ETA 10d 23:43; 6df742314b82f841 (check 3.11s) 1 errors 2020-04-01 07:53:54 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:54:03 condorella/rx480 131500093 EE 800 0.00%; 7190 us/it; ETA 10d 22:38; 6df742314b82f841 (check 3.10s) 2 errors 2020-04-01 07:54:03 condorella/rx480 3 sequential errors, will stop. 2020-04-01 07:54:03 condorella/rx480 Exiting because "too many errors" 2020-04-01 07:54:03 condorella/rx480 Bye C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611 C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win 2020-04-01 07:54:08 gpuowl v6.11-134-g1e0ce1d 2020-04-01 07:54:08 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +6 2020-04-01 07:54:08 device 0, unique id '' 2020-04-01 07:54:08 condorella/rx480 131500093 FFT 9216K: Width 256x4, Height 64x8, Middle 9; 13.93 bits/word 2020-04-01 07:54:08 condorella/rx480 using long carry kernels 2020-04-01 07:54:12 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=9u -DWEIGHT_STEP=0x8.5f7e7ead6051p-3 -DIWEIGHT_STEP=0xf.498539ec95fe8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DAMDGPU=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2020-04-01 07:54:16 condorella/rx480 OpenCL compilation in 4.11 s 2020-04-01 07:54:20 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:54:29 condorella/rx480 131500093 OK 800 0.00%; 7461 us/it; ETA 11d 08:32; bbe24bd13cd73020 (check 3.26s) 2020-04-01 08:19:33 condorella/rx480 131500093 OK 200000 0.15%; 7541 us/it; ETA 11d 11:03; 190bb27ff665f83b (check 3.25s) [/CODE][/QUOTE] |
[QUOTE=ATH;541561]RTX 2080 is so bad at double precision and the timings are very inconsistent.
But NEW_SLOWTRIG is better at 3520µs/ite vs 3680µs/ite for ORIG_SLOWTRIG. T2_SHUFFLE is slightly better at 3520µs vs 3553µs for NO_T2_SHUFFLE Otherwise CARRY64 and CARRY32 is about the same. I'm not going to test all those 6 variables on this, since it is very slow and the inconsistencies in the timings is larger than the differences. Btw UNROLL_NONE,UNROLL_WIDTH and UNROLL_HEIGHT does not work at all on either the Tesla P100 or the RTX 2080.[/QUOTE] Thank you, this seems to suggest: keep the defaults unchanged for Nvidia (as they are better on at least some Nvidia GPUs). The Nvidia user can tune by trying ORIG_SLOWTRIG and NO_T2_SHUFFLE if so inclined. |
ROCm 3.3
ROCm 3.1 has a severe bug that is affecting our sin/cos routines, and we had to use a workaround for that bug.
Recently ROCm 3.3 was released, and it seems this bug is fixed, so the ROCm 3.1 workarounds are not needed anymore (they do have a slight perf impact). Right now the ROCm 3.3 performance is very close to ROCm 3.1, a tiny bit slower (less that 0.3% slower). OTOH ROCm 3.3 might have a few advantages -- for one it uses less VGPRs (and it doesn't have the terrible ROCm 3.1 bug). In a recent commit I removed by default the ROCm 3.1 bug-workaround -- it must now be explicitly enabled with -use ROCM31 . If this is not done, there are errors in PRP. A slower alternative is using ORIG_SLOWTRIG which does not trigger the bug. So in brief: - ROCm 3.3 is OK and can be used - if using ROCm 3.1, *must* specify -use ROCM31 or -use ORIG_SLOWTRIG (for users who are now on ROCm 2.10 or earlier, I recommend moving directly to 3.3, skipping 3.1) There is also the possibility of having multiple ROCm versions installed at the same time (this is useful when one wants to experiment and compare versions); here is one way to do it: - install multiple ROCm versions in separate folders, e.g.: /opt/rocm-3.1.0/ and /opt/rocm-3.3.0/ - verify that the ROCm folder containing libamdocl64.so is listed in the LIBPATH in Makefile or SConstruct. - edit the Makefile to link with -lamdocl64 instead of -lOpenCL (or build with scons) - when running gpuowl, specify LD_LIBRARY_PATH pointing to the folder with libamdocl64 for the desired ROCm version, e.g. LD_LIBRARY_PATH=/opt/rocm-3.3.0/opencl/lib/x86_64 ./gpuowl |
[QUOTE=preda;541738]ROCm 3.1 has a severe bug that is affecting our sin/cos routines, and we had to use a workaround for that bug.
Recently ROCm 3.3 was released, and it seems this bug is fixed, so the ROCm 3.1 workarounds are not needed anymore (they do have a slight perf impact). [/QUOTE] I spoke too soon: it seems the bug still affects ROCm 3.3 (but not for exactly the same exponents). As such, I re-enabled the workaround by default as it was before. |
I'm glad I'm a late adopter with these things - still on rocm 2.10 :). Mihai, bugs aside, roughly what % speedup can one expect from moving from 2.10 to 3.3?
|
[QUOTE=ewmayer;541792]I'm glad I'm a late adopter with these things - still on rocm 2.10 :). Mihai, bugs aside, roughly what % speedup can one expect from moving from 2.10 to 3.3?[/QUOTE]
I would guess something in the range 2% - 4%. |
FYI, I'm working on re-enabling LL in gpuowl (intended for DC). (I do not plan to add offset or Jacobi check though)
|
[QUOTE=preda;542013]FYI, I'm working on re-enabling LL in gpuowl (intended for DC). (I do not plan to add offset or Jacobi check though)[/QUOTE]
Thank you!!! :party: |
[QUOTE=preda;542013]FYI, I'm working on re-enabling LL in gpuowl (intended for DC). (I do not plan to add offset or Jacobi check though)[/QUOTE]
I just commited a first iteration of LL. Here's a brief summary of changes: 1. How to run LL a) add a line like DoubleCheck=1AAFFAAD0000000FFFF,51456287,74,1 to worktodo.txt where the AID (the initial hex) can be N/A or missing. The values after the exponent ("74,1") are ignored now. b) or, pass on the command line "-ll <exponent>" (similar to -pm1 and -prp); if this is a test-run and the result should not be published, can be used in conjunction with "-results /dev/null" or a similar file. 2. Savefiles There are savefiles that follow the usual pattern, with the extension ".ll.owl", you can look into them with "head -1 <savefile>" which prints the header. The savefile contains a checksum that covers all the values stored, so if the savefile is corrupted or edited manually without updating the checksum this should be detected (and the file rejected). 3. The command line flags -block and -log are used by LL too: -block indicates how many iterations to queue to the GPU, and the default for LL is 1000. If the GPU becomes sluggish (slow to react) I would look into reducing -block to e.g. 100 (although this problem does not appear on ROCm). Also large values for -block together with slower iterations would produce a slower reaction to manual interrupt (Ctrl-C). -log indicates how often to log, and to save (the two, log and save, are linked as they are for PRP). warning: when running a very small exponent, 1398269, one of my Radeon VII started to act flaky. This turned out to be fixed by increasing the voltage on that GPU; but it seems to indicate that, for R7, sometimes such small FFTs may expose the GPU more than the typical PRP. In my case the fix was increase voltage, *not* decrease memory frequency. There is no Jacobi check for now. (I'll consider how hard it is to add). There is a change in how the FFT size is specified. Look at -h for the new format of FFT specifiers, and pass either the full FFT size (e.g. "5.5M") or one of the FFT specs (e.g. "1K:5:512") from the list displayed by -h . |
1 Attachment(s)
Windows binaries(untested!)
|
[QUOTE=preda;542085]I just commited a first iteration of LL. Here's a brief summary of changes[/QUOTE]Excellent! Will have a look shortly.
You separately added offset and Jacobi check back at v0.6. How much of that is reusable? [url]https://www.mersenneforum.org/showpost.php?p=489083&postcount=7[/url] |
[QUOTE=kriesel;542095]Excellent! Will have a look shortly.
You separately added offset and Jacobi check back at v0.6. How much of that is reusable? [url]https://www.mersenneforum.org/showpost.php?p=489083&postcount=7[/url][/QUOTE] The Jacobi implem should be the same. For the "offset", it's too much trouble to fit it in without adversely affecting PRP which is still the main focus; also "offset" brings too little benefit to be worth bothering with IMO. I consider a matching gpuowl DC with offset==0 for a mprime LL with offset != 0 a very strong verification. What additional benefit is there for the trouble of adding offset to gpuowl? I don't see the point -- maybe you could explain the motivation for adding offset in this context. |
[QUOTE=preda;542102]The Jacobi implem should be the same. For the "offset", it's too much trouble to fit it in without adversely affecting PRP which is still the main focus; also "offset" brings too little benefit to be worth bothering with IMO.
I consider a matching gpuowl DC with offset==0 for a mprime LL with offset != 0 a very strong verification. What additional benefit is there for the trouble of adding offset to gpuowl? I don't see the point -- maybe you could explain the motivation for adding offset in this context.[/QUOTE]There are two issues; gpuowl zero offset twice on the same exponent, and gpuowl zero offset matching some other software's offset. As gpus become a greater fraction of the total primality testing throughput, the work of manually ensuring that gpuowl is not double-checking gpuowl results with the same (zero) offset becomes more onerous. As you know, a Radeon VII with your code and George's modifications is a very fast way of running primality tests, so these gpus and their NVIDIA or cloud-computing near equivalents will have an outsized effect on the increasing number of gpu-produced primality tests, greater than their unit count would indicate. Gpu assignments are manual assignments. The PrimeNet server does not know what software will be used for a manual assignment. There is no way of manually communicating that. There is no way of specifying which software will be used, or specifying for double-check assignments that first-tests from any specific software or initial-run-offset are desired. I find that I generally forget to consider checking before putting the work on my gpus. How many others do also? Maybe mprime/prime95, Mlucas, CUDALucas, etc have or could have zero-offset avoidance in its runs? But that does not address the chance of a previous software version (Mlucas before V18 for example) having produced a zero offset result, or gpuowl-gpuowl zero-offset coincidence on the same exponent. Its chance of producing zero offset from a pseudorandom offset generator is quite low. I agree that different software running differing offsets are good verifications. In fact, it's superior to same software and different offsets, since remaining software bugs are less likely to align among very different softwares. However, if this is working properly, including for any gpuowl results from its LL infancy, it may not be much of a problem while gpuowl work assignments flow through manual reservations and certain other conditions are met. [URL]https://www.mersenneforum.org/showpost.php?p=296358&postcount=36[/URL] [QUOTE]I just changed the manual reservations for double-checks. The page should no longer hand out exponents previously tested by GLucas, Mlucas, or CUDALucas. That is, only prime95 with its shift count capability is allowed to do the double (or triple) checking. This feature needs some testing. [/QUOTE] Zero-offset tandem runs could still be performed manually by users, as Laurv or others have done for large exponents. Creation of a PrimeNet API connection for gpu applications would change the offset calculus. If gpus become a large majority of the primality testing throughput, it will become difficult to avoid gpuowl-first-test double checks as manual assignments. Maybe it also starts to affect strategic double and triple checking [URL]https://www.mersenneforum.org/showthread.php?goto=newpost&t=24148[/URL] I don't know what the overall project mix manual (gpus)/ primenet-API (cpus) throughput ratio is. But on my own cpus page data, it's about 7.6 to 1. I expect that ratio to increase over time. The gpu throughput is a mix of TF, P-1, and primality testing, with several gpus dedicated to double-checking. A desire to not slow PRP by offset provisions motivated by LL, and a desire not to duplicate code and increase complexity further by separating offset behavior between PRP and LL, are both understandable. As is conserving your available time for other things, such as P-1 error detection and handling. |
CUDALucas v2.05 was the beginning of nonzero shift there; late 2013 or so. There's no telling how long changeover from earlier versions took.
[URL]https://www.mersenneforum.org/showpost.php?p=359150&postcount=1962[/URL] We're still double-checking LL tests from 2010 and 2011. [URL]https://www.mersenne.org/report_exponent/?exp_lo=50281067&full=1[/URL] [URL]https://www.mersenne.org/report_exponent/?exp_lo=50485051&full=1[/URL] [URL]https://www.mersenne.org/report_exponent/?exp_lo=50584823&exp_hi=&full=1[/URL] Mlucas V18 and its introduction of nonzero shift would have been sometime in 2018. |
I didn't find anything indicating nonzero offset was ever implemented in clLucas.
|
[QUOTE=kriesel;542153]There are two issues; gpuowl zero offset twice on the same exponent, and gpuowl zero offset matching some other software's offset.
[/QUOTE] It seems to me that the zero-offset-DC problem can be addressed through external means. For example, manual DC assignments could be handed out only for exponents that had initial-LL with non-zero offset; and the need to DC the cases with zero-offset-initial-LL can be adequately covered by mprime through non-manual assignments. |
If people do 2 zero-offset LL tests with gpuowl, we just have to triple check them with Prime95/mprime/CUDALucas. It should not occur that often.
Maybe you should limit the exponent to 90M for LL test, since it is only for double checks. The few LL tests above 90M does not need to be double checked for a long time. |
[QUOTE=ATH;542193]Maybe you should limit the exponent to 90M for LL test, since it is only for double checks. The few LL tests above 90M does not need to be double checked for a long time.[/QUOTE]As a subproject I am running spot double checks ahead of the first-test wavefront, and it would be useful to be able to use the very efficient gpuowl on a fast new reliable Radeon VII in that. In general, while the extent of changeover from LL to PRP first test is heartening, there is much further to go in that regard. There are LL first test results reported this morning up to 108M on [URL]https://www.mersenne.org/report_recent_cleared/[/URL]
The two highest exponent primality tests were 107985967 and 107981609 LL by brode-runner. Ten of the highest 25 exponents' primality tests were LL. Note that some older gpu hardware is not capable of running gpuowl, so if used for primality testing, it will run CUDALucas for primality tests, forcing LL not PRP. Old hardware output should be checked sooner since it is less likely to be reliable and CUDALucas has no Jacobi check. A subjective summary of the recent cleared I saw follows. Anonymous and many other users submitted mixed LL & PRP. In some cases, including WR and kriesel, the LL were DC. all LL: AUM - Kuwait brode-runner curtisc Ryan Propper (all DC) TAMUC-ComputerScience all PRP: Ben Delo dcheuk George Woltman Gordon Spence marssystems Mihai Preda (shocking! ;) mrh.org Oliver Kruse oodaira S00030 Simon Josefsson Sebastien Broucke trebor Other things being equal or nearly so, I'm in favor of orthogonality, and against artificial limitations built into the software. If GIMPS as a project wants to limit future LL activity to below some exponent value, the place to do that is at the PrimeNet server. When the next Mersenne prime is found, we'll want to use gpuowl to confirm it. There are many >100Mdigit exponents LL tested and without a double-check. (end) |
[QUOTE=preda;542189]It seems to me that the zero-offset-DC problem can be addressed through external means. For example, manual DC assignments could be handed out only for exponents that had initial-LL with non-zero offset;.[/QUOTE]
The server currently does this. |
Mihai/George, could you explain why residue shift apparently incurs such a heavy performance penalty in gpuOwl?
For LL with shift, I can maybe understand it - during the carry step of each iteration, one needs to precompute the bit offset of the -2 for the current shift value and then inject it into the corresponding residue word - not a lot of cycles needed, but perhaps in a massively-parallel GPU context, slowing whichever one of those smaller work units gets the shifted -2 causes the others to stall - just speculating here. But in a PRP context, there is no per-iteration -2 subtrahend, we simply apply some initial shift value to the starting residue, then repeated-square-mod happily away, with the only shift-related expense being the per-iteration update of the shift value, shift = 2*shift (mod p), where the * and mod can both be replaced with low-latency operations, shift (or add), compute shift2 = shift - p, followed by cmov to select the proper one of shift and shift2. |
[QUOTE=ewmayer;542231]Mihai/George, could you explain why residue shift apparently incurs such a heavy performance penalty in gpuOwl?[/QUOTE]
I don't believe there would be a performance penalty. |
[QUOTE=ewmayer;542231]Mihai/George, could you explain why residue shift apparently incurs such a heavy performance penalty in gpuOwl?
For LL with shift, I can maybe understand it - during the carry step of each iteration, one needs to precompute the bit offset of the -2 for the current shift value and then inject it into the corresponding residue word - not a lot of cycles needed, but perhaps in a massively-parallel GPU context, slowing whichever one of those smaller work units gets the shifted -2 causes the others to stall - just speculating here. But in a PRP context, there is no per-iteration -2 subtrahend, we simply apply some initial shift value to the starting residue, then repeated-square-mod happily away, with the only shift-related expense being the per-iteration update of the shift value, shift = 2*shift (mod p), where the * and mod can both be replaced with low-latency operations, shift (or add), compute shift2 = shift - p, followed by cmov to select the proper one of shift and shift2.[/QUOTE] I would also need to look into how the error check needs to be updated for non-zero offset. I don't think it would be a big cost, but probably not a zero-cost either. IMO not the highest priority thing to do ATM. |
[QUOTE=Prime95;542241]I don't believe there would be a performance penalty.[/QUOTE]
Mihai in #2049: [quote]For the "offset", it's too much trouble to fit it in without adversely affecting PRP which is still the main focus; also "offset" brings too little benefit to be worth bothering with IMO.[/quote] I read "adversely affecting PRP" as implying a performance hit. I have a special interest here - once I finish my current round of v20-related updates to Mlucas I intend to get up to speed on the programming model used for gpuowl, with a long-term goal of enhancing it to support the negacyclic DWT (on top of the Mersenne-style IBDWT) and right-angle-transform data layout needed to support Fermat-mod arithmetic. For my Fermat number testing to date I've used pairs of side-by-side runs, both with 0 shift but at different FFT lengths. The problem is, as we approach F33, the window of possible sizes for the smaller, slightly-less-than-power-of-2 FFT length of said run pairings rapidly shrinks. For F31 a smaller-FFT length of 120M = 15*8M is gonna really be pushing the accuracy limits of a doubles-based FFT. For F33 we'd need at a minimum 496M = 31*16M, but that prime 31 means a 31-DFT, and even the best-of-breed such algorithm is horribly inefficient. So I originally had in mind some highly-composite length < 512M, specifically 504M = 63*8M, but even though 63 = 3^2.7 is decently smooth, the result will likely be slower than the accompanying 512M run. But in the meantime I've worked out all the needed details to do residue-shifted Fermat-mod arithmetic - it's quite a bit more involved than Mersenne-mod, for reasons I'll detail soon in an upcoming post to the "Pepin tests of Fermat numbers beyond F24" thread - but now that I've worked out the mathematical details and have working proof-of-principle code, it's clear that performance-wise it should be no worse than Mersenne-mod with shift. So F33 - starting with a deep p-1 stage 1 (where it's crucial to obtain a correct residue, since absent a resulting factor one wants to distribute said residue to many stage-2 subinterval-runners) can use paired runs at 512M FFT, each with a different shift. |
[QUOTE=ewmayer;542245]Mihai in #2049:
I read "adversely affecting PRP" as implying a performance hit. I have a special interest here - once I finish my current round of v20-related updates to Mlucas I intend to get up to speed on the programming model used for gpuowl, with a long-term goal of enhancing it to support the negacyclic DWT (on top of the Mersenne-style IBDWT) and right-angle-transform data layout needed to support Fermat-mod arithmetic. For my Fermat number testing to date I've used pairs of side-by-side runs, both with 0 shift but at different FFT lengths. The problem is, as we approach F33, the window of possible sizes for the smaller, slightly-less-than-power-of-2 FFT length of said run pairings rapidly shrinks. For F31 a smaller-FFT length of 120M = 15*8M is gonna really be pushing the accuracy limits of a doubles-based FFT. For F33 we'd need at a minimum 496M = 31*16M, but that prime 31 means a 31-DFT, and even the best-of-breed such algorithm is horribly inefficient. So I originally had in mind some highly-composite length < 512M, specifically 504M = 63*8M, but even though 63 = 3^2.7 is decently smooth, the result will likely be slower than the accompanying 512M run. But in the meantime I've worked out all the needed details to do residue-shifted Fermat-mod arithmetic - it's quite a bit more involved than Mersenne-mod, for reasons I'll detail soon in an upcoming post to the "Pepin tests of Fermat numbers beyond F24" thread - but now that I've worked out the mathematical details and have working proof-of-principle code, it's clear that performance-wise it should be no worse than Mersenne-mod with shift. So F33 - starting with a deep p-1 stage 1 (where it's crucial to obtain a correct residue, since absent a resulting factor one wants to distribute said residue to many stage-2 subinterval-runners) can use paired runs at 512M FFT, each with a different shift.[/QUOTE] Nice work! and your contributions to gpuowl are more than welcome. Did you consider a GEC-style error check for Pepin instead of paired runs? |
Tried to submit an LL result, got "Did not understand 1 lines."
{"exponent":"54907981", "worktype":"LL", "status":"C", "program":{"name":"gpuowl", "version":"v6.11-252-gaf403e2"}, "timestamp":"2020-04-10 14:05:02 UTC", "user":"kracker", "computer":"core", "aid":"xxxxxxxxxx", "fft-length":3145728, "res64":"xxxxxxxxxxxxx", "offset":0} |
gpuowl-v6.11-255-g81fa7c3 for Win 7 x64 or up
2 Attachment(s)
Latest commit build, build log, help output, etc.
|
[QUOTE=preda;542252]Nice work! and your contributions to gpuowl are more than welcome. Did you consider a GEC-style error check for Pepin instead of paired runs?[/QUOTE]
The primality-test runs - whatever hardware they end up being done on - will of course use the GEC, but for this type of rare-but-historic computation, us all believing the GEC is foolproof will not suffice in terms of a research-quality announcement and attendant peer-reviewed paper - there must be at least 2 runs, which, if not done by independently developed programs, must at least provide reasonable assurance of independent-FFT-data. Having done F24, believe me, merely omitting a third pure-integer-code "drone" run (using interim residues provided by 2 cross-checked fast floating-FFT runs) is already a stretch, as far as more conservative parts of the computational number theory community are concerned. One could object "but they accept GIMPS new-prime announcements, based on matching independent floating-FFT runs" -- true, and that establishes the minimum baseline for e.g. an F33 testing effort. Further, my ongoing Fermat number tests - currently finishing up run #2 of F30 @64M FFT, first run @60M finished late last year - all deposit interim every-10Miter checkpoint files, so knowing the format of same, anyone could do a parallel (in the sense of multiple runs, each covering a separate 10Miter subinterval) triple-check using whatever code they like. For F33 the resulting fileset, at ~1 GB per checkpoint and 858 such, will occupy slightly less than 1TB, so any such file sharing might have to be done using physical disk drives, depending on the state of storage technology at that timepoint. |
[QUOTE=kracker;542292]Tried to submit an LL result, got "Did not understand 1 lines."
{"exponent":"54907981", "worktype":"LL", "status":"C", "program":{"name":"gpuowl", "version":"v6.11-252-gaf403e2"}, "timestamp":"2020-04-10 14:05:02 UTC", "user":"kracker", "computer":"core", "aid":"xxxxxxxxxx", "fft-length":3145728, "res64":"xxxxxxxxxxxxx", "offset":0}[/QUOTE] Yep, I just got the same message, so I guess George or Aaron needs to update the manual result script for this "new" gpuowl LL test. I guess it is different than back in gpuowl 0.6? |
[QUOTE=kracker;542292]Tried to submit an LL result, got "Did not understand 1 lines."[/QUOTE][QUOTE=ATH;542304]Yep, I just got the same message, so I guess George or Aaron needs to update the manual result script for this "new" gpuowl LL test. I guess it is different than back in gpuowl 0.6?[/QUOTE]Looks like Mihai changed the result format and forgot to tell me about it. :smile: (I do the manual results form, most everything else is George or Aaron).
I'm just waiting to hear back from Mihai regarding the change in format, I'll post back when the manual form will accept these results. |
[QUOTE=James Heinrich;542315]Looks like Mihai changed the result format and forgot to tell me about it. :smile: (I do the manual results form, most everything else is George or Aaron).
I'm just waiting to hear back from Mihai regarding the change in format, I'll post back when the manual form will accept these results.[/QUOTE] Sorry for that. I need to remind myself what is the right format for LL JSON. |
[QUOTE=ATH;542304]I guess it is different than back in gpuowl 0.6?[/QUOTE]Very different; V0.6 was before the switch to JSON.
[URL]https://www.mersenneforum.org/showpost.php?p=531029&postcount=28[/URL] |
[QUOTE=kracker;542292]Tried to submit an LL result, got "Did not understand 1 lines."
{"exponent":"54907981", "worktype":"LL", "status":"C", "program":{"name":"gpuowl", "version":"v6.11-252-gaf403e2"}, "timestamp":"2020-04-10 14:05:02 UTC", "user":"kracker", "computer":"core", "aid":"xxxxxxxxxx", "fft-length":3145728, "res64":"xxxxxxxxxxxxx", "offset":0}[/QUOTE] Please replace "offset" with "shift-count" and re-submit the result -- it should be accepted after this change. This same change has been comitted to gpuowl, so this should be fixed after a re-checkout. |
[QUOTE=kriesel;542296]Latest commit build, build log, help output, etc.[/QUOTE]
v6.11-255 on Win7 x64, RX550 did not like the default fft at all. +1 etc syntax is apparently gone and if used, gpuowl fails in an interesting way. A quick read of the help output set it right and on its way with the second fft specification for the fft length. [CODE]C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>title gpuowl-v6.11-255-g81fa7c3/rx550 C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>gpuowl-win 2020-04-10 12:09:43 gpuowl v6.11-255-g81fa7c3 2020-04-10 12:09:43 config: -device 1 -user kriesel -cpu condorella/rx550 -yield -maxAlloc 3600 -use NO_ASM 2020-04-10 12:09:43 device 1, unique id '' 2020-04-10 12:09:43 condorella/rx550 94741139 FFT: 5M 1K:10:256 (18.07 bpw) 2020-04-10 12:09:43 condorella/rx550 Expected maximum carry32: 461E0000 2020-04-10 12:09:46 condorella/rx550 OpenCL args "-DEXP=94741139u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xf.3cd1fc041 1148p-3 -DIWEIGHT_STEP=0x8.66790bf53aca8p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DPM1=0 -DAMDGPU=1 -DNO_ASM=1 -cl-fast-relaxed-math -cl-std=CL2.0 " 2020-04-10 12:09:53 condorella/rx550 OpenCL compilation in 6.96 s 2020-04-10 12:10:09 condorella/rx550 94741139 EE 0 loaded: blockSize 400, 0000000000000000 (expected 0000000000000003) 2020-04-10 12:10:09 condorella/rx550 Exiting because "error on load" 2020-04-10 12:10:09 condorella/rx550 Bye C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>g611 C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>title gpuowl-v6.11-255-g81fa7c3/rx550 C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>gpuowl-win 2020-04-10 12:10:51 gpuowl v6.11-255-g81fa7c3 2020-04-10 12:10:51 config: -device 1 -user kriesel -cpu condorella/rx550 -yield -maxAlloc 3600 -use NO_ASM -fft +1 2020-04-10 12:10:51 device 1, unique id '' 2020-04-10 12:10:51 condorella/rx550 94741139 FFT: 128K 256:1:256 (722.82 bpw) 2020-04-10 12:10:51 condorella/rx550 FFT size too small for exponent (722.82 bits/word). 2020-04-10 12:10:51 condorella/rx550 Exiting because "FFT size too small" 2020-04-10 12:10:51 condorella/rx550 Bye C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>g611 C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>title gpuowl-v6.11-255-g81fa7c3/rx550 C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-255-g81fa7c3>gpuowl-win 2020-04-10 12:12:45 gpuowl v6.11-255-g81fa7c3 2020-04-10 12:12:45 config: -device 1 -user kriesel -cpu condorella/rx550 -yield -maxAlloc 3600 -use NO_ASM -fft 1K:5:512 2020-04-10 12:12:45 device 1, unique id '' 2020-04-10 12:12:45 condorella/rx550 94741139 FFT: 5M 1K:5:512 (18.07 bpw) 2020-04-10 12:12:45 condorella/rx550 Expected maximum carry32: 461E0000 2020-04-10 12:12:47 condorella/rx550 OpenCL args "-DEXP=94741139u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=5u -DWEIGHT_STEP=0xf.3cd1fc0411 148p-3 -DIWEIGHT_STEP=0x8.66790bf53aca8p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DPM1=0 -DAMDGPU=1 - DNO_ASM=1 -cl-fast-relaxed-math -cl-std=CL2.0 " 2020-04-10 12:12:55 condorella/rx550 OpenCL compilation in 8.18 s 2020-04-10 12:13:02 condorella/rx550 94741139 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-10 12:13:19 condorella/rx550 94741139 OK 800 0.00%; 14229 us/it; ETA 15d 14:28; 738c4e015132f834 (check 5.86s) 2020-04-10 13:00:54 condorella/rx550 94741139 OK 200000 0.21%; 14317 us/it; ETA 15d 15:59; e0463c77c58b0105 (check 5.87s) 2020-04-10 13:48:40 condorella/rx550 94741139 OK 400000 0.42%; 14319 us/it; ETA 15d 15:14; 5b1fe09cbecb5e40 (check 5.89s) 2020-04-10 14:36:27 condorella/rx550 94741139 OK 600000 0.63%; 14321 us/it; ETA 15d 14:29; 5f62cf32c024e1a2 (check 5.87s) 2020-04-10 15:24:15 condorella/rx550 94741139 OK 800000 0.84%; 14322 us/it; ETA 15d 13:44; 3dd122479d7dde25 (check 5.88s) 2020-04-10 16:12:02 condorella/rx550 94741139 OK 1000000 1.06%; 14319 us/it; ETA 15d 12:52; e44ae2f6c9046662 (check 5.87s) 2020-04-10 16:59:49 condorella/rx550 94741139 OK 1200000 1.27%; 14320 us/it; ETA 15d 12:06; b3a0108ad221f8fd (check 5.88s) 2020-04-10 17:47:36 condorella/rx550 94741139 OK 1400000 1.48%; 14319 us/it; ETA 15d 11:17; 6077a7f20c7ee45c (check 5.88s) 2020-04-10 17:49:53 condorella/rx550 Stopping, please wait.. 2020-04-10 17:50:05 condorella/rx550 94741139 OK 1410000 1.49%; 14328 us/it; ETA 15d 11:28; e02e0d0dca18d9f5 (check 5.87s) 2020-04-10 17:50:05 condorella/rx550 Exiting because "stop requested" 2020-04-10 17:50:05 condorella/rx550 Bye[/CODE] |
[QUOTE=kriesel;542296]Latest commit build, build log, help output, etc.[/QUOTE]
Could you (or kracker) please rebuild with the last change from preda, and repost? (I am not yet able to build gpuowl, I mean, I didn't try yet, but I will give it few tests as long as it can LL). |
[QUOTE=preda;542348]Please replace "offset" with "shift-count" and re-submit the result -- it should be accepted after this change.
This same change has been comitted to gpuowl, so this should be fixed after a re-checkout.[/QUOTE] Thanks, that worked. 2 successful double checks from gpuowl: [M]83174053[/M] [M]83180563[/M] |
Win7 x64 build of gpuowl v6.11-257
2 Attachment(s)
Latest available commit as of ~12 minutes before this post. Usual shower of warning in the build log; help output included; no testing performed. Enjoy, and please report here any issues.
|
1 Attachment(s)
Just now, I made the very stupid mistake of not checking a few DC residues before submitting a batch... :sorry:
I can redo them - or whatever is best. Nvidia P100 in colab. gpuowl v6.11-252-gaf403e2 OUT_SIZEX=16,IN_SIZEX=8,IN_SPACING=8 [code] 51509873 51491101 51491059 51490883 51490843 51491267 51491119 51509257 51490799 51490723 51490343 51490339 51508747 58650941 51488837 51491983 51491773 51491731 [/code] |
It seems the problem is associated with the setup
[QUOTE] OUT_SIZEX=16,IN_SIZEX=8,IN_SPACING=8 [/QUOTE] Did this setup work for another exponent? One way to check whether the FFT is broken is to run a few PRP iterations before starting the LL, e.g. ./gpuowl -prp 51509873 [QUOTE=kracker;542369]Just now, I made the very stupid mistake of not checking a few DC residues before submitting a batch... :sorry: I can redo them - or whatever is best. Nvidia P100 in colab. gpuowl v6.11-252-gaf403e2 OUT_SIZEX=16,IN_SIZEX=8,IN_SPACING=8 [code] 51509873 51491101 51491059 51490883 51490843 51491267 51491119 51509257 51490799 51490723 51490343 51490339 51508747 58650941 51488837 51491983 51491773 51491731 [/code][/QUOTE] |
with the previously set settings I'm getting an immediate EE... seems to work with no -use arguments.
[code] /content/drive/My Drive/gpuowl-colab 2020-04-12 02:08:53 gpuowl v6.11-252-gaf403e2 2020-04-12 02:08:53 config: -user kracker -cpu pce 2020-04-12 02:08:53 config: -ll 51509873 2020-04-12 02:08:53 device 0, unique id '' 2020-04-12 02:08:53 pce 51509873 FFT: 2.75M 256:11:512 (17.86 bpw) 2020-04-12 02:08:53 pce Expected maximum carry32: 2B810000 2020-04-12 02:08:54 pce OpenCL args "-DEXP=51509873u -DWIDTH=256u -DSMALL_HEIGHT=512u -DMIDDLE=11u -DWEIGHT_STEP=0x1.19794ea80bcb4p+0 -DIWEIGHT_STEP=0x1.d1a9c3958d155p-1 -DWEIGHT_BIGSTEP=0x1.ae89f995ad3adp+0 -DIWEIGHT_BIGSTEP=0x1.306fe0a31b715p-1 -DPM1=0 -cl-fast-relaxed-math -cl-std=CL2.0 " 2020-04-12 02:08:57 pce 2020-04-12 02:08:57 pce OpenCL compilation in 2.80 s 2020-04-12 02:08:57 pce 51509873 LL 0 loaded: 0000000000000004 2020-04-12 02:09:48 pce 51509873 LL 100000 0.19%; 509 us/it; ETA 0d 07:16; d4bf953f17f5dd56 2020-04-12 02:10:15 pce Stopping, please wait.. 2020-04-12 02:10:15 pce 51509873 LL 154000 0.30%; 510 us/it; ETA 0d 07:17; be98350bc1fe8687 2020-04-12 02:10:15 pce Exiting because "stop requested" 2020-04-12 02:10:15 pce Bye [/code] [code] /content/drive/My Drive/gpuowl-colab 2020-04-12 02:12:19 gpuowl v6.11-252-gaf403e2 2020-04-12 02:12:19 config: -user kracker -cpu pce 2020-04-12 02:12:19 config: -use OUT_SIZEX=16,IN_SIZEX=8,IN_SPACING=8 -ll 51509873 2020-04-12 02:12:19 device 0, unique id '' 2020-04-12 02:12:19 pce 51509873 FFT: 2.75M 256:11:512 (17.86 bpw) 2020-04-12 02:12:19 pce Expected maximum carry32: 2B810000 2020-04-12 02:12:19 pce OpenCL args "-DEXP=51509873u -DWIDTH=256u -DSMALL_HEIGHT=512u -DMIDDLE=11u -DWEIGHT_STEP=0x1.19794ea80bcb4p+0 -DIWEIGHT_STEP=0x1.d1a9c3958d155p-1 -DWEIGHT_BIGSTEP=0x1.ae89f995ad3adp+0 -DIWEIGHT_BIGSTEP=0x1.306fe0a31b715p-1 -DPM1=0 -DIN_SIZEX=8 -DIN_SPACING=8 -DOUT_SIZEX=16 -cl-fast-relaxed-math -cl-std=CL2.0 " 2020-04-12 02:12:19 pce 2020-04-12 02:12:19 pce OpenCL compilation in 0.01 s 2020-04-12 02:12:19 pce 51509873 LL 0 loaded: 0000000000000004 2020-04-12 02:13:09 pce 51509873 LL 100000 0.19%; 496 us/it; ETA 0d 07:05; a2891146b3ded4b9 2020-04-12 02:13:16 pce Stopping, please wait.. 2020-04-12 02:13:17 pce 51509873 LL 115000 0.22%; 502 us/it; ETA 0d 07:10; 42848d9cb649a731 2020-04-12 02:13:17 pce Exiting because "stop requested" 2020-04-12 02:13:17 pce Bye [/code] |
I created a script to test the speed of a bunch of combinations of the OUT_WG,OUT_SIZEX,OUT_SPACING,IN_WG,IN_SIZEX,IN_SPACING variables for the LL test.
It seems for LL test there is no block to stop combinations that will not work. Instead it zeros the residue. For example these: [CODE]./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=256,OUT_SIZEX=4,OUT_SPACING=128,IN_WG=64,IN_SIZEX=128,IN_SPACING=4 ./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=256,OUT_SIZEX=4,OUT_SPACING=128,IN_WG=64,IN_SIZEX=128,IN_SPACING=128 ./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=64,OUT_SIZEX=128,OUT_SPACING=8,IN_WG=64,IN_SIZEX=128,IN_SPACING=64 ./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=64,OUT_SIZEX=128,OUT_SPACING=128,IN_WG=64,IN_SIZEX=128,IN_SPACING=64 Output: 2020-04-13 22:32:34 Tesla P100-PCIE-16GB-0 OpenCL compilation in 2.22 s 2020-04-13 22:32:34 Tesla P100-PCIE-16GB-0 95000011 LL 0 loaded: 0000000000000004 2020-04-13 22:32:41 Tesla P100-PCIE-16GB-0 95000011 LL 10000 0.01%; 641 us/it; ETA 0d 16:54; fffffffffffffffd 2020-04-13 22:32:43 Tesla P100-PCIE-16GB-0 Stopping, please wait.. 2020-04-13 22:32:43 Tesla P100-PCIE-16GB-0 95000011 LL 14000 0.01%; 657 us/it; ETA 0d 17:20; fffffffffffffffd [/CODE] |
LL is "naked", no error check at all. Please try/tune combinations on PRP, which will help detect the invalid ones. Only after validation with PRP use any combination for LL.
[QUOTE=ATH;542574]I created a script to test the speed of a bunch of combinations of the OUT_WG,OUT_SIZEX,OUT_SPACING,IN_WG,IN_SIZEX,IN_SPACING variables for the LL test. It seems for LL test there is no block to stop combinations that will not work. Instead it zeros the residue. For example these: [CODE]./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=256,OUT_SIZEX=4,OUT_SPACING=128,IN_WG=64,IN_SIZEX=128,IN_SPACING=4 ./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=256,OUT_SIZEX=4,OUT_SPACING=128,IN_WG=64,IN_SIZEX=128,IN_SPACING=128 ./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=64,OUT_SIZEX=128,OUT_SPACING=8,IN_WG=64,IN_SIZEX=128,IN_SPACING=64 ./gpuowlLL -ll 95000011 -iters 30000 -log 10000 -use CARRY32,ORIG_SLOWTRIG,OUT_WG=64,OUT_SIZEX=128,OUT_SPACING=128,IN_WG=64,IN_SIZEX=128,IN_SPACING=64 Output: 2020-04-13 22:32:34 Tesla P100-PCIE-16GB-0 OpenCL compilation in 2.22 s 2020-04-13 22:32:34 Tesla P100-PCIE-16GB-0 95000011 LL 0 loaded: 0000000000000004 2020-04-13 22:32:41 Tesla P100-PCIE-16GB-0 95000011 LL 10000 0.01%; 641 us/it; ETA 0d 16:54; fffffffffffffffd 2020-04-13 22:32:43 Tesla P100-PCIE-16GB-0 Stopping, please wait.. 2020-04-13 22:32:43 Tesla P100-PCIE-16GB-0 95000011 LL 14000 0.01%; 657 us/it; ETA 0d 17:20; fffffffffffffffd [/CODE][/QUOTE] |
[QUOTE=preda;542577]LL is "naked", no error check at all. Please try/tune combinations on PRP, which will help detect the invalid ones. Only after validation with PRP use any combination for LL.[/QUOTE]Yikes, that means the LL side of gpuowl will be less reliable than CUDALucas v2.06, which has checks for known bad residues seen to occur,
0x0000000000000000, 0x0000000000000002, 0xffffffff80000000, 0xfffffffffffffffd, and excessive roundoff error. Gpuowl checks bits/word. A memory copy fail could give 0; +-2 values come from the residue getting zeroed and then the -2 and the squaring; the 33-bits-set value 0xffffffff80000000 comes from using far too short an fft length as was seen in both cllucas 1.02 and CUDALucas v2.03. [URL]https://mersenneforum.org/showpost.php?p=355661&postcount=232[/URL] [URL]https://mersenneforum.org/showpost.php?p=386081&postcount=299[/URL] |
We are struggling to run gpuOwn on windoze 7, nvidia card (2080Ti, but also 1080Ti). We are sure we are missing something. Can anyone point us to a tutorial? Right now, with cudaLucas, we are squeezing about 22 hours for a 55M LL test. We want to see what the owl can do, before replacing the cards with a couple of radeon vees (or.. it is wees? like in "waa wee cafè"?)
|
[QUOTE=kriesel;542591]Yikes, that means the LL side of gpuowl will be less reliable than CUDALucas v2.06, which has checks for known bad residues seen to occur,
0x0000000000000000, 0x0000000000000002, 0xffffffff80000000, 0xfffffffffffffffd, and excessive roundoff error. Gpuowl checks bits/word. A memory copy fail could give 0; +-2 values come from the residue getting zeroed and then the -2 and the squaring; the 33-bits-set value 0xffffffff80000000 comes from using far too short an fft length as was seen in both cllucas 1.02 and CUDALucas v2.03. [URL]https://mersenneforum.org/showpost.php?p=355661&postcount=232[/URL] [URL]https://mersenneforum.org/showpost.php?p=386081&postcount=299[/URL][/QUOTE] Things may improve in time; this is an intermediary point in the timeline, not the final perfect LL. |
[QUOTE=LaurV;542605]We are struggling to run gpuOwn on windoze 7, nvidia card (2080Ti, but also 1080Ti). We are sure we are missing something. Can anyone point us to a tutorial? Right now, with cudaLucas, we are squeezing about 22 hours for a 55M LL test. We want to see what the owl can do, before replacing the cards with a couple of radeon vees (or.. it is wees? like in "waa wee cafè"?)[/QUOTE]
1. does clinfo run, detecting the GPUs? 2. does gpuowl -h run, printing the list of GPUs? 3. what fails? |
v6.11-134, no errors for months on this RX550 gpu running 9xM PRP.
v6.11-257, 2 errors in 5 hours, same gpu[CODE]2020-04-14 02:08:31 gpuowl v6.11-257-g39fc002 2020-04-14 02:08:31 config: -device 1 -user kriesel -cpu condorella/rx550 -yield -maxAlloc 3600 -use NO_ASM 2020-04-14 02:08:31 device 1, unique id '' 2020-04-14 02:08:31 condorella/rx550 94741139 FFT: 5M 1K:10:256 (18.07 bpw) 2020-04-14 02:08:31 condorella/rx550 Expected maximum carry32: 461E0000 2020-04-14 02:08:32 condorella/rx550 OpenCL args "-DEXP=94741139u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xf .3cd1fc0411148p-3 -DIWEIGHT_STEP=0x8.66790bf53aca8p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DPM1=0 -DAMDGPU=1 -DNO_ASM=1 -cl-fast-relaxed-math -cl-std=CL2.0 " 2020-04-14 02:08:38 condorella/rx550 OpenCL compilation in 5.62 s 2020-04-14 02:08:44 condorella/rx550 94741139 OK 6054400 loaded: blockSize 400, cc34a0f738ddbc39 2020-04-14 02:09:01 condorella/rx550 94741139 OK 6055200 6.39%; 13606 us/it; ETA 13d 23:12; 2b8af08eb69c5bb4 (check 5.61s) 2020-04-14 02:42:10 condorella/rx550 94741139 OK 6200000 6.54%; 13711 us/it; ETA 14d 01:13; 36657b8d4cf7b2b8 (check 5.63s) 2020-04-14 03:27:58 condorella/rx550 94741139 OK 6400000 6.76%; 13717 us/it; ETA 14d 00:36; 4941c60b1a288320 (check 5.63s) 2020-04-14 04:13:45 condorella/rx550 94741139 OK 6600000 6.97%; 13717 us/it; ETA 13d 23:50; 67aa94150e6fcccf (check 5.62s) 2020-04-14 04:59:31 condorella/rx550 94741139 OK 6800000 7.18%; 13712 us/it; ETA 13d 22:58; cab0b7a0fb0cc066 (check 5.65s) 2020-04-14 05:45:17 condorella/rx550 94741139 EE 7000000 7.39%; 13710 us/it; ETA 13d 22:09; 5e731e02beb738ea (check 5.61s) 2020-04-14 05:45:23 condorella/rx550 94741139 OK 6800000 loaded: blockSize 400, cab0b7a0fb0cc066 2020-04-14 05:54:37 condorella/rx550 94741139 OK 6840000 7.22%; 13711 us/it; ETA 13d 22:46; 4f7b98cea0650fb9 (check 5.63s) 1 er rors 2020-04-14 06:22:07 condorella/rx550 94741139 OK 6960000 7.35%; 13714 us/it; ETA 13d 22:24; a47542d527e8a188 (check 5.63s) 1 er rors 2020-04-14 06:49:37 condorella/rx550 94741139 EE 7080000 7.47%; 13711 us/it; ETA 13d 21:53; b71198a3d710f35b (check 5.62s) 1 er rors 2020-04-14 06:49:43 condorella/rx550 94741139 OK 6960000 loaded: blockSize 400, a47542d527e8a188 2020-04-14 07:09:07 condorella/rx550 94741139 OK 7040000 7.43%; 13716 us/it; ETA 13d 22:08; b7ef942604ff7e9d (check 5.62s) 2 er rors[/CODE] |
Luckily it was a motherboard that failed for me not the R7. I had a mammoth battle installing rocm-3.3.0 onto a different Debian Buster machine, my desktop, which involved a kernel upgrade and it works! With the memory at stock: 0 1000 820 1050 4 and two instances running I am getting 1409us/it each :smile:
|
[QUOTE=paulunderwood;542641]Luckily it was a motherboard that failed for me not the R7. I had a mammoth battle installing rocm-3.3.0 onto a different Debian Buster machine, my desktop, which involved a kernel upgrade and it works! With the memory at stock: 0 1000 820 1050 4 and two instances running I am getting 1409us/it each :smile:[/QUOTE]
Good for you - what sclk setting is the quoted timing at, and what's the total system wall wattage, if you have it running through a wattmeter? |
[QUOTE=ewmayer;542680]Good for you - what sclk setting is the quoted timing at, and what's the total system wall wattage, if you have it running through a wattmeter?[/QUOTE]
No meter. At sclk 4 it is drawing 214 watts according to sensors. The odd thing I noticed is that timings for a PRP test used to go down when the other instance was running P-1, but now (as a desktop GPU) it goes up when P-1 is running. |
[QUOTE=paulunderwood;542685]No meter. At sclk 4 it is drawing 214 watts according to sensors.
The odd thing I noticed is that timings for a PRP test used to go down when the other instance was running P-1, but now (as a desktop GPU) it goes up when P-1 is running.[/QUOTE] I've noticed similar timings effects with one-PRP-one-P-1 ... I think it's due to some kind of internal GPU task-priority setting, which (AFAIK) the user has no contol over. |
gpuowl-win v6.11-259-g83434d8 build fail
some usual-looking warnings, then:[CODE]Gpu.cpp: In member function 'void Gpu::printRoundoff(u32)':
Gpu.cpp:844:35: error: 'M_PI' was not declared in this scope; did you mean 'M_PIl'? 844 | double beta = sdev * (sqrt(6) / M_PI); | ^~~~ | M_PIl make: *** [Makefile:30: Gpu.o] Error 1 [/CODE] |
gpuowl-win v6.11-259-g83434d8 build
2 Attachment(s)
After the trivial edit, see preceding post, built ok.
|
[QUOTE=preda;542577]LL is "naked", no error check at all. Please try/tune combinations on PRP, which will help detect the invalid ones. Only after validation with PRP use any combination for LL.[/QUOTE]
Yes, be very careful tuning LL performance. I tried using the fastest settings that did not zero the residue, but the final residue from the test did not match the initial LL test on the exponent. So I was suspicious and did not turn in the result, but ran the LL test again using the settings from PRP tuning that I used for the other successful LL double checks, and this time it did match the first test and finished the double check. Which means that my first test was faulty due to too aggressive settings. |
3 Attachment(s)
[QUOTE=preda;542607]1. does clinfo run, detecting the GPUs?
2. does gpuowl -h run, printing the list of GPUs? 3. what fails?[/QUOTE] Managed to make it run. I only had to look into the mirror to see where the dumb was. For some odd reason, I was using very old drivers (397), and when I decided to use the new owl freshly compiled by the people here, I had to upgrade the drivers to the new 445.7x (?) available last week. Either something went wrong at that time, or the driver didn't work properly. I suspect, the last, because they just released 445.87 yesterday, and with this it works. It seemed a fast fix on their side, so maybe indeed something was wrong with the drivers. Now, the comments... Be ready... It is coming... [U]Now[/U]... The owl is about 7% faster than cudaLucas running with old drivers. And trying cudaLucas with newer drivers, I suddenly remembered why I kept the old :picard:... all drivers from the generation 4xx are about 6% slower when running cudaLucas for 1080Ti and 2080Ti. I already forgot this, because in the last time I only played mfaktc/gpu72. So, with the new drivers, in the race between the owl and cudaLucas, the owl comes out about 13% faster. This is good. Also, the cards run much cooler, about 9-10°C cooler that cudaLucas runs them. Also good. [ATTACH]22045[/ATTACH] (ignore the iteration at 300k, that is because we were doing something else more important in that time, for few seconds). But then, the CPU activity seems strange in that graphic, the temperature jumps up and down like somebody is stealing ticks from P95. And voila, the owl gets full core for itself. (in the photo, 5% and 45% is due to HT, this CPU has 10 phys cores, so it is 10% and 90% occupancy). [ATTACH]22046[/ATTACH] That's not good. As a proof, P95 performance goes down 10% compared with the case when cudaLucas runs. That is 20%, if two copies (two cards) run. And if your CPU has only 4 cores, than the performance of P95 gets down 50%, because gpuOwl (2 copies) can monopolize 2 cores from 4. That's bad. [ATTACH]22047[/ATTACH] (in this picture, P95 runs parallel with 1 gpuOwl thread then with 2 cudaLucas threads, then back, then some other tests) So, if you have Nvidia cards and want to use gpuOwl, AND use P95 in the same time, you have to see how do you get more output, running P95 al lower performance, and use the faster owl with the GPU(s) or running P95 full speed and using the (slower) cudaLucas (as cudaLucas does not take [U]any[/U] CPU resources). Temperature is also a thing, but there is no dilemma here, if you have common water circuit, then the CPU running cooler reflects also in the GPUs running cooler. To make sure, separate water circuit will be needed (which I will do, probably during the weekend). And checkpoint file names suck... where is the "exponent.iteration.residue.ll" naming convention? :razz: And where is the shift? :rant: |
[QUOTE=LaurV;542874]Now, the comments... Be ready... It is coming... [U]Now[/U]..[/QUOTE]
Everybody's a critic :) Try the -yield option to tell gpuowl to yield the CPU to other tasks. I don't recall all the details but I believe this is really an nVidia openCL bug. |
1 Attachment(s)
[QUOTE=Prime95;542879]
Try the -yield option [/QUOTE] No difference. (this was in fact written in the help, being dumb again, or too tired, 0:35AM here, going to bed after this...) But wait! What? WHAT? WTF? :shock: [ATTACH]22048[/ATTACH] (cL running on the other card, regardless of the fact that gpuOwl is running or not, now it takes a CPU core). It seems a NV problem indeed. I didn't have this in the past. Maybe I am dumb, but I will have to dig up the old drivers to verify. |
All on the same RX550 gpu and host system, that has reliably run without GEC errors for multiple 5M PRP first-tests on v6.11-134:
90710093 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.30 bits/word 92858651 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.71 bits/word 93461911 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.83 bits/word 93873049 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.90 bits/word 94418047 began FFT 5120K: Width 256x4, Height 64x4, Middle 10; 18.01 bits/word, ran ok. finished at 6M because of problems encountered at 7M on 131.5M, requiring -fft +6 All runs -use NO_ASM V6.11-134 on rx550 from start on 94741139; no GEC to 2.8M iterations; this was at 6M fft, 18370 us/iter due to problems seen with 7M fft (leftover -fft +6 config.txt content) V6.11-257 continuation on same rx550, 5M fft 1K:5:512; 14310 us/iter to 6.0544M iterations V6.11-257 continuation on same RX550, no fft specification; 1K:10:256 chosen by program; ~13712 us/iter, 9 GEC errors by 22.1M iterations. V6.11-259 continuation with -use STATS on same RX550, 1K:10:256, ~16542 us/iter, no additional GEC through 25.64M iterations. V6.11-259 continuation without -use STATS underway now, 13750 usec/iter |
gpuowl-win v6.11-264-264-g5c977d4 build
2 Attachment(s)
Nice, gracefully deals with the ASM issue. That may have been there a while, I haven't tried it recently until now.
[CODE]2020-04-17 14:17:34 roa/radeonvii ASM compilation failed, retrying compilation using NO_ASM[/CODE]Please fix the github source for this, previously reported minor issue.[CODE]Gpu.cpp: In member function 'void Gpu::printRoundoff(u32)': Gpu.cpp:851:35: error: 'M_PI' was not declared in this scope; did you mean 'M_PIl'? 851 | double beta = sdev * (sqrt(6) / M_PI); | ^~~~ | M_PIl make: *** [Makefile:30: Gpu.o] Error 1[/CODE]Please fix the readme.md which says P-1 iterations for LL; it's P-2. (See [URL]https://www.mersenne.org/various/math.php#lucas-lehmer[/URL]) -use doc from top of gpuowl.cl source code; note that there are additional that are derived from these optional inputs:[CODE]// gpuOwl, an OpenCL Mersenne primality test. // Copyright Mihai Preda and George Woltman. /* List of user-serviceable -use flags and their effects DEBUG : enable asserts. Slow, but allows to verify that all asserts hold. NO_ASM : request to not use any inline __asm() NO_OMOD: do not use GCN output modifiers in __asm() OUT_WG,OUT_SIZEX,OUT_SPACING <AMD default is 256,32,4> <nVidia default is 256,4,1 but needs testing> IN_WG,IN_SIZEX,IN_SPACING <AMD default is 256,32,1> <nVidia default is 256,4,1 but needs testing> UNROLL_WIDTH <nVidia default> NO_UNROLL_WIDTH <AMD default> OLD_FFT8 <default> NEWEST_FFT8 NEW_FFT8 OLD_FFT5 NEW_FFT5 <default> NEWEST_FFT5 NEW_FFT10 <default> OLD_FFT10 CARRY32 <AMD default for PRP when appropriate> CARRY64 <nVidia default>, <AMD default for PM1 when appropriate> CARRYM32 CARRYM64 <default> ORIG_SLOWTRIG NEW_SLOWTRIG <default> // Our own sin/cos implementation ROCM_SLOWTRIG // Use ROCm's private reduced-argument sin/cos ROCM31 <AMD default> // Enable workaround for ROCm 3.1 bug affecting kcos() NO_ROCM31 <nVidia default> ---- P-1 below ---- NO_P2_FUSED_TAIL // Do not use the big kernel tailFusedMulDelta */ [/CODE]Somehow describing the available values and restrictions would be a useful addition. Granted, we don't want to absorb too much time of the coders in documentation of transient states. However, more readily available info would aid testing support by others and reduce lost time by users. -use STATS and ROUNDOFF are not listed above. Have they been removed? There do seem to be some oddities, like getting the message both CARRY32 and CARRY64 have been specified when CARRY32 is specified, or out of resources message and terminate when -use STATS attempted. [CODE]2020-04-17 14:37:29 gpuowl v6.11-264-g5c977d4-dirty 2020-04-17 14:37:29 config: -device 1 -user kriesel -cpu roa/radeonvii-w2 2020-04-17 14:37:29 device 1, unique id '' 2020-04-17 14:37:29 roa/radeonvii-w2 852348659 FFT: 48M 4K:12:512 (16.93 bpw) 2020-04-17 14:37:29 roa/radeonvii-w2 Expected maximum carry32: 70BA0000 2020-04-17 14:37:41 roa/radeonvii-w2 OpenCL args "-DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -cl-fast-relaxed-math -cl-std=CL2.0 " 2020-04-17 14:37:41 roa/radeonvii-w2 ASM compilation failed, retrying compilation using NO_ASM 2020-04-17 14:37:57 roa/radeonvii-w2 OpenCL compilation in 16.49 s 2020-04-17 14:38:07 roa/radeonvii-w2 852348659 OK 165317600 loaded: blockSize 400, ad90fdb1696eabf0 2020-04-17 14:38:25 roa/radeonvii-w2 852348659 OK 165318400 19.40%; 12853 us/it; ETA 102d 04:56; 8dcd685e25e59f08 (check 7.34s) 4 errors 2020-04-17 14:38:53 roa/radeonvii-w2 852348659 OK 165320000 19.40%; 12831 us/it; ETA 102d 00:41; d1ff263c2a76e8c1 (check 7.27s) 4 errors 2020-04-17 14:40:31 roa/radeonvii-w2 Stopping, please wait.. 2020-04-17 14:40:42 roa/radeonvii-w2 852348659 OK 165328000 19.40%; 12834 us/it; ETA 102d 01:12; f095cd3f6ba99a30 (check 6.65s) 4 errors 2020-04-17 14:40:42 roa/radeonvii-w2 Exiting because "stop requested" 2020-04-17 14:40:42 roa/radeonvii-w2 Bye 2020-04-17 14:40:47 gpuowl v6.11-264-g5c977d4-dirty 2020-04-17 14:40:47 config: -device 1 -user kriesel -cpu roa/radeonvii-w2 -use NO_ASM,CARRY32,STATS,DEBUG 2020-04-17 14:40:47 device 1, unique id '' 2020-04-17 14:40:47 roa/radeonvii-w2 852348659 FFT: 48M 4K:12:512 (16.93 bpw) 2020-04-17 14:40:47 roa/radeonvii-w2 Expected maximum carry32: 70BA0000 2020-04-17 14:40:59 roa/radeonvii-w2 OpenCL args "-DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DCARRY32=1 -DDEBUG=1 -DNO_ASM=1 -DSTATS=1 -cl-fast-relaxed-math -cl-std=CL2.0 " 2020-04-17 14:41:00 roa/radeonvii-w2 ASM compilation failed, retrying compilation using NO_ASM 2020-04-17 14:41:00 roa/radeonvii-w2 OpenCL compilation error -11 (args -DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DCARRY32=1 -DDEBUG=1 -DNO_ASM=1 -DSTATS=1 -cl-fast-relaxed-math -cl-std=CL2.0 -DNO_ASM=1) 2020-04-17 14:41:00 roa/radeonvii-w2 C:\Users\ken\AppData\Local\Temp\\OCL2832T1.cl:82:2: error: Conflict: both CARRY32 and CARRY64 requested #error Conflict: both CARRY32 and CARRY64 requested ^ 1 error generated. error: Clang front-end compilation failed! Frontend phase failed compilation. Error: Compiling CL to IR 2020-04-17 14:41:00 roa/radeonvii-w2 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:247 build 2020-04-17 14:41:00 roa/radeonvii-w2 Bye 2020-04-17 14:41:33 gpuowl v6.11-264-g5c977d4-dirty 2020-04-17 14:41:33 config: -device 1 -user kriesel -cpu roa/radeonvii-w2 -use NO_ASM,STATS,DEBUG 2020-04-17 14:41:33 device 1, unique id '' 2020-04-17 14:41:33 roa/radeonvii-w2 852348659 FFT: 48M 4K:12:512 (16.93 bpw) 2020-04-17 14:41:33 roa/radeonvii-w2 Expected maximum carry32: 70BA0000 2020-04-17 14:41:45 roa/radeonvii-w2 OpenCL args "-DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DDEBUG=1 -DNO_ASM=1 -DSTATS=1 -cl-fast-relaxed-math -cl-std=CL2.0 " 2020-04-17 14:42:09 roa/radeonvii-w2 OpenCL compilation in 24.22 s 2020-04-17 14:42:13 roa/radeonvii-w2 Exception gpu_error: OUT_OF_RESOURCES carryFused at clwrap.cpp:310 run 2020-04-17 14:42:13 roa/radeonvii-w2 Bye 2020-04-17 14:42:31 gpuowl v6.11-264-g5c977d4-dirty 2020-04-17 14:42:31 config: -device 1 -user kriesel -cpu roa/radeonvii-w2 -use NO_ASM,STATS,DEBUG,ROUNDOFF 2020-04-17 14:42:31 device 1, unique id '' 2020-04-17 14:42:31 roa/radeonvii-w2 852348659 FFT: 48M 4K:12:512 (16.93 bpw) 2020-04-17 14:42:31 roa/radeonvii-w2 Expected maximum carry32: 70BA0000 2020-04-17 14:42:43 roa/radeonvii-w2 OpenCL args "-DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DDEBUG=1 -DNO_ASM=1 -DROUNDOFF=1 -DSTATS=1 -cl-fast-relaxed-math -cl-std=CL2.0 " 2020-04-17 14:43:06 roa/radeonvii-w2 OpenCL compilation in 23.37 s 2020-04-17 14:43:11 roa/radeonvii-w2 Exception gpu_error: OUT_OF_RESOURCES carryFused at clwrap.cpp:310 run 2020-04-17 14:43:11 roa/radeonvii-w2 Bye 2020-04-17 14:43:40 gpuowl v6.11-264-g5c977d4-dirty 2020-04-17 14:43:40 config: -device 1 -user kriesel -cpu roa/radeonvii-w2 -use NO_ASM,STATS 2020-04-17 14:43:40 device 1, unique id '' 2020-04-17 14:43:40 roa/radeonvii-w2 852348659 FFT: 48M 4K:12:512 (16.93 bpw) 2020-04-17 14:43:40 roa/radeonvii-w2 Expected maximum carry32: 70BA0000 2020-04-17 14:43:52 roa/radeonvii-w2 OpenCL args "-DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DNO_ASM=1 -DSTATS=1 -cl-fast-relaxed-math -cl-std=CL2.0 " 2020-04-17 14:44:10 roa/radeonvii-w2 OpenCL compilation in 18.00 s 2020-04-17 14:44:14 roa/radeonvii-w2 Exception gpu_error: OUT_OF_RESOURCES carryFused at clwrap.cpp:310 run 2020-04-17 14:44:14 roa/radeonvii-w2 Bye 2020-04-17 14:44:54 gpuowl v6.11-264-g5c977d4-dirty 2020-04-17 14:44:54 config: -device 1 -user kriesel -cpu roa/radeonvii-w2 -use NO_ASM,DEBUG 2020-04-17 14:44:54 device 1, unique id '' 2020-04-17 14:44:54 roa/radeonvii-w2 852348659 FFT: 48M 4K:12:512 (16.93 bpw) 2020-04-17 14:44:54 roa/radeonvii-w2 Expected maximum carry32: 70BA0000 2020-04-17 14:45:06 roa/radeonvii-w2 OpenCL args "-DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DDEBUG=1 -DNO_ASM=1 -cl-fast-relaxed-math -cl-std=CL2.0 " 2020-04-17 14:45:31 roa/radeonvii-w2 OpenCL compilation in 24.75 s 2020-04-17 14:45:43 roa/radeonvii-w2 852348659 OK 165328000 loaded: blockSize 400, f095cd3f6ba99a30 2020-04-17 14:46:07 roa/radeonvii-w2 852348659 OK 165328800 19.40%; 17545 us/it; ETA 139d 12:21; ebb1fef7de6d72cf (check 9.52s) 4 errors /[/CODE] Debug overhead seems to be around 35.%; roundoff 0.03% based on very brief trials. |
[QUOTE=kriesel;542984]
-use STATS and ROUNDOFF are not listed above. Have they been removed? There do seem to be some oddities, like getting the message both CARRY32 and CARRY64 have been specified when CARRY32 is specified, or out of resources message and terminate when -use STATS attempted.[/QUOTE] (the small problems fixed in a recent commit: README p-1, M_PI) - ROUNDOFF does not exist anymore, it's called STATS now. - with that exponent, CARRY64 is required so the software inserts it for you. At the same time, you manually request CARRY32. This situation is reported as a conflict. This hand-holding has the advantage to protect against users specifying invalid CARRY32 which would be severe for a naked LL - do you have more details about the "out of resources" with STATS? |
[QUOTE=preda;543008](the small problems fixed in a recent commit: README p-1, M_PI)
- ROUNDOFF does not exist anymore, it's called STATS now. - with that exponent, CARRY64 is required so the software inserts it for you. At the same time, you manually request CARRY32. This situation is reported as a conflict. This hand-holding has the advantage to protect against users specifying invalid CARRY32 which would be severe for a naked LL[/QUOTE]Thanks for the above. [QUOTE]- do you have more details about the "out of resources" with STATS?[/QUOTE]I don't have any more than what the console showed in the preceding post's long CODE section. gpuowl.log says the same; see following.[CODE]2020-04-17 14:43:40 config: -device 1 -user kriesel -cpu roa/radeonvii-w2 -use NO_ASM,STATS 2020-04-17 14:43:40 config: ;,DEBUG,ROUNDOFF 2020-04-17 14:43:40 device 1, unique id '' 2020-04-17 14:43:40 roa/radeonvii-w2 852348659 FFT: 48M 4K:12:512 (16.93 bpw) 2020-04-17 14:43:40 roa/radeonvii-w2 Expected maximum carry32: 70BA0000 2020-04-17 14:43:52 roa/radeonvii-w2 OpenCL args "-DEXP=852348659u -DWIDTH=4096u -DSMALL_HEIGHT=512u -DMIDDLE=12u -DWEIGHT_STEP=0x8.5ee83d8b97248p-3 -DIWEIGHT_STEP=0xf.4a97a185410b8p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-4 -DPM1=0 -DAMDGPU=1 -DCARRY64=1 -DNO_ASM=1 -DSTATS=1 -cl-fast-relaxed-math -cl-std=CL2.0 " 2020-04-17 14:44:10 roa/radeonvii-w2 OpenCL compilation in 18.00 s 2020-04-17 14:44:14 roa/radeonvii-w2 Exception gpu_error: OUT_OF_RESOURCES carryFused at clwrap.cpp:310 run 2020-04-17 14:44:14 roa/radeonvii-w2 Bye [/CODE]I'll try -use STATS on some other exponents / fft lengths. |
-use STATS through the occurrence of a GEC error
Finally caught a GEC error with -use STATS in effect, after ~ 30 hours, on gpuowl-win v6.11-259, rx550 that was reliable previously. Maybe the fft length's exponent limit is just slightly too high and this exponent is pushing it. See also [URL]https://mersenneforum.org/showpost.php?p=542963&postcount=2094[/URL] and [URL]https://mersenneforum.org/showthread.php?t=25452[/URL]
The program's help output says the 5M fft is usable up to 95.71M.[CODE] 2020-04-18 00:29:06 condorella/rx550 94741139 OK 28880000 30.48%; 16548 us/it; ETA 12d 14:44; dee8912001b65af1 (check 6.82s) 9 errors 2020-04-18 00:40:08 condorella/rx550 Roundoff: N=40500, max 0.300651, avg 0.205114, sdev 0.012225 (0.059601, 0.061244), max-round 0.400715 2020-04-18 00:40:08 condorella/rx550 Carry: N=40499, max 3ddbbf85, avg 2b51cb2f; CarryM: N=1, max 81bd1be4, avg 81bd1be4 2020-04-18 00:40:15 condorella/rx550 94741139 OK 28920000 30.53%; 16549 us/it; ETA 12d 14:35; 75ec23b835c4110e (check 6.91s) 9 errors 2020-04-18 00:51:17 condorella/rx550 Roundoff: N=40500, max 0.287802, avg 0.205098, sdev 0.012112 (0.059053, 0.060665), max-round 0.398885 2020-04-18 00:51:17 condorella/rx550 Carry: N=40499, max 3ab5e0b9, avg 2b54d6af; CarryM: N=1, max 8a7e172e, avg 8a7e172e 2020-04-18 00:51:23 condorella/rx550 94741139 OK 28960000 30.57%; 16543 us/it; ETA 12d 14:17; bd54d7293f55a663 (check 6.80s) 9 errors 2020-04-18 01:02:25 condorella/rx550 Roundoff: N=40500, max 0.303200, avg 0.205065, sdev 0.012201 (0.059499, 0.061136), max-round 0.400285 2020-04-18 01:02:25 condorella/rx550 Carry: N=40499, max 3c80220d, avg 2b50c51e; CarryM: N=1, max 904d2cef, avg 904d2cef 2020-04-18 01:02:32 condorella/rx550 94741139 OK 29000000 30.61%; 16550 us/it; ETA 12d 14:14; 4abacbf0300b8d02 (check 6.78s) 9 errors 2020-04-18 01:13:34 condorella/rx550 Roundoff: N=40500, [B]max 0.507677[/B], avg 0.205098, sdev 0.012272 (0.059834, 0.061490), max-round 0.401449 2020-04-18 01:13:34 condorella/rx550 Carry: N=40499, max 3ca2ec25, avg 2b5134d2; CarryM: N=1, max 8ee0956e, avg 8ee0956e 2020-04-18 01:13:41 condorella/rx550 94741139 [COLOR=Red][B]EE[/B][/COLOR] 29040000 30.65%; 16547 us/it; ETA 12d 14:00; 94dffebb91e5bac9 (check 6.79s) 9 errors 2020-04-18 01:13:48 condorella/rx550 94741139 OK 29000000 loaded: blockSize 400, 4abacbf0300b8d02 2020-04-18 01:24:50 condorella/rx550 Roundoff: N=40928, max 0.308405, avg 0.205040, sdev 0.012282 (0.059900, 0.061560), max-round 0.401552 2020-04-18 01:24:50 condorella/rx550 Carry: N=40926, max 3ca2ec25, avg 2b4e7a6d; CarryM: N=2, max 879ffbad, avg 71a4f7ac 2020-04-18 01:24:57 condorella/rx550 94741139 OK 29040000 30.65%; 16552 us/it; ETA 12d 14:05; 94dffebb91e5bac9 (check 6.81s) 10 errors 2020-04-18 01:35:59 condorella/rx550 Roundoff: N=40500, max 0.289172, avg 0.205136, sdev 0.012255 (0.059742, 0.061392), max-round 0.401219 2020-04-18 01:35:59 condorella/rx550 Carry: N=40499, max 3cd22042, avg 2b55bcd1; CarryM: N=1, max 8e3bb9df, avg 8e3bb9df 2020-04-18 01:36:06 condorella/rx550 94741139 OK 29080000 30.69%; 16549 us/it; ETA 12d 13:50; 1af282f82b04e1e9 (check 6.79s) 10 errors 2020-04-18 01:47:08 condorella/rx550 Roundoff: N=40500, max 0.286431, avg 0.205259, sdev 0.012291 (0.059879, 0.061538), max-round 0.401912 2020-04-18 01:47:08 condorella/rx550 Carry: N=40499, max 3c82a79c, avg 2b52a12b; CarryM: N=1, max 7cc1bbcb, avg 7cc1bbcb 2020-04-18 01:47:14 condorella/rx550 94741139 OK 29120000 30.74%; 16545 us/it; ETA 12d 13:35; ca53d4b9ba5f4404 (check 6.80s) 10 errors 2020-04-18 01:58:16 condorella/rx550 Roundoff: N=40500, max 0.311799, avg 0.205057, sdev 0.012195 (0.059469, 0.061105), max-round 0.400171 2020-04-18 01:58:16 condorella/rx550 Carry: N=40499, max 390cef07, avg 2b50e612; CarryM: N=1, max 81a290db, avg 81a290db 2020-04-18 01:58:23 condorella/rx550 94741139 OK 29160000 30.78%; 16549 us/it; ETA 12d 13:29; c18cada5e14a957b (check 6.79s) 10 errors 2020-04-18 02:09:25 condorella/rx550 Roundoff: N=40500, max 0.306561, avg 0.205154, sdev 0.012206 (0.059498, 0.061135), max-round 0.400454 2020-04-18 02:09:25 condorella/rx550 Carry: N=40499, max 3b2efd5d, avg 2b527db0; CarryM: N=1, max 82169439, avg 82169439 2020-04-18 02:09:32 condorella/rx550 94741139 OK 29200000 30.82%; 16544 us/it; ETA 12d 13:12; e8410c7d73b8f089 (check 6.81s) 10 errors 2020-04-18 02:20:34 condorella/rx550 Roundoff: N=40500, max 0.299344, avg 0.205238, sdev 0.012232 (0.059601, 0.061244), max-round 0.400957 2020-04-18 02:20:34 condorella/rx550 Carry: N=40499, max 39891e5b, avg 2b510501; CarryM: N=1, max 7aa1e3fc, avg 7aa1e3fc[/CODE] |
Quibble - reported roundoff error should never be > 0.5, as the fractional part is computed as abs(x - rnd(x)).
|
I see in this case that the residue was correct when the error was reported (because the line with "OK" on the same iteration has the same residue). Is this a pattern -- do you see the same for the previous errors?
The most likely explanation is still GPU error (either memory-related or processor related). Do you have another similar GPU to try on, for comparison? The roundoff being large is most likely a red herring here. [QUOTE=kriesel;543080]Finally caught a GEC error with -use STATS in effect, after ~ 30 hours, on gpuowl-win v6.11-259, rx550 that was reliable previously. Maybe the fft length's exponent limit is just slightly too high and this exponent is pushing it. See also [URL]https://mersenneforum.org/showpost.php?p=542963&postcount=2094[/URL] and [URL]https://mersenneforum.org/showthread.php?t=25452[/URL] The program's help output says the 5M fft is usable up to 95.71M.[CODE] 2020-04-18 01:13:34 condorella/rx550 Roundoff: N=40500, [B]max 0.507677[/B], avg 0.205098, sdev 0.012272 (0.059834, 0.061490), max-round 0.401449 2020-04-18 01:13:34 condorella/rx550 Carry: N=40499, max 3ca2ec25, avg 2b5134d2; CarryM: N=1, max 8ee0956e, avg 8ee0956e 2020-04-18 01:13:41 condorella/rx550 94741139 [COLOR=Red][B]EE[/B][/COLOR] 29040000 30.65%; 16547 us/it; ETA 12d 14:00; 94dffebb91e5bac9 (check 6.79s) 9 errors 2020-04-18 01:13:48 condorella/rx550 94741139 OK 29000000 loaded: blockSize 400, 4abacbf0300b8d02 2020-04-18 01:24:50 condorella/rx550 Roundoff: N=40928, max 0.308405, avg 0.205040, sdev 0.012282 (0.059900, 0.061560), max-round 0.401552 2020-04-18 01:24:50 condorella/rx550 Carry: N=40926, max 3ca2ec25, avg 2b4e7a6d; CarryM: N=2, max 879ffbad, avg 71a4f7ac 2020-04-18 01:24:57 condorella/rx550 94741139 OK 29040000 30.65%; 16552 us/it; ETA 12d 14:05; 94dffebb91e5bac9 (check 6.81s) 10 errors [/CODE][/QUOTE] |
[QUOTE=ewmayer;543087]Quibble - reported roundoff error should never be > 0.5, as the fractional part is computed as abs(x - rnd(x)).[/QUOTE]
More precisely, given reverse-weight "w" and FFT-output word "x", the error is computed as: abs(FMA(x, w, -rint(x * w))); which, arguably, can be larger that 0.5. |
[QUOTE=preda;543102]I see in this case that the residue was correct when the error was reported (because the line with "OK" on the same iteration has the same residue). Is this a pattern -- do you see the same for the previous errors?
The most likely explanation is still GPU error (either memory-related or processor related). Do you have another similar GPU to try on, for comparison? The roundoff being large is most likely a red herring here.[/QUOTE] I now have -use STATS on 3 gpus, for the time being, and will report anything that seems interesting. The performance drag is considerable. That 30 hours to catch one EE with stats could have been run in 25 hours without stats. I may update an rx480 instance and add it to the test; that's on the same dual-hex-core system so might be same cpu core but more likely not unless I start setting core affinities. More likely the same complement of system ram though. [CODE]Preda asks, Were the earlier occurrences' res64s correct despite the EEs? yes 6, unknown 2, no 0; the ayes have it. 1,2, can't tell from logs v6.11-257 2020-04-14 04:59:31 condorella/rx550 94741139 OK 6800000 7.18%; 13712 us/it; ETA 13d 22:58; cab0b7a0fb0cc066 (check 5.65s) 2020-04-14 05:45:17 condorella/rx550 94741139 EE 7000000 7.39%; 13710 us/it; ETA 13d 22:09; 5e731e02beb738ea (check 5.61s) 2020-04-14 05:45:23 condorella/rx550 94741139 OK 6800000 loaded: blockSize 400, cab0b7a0fb0cc066 2020-04-14 05:54:37 condorella/rx550 94741139 OK 6840000 7.22%; 13711 us/it; ETA 13d 22:46; 4f7b98cea0650fb9 (check 5.63s) 1 errors 2020-04-14 06:22:07 condorella/rx550 94741139 OK 6960000 7.35%; 13714 us/it; ETA 13d 22:24; a47542d527e8a188 (check 5.63s) 1 errors 2020-04-14 06:49:37 condorella/rx550 94741139 EE 7080000 7.47%; 13711 us/it; ETA 13d 21:53; b71198a3d710f35b (check 5.62s) 1 errors 2020-04-14 06:49:43 condorella/rx550 94741139 OK 6960000 loaded: blockSize 400, a47542d527e8a188 2020-04-14 07:09:07 condorella/rx550 94741139 OK 7040000 7.43%; 13716 us/it; ETA 13d 22:08; b7ef942604ff7e9d (check 5.62s) 2 errors 2020-04-14 07:27:29 condorella/rx550 94741139 OK 7120000 7.52%; 13710 us/it; ETA 13d 21:42; 5c248da8f1d53306 (check 5.64s) 2 errors 2020-04-14 07:45:51 condorella/rx550 94741139 OK 7200000 7.60%; 13713 us/it; ETA 13d 21:27; 679505ffa3183075 (check 5.63s) 2 errors 3,4 yes 2020-04-14 10:31:16 condorella/rx550 94741139 OK 7920000 8.36%; 13706 us/it; ETA 13d 18:32; 65631a9de20e1074 (check 5.80s) 2 errors 2020-04-14 10:49:38 condorella/rx550 94741139 EE 8000000 8.44%; 13714 us/it; ETA 13d 18:27; d5fca8bd937ae862 (check 5.62s) 2 errors 2020-04-14 10:49:44 condorella/rx550 94741139 OK 7920000 loaded: blockSize 400, 65631a9de20e1074 2020-04-14 10:58:58 condorella/rx550 94741139 EE 7960000 8.40%; 13708 us/it; ETA 13d 18:27; 5512a7950572a594 (check 5.61s) 3 errors 2020-04-14 10:59:04 condorella/rx550 94741139 OK 7920000 loaded: blockSize 400, 65631a9de20e1074 2020-04-14 11:08:18 condorella/rx550 94741139 OK 7960000 8.40%; 13716 us/it; ETA 13d 18:38; 5512a7950572a594 (check 5.62s) 4 errors 2020-04-14 11:17:32 condorella/rx550 94741139 OK 8000000 8.44%; 13712 us/it; ETA 13d 18:24; d5fca8bd937ae862 (check 5.73s) 4 errors 2020-04-14 11:26:45 condorella/rx550 94741139 OK 8040000 8.49%; 13706 us/it; ETA 13d 18:06; 8443b148d25a6ac2 (check 5.63s) 4 errors 5 yes 2020-04-14 15:17:31 condorella/rx550 94741139 OK 9040000 9.54%; 13708 us/it; ETA 13d 14:20; 1d49e02d84ebc14b (check 5.80s) 4 errors 2020-04-14 15:26:44 condorella/rx550 94741139 EE 9080000 9.58%; 13709 us/it; ETA 13d 14:13; 071b118bfd9c270e (check 5.61s) 4 errors 2020-04-14 15:26:50 condorella/rx550 94741139 OK 9040000 loaded: blockSize 400, 1d49e02d84ebc14b 2020-04-14 15:36:04 condorella/rx550 94741139 OK 9080000 9.58%; 13714 us/it; ETA 13d 14:19; 071b118bfd9c270e (check 5.62s) 5 errors 2020-04-14 15:45:18 condorella/rx550 94741139 OK 9120000 9.63%; 13708 us/it; ETA 13d 14:02; 6a7e1626bbbcb964 (check 5.63s) 5 errors 6 yes 2020-04-14 20:49:58 condorella/rx550 94741139 OK 10440000 11.02%; 13712 us/it; ETA 13d 09:06; d2720a83125e79ee (check 5.62s) 5 errors 2020-04-14 20:59:11 condorella/rx550 94741139 EE 10480000 11.06%; 13717 us/it; ETA 13d 09:03; 947499a4cc1fd4fe (check 5.61s) 5 errors 2020-04-14 20:59:18 condorella/rx550 94741139 OK 10440000 loaded: blockSize 400, d2720a83125e79ee 2020-04-14 21:08:32 condorella/rx550 94741139 OK 10480000 11.06%; 13722 us/it; ETA 13d 09:10; 947499a4cc1fd4fe (check 5.63s) 6 errors 7,8 yes 2020-04-15 00:31:38 condorella/rx550 94741139 OK 11360000 11.99%; 13710 us/it; ETA 13d 05:33; 98dbac1057665909 (check 5.63s) 6 errors 2020-04-15 00:40:52 condorella/rx550 94741139 EE 11400000 12.03%; 13717 us/it; ETA 13d 05:33; a0464605ba0b9bf6 (check 5.61s) 6 errors 2020-04-15 00:40:58 condorella/rx550 94741139 OK 11360000 loaded: blockSize 400, 98dbac1057665909 2020-04-15 00:50:12 condorella/rx550 94741139 OK 11400000 12.03%; 13715 us/it; ETA 13d 05:30; a0464605ba0b9bf6 (check 5.63s) 7 errors 2020-04-15 00:59:25 condorella/rx550 94741139 EE 11440000 12.07%; 13705 us/it; ETA 13d 05:07; 59fd18a546ca4936 (check 5.61s) 7 errors 2020-04-15 00:59:31 condorella/rx550 94741139 OK 11400000 loaded: blockSize 400, a0464605ba0b9bf6 2020-04-15 01:08:45 condorella/rx550 94741139 OK 11440000 12.07%; 13712 us/it; ETA 13d 05:17; 59fd18a546ca4936 (check 5.72s) 8 errors [/CODE] |
[QUOTE=kriesel;543107]I now have -use STATS on 3 gpus, for the time being, and will report anything that seems interesting. The performance drag is considerable. That 30 hours to catch one EE with stats could have been run in 25 hours without stats. I may update an rx480 instance and add it to the test; that's on the same dual-hex-core system so might be same cpu core but more likely not unless I start setting core affinities. More likely the same complement of system ram though.
[CODE]Preda asks, Were the earlier occurrences' res64s correct despite the EEs? yes 6, unknown 2, no 0; the ayes have it. [/CODE] [/QUOTE] Interesting. Are you running with any -use options, in particular CARRYM32 ? I'm trying to understand what would produce the "spurious errors" you see. The main computation is correct, the errors are affecting the check only (or something related to the check). I don't see any particular benefit in running this with STATS. There is no danger of a genuine overflow for those exponents. |
[QUOTE=preda;543113]Interesting. Are you running with any -use options, in particular CARRYM32 ?[/QUOTE]NO_ASM and sometimes STATS. That's it.
I gave up for now on trying to keep up on the rapidly shifting availability of optimization choices. They also made operation fragile, since they sometimes would be illegal and terminate the program when fft length changed from one worktodo exponent to the next. What was optimal for one fft length was not allowed on another and the program would terminate, costing hours or days. [QUOTE=kriesel;542963]All on the same RX550 gpu and host system, that has reliably run without GEC errors for multiple 5M PRP first-tests on v6.11-134: 90710093 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.30 bits/word 92858651 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.71 bits/word 93461911 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.83 bits/word 93873049 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 17.90 bits/word 94418047 began FFT 5120K: Width 256x4, Height 64x4, Middle 10; 18.01 bits/word, ran ok. finished at 6M because of problems encountered at 7M on 131.5M, requiring -fft +6 All runs [B]-use NO_ASM[/B] V6.11-134 on rx550 from start on 94741139; no GEC to 2.8M iterations; this was at 6M fft, 18370 us/iter due to problems seen with 7M fft (leftover -fft +6 config.txt content) V6.11-257 continuation on same rx550, 5M fft 1K:5:512; 14310 us/iter to 6.0544M iterations V6.11-257 continuation on same RX550, no fft specification; 1K:10:256 chosen by program; ~13712 us/iter, 9 GEC errors by 22.1M iterations. V6.11-259 continuation with -use STATS on same RX550, 1K:10:256, ~16542 us/iter, no additional GEC through 25.64M iterations. V6.11-259 continuation without -use STATS underway now, 13750 usec/iter[/QUOTE] V6.11-257 folder's entire config.txt:[CODE]-device 1 -user kriesel -cpu condorella/rx550 -yield -maxAlloc 3600 -use NO_ASM [/CODE]V6.11-259 folder's entire config.txt:[CODE]-device 1 -user kriesel -cpu condorella/rx550 -yield -maxAlloc 3600 -use NO_ASM,STATS [/CODE]A look through the Windows system log showed nothing relevant around the times of the EE occurrences. |
[QUOTE=preda;543113]I don't see any particular benefit in running this with STATS. There is no danger of a genuine overflow for those exponents.[/QUOTE]I've continued STATS a while. If nothing else, it provides confirmation of the margins built in against roundoff error.
The RX480 instance has been updated to v6.11-264 and has not shown any issues on the same system. I own multiple RX550s, and this may be one of the older ones. EEs #11 and 12 also matched their respective ok res64s:[CODE]2020-04-19 08:41:38 condorella/rx550 94741139 OK 35760000 37.74%; 16554 us/it; ETA 11d 07:13; f181b958d4edbb40 (check 6.79s) 10 errors 2020-04-19 08:52:40 condorella/rx550 Roundoff: N=40500, max 0.507109, avg 0.205147, sdev 0.012284 (0.059880, 0.061539), max-round 0.401694 2020-04-19 08:52:40 condorella/rx550 Carry: N=40499, max 39fed29e, avg 2b55373f; CarryM: N=1, max 8d57bc50, avg 8d57bc50 2020-04-19 08:52:47 condorella/rx550 94741139 EE 35800000 37.79%; 16553 us/it; ETA 11d 07:01; 09e2502ff060aa59 (check 6.80s) 10 errors 2020-04-19 08:52:54 condorella/rx550 94741139 OK 35760000 loaded: blockSize 400, f181b958d4edbb40 2020-04-19 09:03:56 condorella/rx550 Roundoff: N=40928, max 0.292142, avg 0.205081, sdev 0.012278 (0.059870, 0.061527), max-round 0.401532 2020-04-19 09:03:56 condorella/rx550 Carry: N=40926, max 39fed29e, avg 2b52fd08; CarryM: N=2, max 7f8fee13, avg 6ae2001d 2020-04-19 09:04:03 condorella/rx550 94741139 OK 35800000 37.79%; 16549 us/it; ETA 11d 06:57; 09e2502ff060aa59 (check 6.79s) 11 errors 2020-04-19 09:15:05 condorella/rx550 Roundoff: N=40500, max 0.308341, avg 0.205111, sdev 0.012159 (0.059278, 0.060903), max-round 0.399648 2020-04-19 09:15:05 condorella/rx550 Carry: N=40499, max 3ff202a2, avg 2b532f47; CarryM: N=1, max 79fdf537, avg 79fdf537 2020-04-19 09:15:11 condorella/rx550 94741139 OK 35840000 37.83%; 16551 us/it; ETA 11d 06:47; 72081dbe2707fc8b (check 6.80s) 11 errors ... 2020-04-19 12:02:25 condorella/rx550 94741139 OK 36440000 38.46%; 16554 us/it; ETA 11d 04:05; 371ec60215fd8888 (check 6.83s) 11 errors 2020-04-19 12:13:28 condorella/rx550 Roundoff: N=40500, max 0.317265, avg 0.205178, sdev 0.012361 (0.060245, 0.061923), max-round 0.402951 2020-04-19 12:13:28 condorella/rx550 Carry: N=40499, max 39e4bbc8, avg 2b541ee3; CarryM: N=1, max 88c9f1c4, avg 88c9f1c4 2020-04-19 12:13:34 condorella/rx550 94741139 OK 36480000 38.50%; 16552 us/it; ETA 11d 03:53; cf6fab81c6c3e53d (check 6.79s) 11 errors 2020-04-19 12:24:37 condorella/rx550 Roundoff: N=40500, max 0.507318, avg 0.205169, sdev 0.012261 (0.059762, 0.061414), max-round 0.401350 2020-04-19 12:24:37 condorella/rx550 Carry: N=40499, max 4031f5fd, avg 2b513507; CarryM: N=1, max 7ca22e72, avg 7ca22e72 2020-04-19 12:24:43 condorella/rx550 94741139 EE 36520000 38.55%; 16557 us/it; ETA 11d 03:46; a73fe45b46915805 (check 6.78s) 11 errors 2020-04-19 12:24:51 condorella/rx550 94741139 OK 36480000 loaded: blockSize 400, cf6fab81c6c3e53d 2020-04-19 12:35:56 condorella/rx550 Roundoff: N=40928, max 0.289129, avg 0.205117, sdev 0.012271 (0.059823, 0.061479), max-round 0.401450 2020-04-19 12:35:56 condorella/rx550 Carry: N=40926, max 4031f5fd, avg 2b4ebaf7; CarryM: N=2, max 9fd2a241, avg 7a4f0e2d 2020-04-19 12:36:03 condorella/rx550 94741139 OK 36520000 38.55%; 16643 us/it; ETA 11d 05:10; a73fe45b46915805 (check 6.79s) 12 errors[/CODE]I've just migrated this run in progress again, to v6.11-268 on the same gpu, same config.txt. |
Gpuowl-win v6.11-268-g0d07d21 build
2 Attachment(s)
Under test now.
|
[QUOTE=preda;543106]More precisely, given reverse-weight "w" and FFT-output word "x", the error is computed as:
abs(FMA(x, w, -rint(x * w))); which, arguably, can be larger that 0.5.[/QUOTE] If we agree that x is a given convolution output, which is expected to be an integer using exact arithmetic, and the fractional part of x as computed using inexact arithmetic is the absolute value of the difference between x and nearest_int(x), that is by definition in [0,0.5]. If your actual way of computing said fractional error itself introduces addition error so that your frac(x) != abs(x - nearest_int(x)), that is a separate issue. But appreciate the clarification. Since gpuOwl is not using stats about such fractional errors to decide if the FFT length needs to be upped, accurately computing frac(x) is less important than for programs which do make use of same. |
[QUOTE=ewmayer;543200]Since gpuOwl is not using stats about such fractional errors to decide if the FFT length needs to be upped, accurately computing frac(x) is less important than for programs which do make use of same.[/QUOTE]
Well, we are using the stats to decide when the FFT length needs to be upped. However, the current code is just as likely to compute the fractional part low vs. high and were using the average max roundoff error so it all works in out in the end. Over the last few weeks we've managed to increase the maximum exponent that can be tested with a 5M FFT by over a million. I had to do this because I'm oh so close to being assigned exponents that would have pushed me into the 5.5M FFT. I know, very selfish :) |
[QUOTE=kriesel;543176]I've just migrated this run in progress again, to v6.11-268 on the same gpu, same config.txt.[/QUOTE]This was still V6.11-259, same rx550 and system:
For EEs # 13, 14, 15, res64s also repeated, and max > 0.50 were observed.[CODE]2020-04-19 18:35:46 condorella/rx550 94741139 OK 37800000 39.90%; 16558 us/it; ETA 10d 21:54; c6ff48ecd994dbaf (check 6.81s) 12 errors 2020-04-19 18:46:49 condorella/rx550 Roundoff: N=40500, [B]max 0.507793[/B], avg 0.205110, sdev 0.012265 (0.059799, 0.061453), max-round 0.401355 2020-04-19 18:46:49 condorella/rx550 Carry: N=40499, max 3938aaef, avg 2b4f1494; CarryM: N=1, max 8da124ce, avg 8da124ce 2020-04-19 18:46:55 condorella/rx550 94741139 [B]EE[/B] 37840000 39.94%; 16556 us/it; ETA 10d 21:41; 55bb210f90f33b08 (check 6.77s) 12 errors 2020-04-19 18:47:03 condorella/rx550 94741139 OK 37800000 loaded: blockSize 400, c6ff48ecd994dbaf 2020-04-19 18:58:05 condorella/rx550 Roundoff: N=40928, max 0.285702, avg 0.205056, sdev 0.012282 (0.059894, 0.061553), max-round 0.401561 2020-04-19 18:58:05 condorella/rx550 Carry: N=40926, max 3938aaef, avg 2b4d3aac; CarryM: N=2, max 8184df04, avg 6c12c45b 2020-04-19 18:58:12 condorella/rx550 94741139 OK 37840000 39.94%; 16561 us/it; ETA 10d 21:45; 55bb210f90f33b08 (check 6.80s) 13 errors 2020-04-19 19:09:14 condorella/rx550 Roundoff: N=40500, max 0.280020, avg 0.205015, sdev 0.012088 (0.058962, 0.060569), max-round 0.398423 2020-04-19 19:09:14 condorella/rx550 Carry: N=40499, max 3a6f97e3, avg 2b4b4bdd; CarryM: N=1, max 80be9c3a, avg 80be9c3a ... 2020-04-19 21:23:08 condorella/rx550 94741139 OK 38360000 40.49%; 16551 us/it; ETA 10d 19:12; 7a4b392aea8ba6b3 (check 6.79s) 13 errors 2020-04-19 21:34:10 condorella/rx550 Roundoff: N=40500, [B]max 0.505375[/B], avg 0.205216, sdev 0.012323 (0.060051, 0.061719), max-round 0.402390 2020-04-19 21:34:10 condorella/rx550 Carry: N=40499, max 39289dae, avg 2b50054a; CarryM: N=1, max 89e4790c, avg 89e4790c 2020-04-19 21:34:17 condorella/rx550 94741139 [B]EE[/B] 38400000 40.53%; 16552 us/it; ETA 10d 19:03; 72bd6fa0e937b804 (check 6.77s) 13 errors 2020-04-19 21:34:24 condorella/rx550 94741139 OK 38360000 loaded: blockSize 400, 7a4b392aea8ba6b3 2020-04-19 21:45:26 condorella/rx550 Roundoff: N=40928, max 0.308075, avg 0.205175, sdev 0.012357 (0.060226, 0.061904), max-round 0.402885 2020-04-19 21:45:26 condorella/rx550 Carry: N=40926, max 39289dae, avg 2b4e41e9; CarryM: N=2, max 87679ef6, avg 75444180 2020-04-19 21:45:33 condorella/rx550 94741139 OK 38400000 40.53%; 16549 us/it; ETA 10d 19:00; 72bd6fa0e937b804 (check 6.84s) 14 errors 2020-04-19 21:56:35 condorella/rx550 Roundoff: N=40500, max 0.297735, avg 0.205119, sdev 0.012226 (0.059606, 0.061249), max-round 0.400741 2020-04-19 21:56:35 condorella/rx550 Carry: N=40499, max 3b844536, avg 2b5043a2; CarryM: N=1, max 8422e711, avg 8422e711 ... 2020-04-20 01:50:43 condorella/rx550 94741139 OK 39280000 41.46%; 16541 us/it; ETA 10d 14:49; 656629c4657b02f0 (check 6.84s) 14 errors 2020-04-20 02:01:44 condorella/rx550 Roundoff: N=40500, [B]max 0.503906[/B], avg 0.205072, sdev 0.012220 (0.059587, 0.061229), max-round 0.400584 2020-04-20 02:01:44 condorella/rx550 Carry: N=40499, max 3d252f79, avg 2b52cd29; CarryM: N=1, max 7af459d9, avg 7af459d9 2020-04-20 02:01:51 condorella/rx550 94741139 [B]EE [/B]39320000 41.50%; 16542 us/it; ETA 10d 14:40; 4b750a9575434d29 (check 6.77s) 14 errors 2020-04-20 02:01:58 condorella/rx550 94741139 OK 39280000 loaded: blockSize 400, 656629c4657b02f0 2020-04-20 02:13:00 condorella/rx550 Roundoff: N=40928, max 0.302707, avg 0.205027, sdev 0.012249 (0.059741, 0.061392), max-round 0.401003 2020-04-20 02:13:00 condorella/rx550 Carry: N=40926, max 3d252f79, avg 2b513065; CarryM: N=2, max 82276967, avg 704e0507 2020-04-20 02:13:07 condorella/rx550 94741139 OK 39320000 41.50%; 16543 us/it; ETA 10d 14:40; 4b750a9575434d29 (check 6.80s) 15 errors 2020-04-20 02:24:09 condorella/rx550 Roundoff: N=40500, max 0.297502, avg 0.205151, sdev 0.012245 (0.059690, 0.061338), max-round 0.401078 2020-04-20 02:24:09 condorella/rx550 Carry: N=40499, max 3de87af2, avg 2b53fed7; CarryM: N=1, max 7d3fca8a, avg 7d3fca8a[/CODE]I will swap out the RX550 for a different unit after a trial of v6.11-268 if it also produces such EE occurrences. |
1 Attachment(s)
The biggest issue I see now with gpuOwl+LL is the fact that when something like this happens, you need to restart both of them from scratch, even if you would be able to detect when the "thing" happens, if the cards are just a bit "out of phase" (which they ARE, because one is always a bit faster), there is no way to know which one is good and which one is bad, and there is no way to resume from THAT specific iteration. We tried old switches that used to work, like -saveStep or variants, they are not in the help anymore, but we hoped... and hoped...
[ATTACH]22071[/ATTACH] We want checkpoint files, called "exponent.iteration.residue.whatever", otherwise we are doomed to waste two R7s for few hours in average every time a naughty bit humps here and there... And non-zero shift... And vanilla ice cream... |
[QUOTE=LaurV;543250]The biggest issue I see now with gpuOwl+LL is the fact that when something like this happens, you need to restart both of them from scratch[/QUOTE]You don't do backups?
You could try a tiebreaker third gpu. And/or one running CUDALucas on a suitable gpu. From the CUDALucas.ini file, [CODE]# SaveAllCheckpoints is the same as the -s option. When active, CUDALucas will # save each checkpoint separately in the folder specified in the "SaveFolder" # option above. This is a binary option; set to 1 to activate, 0 to de-activate. SaveAllCheckpoints=1 # This option is the name of the folder where the separate checkpoint files are # saved. This option is only checked if SaveAllCheckpoints is activated. SaveFolder=savefiles[/CODE] [URL]https://www.mersenneforum.org/showpost.php?p=489059&postcount=2[/URL] |
As usual, unrelated. Lots of clutter.
[QUOTE=LaurV;543250]gpuOwl+LL[/QUOTE] |
[QUOTE=Prime95;543226]Over the last few weeks we've managed to increase the maximum exponent that can be tested with a 5M FFT by over a million.
I had to do this because I'm oh so close to being assigned exponents that would have pushed me into the 5.5M FFT. I know, very selfish :)[/QUOTE] Nice! So what are the default maxp limits for 5 and 5.5M in the latest commit? And how conservative are those, in your estimation? [p.s.: It's only selfish if you have said improvements in place in your local dev-branch, and refuse to share. :] |
LaurV:
You have choices. We all do. Shown before, 3 work arounds: a) backups. User picks how often. Re-start runs from the point in time of last backup with matched res64s. b) tie-breaker 3rd run. If two or 3 match, great; if none match, some erred. c) CUDALucas as a run. It has save files each n steps, but requires NVIDIA. It has long been the standard for LL on gpu. It can be rerun from the last save file before the res64 mismatch. A block of text how CUDALucas does it was mostly meant for Preda, whose time as a great coder is precious. (That fits for a few more people in GIMPS too.) I don't know if Preda has run CUDALucas. I know you have. Others who read this thread may not have. The choice for the gpuowl user to set save step would be good. And: d) code the change you want, and give it to Preda, as George, SELROC, chengsun, kracker etc have done for gpuowl, and others have done for other GIMPS software. e) do single tests and wait for others to double check them, like most users do, with other software and shift. f) wait until the feature set you want appears ALL on topic, as was [URL]https://www.mersenneforum.org/showpost.php?p=543260&postcount=2111[/URL] |
[QUOTE=ewmayer;543305]Nice! So what are the default maxp limits for 5 and 5.5M in the latest commit? And how conservative are those, in your estimation?[/QUOTE]
From gpuowl -h [CODE]FFT 5M [ 7.86M - 97.42M] 1K:10:256 1K:5:512 256:10:1K 512:10:512 512:5:1K FFT 5.50M [ 8.65M - 106.63M] 1K:11:256 256:11:1K 512:11:512 FFT 6M [ 9.44M - 115.86M] 1K:12:256 1K:6:512 1K:3:1K 256:12:1K 512:12:512 512:6:1K 4K:3:256 [/CODE] I'd say the limits are aggressive. |
[QUOTE=Prime95;543316]From gpuowl -h
[CODE]FFT 5M [ 7.86M - 97.42M] 1K:10:256 1K:5:512 256:10:1K 512:10:512 512:5:1K FFT 5.50M [ 8.65M - 106.63M] 1K:11:256 256:11:1K 512:11:512 FFT 6M [ 9.44M - 115.86M] 1K:12:256 1K:6:512 1K:3:1K 256:12:1K 512:12:512 512:6:1K 4K:3:256 [/CODE] I'd say the limits are aggressive.[/QUOTE] Indeed - from the version I'm currently on, v6.11-238-g62a3025-dirty: [code]FFT 5M [ 7.86M - 95.71M] 1K-256-10 256-1K-10 512-512-10 FFT 5632K [ 8.65M - 105.06M] 1K-256-11 256-1K-11 512-512-11 FFT 6M [ 9.44M - 114.40M] 1K-256-12 1K-512-6 256-1K-12 256-2K-6 512-512-12 512-1K-6 2K-256-6[/code] Just updated to current... wait. there's an issue related to a small change I made in my local primenet.py, which is to re-add a couple lines (i.e. to match the way the Mlucas primenet.py does things) so that '-t 0' means 'run py-script just once and quit'. Renamed my custom version, now we're good. |
[QUOTE=kriesel;543240]I will swap out the RX550 for a different unit after a trial of v6.11-268 if it also produces such EE occurrences.[/QUOTE]Time to swap it out. V6.11-268 had EE #16.
[CODE]2020-04-20 13:25:12 condorella/rx550 94741139 OK 41850000 44.17%; 14679 us/it; ETA 8d 23:40; 24439ce356cbcd12 (check 6.02s) 15 errors 2020-04-20 13:37:26 condorella/rx550 Roundoff: N=50525, mean 0.202943, SD 0.012035, CV 0.059305, [B]max 0.507728[/B], pErr 0.000001 2020-04-20 13:37:26 condorella/rx550 Carry: N=50524, max 3ba0c0a4, avg 2b56dd02; CarryM: N=1, max 7ac075bf, avg 7ac075bf 2020-04-20 13:37:32 condorella/rx550 94741139 [B]EE[/B] 41900000 44.23%; 14680 us/it; ETA 8d 23:29; [B]6dead1fc3993bd7b[/B] (check 6.01s) 15 errors 2020-04-20 13:37:39 condorella/rx550 94741139 OK 41850000 loaded: blockSize 400, 24439ce356cbcd12 2020-04-20 13:49:52 condorella/rx550 Roundoff: N=50953, mean 0.202905, SD 0.012028, CV 0.059281, max 0.299187, pErr 0.000001 2020-04-20 13:49:52 condorella/rx550 Carry: N=50951, max 3ba0c0a4, avg 2b54ff6f; CarryM: N=2, max 825305ba, avg 6b44b674 2020-04-20 13:49:58 condorella/rx550 94741139 [B]OK[/B] 41900000 44.23%; 14670 us/it; ETA 8d 23:20; [B]6dead1fc3993bd7b[/B] (check 6.03s) 16 errors 2020-04-20 14:02:12 condorella/rx550 Roundoff: N=50525, mean 0.203002, SD 0.012050, CV 0.059358, max 0.305012, pErr 0.000001 2020-04-20 14:02:12 condorella/rx550 Carry: N=50524, max 3b45831d, avg 2b4ff588; CarryM: N=1, max 814cae6d, avg 814cae6d[/CODE] |
[QUOTE=ewmayer;543318]Indeed - from the version I'm currently on, v6.11-238-g62a3025-dirty:
[code]FFT 5M [ 7.86M - 95.71M] 1K-256-10 256-1K-10 512-512-10 FFT 5632K [ 8.65M - 105.06M] 1K-256-11 256-1K-11 512-512-11 FFT 6M [ 9.44M - 114.40M] 1K-256-12 1K-512-6 256-1K-12 256-2K-6 512-512-12 512-1K-6 2K-256-6[/code] Just updated to current... wait. there's an issue related to a small change I made in my local primenet.py, which is to re-add a couple lines (i.e. to match the way the Mlucas primenet.py does things) so that '-t 0' means 'run py-script just once and quit'. Renamed my custom version, now we're good.[/QUOTE] Feel free to submit a pull request with the "-t 0" change. The current upper bound for 5M (97.4M) looks fine to me. |
[QUOTE=kriesel;543322]Time to swap it out. V6.11-268 had EE #16.[/QUOTE]The issue appears at the moment to be a bad memory fan in this HP Z600, resulting in hotter than operating spec for half the system ram. That fan if it died or is spinning too slowly would leave the air in the memory fan duct pretty stagnant and warm. I don't know why that would create issues in one gpu's gpuowl run but not the prime95 runs saturating the cpus. There were no GEC errors on that system's prime95's GUI display, or in its log files, going back months. Nor has it affected that system's RX480 gpuowl runs.
Symptoms: "514 Memory fan not detected" message from BIOS on startup, which re-seating did not cure. HWMonitor showed system ram, bank of 3 nearer the closer Xeon 90C+, other bank in the 70s. Other Z600s in the same large room running similar workloads on cpus and NVIDIA gpus had memory temps in the 50s. Experimenting with the prime95 instance, turning off half the workers to reduce power at the nearer Xeon, lowered the hotter ram into the 70s. Replacement fan on the way. |
[QUOTE=kriesel;543333]The issue appears at the moment to be a bad memory fan ...
Experimenting with the prime95 instance, turning off half the workers to reduce power at the nearer Xeon, lowered the hotter ram into the 70s. Replacement fan on the way.[/QUOTE]Even after dropping cpu heat, and swapping the gpu for another, it's still getting EEs.[CODE]2020-04-20 19:22:40 gpuowl v6.11-268-g0d07d21 2020-04-20 19:22:40 config: -device 1 -user kriesel -cpu condorella/rx550 -yield -maxAlloc 3600 -use NO_ASM 2020-04-20 19:22:40 device 1, unique id '' 2020-04-20 19:22:40 condorella/rx550 94741139 FFT: 5M 1K:10:256 (18.07 bpw) 2020-04-20 19:22:40 condorella/rx550 Expected maximum carry32: 461E0000 2020-04-20 19:22:41 condorella/rx550 OpenCL args "-DEXP=94741139u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xf.3cd1fc0411148p-3 -DIWEIGHT_ST EP=0x8.66790bf53aca8p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DPM1=0 -DAMDGPU=1 -DNO_ASM=1 -cl-fast-relaxed-math -cl-st d=CL2.0 " 2020-04-20 19:22:47 condorella/rx550 OpenCL compilation in 5.58 s 2020-04-20 19:22:53 condorella/rx550 94741139 OK 43154000 loaded: blockSize 400, 850d5d673cf6ad49 2020-04-20 19:23:10 condorella/rx550 94741139 OK 43154800 45.55%; 13701 us/it; ETA 8d 04:20; e0021e93eddece6a (check 5.65s) 16 errors 2020-04-20 19:33:38 condorella/rx550 94741139 OK 43200000 45.60%; 13772 us/it; ETA 8d 05:10; 18847855ef4addd5 (check 5.70s) 16 errors 2020-04-20 19:45:13 condorella/rx550 94741139 OK 43250000 45.65%; 13775 us/it; ETA 8d 05:01; c8b93071fb167821 (check 5.64s) 16 errors 2020-04-20 19:56:47 condorella/rx550 94741139 OK 43300000 45.70%; 13770 us/it; ETA 8d 04:45; e36f93de9f65e252 (check 5.64s) 16 errors 2020-04-20 20:08:21 condorella/rx550 94741139 OK 43350000 45.76%; 13771 us/it; ETA 8d 04:35; db7548eeff7fd82d (check 5.64s) 16 errors 2020-04-20 20:19:55 condorella/rx550 94741139 OK 43400000 45.81%; 13766 us/it; ETA 8d 04:19; d5890f6f7bc3bb62 (check 5.64s) 16 errors 2020-04-20 20:31:29 condorella/rx550 94741139 OK 43450000 45.86%; 13763 us/it; ETA 8d 04:05; a47eafb785a71fa4 (check 5.64s) 16 errors 2020-04-20 20:43:02 condorella/rx550 94741139 EE 43500000 45.91%; 13759 us/it; ETA 8d 03:50; 9c0cad0c6879242b (check 5.73s) 16 errors 2020-04-20 20:43:09 condorella/rx550 94741139 OK 43450000 loaded: blockSize 400, a47eafb785a71fa4 2020-04-20 20:54:42 condorella/rx550 94741139 OK 43500000 45.91%; 13759 us/it; ETA 8d 03:50; 9c0cad0c6879242b (check 5.64s) 17 errors 2020-04-20 21:06:16 condorella/rx550 94741139 OK 43550000 45.97%; 13765 us/it; ETA 8d 03:44; 80f24faafac9b03a (check 5.64s) 17 errors 2020-04-20 21:17:50 condorella/rx550 94741139 OK 43600000 46.02%; 13762 us/it; ETA 8d 03:30; 45d1a03b9cb91819 (check 5.64s) 17 errors 2020-04-20 21:29:24 condorella/rx550 94741139 OK 43650000 46.07%; 13766 us/it; ETA 8d 03:22; fac79b7ec0105d01 (check 5.64s) 17 errors 2020-04-20 21:40:57 condorella/rx550 94741139 OK 43700000 46.13%; 13759 us/it; ETA 8d 03:04; a66ca92be5e6dbb6 (check 5.64s) 17 errors 2020-04-20 21:52:31 condorella/rx550 94741139 OK 43750000 46.18%; 13764 us/it; ETA 8d 02:58; 3740bb97fee487d0 (check 5.64s) 17 errors 2020-04-20 22:04:05 condorella/rx550 94741139 OK 43800000 46.23%; 13764 us/it; ETA 8d 02:46; db25fa854c5db484 (check 5.65s) 17 errors 2020-04-20 22:15:39 condorella/rx550 94741139 OK 43850000 46.28%; 13764 us/it; ETA 8d 02:35; e69e2dbf65d78b2a (check 5.64s) 17 errors 2020-04-20 22:27:13 condorella/rx550 94741139 EE 43900000 46.34%; 13762 us/it; ETA 8d 02:21; 1f68378b7c6fc404 (check 5.63s) 17 errors 2020-04-20 22:27:19 condorella/rx550 94741139 OK 43850000 loaded: blockSize 400, e69e2dbf65d78b2a 2020-04-20 22:38:53 condorella/rx550 94741139 OK 43900000 46.34%; 13761 us/it; ETA 8d 02:20; 1f68378b7c6fc404 (check 5.68s) 18 errors 2020-04-20 22:50:26 condorella/rx550 94741139 OK 43950000 46.39%; 13759 us/it; ETA 8d 02:08; 31bdbf61721379f5 (check 5.68s) 18 errors 2020-04-20 23:02:00 condorella/rx550 94741139 OK 44000000 46.44%; 13762 us/it; ETA 8d 01:58; ab5f29aa5e0616d4 (check 5.64s) 18 errors 2020-04-20 23:13:34 condorella/rx550 94741139 OK 44050000 46.50%; 13764 us/it; ETA 8d 01:49; d15a6b5993812fc4 (check 5.64s) 18 errors 2020-04-20 23:25:08 condorella/rx550 94741139 OK 44100000 46.55%; 13761 us/it; ETA 8d 01:35; 72acbd04b3d43f04 (check 5.64s) 18 errors 2020-04-20 23:36:41 condorella/rx550 94741139 OK 44150000 46.60%; 13761 us/it; ETA 8d 01:23; 2894cbff475de263 (check 5.64s) 18 errors 2020-04-20 23:48:15 condorella/rx550 94741139 OK 44200000 46.65%; 13764 us/it; ETA 8d 01:15; d3091a2a24f15d8b (check 5.64s) 18 errors 2020-04-20 23:59:49 condorella/rx550 94741139 OK 44250000 46.71%; 13761 us/it; ETA 8d 01:00; d35597a77e451f9b (check 5.64s) 18 errors 2020-04-21 00:11:23 condorella/rx550 94741139 OK 44300000 46.76%; 13762 us/it; ETA 8d 00:50; 092708b97dc11cf0 (check 5.64s) 18 errors 2020-04-21 00:22:56 condorella/rx550 94741139 OK 44350000 46.81%; 13757 us/it; ETA 8d 00:34; a55be7644c8914ff (check 5.64s) 18 errors 2020-04-21 00:34:30 condorella/rx550 94741139 OK 44400000 46.86%; 13761 us/it; ETA 8d 00:26; 6c9cb184d9ae9fb9 (check 5.67s) 18 errors 2020-04-21 00:46:03 condorella/rx550 94741139 OK 44450000 46.92%; 13757 us/it; ETA 8d 00:11; 440bf81e51efd1b8 (check 5.64s) 18 errors 2020-04-21 00:57:37 condorella/rx550 94741139 OK 44500000 46.97%; 13760 us/it; ETA 8d 00:02; 4e2721d94c80f9a9 (check 5.67s) 18 errors 2020-04-21 01:09:11 condorella/rx550 94741139 OK 44550000 47.02%; 13758 us/it; ETA 7d 23:49; acc59d938a878840 (check 5.67s) 18 errors 2020-04-21 01:20:44 condorella/rx550 94741139 OK 44600000 47.08%; 13760 us/it; ETA 7d 23:39; e8ae6b2e1342173a (check 5.64s) 18 errors 2020-04-21 01:32:18 condorella/rx550 94741139 OK 44650000 47.13%; 13758 us/it; ETA 7d 23:26; 7738e5de79a41988 (check 5.64s) 18 errors 2020-04-21 01:43:51 condorella/rx550 94741139 OK 44700000 47.18%; 13754 us/it; ETA 7d 23:11; 0325e62041e2ef93 (check 5.66s) 18 errors 2020-04-21 01:55:25 condorella/rx550 94741139 OK 44750000 47.23%; 13757 us/it; ETA 7d 23:03; ac90cc4d821b536d (check 5.67s) 18 errors 2020-04-21 02:06:58 condorella/rx550 94741139 OK 44800000 47.29%; 13758 us/it; ETA 7d 22:52; 96fdda068a85c0ec (check 5.64s) 18 errors[/CODE]Next level is in effect now, stop and close prime95. |
[QUOTE=preda;543327]Feel free to submit a pull request with the "-t 0" change.[/QUOTE]
That's what I did, but without intending to commit my local change - got this error: [code]git pull https://github.com/preda/gpuowl && make remote: Enumerating objects: 119, done. remote: Counting objects: 100% (119/119), done. remote: Compressing objects: 100% (46/46), done. remote: Total 136 (delta 96), reused 89 (delta 73), pack-reused 17 Receiving objects: 100% (136/136), 83.73 KiB | 2.20 MiB/s, done. Resolving deltas: 100% (96/96), completed with 22 local objects. From https://github.com/preda/gpuowl * branch HEAD -> FETCH_HEAD Updating 62a3025..f1fd1f7 error: Your local changes to the following files would be overwritten by merge: tools/primenet.py Please commit your changes or stash them before you merge. Aborting[/code] So this seems a good baby-step introduction to the rev-control setup ... what is the procedure for checking out a file, then testing and submitting a modified version? And what is the code review process you and George have in place? Oh, another Q re. the latest primenet.py - just tried to use it with same flags I'd always used, -w 150 --tasks 10, to queue up new PRPs, but with the latest got [i]primenet.py: error: argument -w: invalid choice: '150' (choose from 'PRP', 'PM1', 'LL_DC', 'PRP_DC', 'PRP_WORLD_RECORD', 'PRP_100M')[/i] That "numeric value no longer works" appears to be due to a change in the choice=list(..) command - did you deliberately mean to disable numeric-server-worktype code support? |
[QUOTE=ewmayer;543392]That's what I did, but without intending to commit my local change - got this error:
[code]git pull https://github.com/preda/gpuowl && make remote: Enumerating objects: 119, done. remote: Counting objects: 100% (119/119), done. remote: Compressing objects: 100% (46/46), done. remote: Total 136 (delta 96), reused 89 (delta 73), pack-reused 17 Receiving objects: 100% (136/136), 83.73 KiB | 2.20 MiB/s, done. Resolving deltas: 100% (96/96), completed with 22 local objects. From https://github.com/preda/gpuowl * branch HEAD -> FETCH_HEAD Updating 62a3025..f1fd1f7 error: Your local changes to the following files would be overwritten by merge: tools/primenet.py Please commit your changes or stash them before you merge. Aborting[/code] So this seems a good baby-step introduction to the rev-control setup ... what is the procedure for checking out a file, then testing and submitting a modified version? [/QUOTE] I wouldn't dare to write a git/github how-to here -- it's too large a subject, and there already are good tutorials out there. But the basic step sequence is: 1. create a github account 2. fork the project to your account (using github interface) 3. "git clone": check out locally *your* clone of the project (because you have write rights on your clone) 4. make local changes 5. "git commit": commit local changes 6. "git push": publish your local commits to your fork 7. using the github interface, create a pull request from your fork to the main project 8. I see the pull request, and I can merge it [QUOTE] And what is the code review process you and George have in place? [/QUOTE] It's extremely light right now: - I commit without any reviews. Sometimes George detects errors I make, and notifies me (so, that's a form of post-commit review :). - George sends me pull requests. I usually verify them before merging (by compiling and running an exponent for a bit). (the goal of my testing is mainly to detect performance differences between our respective setups) [QUOTE] Oh, another Q re. the latest primenet.py - just tried to use it with same flags I'd always used, -w 150 --tasks 10, to queue up new PRPs, but with the latest got [i]primenet.py: error: argument -w: invalid choice: '150' (choose from 'PRP', 'PM1', 'LL_DC', 'PRP_DC', 'PRP_WORLD_RECORD', 'PRP_100M')[/i] That "numeric value no longer works" appears to be due to a change in the choice=list(..) command - did you deliberately mean to disable numeric-server-worktype code support?[/QUOTE] No, disabling the numeric values was unintentional (the goal of the change was to make the help less confusing by not displaying the numeric values there). But, why do you prefer using the numeric value (150) vs. the symbolic name "PRP"? Anyway, I'm fine with adding the numeric ids back if they're useful. |
Ernst do git stash before doing the pull and then do git stash pop If you haven’t pulled any conflicting changes the pop will replay your own changes on the latest from Github
|
gpuowl v6.11-270-gf1fd1f7 Win 7 x64 build
2 Attachment(s)
Untested, except help output so far.
|
[QUOTE=kriesel;543345]Even after dropping cpu heat, and swapping the gpu for another, it's still getting EEs.[/QUOTE]Received and installed the replacement fan assembly, $15 used from ebay; these fan assemblies have an unusual 2x2 fan connector that mates when the whole ducted fan assembly is snapped into place, so it seemed money well spent. I was skeptical about whether the old fan was an issue because it did spin if powered on the bench. But the new assembly did a fine job of bringing ram temps from 100C max down to 65-72C among the 6 DIMMs. That's still a bit warmer than the other Z600s I have, but might be because they're at floor level and this is 4 feet above. Early results of lowering it to the floor 30 minutes ago is minimal difference, at 64-71C DIMM temps.
But in the nearly day of running since the fan swap, it's producing more errors than ever. Maybe the Micron ram was permanently damaged? [URL]https://www.micron.com/products/dram/ddr3-sdram[/URL] shows operating limits as low as 95C. Or maybe there's an issue with the particular PCIe slot. [CODE]2020-04-25 13:13:33 gpuowl v6.11-268-g0d07d21 2020-04-25 13:13:33 config: -device 1 -user kriesel -cpu condorella/rx550 -yield -maxAlloc 3600 -use NO_ASM 2020-04-25 13:13:33 device 1, unique id '' 2020-04-25 13:13:33 condorella/rx550 94741139 FFT: 5M 1K:10:256 (18.07 bpw) 2020-04-25 13:13:33 condorella/rx550 Expected maximum carry32: 461E0000 2020-04-25 13:13:35 condorella/rx550 OpenCL args "-DEXP=94741139u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DWEIGHT_STEP=0xf.3cd1fc0411148p-3 -DIWEIGHT_ST EP=0x8.66790bf53aca8p-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DPM1=0 -DAMDGPU=1 -DNO_ASM=1 -cl-fast-relaxed-math -cl-st d=CL2.0 " 2020-04-25 13:13:40 condorella/rx550 OpenCL compilation in 5.54 s 2020-04-25 13:13:47 condorella/rx550 94741139 OK 72010000 loaded: blockSize 400, 69fc8cbdf6ee352e 2020-04-25 13:14:03 condorella/rx550 94741139 OK 72010800 76.01%; 13722 us/it; ETA 3d 14:38; 93b608104f71f185 (check 5.65s) 27 errors ... 2020-04-25 18:01:26 condorella/rx550 94741139 OK 73250000 77.32%; 13787 us/it; ETA 3d 10:18; 67324677e938628d (check 5.65s) 27 errors 2020-04-25 18:13:01 condorella/rx550 94741139 EE 73300000 77.37%; 13785 us/it; ETA 3d 10:06; 7da27d1bd2ca79bd (check 5.64s) 27 errors 2020-04-25 18:13:08 condorella/rx550 94741139 OK 73250000 loaded: blockSize 400, 67324677e938628d 2020-04-25 18:24:42 condorella/rx550 94741139 OK 73300000 77.37%; 13784 us/it; ETA 3d 10:06; 7da27d1bd2ca79bd (check 5.65s) 28 errors 2020-04-25 18:36:17 condorella/rx550 94741139 OK 73350000 77.42%; 13783 us/it; ETA 3d 09:54; 1cc91ad65d4d6fb0 (check 5.66s) 28 errors ... 2020-04-26 02:08:13 condorella/rx550 94741139 OK 75300000 79.48%; 13787 us/it; ETA 3d 02:27; 814796c75126ea7f (check 5.66s) 28 errors 2020-04-26 02:19:48 condorella/rx550 94741139 EE 75350000 79.53%; 13783 us/it; ETA 3d 02:14; 5f754504bd9d7e7e (check 5.67s) 28 errors 2020-04-26 02:19:54 condorella/rx550 94741139 OK 75300000 loaded: blockSize 400, 814796c75126ea7f 2020-04-26 02:31:29 condorella/rx550 94741139 OK 75350000 79.53%; 13786 us/it; ETA 3d 02:15; 5f754504bd9d7e7e (check 5.65s) 29 errors 2020-04-26 02:43:04 condorella/rx550 94741139 OK 75400000 79.59%; 13783 us/it; ETA 3d 02:03; 2eb2c8172e41590a (check 5.66s) 29 errors ... 2020-04-26 05:48:23 condorella/rx550 94741139 OK 76200000 80.43%; 13782 us/it; ETA 2d 22:59; 1398624a7e37f481 (check 5.65s) 29 errors 2020-04-26 05:59:58 condorella/rx550 94741139 EE 76250000 80.48%; 13780 us/it; ETA 2d 22:47; acfe1cce4b98f205 (check 5.64s) 29 errors 2020-04-26 06:00:04 condorella/rx550 94741139 OK 76200000 loaded: blockSize 400, 1398624a7e37f481 2020-04-26 06:11:39 condorella/rx550 94741139 OK 76250000 80.48%; 13780 us/it; ETA 2d 22:47; acfe1cce4b98f205 (check 5.65s) 30 errors 2020-04-26 06:23:14 condorella/rx550 94741139 OK 76300000 80.54%; 13779 us/it; ETA 2d 22:35; 886dbb4e437b2eb6 (check 5.67s) 30 errors ... 2020-04-26 07:32:46 condorella/rx550 94741139 OK 76600000 80.85%; 13772 us/it; ETA 2d 21:24; 14aea5c6cb66203e (check 5.65s) 30 errors 2020-04-26 07:44:23 condorella/rx550 94741139 EE 76650000 80.90%; 13820 us/it; ETA 2d 21:27; 3d54908aab697d76 (check 5.66s) 30 errors 2020-04-26 07:44:29 condorella/rx550 94741139 OK 76600000 loaded: blockSize 400, 14aea5c6cb66203e 2020-04-26 07:56:04 condorella/rx550 94741139 EE 76650000 80.90%; 13784 us/it; ETA 2d 21:16; 3d54908aab697d76 (check 5.64s) 31 errors 2020-04-26 07:56:11 condorella/rx550 94741139 OK 76600000 loaded: blockSize 400, 14aea5c6cb66203e 2020-04-26 08:07:46 condorella/rx550 94741139 OK 76650000 80.90%; 13787 us/it; ETA 2d 21:17; 3d54908aab697d76 (check 5.83s) 32 errors ... 2020-04-26 12:22:30 condorella/rx550 94741139 OK 77750000 82.07%; 13774 us/it; ETA 2d 17:01; 16108dac33118d12 (check 5.92s) 32 errors 2020-04-26 12:34:04 condorella/rx550 94741139 OK 77800000 82.12%; 13778 us/it; ETA 2d 16:50; 47d1f28515271fba (check 5.93s) 32 errors [/CODE]I'll probably try memtest86+ or gpu-slot-swap or both next. Other suggestions? |
Do you have another GPU of the same model that does not exhibit such errors? otherwise I'd suspect something amiss software-side (i.e. gpuowl, and the related OpenCL compilation).
Anyway on ROCm / Radeon VII I don't see this pattern. [QUOTE=kriesel;543880] [CODE] 2020-04-26 07:32:46 condorella/rx550 94741139 OK 76600000 80.85%; 13772 us/it; ETA 2d 21:24; 14aea5c6cb66203e (check 5.65s) 30 errors 2020-04-26 07:44:23 condorella/rx550 94741139 EE 76650000 80.90%; 13820 us/it; ETA 2d 21:27; 3d54908aab697d76 (check 5.66s) 30 errors 2020-04-26 07:44:29 condorella/rx550 94741139 OK 76600000 loaded: blockSize 400, 14aea5c6cb66203e 2020-04-26 07:56:04 condorella/rx550 94741139 EE 76650000 80.90%; 13784 us/it; ETA 2d 21:16; 3d54908aab697d76 (check 5.64s) 31 errors 2020-04-26 07:56:11 condorella/rx550 94741139 OK 76600000 loaded: blockSize 400, 14aea5c6cb66203e 2020-04-26 08:07:46 condorella/rx550 94741139 OK 76650000 80.90%; 13787 us/it; ETA 2d 21:17; 3d54908aab697d76 (check 5.83s) 32 errors [/CODE][/QUOTE] |
[QUOTE=preda;543924]Do you have another GPU of the same model that does not exhibit such errors? otherwise I'd suspect something amiss software-side (i.e. gpuowl, and the related OpenCL compilation).
Anyway on ROCm / Radeon VII I don't see this pattern.[/QUOTE]I have three RX550s. The two that are 4GB both have exhibited the EE occurrence when used during this exponent run. The other is a 2GB and has not been tried there. It could be, since it is idle for the moment while I wait for a replacement power supply for another system. Two days remain on the exponent at RX550 rate. The last 16 hours, after lowering the system to the floor, has gone well, on the second 4GB RX550, no EE during that time in v6.11-268. The RX480 in the same system as the problem occurs is behaving well on a similar exponent PRP, with no EE yet and less than a day remaining at RX480 rate in v6.11-264. The host system does not have adequate power connectors for trying a Radeon VII in the pcie slot where the frequent EE have been observed. |
Preparing to configure new build which will eventually host several Radeon VIIs. In reviewing/updating my personal setup menu, need to make sure I have the ROCm stuff updated for the current version - by default that will be 3.3, yes? And are there any extra command-line flags needed for running gpuOwl under 3.3, by way of working around issues with that ROCm version?
|
[QUOTE=ewmayer;543978]Preparing to configure new build which will eventually host several Radeon VIIs. In reviewing/updating my personal setup menu, need to make sure I have the ROCm stuff updated for the current version - by default that will be 3.3, yes? And are there any extra command-line flags needed for running gpuOwl under 3.3, by way of working around issues with that ROCm version?[/QUOTE]
Yes I think at the momement ROCm 3.3 is the most recent version, and what you get by default. The ROCm-bug-workaround is enabled by default, no special action needed. |
Which is the latest stable version that supports LL? I'm currently using a build from kriesel (gpuowl-v6.11-268-g0d07d21), but it gives me
[CODE]Assertion failed: 0 <= w && w < (1 << nBits), file state.cpp, line 22[/CODE] constantly, on both of my R9 290, and I doubt that both of them got bad so close in time. Especially, because they are different charges. A lot of the results did not match the first LL, some did. |
[QUOTE=kruoli;544049]Which is the latest stable version that supports LL? I'm currently using a build from kriesel (gpuowl-v6.11-268-g0d07d21), but it gives me
[CODE]Assertion failed: 0 <= w && w < (1 << nBits), file state.cpp, line 22[/CODE] constantly, on both of my R9 290, and I doubt that both of them got bad so close in time. Especially, because they are different charges. A lot of the results did not match the first LL, some did.[/QUOTE] LL is experimental in GpuOwl ATM. The assert failing may indicate a bug. Could you please indicate repro steps: what exponent, when it happens, how often it happens (every time?) etc. Basically what you think would allow the developers to reproduce the problem you see -- this would allow us to debug it. At the minimum a log excerpt would also be helpful. If you see any LL mismatching, you should bring it up because it's more likely it's an error on gpuowl's side that a genuine mismatch. Before doing LL on an exponent range, you should validate by doing a few iterations of PRP on the exponent -- if that works fine then LL stands a chance. |
Okay, thank you for the information! Somehow I thought, there has been working LL in the past, but I guess, I confused it with CudaLucas etc.
A few LL ran fine without any errors and matched (e.g. [URL="https://www.mersenne.org/report_exponent/?exp_lo=57234283&full=1"]M57234283[/URL]), but others went erroneous (e.g. [URL="https://www.mersenne.org/report_exponent/?exp_lo=57234167&full=1"]M57234167[/URL], [URL="https://www.mersenne.org/report_exponent/?exp_lo=57234179&full=1"]M57234179[/URL], [URL="https://www.mersenne.org/report_exponent/?exp_lo=57233941&full=1"]M57233941[/URL], [URL="https://www.mersenne.org/report_exponent/?exp_lo=55297621&full=1"]M55233941[/URL]). I uploaded the full logs and residue folders (I guess, that's what they are) compressed for both cards I ran it on [URL="http://mc.oliver-kruse.de/GIMPS/gpuOwl"]here[/URL]. |
Did you tune gpuowl parameters for LL tests? I found out you should only tune for PRP tests and use the paramters that works for PRP for LL tests as well, since there is no error checking on LL tests, so you do no know if you tuned so far it is not working correctly.
|
[QUOTE=ATH;544065]Did you tune gpuowl parameters for LL tests?[/QUOTE]
No, I have not tuned at all, because I did not saw such an option in the "-h" menu. Maybe a bit foolish... |
| All times are UTC. The time now is 07:02. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.