![]() |
[QUOTE=preda;541444]Could you run with -use ROUNDOFF paired in turn with ORIG_SLOWTRIG/NEW_SLOWTRIG and look at the average roundoff error to evaluate their respective accuracy. If ORIG_SLOWTRIG is similarly accurate to NEW_SLOWTRIG we may consider making it the default on Nvidia.
Could other Nvidia users speak up if those proposed Nvidia defaults have adverse performance effects for them (due to different hardware).[/QUOTE] I tested the first 100K iterations of 95,000,011: ORIG_SLOWTRIG: Roundoff: N=10374, max 0.312500, avg 0.212775 NEW_SLOWTRIG: Roundoff: N=10374, max 0.312500, avg 0.214292 I can try and test on my own 2080, if I can compile gpuowl in Windows, or find a new compiled version. |
RTX 2080 is so bad at double precision and the timings are very inconsistent.
But NEW_SLOWTRIG is better at 3520µs/ite vs 3680µs/ite for ORIG_SLOWTRIG. T2_SHUFFLE is slightly better at 3520µs vs 3553µs for NO_T2_SHUFFLE Otherwise CARRY64 and CARRY32 is about the same. I'm not going to test all those 6 variables on this, since it is very slow and the inconsistencies in the timings is larger than the differences. Btw UNROLL_NONE,UNROLL_WIDTH and UNROLL_HEIGHT does not work at all on either the Tesla P100 or the RTX 2080. |
Should be fixed in the most recent commit, please re-try.
This was again the ROCm optimizer that is generating broken code for our own sin/cos in some particular cases, that we try carefully to avoid. When seeing unexplained failures like here, it is often useful to try with -use ORIG_SLOWTRIG as that usually works (slower though). [QUOTE=kriesel;541479]I don't know why, but -fft 0 through -fft +5 all hit EE in 800 iterations on this exponent 131500093. Gpuowl v6.11-134-g1e0ce1d chose the initial 7M fft length on its own. After finding it reproducible, I successively incremented -fft to seek a reliable run case. It wasn't until it reached 9M fft that it succeeded in the GEC. The resulting speed penalty is considerable, 7.5 msec/iter versus 5.3 on an RX480. From the program's help output,[CODE]FFT 7M [ 11.01M - 132.46M] 1K-512-7 256-2K-7 512-1K-7 2K-256-7 FFT 8M [ 12.58M - 150.85M] 2K-2K 4K-1K FFT 9M [ 14.16M - 169.18M] 1K-512-9 256-2K-9 512-1K-9 2K-256-9[/CODE][CODE]C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win 2020-04-01 07:47:57 gpuowl v6.11-134-g1e0ce1d 2020-04-01 07:47:57 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM 2020-04-01 07:47:57 device 0, unique id '' 2020-04-01 07:47:57 condorella/rx480 131500093 FFT 7168K: Width 256x4, Height 64x8, Middle 7; 17.92 bits/word 2020-04-01 07:47:59 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=7u -DWEIGHT_STEP=0x8.7b964bd91a558p-3 -DIWEIGHT_STEP=0xf.16e489ea55fc8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DAMDGPU=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2020-04-01 07:48:03 condorella/rx480 OpenCL compilation in 3.97 s 2020-04-01 07:48:06 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:48:13 condorella/rx480 131500093 EE 800 0.00%; 5272 us/it; ETA 8d 00:34; 6781adfa7991c92a (check 2.31s) 2020-04-01 07:48:15 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:48:22 condorella/rx480 131500093 EE 800 0.00%; 5309 us/it; ETA 8d 01:56; 6781adfa7991c92a (check 2.31s) 1 errors 2020-04-01 07:48:24 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:48:31 condorella/rx480 131500093 EE 800 0.00%; 5298 us/it; ETA 8d 01:32; 6781adfa7991c92a (check 2.33s) 2 errors 2020-04-01 07:48:31 condorella/rx480 3 sequential errors, will stop. 2020-04-01 07:48:31 condorella/rx480 Exiting because "too many errors" 2020-04-01 07:48:31 condorella/rx480 Bye C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611 C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win 2020-04-01 07:48:50 gpuowl v6.11-134-g1e0ce1d 2020-04-01 07:48:50 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +1 2020-04-01 07:48:50 device 0, unique id '' 2020-04-01 07:48:50 condorella/rx480 131500093 FFT 7168K: Width 64x4, Height 256x8, Middle 7; 17.92 bits/word 2020-04-01 07:48:53 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=256u -DSMALL_HEIGHT=2048u -DMIDDLE=7u -DWEIGHT_STE P=0x8.7b964bd91a558p-3 -DIWEIGHT_STEP=0xf.16e489ea55fc8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f05 18db8a8p-4 -DAMDGPU=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2020-04-01 07:48:57 condorella/rx480 OpenCL compilation in 4.67 s 2020-04-01 07:49:01 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:49:11 condorella/rx480 131500093 EE 800 0.00%; 7714 us/it; ETA 11d 17:46; 55f854bea6c1cecf (check 3.28s) 2020-04-01 07:49:14 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:49:24 condorella/rx480 131500093 EE 800 0.00%; 7697 us/it; ETA 11d 17:10; 55f854bea6c1cecf (check 3.29s) 1 errors 2020-04-01 07:49:27 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:49:37 condorella/rx480 131500093 EE 800 0.00%; 7687 us/it; ETA 11d 16:46; 55f854bea6c1cecf (check 3.27s) 2 errors 2020-04-01 07:49:37 condorella/rx480 3 sequential errors, will stop. 2020-04-01 07:49:37 condorella/rx480 Exiting because "too many errors" 2020-04-01 07:49:37 condorella/rx480 Bye C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611 C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win 2020-04-01 07:50:25 gpuowl v6.11-134-g1e0ce1d 2020-04-01 07:50:25 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +2 2020-04-01 07:50:25 device 0, unique id '' 2020-04-01 07:50:25 condorella/rx480 131500093 FFT 7168K: Width 64x8, Height 256x4, Middle 7; 17.92 bits/word 2020-04-01 07:50:27 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=512u -DSMALL_HEIGHT=1024u -DMIDDLE=7u -DWEIGHT_STEP=0x8.7b964bd91a558p-3 -DIWEIGHT_STEP=0xf.16e489ea55fc8p-4 -DWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-3 -DIWEIGHT_BIGSTEP=0xc.5672a115506d8p-4 -DAMDGPU=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2020-04-01 07:50:31 condorella/rx480 OpenCL compilation in 3.72 s 2020-04-01 07:50:34 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:50:42 condorella/rx480 131500093 EE 800 0.00%; 6286 us/it; ETA 9d 13:37; 6f8253cbb2fe58e9 (check 2.71s) 2020-04-01 07:50:45 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:50:53 condorella/rx480 131500093 EE 800 0.00%; 6283 us/it; ETA 9d 13:29; 6f8253cbb2fe58e9 (check 2.71s) 1 errors 2020-04-01 07:50:56 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:51:03 condorella/rx480 131500093 EE 800 0.00%; 6299 us/it; ETA 9d 14:05; 6f8253cbb2fe58e9 (check 2.71s) 2 errors 2020-04-01 07:51:03 condorella/rx480 3 sequential errors, will stop. 2020-04-01 07:51:03 condorella/rx480 Exiting because "too many errors" 2020-04-01 07:51:03 condorella/rx480 Bye C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611 C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win 2020-04-01 07:51:29 gpuowl v6.11-134-g1e0ce1d 2020-04-01 07:51:29 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +3 2020-04-01 07:51:29 device 0, unique id '' 2020-04-01 07:51:29 condorella/rx480 131500093 FFT 7168K: Width 256x8, Height 64x4, Middle 7; 17.92 bits/word 2020-04-01 07:51:29 condorella/rx480 using long carry kernels 2020-04-01 07:51:32 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=2048u -DSMALL_HEIGHT=256u -DMIDDLE=7u -DWEIGHT_STE P=0x8.7b964bd91a558p-3 -DIWEIGHT_STEP=0xf.16e489ea55fc8p-4 -DWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-3 -DIWEIGHT_BIGSTEP=0xc.5672a1 15506d8p-4 -DAMDGPU=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2020-04-01 07:51:36 condorella/rx480 OpenCL compilation in 3.97 s 2020-04-01 07:51:39 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:51:46 condorella/rx480 131500093 EE 800 0.00%; 5275 us/it; ETA 8d 00:42; cfbd904e74b67aae (check 2.31s) 2020-04-01 07:51:48 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:51:54 condorella/rx480 131500093 EE 800 0.00%; 5249 us/it; ETA 7d 23:44; cfbd904e74b67aae (check 2.29s)1 errors 2020-04-01 07:51:57 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:52:03 condorella/rx480 131500093 EE 800 0.00%; 5239 us/it; ETA 7d 23:23; cfbd904e74b67aae (check 2.29s)2 errors 2020-04-01 07:52:03 condorella/rx480 3 sequential errors, will stop. 2020-04-01 07:52:03 condorella/rx480 Exiting because "too many errors" 2020-04-01 07:52:03 condorella/rx480 Bye C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611 C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win 2020-04-01 07:52:07 gpuowl v6.11-134-g1e0ce1d 2020-04-01 07:52:07 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +4 2020-04-01 07:52:07 device 0, unique id '' 2020-04-01 07:52:07 condorella/rx480 131500093 FFT 8192K: Width 256x8, Height 256x8; 15.68 bits/word 2020-04-01 07:52:07 condorella/rx480 using long carry kernels 2020-04-01 07:52:10 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=2048u -DSMALL_HEIGHT=2048u -DMIDDLE=1u -DWEIGHT_ST EP=0xa.039f00d8f95f8p-3 -DIWEIGHT_STEP=0xc.c82be96a7181p-4 -DWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-3 -DIWEIGHT_BIGSTEP=0xc.5672a1 15506d8p-4 -DAMDGPU=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2020-04-01 07:52:15 condorella/rx480 OpenCL compilation in 5.16 s 2020-04-01 07:52:18 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:52:27 condorella/rx480 131500093 EE 800 0.00%; 6583 us/it; ETA 10d 00:28; 05252a7f59574e37 (check 2.85s) 2020-04-01 07:52:30 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:52:38 condorella/rx480 131500093 EE 800 0.00%; 6587 us/it; ETA 10d 00:36; 05252a7f59574e37 (check 2.85s) 1 errors 2020-04-01 07:52:41 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:52:49 condorella/rx480 131500093 EE 800 0.00%; 6594 us/it; ETA 10d 00:53; 05252a7f59574e37 (check 2.86s) 2 errors 2020-04-01 07:52:49 condorella/rx480 3 sequential errors, will stop. 2020-04-01 07:52:49 condorella/rx480 Exiting because "too many errors" 2020-04-01 07:52:49 condorella/rx480 Bye C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611 C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win 2020-04-01 07:53:21 gpuowl v6.11-134-g1e0ce1d 2020-04-01 07:53:21 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +5 2020-04-01 07:53:21 device 0, unique id '' 2020-04-01 07:53:21 condorella/rx480 131500093 FFT 8192K: Width 512x8, Height 256x4; 15.68 bits/word 2020-04-01 07:53:21 condorella/rx480 using long carry kernels 2020-04-01 07:53:23 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=4096u -DSMALL_HEIGHT=1024u -DMIDDLE=1u -DWEIGHT_ST EP=0xa.039f00d8f95f8p-3 -DIWEIGHT_STEP=0xc.c82be96a7181p-4 -DWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-3 -DIWEIGHT_BIGSTEP=0xc.5672a1 15506d8p-4 -DAMDGPU=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2020-04-01 07:53:26 condorella/rx480 OpenCL compilation in 3.53 s 2020-04-01 07:53:30 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:53:39 condorella/rx480 131500093 EE 800 0.00%; 7196 us/it; ETA 10d 22:51; 6df742314b82f841 (check 3.11s) 2020-04-01 07:53:42 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:53:51 condorella/rx480 131500093 EE 800 0.00%; 7219 us/it; ETA 10d 23:43; 6df742314b82f841 (check 3.11s) 1 errors 2020-04-01 07:53:54 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:54:03 condorella/rx480 131500093 EE 800 0.00%; 7190 us/it; ETA 10d 22:38; 6df742314b82f841 (check 3.10s) 2 errors 2020-04-01 07:54:03 condorella/rx480 3 sequential errors, will stop. 2020-04-01 07:54:03 condorella/rx480 Exiting because "too many errors" 2020-04-01 07:54:03 condorella/rx480 Bye C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611 C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win 2020-04-01 07:54:08 gpuowl v6.11-134-g1e0ce1d 2020-04-01 07:54:08 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +6 2020-04-01 07:54:08 device 0, unique id '' 2020-04-01 07:54:08 condorella/rx480 131500093 FFT 9216K: Width 256x4, Height 64x8, Middle 9; 13.93 bits/word 2020-04-01 07:54:08 condorella/rx480 using long carry kernels 2020-04-01 07:54:12 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=9u -DWEIGHT_STEP=0x8.5f7e7ead6051p-3 -DIWEIGHT_STEP=0xf.498539ec95fe8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DAMDGPU=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2020-04-01 07:54:16 condorella/rx480 OpenCL compilation in 4.11 s 2020-04-01 07:54:20 condorella/rx480 131500093 OK 0 loaded: blockSize 400, 0000000000000003 2020-04-01 07:54:29 condorella/rx480 131500093 OK 800 0.00%; 7461 us/it; ETA 11d 08:32; bbe24bd13cd73020 (check 3.26s) 2020-04-01 08:19:33 condorella/rx480 131500093 OK 200000 0.15%; 7541 us/it; ETA 11d 11:03; 190bb27ff665f83b (check 3.25s) [/CODE][/QUOTE] |
[QUOTE=ATH;541561]RTX 2080 is so bad at double precision and the timings are very inconsistent.
But NEW_SLOWTRIG is better at 3520µs/ite vs 3680µs/ite for ORIG_SLOWTRIG. T2_SHUFFLE is slightly better at 3520µs vs 3553µs for NO_T2_SHUFFLE Otherwise CARRY64 and CARRY32 is about the same. I'm not going to test all those 6 variables on this, since it is very slow and the inconsistencies in the timings is larger than the differences. Btw UNROLL_NONE,UNROLL_WIDTH and UNROLL_HEIGHT does not work at all on either the Tesla P100 or the RTX 2080.[/QUOTE] Thank you, this seems to suggest: keep the defaults unchanged for Nvidia (as they are better on at least some Nvidia GPUs). The Nvidia user can tune by trying ORIG_SLOWTRIG and NO_T2_SHUFFLE if so inclined. |
ROCm 3.3
ROCm 3.1 has a severe bug that is affecting our sin/cos routines, and we had to use a workaround for that bug.
Recently ROCm 3.3 was released, and it seems this bug is fixed, so the ROCm 3.1 workarounds are not needed anymore (they do have a slight perf impact). Right now the ROCm 3.3 performance is very close to ROCm 3.1, a tiny bit slower (less that 0.3% slower). OTOH ROCm 3.3 might have a few advantages -- for one it uses less VGPRs (and it doesn't have the terrible ROCm 3.1 bug). In a recent commit I removed by default the ROCm 3.1 bug-workaround -- it must now be explicitly enabled with -use ROCM31 . If this is not done, there are errors in PRP. A slower alternative is using ORIG_SLOWTRIG which does not trigger the bug. So in brief: - ROCm 3.3 is OK and can be used - if using ROCm 3.1, *must* specify -use ROCM31 or -use ORIG_SLOWTRIG (for users who are now on ROCm 2.10 or earlier, I recommend moving directly to 3.3, skipping 3.1) There is also the possibility of having multiple ROCm versions installed at the same time (this is useful when one wants to experiment and compare versions); here is one way to do it: - install multiple ROCm versions in separate folders, e.g.: /opt/rocm-3.1.0/ and /opt/rocm-3.3.0/ - verify that the ROCm folder containing libamdocl64.so is listed in the LIBPATH in Makefile or SConstruct. - edit the Makefile to link with -lamdocl64 instead of -lOpenCL (or build with scons) - when running gpuowl, specify LD_LIBRARY_PATH pointing to the folder with libamdocl64 for the desired ROCm version, e.g. LD_LIBRARY_PATH=/opt/rocm-3.3.0/opencl/lib/x86_64 ./gpuowl |
[QUOTE=preda;541738]ROCm 3.1 has a severe bug that is affecting our sin/cos routines, and we had to use a workaround for that bug.
Recently ROCm 3.3 was released, and it seems this bug is fixed, so the ROCm 3.1 workarounds are not needed anymore (they do have a slight perf impact). [/QUOTE] I spoke too soon: it seems the bug still affects ROCm 3.3 (but not for exactly the same exponents). As such, I re-enabled the workaround by default as it was before. |
I'm glad I'm a late adopter with these things - still on rocm 2.10 :). Mihai, bugs aside, roughly what % speedup can one expect from moving from 2.10 to 3.3?
|
[QUOTE=ewmayer;541792]I'm glad I'm a late adopter with these things - still on rocm 2.10 :). Mihai, bugs aside, roughly what % speedup can one expect from moving from 2.10 to 3.3?[/QUOTE]
I would guess something in the range 2% - 4%. |
FYI, I'm working on re-enabling LL in gpuowl (intended for DC). (I do not plan to add offset or Jacobi check though)
|
[QUOTE=preda;542013]FYI, I'm working on re-enabling LL in gpuowl (intended for DC). (I do not plan to add offset or Jacobi check though)[/QUOTE]
Thank you!!! :party: |
[QUOTE=preda;542013]FYI, I'm working on re-enabling LL in gpuowl (intended for DC). (I do not plan to add offset or Jacobi check though)[/QUOTE]
I just commited a first iteration of LL. Here's a brief summary of changes: 1. How to run LL a) add a line like DoubleCheck=1AAFFAAD0000000FFFF,51456287,74,1 to worktodo.txt where the AID (the initial hex) can be N/A or missing. The values after the exponent ("74,1") are ignored now. b) or, pass on the command line "-ll <exponent>" (similar to -pm1 and -prp); if this is a test-run and the result should not be published, can be used in conjunction with "-results /dev/null" or a similar file. 2. Savefiles There are savefiles that follow the usual pattern, with the extension ".ll.owl", you can look into them with "head -1 <savefile>" which prints the header. The savefile contains a checksum that covers all the values stored, so if the savefile is corrupted or edited manually without updating the checksum this should be detected (and the file rejected). 3. The command line flags -block and -log are used by LL too: -block indicates how many iterations to queue to the GPU, and the default for LL is 1000. If the GPU becomes sluggish (slow to react) I would look into reducing -block to e.g. 100 (although this problem does not appear on ROCm). Also large values for -block together with slower iterations would produce a slower reaction to manual interrupt (Ctrl-C). -log indicates how often to log, and to save (the two, log and save, are linked as they are for PRP). warning: when running a very small exponent, 1398269, one of my Radeon VII started to act flaky. This turned out to be fixed by increasing the voltage on that GPU; but it seems to indicate that, for R7, sometimes such small FFTs may expose the GPU more than the typical PRP. In my case the fix was increase voltage, *not* decrease memory frequency. There is no Jacobi check for now. (I'll consider how hard it is to add). There is a change in how the FFT size is specified. Look at -h for the new format of FFT specifiers, and pass either the full FFT size (e.g. "5.5M") or one of the FFT specs (e.g. "1K:5:512") from the list displayed by -h . |
| All times are UTC. The time now is 23:08. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.