mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2020-04-01, 22:15   #2036
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

279110 Posts
Default

Quote:
Originally Posted by preda View Post
Could you run with -use ROUNDOFF paired in turn with ORIG_SLOWTRIG/NEW_SLOWTRIG and look at the average roundoff error to evaluate their respective accuracy. If ORIG_SLOWTRIG is similarly accurate to NEW_SLOWTRIG we may consider making it the default on Nvidia.

Could other Nvidia users speak up if those proposed Nvidia defaults have adverse performance effects for them (due to different hardware).
I tested the first 100K iterations of 95,000,011:

ORIG_SLOWTRIG: Roundoff: N=10374, max 0.312500, avg 0.212775
NEW_SLOWTRIG: Roundoff: N=10374, max 0.312500, avg 0.214292


I can try and test on my own 2080, if I can compile gpuowl in Windows, or find a new compiled version.
ATH is offline   Reply With Quote
Old 2020-04-02, 00:28   #2037
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

AE716 Posts
Default

RTX 2080 is so bad at double precision and the timings are very inconsistent.

But NEW_SLOWTRIG is better at 3520µs/ite vs 3680µs/ite for ORIG_SLOWTRIG.
T2_SHUFFLE is slightly better at 3520µs vs 3553µs for NO_T2_SHUFFLE
Otherwise CARRY64 and CARRY32 is about the same.
I'm not going to test all those 6 variables on this, since it is very slow and the inconsistencies in the timings is larger than the differences.

Btw UNROLL_NONE,UNROLL_WIDTH and UNROLL_HEIGHT does not work at all on either the Tesla P100 or the RTX 2080.

Last fiddled with by ATH on 2020-04-02 at 00:28
ATH is offline   Reply With Quote
Old 2020-04-02, 01:49   #2038
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3B216 Posts
Default

Should be fixed in the most recent commit, please re-try.
This was again the ROCm optimizer that is generating broken code for our own sin/cos in some particular cases, that we try carefully to avoid.

When seeing unexplained failures like here, it is often useful to try with -use ORIG_SLOWTRIG as that usually works (slower though).

Quote:
Originally Posted by kriesel View Post
I don't know why, but -fft 0 through -fft +5 all hit EE in 800 iterations on this exponent 131500093. Gpuowl v6.11-134-g1e0ce1d chose the initial 7M fft length on its own. After finding it reproducible, I successively incremented -fft to seek a reliable run case. It wasn't until it reached 9M fft that it succeeded in the GEC. The resulting speed penalty is considerable, 7.5 msec/iter versus 5.3 on an RX480. From the program's help output,
Code:
FFT    7M [ 11.01M -  132.46M]  1K-512-7 256-2K-7 512-1K-7 2K-256-7
FFT    8M [ 12.58M -  150.85M]  2K-2K 4K-1K
FFT    9M [ 14.16M -  169.18M]  1K-512-9 256-2K-9 512-1K-9 2K-256-9
Code:
C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win
 2020-04-01 07:47:57 gpuowl v6.11-134-g1e0ce1d
2020-04-01 07:47:57 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM
2020-04-01 07:47:57 device 0, unique id ''
2020-04-01 07:47:57 condorella/rx480 131500093 FFT 7168K: Width 256x4, Height 64x8, Middle 7; 17.92 bits/word
2020-04-01 07:47:59 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=7u -DWEIGHT_STEP=0x8.7b964bd91a558p-3 -DIWEIGHT_STEP=0xf.16e489ea55fc8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DAMDGPU=1 -DNO_ASM=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-04-01 07:48:03 condorella/rx480 OpenCL compilation in 3.97 s
2020-04-01 07:48:06 condorella/rx480 131500093 OK        0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:48:13 condorella/rx480 131500093 EE      800   0.00%; 5272 us/it; ETA 8d 00:34; 6781adfa7991c92a (check 2.31s)
2020-04-01 07:48:15 condorella/rx480 131500093 OK        0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:48:22 condorella/rx480 131500093 EE      800   0.00%; 5309 us/it; ETA 8d 01:56; 6781adfa7991c92a (check 2.31s) 1 errors
2020-04-01 07:48:24 condorella/rx480 131500093 OK        0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:48:31 condorella/rx480 131500093 EE      800   0.00%; 5298 us/it; ETA 8d 01:32; 6781adfa7991c92a (check 2.33s) 2 errors
2020-04-01 07:48:31 condorella/rx480 3 sequential errors, will stop.
2020-04-01 07:48:31 condorella/rx480 Exiting because "too many errors"
2020-04-01 07:48:31 condorella/rx480 Bye
C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611

C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win
2020-04-01 07:48:50 gpuowl v6.11-134-g1e0ce1d
2020-04-01 07:48:50 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +1
2020-04-01 07:48:50 device 0, unique id ''
2020-04-01 07:48:50 condorella/rx480 131500093 FFT 7168K: Width 64x4, Height 256x8, Middle 7; 17.92 bits/word
2020-04-01 07:48:53 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=256u -DSMALL_HEIGHT=2048u -DMIDDLE=7u -DWEIGHT_STE
P=0x8.7b964bd91a558p-3 -DIWEIGHT_STEP=0xf.16e489ea55fc8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f05
18db8a8p-4 -DAMDGPU=1 -DNO_ASM=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-04-01 07:48:57 condorella/rx480 OpenCL compilation in 4.67 s
2020-04-01 07:49:01 condorella/rx480 131500093 OK        0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:49:11 condorella/rx480 131500093 EE      800   0.00%; 7714 us/it; ETA 11d 17:46; 55f854bea6c1cecf (check 3.28s)
2020-04-01 07:49:14 condorella/rx480 131500093 OK        0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:49:24 condorella/rx480 131500093 EE      800   0.00%; 7697 us/it; ETA 11d 17:10; 55f854bea6c1cecf (check 3.29s) 1 errors
2020-04-01 07:49:27 condorella/rx480 131500093 OK        0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:49:37 condorella/rx480 131500093 EE      800   0.00%; 7687 us/it; ETA 11d 16:46; 55f854bea6c1cecf (check 3.27s) 2 errors
2020-04-01 07:49:37 condorella/rx480 3 sequential errors, will stop.
2020-04-01 07:49:37 condorella/rx480 Exiting because "too many errors"
2020-04-01 07:49:37 condorella/rx480 Bye
C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611

C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win
2020-04-01 07:50:25 gpuowl v6.11-134-g1e0ce1d
2020-04-01 07:50:25 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +2
2020-04-01 07:50:25 device 0, unique id ''
2020-04-01 07:50:25 condorella/rx480 131500093 FFT 7168K: Width 64x8, Height 256x4, Middle 7; 17.92 bits/word
2020-04-01 07:50:27 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=512u -DSMALL_HEIGHT=1024u -DMIDDLE=7u -DWEIGHT_STEP=0x8.7b964bd91a558p-3 -DIWEIGHT_STEP=0xf.16e489ea55fc8p-4 -DWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-3 -DIWEIGHT_BIGSTEP=0xc.5672a115506d8p-4 -DAMDGPU=1 -DNO_ASM=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-04-01 07:50:31 condorella/rx480 OpenCL compilation in 3.72 s
2020-04-01 07:50:34 condorella/rx480 131500093 OK        0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:50:42 condorella/rx480 131500093 EE      800   0.00%; 6286 us/it; ETA 9d 13:37; 6f8253cbb2fe58e9 (check 2.71s)
2020-04-01 07:50:45 condorella/rx480 131500093 OK        0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:50:53 condorella/rx480 131500093 EE      800   0.00%; 6283 us/it; ETA 9d 13:29; 6f8253cbb2fe58e9 (check 2.71s) 1 errors
2020-04-01 07:50:56 condorella/rx480 131500093 OK        0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:51:03 condorella/rx480 131500093 EE      800   0.00%; 6299 us/it; ETA 9d 14:05; 6f8253cbb2fe58e9 (check 2.71s) 2 errors
2020-04-01 07:51:03 condorella/rx480 3 sequential errors, will stop.
2020-04-01 07:51:03 condorella/rx480 Exiting because "too many errors"
2020-04-01 07:51:03 condorella/rx480 Bye
C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611

C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win
2020-04-01 07:51:29 gpuowl v6.11-134-g1e0ce1d
2020-04-01 07:51:29 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +3
2020-04-01 07:51:29 device 0, unique id ''
2020-04-01 07:51:29 condorella/rx480 131500093 FFT 7168K: Width 256x8, Height 64x4, Middle 7; 17.92 bits/word
2020-04-01 07:51:29 condorella/rx480 using long carry kernels
2020-04-01 07:51:32 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=2048u -DSMALL_HEIGHT=256u -DMIDDLE=7u -DWEIGHT_STE
P=0x8.7b964bd91a558p-3 -DIWEIGHT_STEP=0xf.16e489ea55fc8p-4 -DWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-3 -DIWEIGHT_BIGSTEP=0xc.5672a1
15506d8p-4 -DAMDGPU=1 -DNO_ASM=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-04-01 07:51:36 condorella/rx480 OpenCL compilation in 3.97 s
2020-04-01 07:51:39 condorella/rx480 131500093 OK        0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:51:46 condorella/rx480 131500093 EE      800   0.00%; 5275 us/it; ETA 8d 00:42; cfbd904e74b67aae (check 2.31s)
2020-04-01 07:51:48 condorella/rx480 131500093 OK        0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:51:54 condorella/rx480 131500093 EE      800   0.00%; 5249 us/it; ETA 7d 23:44; cfbd904e74b67aae (check 2.29s)1 errors
2020-04-01 07:51:57 condorella/rx480 131500093 OK        0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:52:03 condorella/rx480 131500093 EE      800   0.00%; 5239 us/it; ETA 7d 23:23; cfbd904e74b67aae (check 2.29s)2 errors
2020-04-01 07:52:03 condorella/rx480 3 sequential errors, will stop.
2020-04-01 07:52:03 condorella/rx480 Exiting because "too many errors"
2020-04-01 07:52:03 condorella/rx480 Bye
C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611

C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win
2020-04-01 07:52:07 gpuowl v6.11-134-g1e0ce1d
2020-04-01 07:52:07 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +4
2020-04-01 07:52:07 device 0, unique id ''
2020-04-01 07:52:07 condorella/rx480 131500093 FFT 8192K: Width 256x8, Height 256x8; 15.68 bits/word
2020-04-01 07:52:07 condorella/rx480 using long carry kernels
2020-04-01 07:52:10 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=2048u -DSMALL_HEIGHT=2048u -DMIDDLE=1u -DWEIGHT_ST
EP=0xa.039f00d8f95f8p-3 -DIWEIGHT_STEP=0xc.c82be96a7181p-4 -DWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-3 -DIWEIGHT_BIGSTEP=0xc.5672a1
15506d8p-4 -DAMDGPU=1 -DNO_ASM=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-04-01 07:52:15 condorella/rx480 OpenCL compilation in 5.16 s
2020-04-01 07:52:18 condorella/rx480 131500093 OK        0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:52:27 condorella/rx480 131500093 EE      800   0.00%; 6583 us/it; ETA 10d 00:28; 05252a7f59574e37 (check 2.85s)
2020-04-01 07:52:30 condorella/rx480 131500093 OK        0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:52:38 condorella/rx480 131500093 EE      800   0.00%; 6587 us/it; ETA 10d 00:36; 05252a7f59574e37 (check 2.85s) 1 errors
2020-04-01 07:52:41 condorella/rx480 131500093 OK        0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:52:49 condorella/rx480 131500093 EE      800   0.00%; 6594 us/it; ETA 10d 00:53; 05252a7f59574e37 (check 2.86s) 2 errors
2020-04-01 07:52:49 condorella/rx480 3 sequential errors, will stop.
2020-04-01 07:52:49 condorella/rx480 Exiting because "too many errors"
2020-04-01 07:52:49 condorella/rx480 Bye
C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611

C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win
2020-04-01 07:53:21 gpuowl v6.11-134-g1e0ce1d
2020-04-01 07:53:21 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +5
2020-04-01 07:53:21 device 0, unique id ''
2020-04-01 07:53:21 condorella/rx480 131500093 FFT 8192K: Width 512x8, Height 256x4; 15.68 bits/word
2020-04-01 07:53:21 condorella/rx480 using long carry kernels
2020-04-01 07:53:23 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=4096u -DSMALL_HEIGHT=1024u -DMIDDLE=1u -DWEIGHT_ST
EP=0xa.039f00d8f95f8p-3 -DIWEIGHT_STEP=0xc.c82be96a7181p-4 -DWEIGHT_BIGSTEP=0xa.5fed6a9b15138p-3 -DIWEIGHT_BIGSTEP=0xc.5672a1
15506d8p-4 -DAMDGPU=1 -DNO_ASM=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-04-01 07:53:26 condorella/rx480 OpenCL compilation in 3.53 s
2020-04-01 07:53:30 condorella/rx480 131500093 OK        0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:53:39 condorella/rx480 131500093 EE      800   0.00%; 7196 us/it; ETA 10d 22:51; 6df742314b82f841 (check 3.11s)
2020-04-01 07:53:42 condorella/rx480 131500093 OK        0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:53:51 condorella/rx480 131500093 EE      800   0.00%; 7219 us/it; ETA 10d 23:43; 6df742314b82f841 (check 3.11s) 1 errors
2020-04-01 07:53:54 condorella/rx480 131500093 OK        0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:54:03 condorella/rx480 131500093 EE      800   0.00%; 7190 us/it; ETA 10d 22:38; 6df742314b82f841 (check 3.10s) 2 errors
2020-04-01 07:54:03 condorella/rx480 3 sequential errors, will stop.
2020-04-01 07:54:03 condorella/rx480 Exiting because "too many errors"
2020-04-01 07:54:03 condorella/rx480 Bye
C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>g611

C:\msys64\home\ken\gpuowl-compile\gpuowl-v6.11-134-g1e0ce1d\rx480>gpuowl-win
2020-04-01 07:54:08 gpuowl v6.11-134-g1e0ce1d
2020-04-01 07:54:08 config: -device 0 -user kriesel -cpu condorella/rx480 -yield -maxAlloc 7500 -use NO_ASM -fft +6
2020-04-01 07:54:08 device 0, unique id ''
2020-04-01 07:54:08 condorella/rx480 131500093 FFT 9216K: Width 256x4, Height 64x8, Middle 9; 13.93 bits/word
2020-04-01 07:54:08 condorella/rx480 using long carry kernels
2020-04-01 07:54:12 condorella/rx480 OpenCL args "-DEXP=131500093u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=9u -DWEIGHT_STEP=0x8.5f7e7ead6051p-3 -DIWEIGHT_STEP=0xf.498539ec95fe8p-4 -DWEIGHT_BIGSTEP=0xd.744fccad69d68p-3 -DIWEIGHT_BIGSTEP=0x9.837f0518db8a8p-4 -DAMDGPU=1 -DNO_ASM=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-04-01 07:54:16 condorella/rx480 OpenCL compilation in 4.11 s
2020-04-01 07:54:20 condorella/rx480 131500093 OK        0 loaded: blockSize 400, 0000000000000003
2020-04-01 07:54:29 condorella/rx480 131500093 OK      800   0.00%; 7461 us/it; ETA 11d 08:32; bbe24bd13cd73020 (check 3.26s)
2020-04-01 08:19:33 condorella/rx480 131500093 OK   200000   0.15%; 7541 us/it; ETA 11d 11:03; 190bb27ff665f83b (check 3.25s)
preda is offline   Reply With Quote
Old 2020-04-02, 01:57   #2039
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

16628 Posts
Default

Quote:
Originally Posted by ATH View Post
RTX 2080 is so bad at double precision and the timings are very inconsistent.

But NEW_SLOWTRIG is better at 3520µs/ite vs 3680µs/ite for ORIG_SLOWTRIG.
T2_SHUFFLE is slightly better at 3520µs vs 3553µs for NO_T2_SHUFFLE
Otherwise CARRY64 and CARRY32 is about the same.
I'm not going to test all those 6 variables on this, since it is very slow and the inconsistencies in the timings is larger than the differences.

Btw UNROLL_NONE,UNROLL_WIDTH and UNROLL_HEIGHT does not work at all on either the Tesla P100 or the RTX 2080.
Thank you, this seems to suggest: keep the defaults unchanged for Nvidia (as they are better on at least some Nvidia GPUs). The Nvidia user can tune by trying ORIG_SLOWTRIG and NO_T2_SHUFFLE if so inclined.
preda is offline   Reply With Quote
Old 2020-04-04, 09:54   #2040
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

94610 Posts
Default ROCm 3.3

ROCm 3.1 has a severe bug that is affecting our sin/cos routines, and we had to use a workaround for that bug.

Recently ROCm 3.3 was released, and it seems this bug is fixed, so the ROCm 3.1 workarounds are not needed anymore (they do have a slight perf impact).

Right now the ROCm 3.3 performance is very close to ROCm 3.1, a tiny bit slower (less that 0.3% slower). OTOH ROCm 3.3 might have a few advantages -- for one it uses less VGPRs (and it doesn't have the terrible ROCm 3.1 bug).

In a recent commit I removed by default the ROCm 3.1 bug-workaround -- it must now be explicitly enabled with -use ROCM31 . If this is not done, there are errors in PRP. A slower alternative is using ORIG_SLOWTRIG which does not trigger the bug.

So in brief:
- ROCm 3.3 is OK and can be used
- if using ROCm 3.1, *must* specify -use ROCM31 or -use ORIG_SLOWTRIG


(for users who are now on ROCm 2.10 or earlier, I recommend moving directly to 3.3, skipping 3.1)

There is also the possibility of having multiple ROCm versions installed at the same time (this is useful when one wants to experiment and compare versions); here is one way to do it:

- install multiple ROCm versions in separate folders, e.g.: /opt/rocm-3.1.0/ and /opt/rocm-3.3.0/
- verify that the ROCm folder containing libamdocl64.so is listed in the LIBPATH in Makefile or SConstruct.
- edit the Makefile to link with -lamdocl64 instead of -lOpenCL (or build with scons)
- when running gpuowl, specify LD_LIBRARY_PATH pointing to the folder with libamdocl64 for the desired ROCm version, e.g.

LD_LIBRARY_PATH=/opt/rocm-3.3.0/opencl/lib/x86_64 ./gpuowl
preda is offline   Reply With Quote
Old 2020-04-04, 11:41   #2041
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2×11×43 Posts
Default

Quote:
Originally Posted by preda View Post
ROCm 3.1 has a severe bug that is affecting our sin/cos routines, and we had to use a workaround for that bug.

Recently ROCm 3.3 was released, and it seems this bug is fixed, so the ROCm 3.1 workarounds are not needed anymore (they do have a slight perf impact).
I spoke too soon: it seems the bug still affects ROCm 3.3 (but not for exactly the same exponents). As such, I re-enabled the workaround by default as it was before.
preda is offline   Reply With Quote
Old 2020-04-04, 19:45   #2042
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

3×3,643 Posts
Default

I'm glad I'm a late adopter with these things - still on rocm 2.10 :). Mihai, bugs aside, roughly what % speedup can one expect from moving from 2.10 to 3.3?
ewmayer is offline   Reply With Quote
Old 2020-04-04, 20:46   #2043
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2×11×43 Posts
Default

Quote:
Originally Posted by ewmayer View Post
I'm glad I'm a late adopter with these things - still on rocm 2.10 :). Mihai, bugs aside, roughly what % speedup can one expect from moving from 2.10 to 3.3?
I would guess something in the range 2% - 4%.
preda is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1601 2020-04-05 22:34
GPUOWL AMD Windows OpenCL issues xx005fs GPU Computing 0 2019-07-26 21:37
Primality testing non-Mersennes lukerichards Software 8 2018-01-24 22:30
Mersenne trial division implementation mathPuzzles Math 8 2017-04-21 07:21
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12

All times are UTC. The time now is 07:54.

Tue Apr 7 07:54:33 UTC 2020 up 13 days, 5:27, 2 users, load averages: 1.70, 1.80, 1.81

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.