![]() |
![]() |
#309 |
P90 years forever!
Aug 2002
Yeehaw, FL
1CEF16 Posts |
![]()
FFT length is way too small
|
![]() |
![]() |
![]() |
#310 |
Nov 2010
Germany
3×199 Posts |
![]() |
![]() |
![]() |
![]() |
#311 |
Jun 2014
23×3×5 Posts |
![]()
I think George was talking about the 2^17 FFT length earlier.
|
![]() |
![]() |
![]() |
#312 |
Aug 2010
Republic of Belarus
2×89 Posts |
![]()
Hello! Where i can see benchmark's for 100M exponent (ETA, ms)!? Very interested in the results for the 295X2. Now it's card very cheap and i want buy it!
And second question. LL OpenCl using only GPU or GPU+CPU? |
![]() |
![]() |
![]() |
#313 | |
Nov 2010
Germany
3·199 Posts |
![]() Quote:
The 295x2 can run two such tests in parallel. I'd expect the speed of each slightly below my results: HD7950 has 717 DP GFlops/ 240 GB/s memory rate, OC'd to 1100/1400 MHz ==> 985 GFlops/268 GB/s. R295x2 has 2x 717 GFlops, 2x 320 GB/s memory rate. I'm not sure what counts stronger: the lower DP power, or the better memory bandwidth. In an attempt to answer this last question I separately reduced the clock of GPU cores and memory by 10%. 10% lower GFlops result in 5.9% longer iteration times, whereas 10% lower bandwidth cause 5.2% longer iteration times. If both clocks are lowered by 10%, then the iteration times increase by 10.1% ![]() |
|
![]() |
![]() |
![]() |
#314 | |
Aug 2010
Republic of Belarus
2·89 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#315 |
"Mr. Meeseeks"
Jan 2012
California, USA
32×241 Posts |
![]()
Nice performance improvements with the latest clFFT library... playing around with it now
![]() clFFT 2.0(current binary) Code:
Platform :AdvancedMicro Devices, Inc. Device 0 : Tonga M( 1257787 )C, 0x3f45bf9bea7213ea, n = 65536, clLucas v1.01 err = 0.1211 (0:03 real, 0.3194 ms/iter, ETA 6:36) M( 1398269 )C, 0xa4a6d2f0e34629db, n = 73728, clLucas v1.01 err = 0.1016 (0:04 real, 0.3781 ms/iter, ETA 8:41) M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, clLucas v1.01 err = 0.05078 (0:06 real, 0.6307 ms/iter, ETA 31:06) M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, clLucas v1.01 err = 0.0625 (0:06 real, 0.6283 ms/iter, ETA 31:31) M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, clLucas v1.01 err = 0.04688 (0:13 real, 1.2852 ms/iter, ETA 2:29:05) M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, clLucas v1.01 err = 0.03223 (0:29 real, 2.9072 ms/iter, ETA 10:51:42) M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, clLucas v1.01 err = 0.09375 (0:50 real, 5.0678 ms/iter, ETA 29:32:02) M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, clLucas v1.01 err = 0.1875 (1:04 real, 6.4269 ms/iter, ETA 42:52:54) M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, clLucas v1.01 err = 0.02051 (1:26 real, 8.5475 ms/iter, ETA 61:36:48) M( 30402457 )C, 0x0b8600ef47e69d27, n = 1638400, clLucas v1.01 err = 0.3125 (1:38 real, 9.8404 ms/iter, ETA 83:04:09) M( 32582657 )C, 0x02751b7fcec76bb1, n = 1769472, clLucas v1.01 err = 0.2969 (2:29 real, 14.9789 ms/iter, ETA 135:31:01) M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, clLucas v1.01 err = 0.1201 (0:56 real, 5.5684 ms/iter, ETA 57:26:50) M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, clLucas v1.01 err = 0.2031 (2:34 real, 15.4499 ms/iter, ETA 182:57:10) M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, clLucas v1.01 err = 0.2656 (2:34 real, 15.4018 ms/iter, ETA 184:23:39) Code:
Platform :Advanced Micro Devices, Inc. Device 0 : Tonga M( 1257787 )C, 0x3f45bf9bea7213ea, n = 65536, clLucas v1.01 err = 0.1094 (0:03 real, 0.3001 ms/iter, ETA 6:12) M( 1398269 )C, 0xa4a6d2f0e34629db, n = 73728, clLucas v1.01 err = 0.09375 (0:05 real, 0.5239 ms/iter, ETA 12:03) M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, clLucas v1.01 err = 0.04883 (0:09 real, 0.8545 ms/iter, ETA 42:09) M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, clLucas v1.01 err = 0.06641 (0:08 real, 0.8560 ms/iter, ETA 42:56) M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, clLucas v1.01 err = 0.05139 (0:14 real, 1.4218 ms/iter, ETA 2:44:55) M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, clLucas v1.01 err = 0.03125 (0:24 real, 2.3861 ms/iter, ETA 8:54:52) M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, clLucas v1.01 err = 0.09375 (0:36 real, 3.5629 ms/iter, ETA 20:45:50) M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, clLucas v1.01 err = 0.2031 (0:41 real, 4.1131 ms/iter, ETA 27:26:37) M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, clLucas v1.01 err = 0.02002 (0:42 real, 4.1595 ms/iter, ETA 29:59:00) M( 30402457 )C, 0x0b8600ef47e69d27, n = 1638400, clLucas v1.01 err = 0.2881 (0:50 real, 5.0494 ms/iter, ETA 42:37:29) M( 32582657 )C, 0x02751b7fcec76bb1, n = 1769472, clLucas v1.01 err = 0.3125 (0:49 real, 4.8774 ms/iter, ETA 44:07:37) M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, clLucas v1.01 err = 0.1074 (0:47 real, 4.7492 ms/iter, ETA 48:59:46) M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, clLucas v1.01 err = 0.209 (1:10 real, 6.9796 ms/iter, ETA 82:39:00) M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, clLucas v1.01 err = 0.2656 (1:10 real, 6.9811 ms/iter, ETA 83:34:46) Last fiddled with by kracker on 2015-07-25 at 02:06 |
![]() |
![]() |
![]() |
#316 | |
Jul 2003
So Cal
2·3·347 Posts |
![]() Quote:
clFFT 2.2: Platform :Advanced Micro Devices, Inc. Device 0 : Hawaii Build Options are : -D KHR_DP_EXTENSION start M35064059 fft length = 2097152 Iteration 10000 0x005a9a8bbdfa894b, n = 2097152 err = 0.02771 (0:42 real, 4.1596 ms/iter, ETA 40:29:55) Iteration 20000 0x085623e4553c8c01, n = 2097152 err = 0.02771 (0:41 real, 4.1491 ms/iter, ETA 40:23:05) clFFT 2.6: Platform :Advanced Micro Devices, Inc. Device 0 : Hawaii Build Options are : -D KHR_DP_EXTENSION start M35064059 fft length = 2097152 Iteration 10000 0x005a9a8bbdfa894b, n = 2097152 err = 0.02734 (0:30 real, 2.9535 ms/iter, ETA 28:45:21) Iteration 20000 0x085623e4553c8c01, n = 2097152 err = 0.02734 (0:29 real, 2.8963 ms/iter, ETA 28:11:27) |
|
![]() |
![]() |
![]() |
#317 |
Jul 2003
So Cal
2×3×347 Posts |
![]()
With the clFFT speed improvements, perhaps it's time to make clLucas more user friendly by bringing over code from CUDALucas. Reading from worktodo.txt is essential. Supporting offsets would be nice. Benchmarking FFTs ahead of time and auto-choosing the fastest would be cool.
|
![]() |
![]() |
![]() |
#318 | |
"Mr. Meeseeks"
Jan 2012
California, USA
32×241 Posts |
![]() Quote:
![]() Also.. it seems that the faster cards get a bigger boost.. I wonder how the Fury X's perform with their HBM memory... |
|
![]() |
![]() |
![]() |
#319 | |
Jul 2003
So Cal
2·3·347 Posts |
![]() Quote:
Edit: Let me know when you have a Windows version ready and I'll try it on my R9 280 at home. The Tahiti GPU runs DP at 1/4 SP, so it might be faster than both the R9 290X and the Fury X. Last fiddled with by frmky on 2015-07-26 at 01:45 |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1668 | 2020-12-22 15:38 |
Can't get OpenCL to work on HD7950 Ubuntu 14.04.5 LTS | VictordeHolland | Linux | 4 | 2018-04-11 13:44 |
OpenCL accellerated lattice siever | pstach | Factoring | 1 | 2014-05-23 01:03 |
OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
AMD's Graphics Core Next- a reason to accelerate towards OpenCL? | Belteshazzar | GPU Computing | 19 | 2012-03-07 18:58 |