20141026, 20:44  #309 
P90 years forever!
Aug 2002
Yeehaw, FL
1CEF_{16} Posts 
FFT length is way too small

20141026, 20:46  #310 
Nov 2010
Germany
3×199 Posts 

20141026, 21:08  #311 
Jun 2014
2^{3}×3×5 Posts 
I think George was talking about the 2^17 FFT length earlier.

20150108, 10:04  #312 
Aug 2010
Republic of Belarus
2×89 Posts 
Hello! Where i can see benchmark's for 100M exponent (ETA, ms)!? Very interested in the results for the 295X2. Now it's card very cheap and i want buy it!
And second question. LL OpenCl using only GPU or GPU+CPU? 
20150110, 19:40  #313  
Nov 2010
Germany
3·199 Posts 
Quote:
The 295x2 can run two such tests in parallel. I'd expect the speed of each slightly below my results: HD7950 has 717 DP GFlops/ 240 GB/s memory rate, OC'd to 1100/1400 MHz ==> 985 GFlops/268 GB/s. R295x2 has 2x 717 GFlops, 2x 320 GB/s memory rate. I'm not sure what counts stronger: the lower DP power, or the better memory bandwidth. In an attempt to answer this last question I separately reduced the clock of GPU cores and memory by 10%. 10% lower GFlops result in 5.9% longer iteration times, whereas 10% lower bandwidth cause 5.2% longer iteration times. If both clocks are lowered by 10%, then the iteration times increase by 10.1% . So it seems both GFlops and memory rate are important, but GFlops a tiny bit more so. 

20150111, 09:51  #314  
Aug 2010
Republic of Belarus
2·89 Posts 
Quote:


20150725, 02:05  #315 
"Mr. Meeseeks"
Jan 2012
California, USA
3^{2}×241 Posts 
Nice performance improvements with the latest clFFT library... playing around with it now
clFFT 2.0(current binary) Code:
Platform :AdvancedMicro Devices, Inc. Device 0 : Tonga M( 1257787 )C, 0x3f45bf9bea7213ea, n = 65536, clLucas v1.01 err = 0.1211 (0:03 real, 0.3194 ms/iter, ETA 6:36) M( 1398269 )C, 0xa4a6d2f0e34629db, n = 73728, clLucas v1.01 err = 0.1016 (0:04 real, 0.3781 ms/iter, ETA 8:41) M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, clLucas v1.01 err = 0.05078 (0:06 real, 0.6307 ms/iter, ETA 31:06) M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, clLucas v1.01 err = 0.0625 (0:06 real, 0.6283 ms/iter, ETA 31:31) M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, clLucas v1.01 err = 0.04688 (0:13 real, 1.2852 ms/iter, ETA 2:29:05) M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, clLucas v1.01 err = 0.03223 (0:29 real, 2.9072 ms/iter, ETA 10:51:42) M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, clLucas v1.01 err = 0.09375 (0:50 real, 5.0678 ms/iter, ETA 29:32:02) M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, clLucas v1.01 err = 0.1875 (1:04 real, 6.4269 ms/iter, ETA 42:52:54) M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, clLucas v1.01 err = 0.02051 (1:26 real, 8.5475 ms/iter, ETA 61:36:48) M( 30402457 )C, 0x0b8600ef47e69d27, n = 1638400, clLucas v1.01 err = 0.3125 (1:38 real, 9.8404 ms/iter, ETA 83:04:09) M( 32582657 )C, 0x02751b7fcec76bb1, n = 1769472, clLucas v1.01 err = 0.2969 (2:29 real, 14.9789 ms/iter, ETA 135:31:01) M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, clLucas v1.01 err = 0.1201 (0:56 real, 5.5684 ms/iter, ETA 57:26:50) M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, clLucas v1.01 err = 0.2031 (2:34 real, 15.4499 ms/iter, ETA 182:57:10) M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, clLucas v1.01 err = 0.2656 (2:34 real, 15.4018 ms/iter, ETA 184:23:39) Code:
Platform :Advanced Micro Devices, Inc. Device 0 : Tonga M( 1257787 )C, 0x3f45bf9bea7213ea, n = 65536, clLucas v1.01 err = 0.1094 (0:03 real, 0.3001 ms/iter, ETA 6:12) M( 1398269 )C, 0xa4a6d2f0e34629db, n = 73728, clLucas v1.01 err = 0.09375 (0:05 real, 0.5239 ms/iter, ETA 12:03) M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, clLucas v1.01 err = 0.04883 (0:09 real, 0.8545 ms/iter, ETA 42:09) M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, clLucas v1.01 err = 0.06641 (0:08 real, 0.8560 ms/iter, ETA 42:56) M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, clLucas v1.01 err = 0.05139 (0:14 real, 1.4218 ms/iter, ETA 2:44:55) M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, clLucas v1.01 err = 0.03125 (0:24 real, 2.3861 ms/iter, ETA 8:54:52) M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, clLucas v1.01 err = 0.09375 (0:36 real, 3.5629 ms/iter, ETA 20:45:50) M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, clLucas v1.01 err = 0.2031 (0:41 real, 4.1131 ms/iter, ETA 27:26:37) M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, clLucas v1.01 err = 0.02002 (0:42 real, 4.1595 ms/iter, ETA 29:59:00) M( 30402457 )C, 0x0b8600ef47e69d27, n = 1638400, clLucas v1.01 err = 0.2881 (0:50 real, 5.0494 ms/iter, ETA 42:37:29) M( 32582657 )C, 0x02751b7fcec76bb1, n = 1769472, clLucas v1.01 err = 0.3125 (0:49 real, 4.8774 ms/iter, ETA 44:07:37) M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, clLucas v1.01 err = 0.1074 (0:47 real, 4.7492 ms/iter, ETA 48:59:46) M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, clLucas v1.01 err = 0.209 (1:10 real, 6.9796 ms/iter, ETA 82:39:00) M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, clLucas v1.01 err = 0.2656 (1:10 real, 6.9811 ms/iter, ETA 83:34:46) Last fiddled with by kracker on 20150725 at 02:06 
20150725, 07:09  #316  
Jul 2003
So Cal
2·3·347 Posts 
Quote:
clFFT 2.2: Platform :Advanced Micro Devices, Inc. Device 0 : Hawaii Build Options are : D KHR_DP_EXTENSION start M35064059 fft length = 2097152 Iteration 10000 0x005a9a8bbdfa894b, n = 2097152 err = 0.02771 (0:42 real, 4.1596 ms/iter, ETA 40:29:55) Iteration 20000 0x085623e4553c8c01, n = 2097152 err = 0.02771 (0:41 real, 4.1491 ms/iter, ETA 40:23:05) clFFT 2.6: Platform :Advanced Micro Devices, Inc. Device 0 : Hawaii Build Options are : D KHR_DP_EXTENSION start M35064059 fft length = 2097152 Iteration 10000 0x005a9a8bbdfa894b, n = 2097152 err = 0.02734 (0:30 real, 2.9535 ms/iter, ETA 28:45:21) Iteration 20000 0x085623e4553c8c01, n = 2097152 err = 0.02734 (0:29 real, 2.8963 ms/iter, ETA 28:11:27) 

20150725, 19:28  #317 
Jul 2003
So Cal
2×3×347 Posts 
With the clFFT speed improvements, perhaps it's time to make clLucas more user friendly by bringing over code from CUDALucas. Reading from worktodo.txt is essential. Supporting offsets would be nice. Benchmarking FFTs ahead of time and autochoosing the fastest would be cool.

20150725, 23:41  #318  
"Mr. Meeseeks"
Jan 2012
California, USA
3^{2}×241 Posts 
Quote:
Also.. it seems that the faster cards get a bigger boost.. I wonder how the Fury X's perform with their HBM memory... 

20150726, 01:39  #319  
Jul 2003
So Cal
2·3·347 Posts 
Quote:
Edit: Let me know when you have a Windows version ready and I'll try it on my R9 280 at home. The Tahiti GPU runs DP at 1/4 SP, so it might be faster than both the R9 290X and the Fury X. Last fiddled with by frmky on 20150726 at 01:45 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
mfakto: an OpenCL program for Mersenne prefactoring  Bdot  GPU Computing  1668  20201222 15:38 
Can't get OpenCL to work on HD7950 Ubuntu 14.04.5 LTS  VictordeHolland  Linux  4  20180411 13:44 
OpenCL accellerated lattice siever  pstach  Factoring  1  20140523 01:03 
OpenCL for FPGAs  TObject  GPU Computing  2  20131012 21:09 
AMD's Graphics Core Next a reason to accelerate towards OpenCL?  Belteshazzar  GPU Computing  19  20120307 18:58 