20100510, 09:12  #166  
Jul 2009
Tokyo
262_{16} Posts 
Very Sorry, wavelet3000
Please change MaclucasFFTW.cu. Quote:


20100510, 09:40  #167 
Jul 2009
Tokyo
1001100010_{2} Posts 

20100514, 23:32  #168 
Jul 2009
Tokyo
1001100010_{2} Posts 
Hi,
Version "Q" at .0106 sec/iter for the 2048K FFT , .0214 sec/iter for the 4096K FFT , .0432 sec/iter for the 8192K FFT and .0895 sec/ier for the 16384K FFT on GTX260. 
20100515, 20:08  #169 
May 2010
7 Posts 
With version Q, 64bit works fine, no problem with 44497, 110203 or other numbers.
Thanks very much. 
20100515, 23:15  #170 
Jul 2003
So Cal
2^{3}×3^{2}×29 Posts 
As expected,
M( 42643801 )P, n = 4194304, MacLucasFFTW v8.1 Ballester in just over 5 days. Although unintended, this also tested the restart code. I had the program running in a terminal (and not using screen) on a Windows machine. Windows update decided to reboot the computer, closing the terminal and stopping the program. The restart worked just as it should. 
20100516, 01:12  #171 
Jul 2009
Tokyo
1142_{8} Posts 

20100516, 12:56  #172 
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
5,869 Posts 

20100516, 15:14  #173 
"Oliver"
Mar 2005
Germany
11×101 Posts 
At least close to.
Glucas 2.9.220080916 + dualsocket Xeon X5680 (3.33GHz hexacore): 2048k FFT: 4.7ms per iteration With a Teslabrandet Fermi (all DPunits enabled) you should beat this easily. It looks like Glucas doesn't scale as good a your code on increasing FFT sizes (at least on this system) so perhaps you're allready faster for bigger FFTs. On the other hand Glucas supports much more FFT sizes. Good job msft! 
20100518, 10:38  #174 
Jul 2003
So Cal
2^{3}·3^{2}·29 Posts 
CUDA 3.1beta is now out. Among the highlights is this little gem:
* Significant improvements in doubleprecision FFT performance on Fermiarchitecture GPUs for 2^n transform sizes Sure enough, the GTX 480 now runs at 4.66 ms/iter for the 2048K FFT and 9.37 ms/iter for the 4096K FFT. 
20100602, 04:15  #176 
Jul 2003
So Cal
2^{3}×3^{2}×29 Posts 
I've been busy testing the 2M FFT on Fermi:
Code:
M( 30000037 )C, 0x307be1a2dc2bca38, n = 2097152, MacLucasFFTW v8.1 Ballester M( 31000003 )C, 0x9bed7651387bd02a, n = 2097152, MacLucasFFTW v8.1 Ballester M( 32000057 )C, 0x60bbddb7958f85e3, n = 2097152, MacLucasFFTW v8.1 Ballester M( 33000001 )C, 0xe54b0c721739183f, n = 2097152, MacLucasFFTW v8.1 Ballester M( 34000081 )C, 0x64415a7a626f0e34, n = 2097152, MacLucasFFTW v8.1 Ballester M( 35000443 )C, 0xbf2fb6ccbc3f8780, n = 2097152, MacLucasFFTW v8.1 Ballester M( 36000143 )C, 0xb0be92372eeab565, n = 2097152, MacLucasFFTW v8.1 Ballester 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Don't DC/LL them with CudaLucas  LaurV  Data  131  20170502 18:41 
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8  Brain  GPU Computing  13  20160219 15:53 
CUDALucas: which binary to use?  Karl M Johnson  GPU Computing  15  20151013 04:44 
settings for cudaLucas  fairsky  GPU Computing  11  20131103 02:08 
Trying to run CUDALucas on Windows 8 CP  Rodrigo  GPU Computing  12  20120307 23:20 