20091106, 06:27  #34 
Jul 2003
So Cal
5044_{8} Posts 
Version k runs at .0141 sec/iter for the 2048K FFT and .0264 sec/iter for the 4096K FFT on the C1060.

20091106, 10:05  #35 
Mar 2003
Melbourne
5×103 Posts 
Cool. Can't wait for the 3xx series  Dec 2009 release date apparently (well according to wikipedia).
 Craig 
20091106, 10:20  #36 
Feb 2005
The Netherlands
2×109 Posts 
After some fiddling, I managed to compile and run this under Windows. I had to replace two memalign() functions with malloc(), because memalign() is apparently obsolete. Here are two results:
Code:
D:\Code\MaclucasFFTW.cuda.k>a 11213 too small Exponent Code:
D:\Code\MaclucasFFTW.cuda.k>a 216091 1 131072 10001 131072 20001 131072 30001 131072 40001 131072 50001 131072 60001 131072 70001 131072 80001 131072 90001 131072 100001 131072 110001 131072 120001 131072 130001 131072 140001 131072 150001 131072 160001 131072 170001 131072 180001 131072 190001 131072 200001 131072 210001 131072 M( 216091 )C, 0xfffffffffffffffd, n = 131072, MacLucasFFTW v8.1 Ballester Other exponents that are big enough have the same results: 0xfffffffffffffffd after a very short while. My video card is a 9600 M GS, and is capable of running Folding@Home. 
20091106, 13:35  #37 
Tribal Bullet
Oct 2004
3^{2}×5×79 Posts 

20091107, 04:38  #38  
Jul 2009
Tokyo
2·5·61 Posts 
Hi, frmky
Quote:
Hi,nucleon Quote:
Hi, BigBrother Need Exponent more than 131072, aint() function need Exponent more than FFT size. Quote:
Hi, jasonp Nice support, thank you 

20091107, 05:22  #39 
Jul 2009
Tokyo
2·5·61 Posts 
Hi,
Version o runs at .0134 sec/iter for the 2048K FFT and .0320 sec/iter for the 4096K FFT on the GTX260. 
20091107, 07:30  #40 
Jul 2003
So Cal
2596_{10} Posts 
Excellent! I have another calculation running now, so I won't be able to bench it on the C1060 for a few days.
Two questions... First, can this be adapted to use nonpowerof2 FFT's, and if so would there be speed gains using comparable FFT sizes to those used by Prime95? Secondly, can this be multithreaded with the calculation split over multiple GPU's, or as the devices can't talk directly to each other will the required memory transfers from/to the host kill the speed? I ask this last question since I'm actually using an S1070 with four C1060's. 
20091107, 10:24  #41  
Jul 2009
Tokyo
2×5×61 Posts 
Hi, frmky
Quote:
Quote:


20091107, 17:50  #42 
Jul 2003
So Cal
101000100100_{2} Posts 
The S1070 is really just four discrete C1060's, just housed in a separate unit. It is no different than installing four GTX260's in your computer. Each card must be addressed individually from a different program thread, and the cards cannot directly communicate with each other.

20091108, 09:01  #43 
Jul 2009
Tokyo
1142_{8} Posts 

20091108, 09:06  #44 
Jul 2009
Tokyo
2·5·61 Posts 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Don't DC/LL them with CudaLucas  LaurV  Data  131  20170502 18:41 
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8  Brain  GPU Computing  13  20160219 15:53 
CUDALucas: which binary to use?  Karl M Johnson  GPU Computing  15  20151013 04:44 
settings for cudaLucas  fairsky  GPU Computing  11  20131103 02:08 
Trying to run CUDALucas on Windows 8 CP  Rodrigo  GPU Computing  12  20120307 23:20 