mersenneforum.org CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW)
 Register FAQ Search Today's Posts Mark Forums Read

 2009-11-06, 06:27 #34 frmky     Jul 2003 So Cal 50448 Posts Version k runs at .0141 sec/iter for the 2048K FFT and .0264 sec/iter for the 4096K FFT on the C1060.
 2009-11-06, 10:05 #35 nucleon     Mar 2003 Melbourne 5×103 Posts Cool. Can't wait for the 3xx series - Dec 2009 release date apparently (well according to wikipedia). -- Craig
 2009-11-06, 10:20 #36 BigBrother   Feb 2005 The Netherlands 2×109 Posts After some fiddling, I managed to compile and run this under Windows. I had to replace two memalign() functions with malloc(), because memalign() is apparently obsolete. Here are two results: Code: D:\Code\MaclucasFFTW.cuda.k>a 11213 too small Exponent Code: D:\Code\MaclucasFFTW.cuda.k>a 216091 1 131072 10001 131072 20001 131072 30001 131072 40001 131072 50001 131072 60001 131072 70001 131072 80001 131072 90001 131072 100001 131072 110001 131072 120001 131072 130001 131072 140001 131072 150001 131072 160001 131072 170001 131072 180001 131072 190001 131072 200001 131072 210001 131072 M( 216091 )C, 0xfffffffffffffffd, n = 131072, MacLucasFFTW v8.1 Ballester This last one ran for only +- 30 seconds. Other exponents that are big enough have the same results: 0xfffffffffffffffd after a very short while. My video card is a 9600 M GS, and is capable of running Folding@Home.
2009-11-06, 13:35   #37
jasonp
Tribal Bullet

Oct 2004

32×5×79 Posts

Quote:
 Originally Posted by BigBrother My video card is a 9600 M GS, and is capable of running Folding@Home.
Are you sure this card supports double precision? If it doesn't, it will use single precision FP internally, and generate completely wrong answers.

2009-11-07, 04:38   #38
msft

Jul 2009
Tokyo

2·5·61 Posts

Hi, frmky
Quote:
 Originally Posted by frmky .0141 sec/iter for the 2048K FFT and .0264 sec/iter for the 4096K FFT on the C1060.
4096K FFT performance is reasonable,My GTX260's 4096K FFT performance is not.

Hi,nucleon
Quote:
 Originally Posted by nucleon Cool. Can't wait for the 3xx series - Dec 2009 release date apparently (well according to wikipedia).
me too.

Hi, BigBrother
Quote:
 Originally Posted by BigBrother Code: D:\Code\MaclucasFFTW.cuda.k>a 11213 too small Exponent
Need Exponent more than 131072, aint() function need Exponent more than FFT size.
Quote:
 Originally Posted by BigBrother My video card is a 9600 M GS, and is capable of running Folding@Home.
Sorry only 2xx support DP.

Hi, jasonp
Quote:
 Originally Posted by jasonp Are you sure this card supports double precision? If it doesn't, it will use single precision FP internally, and generate completely wrong answers.
Nice support, thank you

2009-11-07, 05:22   #39
msft

Jul 2009
Tokyo

2·5·61 Posts

Hi,

Version o runs at .0134 sec/iter for the 2048K FFT and .0320 sec/iter for the 4096K FFT on the GTX260.
Attached Files
 MacLucasFFTW.cuda.o.tar.gz (30.6 KB, 506 views)

 2009-11-07, 07:30 #40 frmky     Jul 2003 So Cal 259610 Posts Excellent! I have another calculation running now, so I won't be able to bench it on the C1060 for a few days. Two questions... First, can this be adapted to use non-power-of-2 FFT's, and if so would there be speed gains using comparable FFT sizes to those used by Prime95? Secondly, can this be multithreaded with the calculation split over multiple GPU's, or as the devices can't talk directly to each other will the required memory transfers from/to the host kill the speed? I ask this last question since I'm actually using an S1070 with four C1060's.
2009-11-07, 10:24   #41
msft

Jul 2009
Tokyo

2×5×61 Posts

Hi, frmky
Quote:
 Originally Posted by frmky First, can this be adapted to use non-power-of-2 FFT's, and if so would there be speed gains using comparable FFT sizes to those used by Prime95?
I Consider it.
Quote:
 Originally Posted by frmky Secondly, can this be multithreaded with the calculation split over multiple GPU's, or as the devices can't talk directly to each other will the required memory transfers from/to the host kill the speed? I ask this last question since I'm actually using an S1070 with four C1060's.
My question is "How is 1D FFT supported on S1070 ?".

2009-11-07, 17:50   #42
frmky

Jul 2003
So Cal

1010001001002 Posts

Quote:
 Originally Posted by msft My question is "How is 1D FFT supported on S1070 ?".
The S1070 is really just four discrete C1060's, just housed in a separate unit. It is no different than installing four GTX260's in your computer. Each card must be addressed individually from a different program thread, and the cards cannot directly communicate with each other.

2009-11-08, 09:01   #43
msft

Jul 2009
Tokyo

11428 Posts

Hi,
Quote:
 Originally Posted by frmky can this be adapted to use non-power-of-2 FFT's,
I make non-power-of-2 FFT version with cufftExecD2Z(),but cufftExecD2Z() is two times slower than cufftExecZ2Z().
Someone tell me use Complex FFT method ?

Thank you,

2009-11-08, 09:06   #44
msft

Jul 2009
Tokyo

2·5·61 Posts

Hi,
Quote:
 Originally Posted by frmky the cards cannot directly communicate with each other.
Nobody say 1D FFT performance on S1070,this is answer.

 Similar Threads Thread Thread Starter Forum Replies Last Post LaurV Data 131 2017-05-02 18:41 Brain GPU Computing 13 2016-02-19 15:53 Karl M Johnson GPU Computing 15 2015-10-13 04:44 fairsky GPU Computing 11 2013-11-03 02:08 Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 01:36.

Mon Jan 30 01:36:21 UTC 2023 up 164 days, 23:04, 0 users, load averages: 1.63, 1.45, 1.29