![]() |
![]() |
#34 |
Jul 2003
So Cal
50448 Posts |
![]()
Version k runs at .0141 sec/iter for the 2048K FFT and .0264 sec/iter for the 4096K FFT on the C1060.
|
![]() |
![]() |
![]() |
#35 |
Mar 2003
Melbourne
5×103 Posts |
![]()
Cool. Can't wait for the 3xx series - Dec 2009 release date apparently (well according to wikipedia).
-- Craig |
![]() |
![]() |
![]() |
#36 |
Feb 2005
The Netherlands
2×109 Posts |
![]()
After some fiddling, I managed to compile and run this under Windows. I had to replace two memalign() functions with malloc(), because memalign() is apparently obsolete. Here are two results:
Code:
D:\Code\MaclucasFFTW.cuda.k>a 11213 too small Exponent Code:
D:\Code\MaclucasFFTW.cuda.k>a 216091 1 131072 10001 131072 20001 131072 30001 131072 40001 131072 50001 131072 60001 131072 70001 131072 80001 131072 90001 131072 100001 131072 110001 131072 120001 131072 130001 131072 140001 131072 150001 131072 160001 131072 170001 131072 180001 131072 190001 131072 200001 131072 210001 131072 M( 216091 )C, 0xfffffffffffffffd, n = 131072, MacLucasFFTW v8.1 Ballester Other exponents that are big enough have the same results: 0xfffffffffffffffd after a very short while. My video card is a 9600 M GS, and is capable of running Folding@Home. |
![]() |
![]() |
![]() |
#37 |
Tribal Bullet
Oct 2004
32×5×79 Posts |
![]() |
![]() |
![]() |
![]() |
#38 | |||
Jul 2009
Tokyo
2·5·61 Posts |
![]()
Hi, frmky
Quote:
Hi,nucleon Quote:
Hi, BigBrother Need Exponent more than 131072, aint() function need Exponent more than FFT size. Quote:
Hi, jasonp Nice support, thank you |
|||
![]() |
![]() |
![]() |
#39 |
Jul 2009
Tokyo
2·5·61 Posts |
![]()
Hi,
Version o runs at .0134 sec/iter for the 2048K FFT and .0320 sec/iter for the 4096K FFT on the GTX260. |
![]() |
![]() |
![]() |
#40 |
Jul 2003
So Cal
259610 Posts |
![]()
Excellent! I have another calculation running now, so I won't be able to bench it on the C1060 for a few days.
Two questions... First, can this be adapted to use non-power-of-2 FFT's, and if so would there be speed gains using comparable FFT sizes to those used by Prime95? Secondly, can this be multithreaded with the calculation split over multiple GPU's, or as the devices can't talk directly to each other will the required memory transfers from/to the host kill the speed? I ask this last question since I'm actually using an S1070 with four C1060's. |
![]() |
![]() |
![]() |
#41 | ||
Jul 2009
Tokyo
2×5×61 Posts |
![]()
Hi, frmky
Quote:
Quote:
|
||
![]() |
![]() |
![]() |
#42 |
Jul 2003
So Cal
1010001001002 Posts |
![]()
The S1070 is really just four discrete C1060's, just housed in a separate unit. It is no different than installing four GTX260's in your computer. Each card must be addressed individually from a different program thread, and the cards cannot directly communicate with each other.
|
![]() |
![]() |
![]() |
#43 |
Jul 2009
Tokyo
11428 Posts |
![]() |
![]() |
![]() |
![]() |
#44 |
Jul 2009
Tokyo
2·5·61 Posts |
![]() |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Don't DC/LL them with CudaLucas | LaurV | Data | 131 | 2017-05-02 18:41 |
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 | Brain | GPU Computing | 13 | 2016-02-19 15:53 |
CUDALucas: which binary to use? | Karl M Johnson | GPU Computing | 15 | 2015-10-13 04:44 |
settings for cudaLucas | fairsky | GPU Computing | 11 | 2013-11-03 02:08 |
Trying to run CUDALucas on Windows 8 CP | Rodrigo | GPU Computing | 12 | 2012-03-07 23:20 |