20091030, 22:59  #23 
Jul 2009
Tokyo
2×5×61 Posts 
I understand,lycorn.
Thank you, 
20091030, 23:43  #24 
(loop (#_fork))
Feb 2006
Cambridge, England
2×7×461 Posts 
Hi msft.
This is great work: many thanks! I had to copy the .h files from a separate install of MacLucasFFTW into the directory and modify some of the paths in the makefile to get it to work, but it works now. It's a bit slower than I expected, 3m40s on a GTX275 to test 216091, but that's probably because 131072 is a very large FFT size to use in double precision for so small a number. Unfortunately my computer crashed the second time I tried testing 216091; I think the graphics card is a bit flaky. Last fiddled with by fivemack on 20091031 at 00:21 
20091031, 01:55  #25 
Jul 2003
So Cal
7^{2}·53 Posts 
This is getting interesting! I decided to try the exponent 24036583. On a Tesla C1060 GPU, CUDA MacLucasFFTW runs at 0.0153 sec/iter using a 2048K FFT. Using one thread on a 2GHz Opteron K10 CPU, on the same exponent Prime95 runs at 0.055 sec/iter using a 1280K FFT. So, comparing the speed on a topoftheline GPU and a notoriously slow for Prime95 CPU, the GPU version runs about 3.5x faster. Also interesting is that after adding
cudaSetDeviceFlags(cudaDeviceBlockingSync); cudaSetDevice(0); near the top of the main() function, MacLucasFFTW uses only about 5% of a cpu core. Assuming the computer doesn't get reset in the next 5 days or that restarting works, I'll let this run to completion. Last fiddled with by frmky on 20091031 at 01:57 
20091031, 09:27  #26  
Jul 2009
Tokyo
610_{10} Posts 
Thank you testing this program,fivemack,
Quote:
Quote:
Exactry, What are GTX275 made of ? 

20091031, 09:52  #27  
Jul 2009
Tokyo
262_{16} Posts 
Hi,frmky.
Quote:
Quote:
Thank you for lots of work, 

20091104, 08:30  #28 
Jul 2003
So Cal
A25_{16} Posts 

20091104, 11:03  #29 
Jul 2009
Tokyo
2·5·61 Posts 

20091104, 12:33  #30 
"GIMFS"
Sep 2002
Oeiras, Portugal
1570_{10} Posts 
Congrats, msft. The code seems to be running fine. 0.0153 sec/iter for a 1280K FFT is better than I can get on a Core2 duo T8300, with BOTH cores crunching the same exponent (best result is ~ 0.017).

20091104, 14:45  #31 
Jul 2009
Tokyo
2·5·61 Posts 
Thank you, lycorn
New version on GTX260. $ tar zxvf MacLucasFFTW.cuda.k.tar.gz $ make $ time ./MacLucasFFTW 216091 M( 216091 )P, n = 131072, MacLucasFFTW v8.1 Ballester real 6m34.691s user 0m10.025s sys 0m0.188s $ time ./MacLucasFFTW 2976221 M( 2976221 )P, n = 262144, MacLucasFFTW v8.1 Ballester real 129m52.509s user 19m27.337s sys 0m1.232s $ time ./MacLucasFFTW 33333333 10001 2097152 real 2m44.702s user 0m22.469s sys 0m1.136s 2048k fft sec/iter = 0.0165 $ time ./MacLucasFFTW 63333333 10001 4194304 real 7m0.095s user 1m43.026s sys 0m1.160s 4096k fft sec/iter = 0.042 M131101 to M1548619 1000 iterations check sum compare to Glucas,it is correct. Thank you, 
20091105, 09:46  #32 
Mar 2003
Melbourne
5·103 Posts 
Is there any advantage is doing multiple FFTs at the same time on the GPU? i.e. can we get 2x prime checks at the same time is say <50% of the time in doing one check?
 Craig 
20091105, 15:06  #33 
Jul 2009
Tokyo
2·5·61 Posts 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Don't DC/LL them with CudaLucas  LaurV  Data  131  20170502 18:41 
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8  Brain  GPU Computing  13  20160219 15:53 
CUDALucas: which binary to use?  Karl M Johnson  GPU Computing  15  20151013 04:44 
settings for cudaLucas  fairsky  GPU Computing  11  20131103 02:08 
Trying to run CUDALucas on Windows 8 CP  Rodrigo  GPU Computing  12  20120307 23:20 