![]() |
![]() |
#23 |
Jul 2009
Tokyo
2×5×61 Posts |
![]()
I understand,lycorn.
Thank you, |
![]() |
![]() |
![]() |
#24 |
(loop (#_fork))
Feb 2006
Cambridge, England
2×7×461 Posts |
![]()
Hi msft.
This is great work: many thanks! I had to copy the .h files from a separate install of MacLucasFFTW into the directory and modify some of the paths in the makefile to get it to work, but it works now. It's a bit slower than I expected, 3m40s on a GTX275 to test 216091, but that's probably because 131072 is a very large FFT size to use in double precision for so small a number. Unfortunately my computer crashed the second time I tried testing 216091; I think the graphics card is a bit flaky. Last fiddled with by fivemack on 2009-10-31 at 00:21 |
![]() |
![]() |
![]() |
#25 |
Jul 2003
So Cal
72·53 Posts |
![]()
This is getting interesting! I decided to try the exponent 24036583. On a Tesla C1060 GPU, CUDA MacLucasFFTW runs at 0.0153 sec/iter using a 2048K FFT. Using one thread on a 2GHz Opteron K10 CPU, on the same exponent Prime95 runs at 0.055 sec/iter using a 1280K FFT. So, comparing the speed on a top-of-the-line GPU and a notoriously slow for Prime95 CPU, the GPU version runs about 3.5x faster. Also interesting is that after adding
cudaSetDeviceFlags(cudaDeviceBlockingSync); cudaSetDevice(0); near the top of the main() function, MacLucasFFTW uses only about 5% of a cpu core. Assuming the computer doesn't get reset in the next 5 days or that restarting works, I'll let this run to completion. Last fiddled with by frmky on 2009-10-31 at 01:57 |
![]() |
![]() |
![]() |
#26 | ||
Jul 2009
Tokyo
61010 Posts |
![]()
Thank you testing this program,fivemack,
Quote:
Quote:
Exactry, What are GTX275 made of ? |
||
![]() |
![]() |
![]() |
#27 | ||
Jul 2009
Tokyo
26216 Posts |
![]()
Hi,frmky.
Quote:
Quote:
Thank you for lots of work, |
||
![]() |
![]() |
![]() |
#28 |
Jul 2003
So Cal
A2516 Posts |
![]() |
![]() |
![]() |
![]() |
#29 |
Jul 2009
Tokyo
2·5·61 Posts |
![]() |
![]() |
![]() |
![]() |
#30 |
"GIMFS"
Sep 2002
Oeiras, Portugal
157010 Posts |
![]()
Congrats, msft. The code seems to be running fine. 0.0153 sec/iter for a 1280K FFT is better than I can get on a Core2 duo T8300, with BOTH cores crunching the same exponent (best result is ~ 0.017).
|
![]() |
![]() |
![]() |
#31 |
Jul 2009
Tokyo
2·5·61 Posts |
![]()
Thank you, lycorn
New version on GTX260. $ tar -zxvf MacLucasFFTW.cuda.k.tar.gz $ make $ time ./MacLucasFFTW 216091 M( 216091 )P, n = 131072, MacLucasFFTW v8.1 Ballester real 6m34.691s user 0m10.025s sys 0m0.188s $ time ./MacLucasFFTW 2976221 M( 2976221 )P, n = 262144, MacLucasFFTW v8.1 Ballester real 129m52.509s user 19m27.337s sys 0m1.232s $ time ./MacLucasFFTW 33333333 10001 2097152 real 2m44.702s user 0m22.469s sys 0m1.136s 2048k fft sec/iter = 0.0165 $ time ./MacLucasFFTW 63333333 10001 4194304 real 7m0.095s user 1m43.026s sys 0m1.160s 4096k fft sec/iter = 0.042 M131101 to M1548619 1000 iterations check sum compare to Glucas,it is correct. Thank you, |
![]() |
![]() |
![]() |
#32 |
Mar 2003
Melbourne
5·103 Posts |
![]()
Is there any advantage is doing multiple FFTs at the same time on the GPU? i.e. can we get 2x prime checks at the same time is say <50% of the time in doing one check?
-- Craig |
![]() |
![]() |
![]() |
#33 |
Jul 2009
Tokyo
2·5·61 Posts |
![]() |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Don't DC/LL them with CudaLucas | LaurV | Data | 131 | 2017-05-02 18:41 |
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 | Brain | GPU Computing | 13 | 2016-02-19 15:53 |
CUDALucas: which binary to use? | Karl M Johnson | GPU Computing | 15 | 2015-10-13 04:44 |
settings for cudaLucas | fairsky | GPU Computing | 11 | 2013-11-03 02:08 |
Trying to run CUDALucas on Windows 8 CP | Rodrigo | GPU Computing | 12 | 2012-03-07 23:20 |