mersenneforum.org CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW)
 User Name Remember Me? Password
 Register FAQ Search Today's Posts Mark Forums Read

 2011-02-10, 01:32 #408 msft     Jul 2009 Tokyo 26216 Posts Hi ,aaronhaviland You are completly right.
 2011-02-11, 02:59 #409 Uncwilly 6809 > 6502     """"""""""""""""""" Aug 2003 101×103 Posts 32·312 Posts msft, Can you pm me an article about this software for the wiki? Or post one yourself.
2011-02-11, 03:03   #410
msft

Jul 2009
Tokyo

10011000102 Posts

Hi ,Uncwilly
Quote:
 Originally Posted by Uncwilly Can you pm me an article about this software for the wiki? Or post one yourself.
permission granted.

2011-02-11, 17:26   #411
Andrew Thall

Dec 2010

23 Posts

Quote:
 Originally Posted by aaronhaviland There seems to be a couple upper limits to this right now. I tried running higher numbers, and get a couple different errors: #CUDALucas 151150000 err = 0.353794, increasing n from 8388608 CUDALucas.cu(534) : cufftSafeCall() CUFFT error. I'm guessing it's because of: "The cuFFT manual states that 1-D ffts are supported for < 8 million elements." The other is at exponents around 318750000, I hit the memory limit on my 768MB card. At 336000000, it wants over 1Gb. Combined, these prevent it from being useful for the 100 million digit numbers. (I can't be the only one eyeing this as making that task feasible.)
Update your CUDA library and CUFFT. The most recent version no longer has the 8M element limit. It's also much more numerically accurate, particularly the non-power-of-two transforms.

2011-02-12, 01:57   #412
aaronhaviland

Jan 2011
Dudley, MA, USA

73 Posts

Quote:
 Originally Posted by Andrew Thall Update your CUDA library and CUFFT. The most recent version no longer has the 8M element limit. It's also much more numerically accurate, particularly the non-power-of-two transforms.
These were with CUDA/CUFFT 3.2.16. Is there a newer version?

(Also, I'm running nv drivers 270.18, which say they support CUDA 4.0. Any word of a newer toolkit/sdk?)

2011-02-12, 16:12   #413
Brain

Dec 2009
Peine, Germany

5138 Posts
CUDALucas thoughts

Quote:
 Originally Posted by kjaget Run times on my factory overclocked GTX 275, along with some rough run times for current work assignments. I know these aren't the most efficient use of the code but it's a good basis for comparison to a CPU. 8.96 msec/iter @ 2M FFT (~ 2.5 days for a 25M LL double check) 18.8 msec/iter @ 4M FFT (~ 11 days for a 47M LL first time run) Not sure how that compares to Linux versions, but it's definitely fast enough to be useful.
- I timed 6.8 msec/iter @ 2M FFT (DC @ 26M) on my GTX 560 Ti with Win7 (GPU load @ 93%). Seems reasonable to me. Thanks for the build, kjaget. Stay tuned.
- I'd also like to know some Linux comparisons.
- I had trouble to figure out the checkpoint command but Uncwilly will document this.
- Which CUDA version (CUFFT) is CUDALucas build with? Will most current 3.2 bring a speedup?
- It seems that mfaktc gets a bigger "bang" out of the GPU but PrimeNet has enough TF power. What kind of work do you prefer?
- Last question: Where is the turnover point to the 4M FFT?

2011-02-13, 10:41   #414
msft

Jul 2009
Tokyo

2×5×61 Posts

Hi ,Brain
Quote:
 Originally Posted by Brain - Last question: Where is the turnover point to the 4M FFT?
39800000.

2011-02-15, 10:25   #415
msft

Jul 2009
Tokyo

2·5·61 Posts

Support CUDA device number.

cudalucas.1.1\$ ./CUDALucas -D1 216091
device_number >= device_count ... exiting
Attached Files
 CUDALucas.1.1.tar.bz2 (27.5 KB, 92 views)

2011-02-15, 19:34   #416
kjaget

Jun 2005

3×43 Posts

Quote:
 Originally Posted by Brain - Which CUDA version (CUFFT) is CUDALucas build with? Will most current 3.2 bring a speedup?
From memory, it's 3.1. I've seen mixed reviews of 3.2 for other projects, but have no idea what it will do for this one.

Quote:
 - It seems that mfaktc gets a bigger "bang" out of the GPU but PrimeNet has enough TF power. What kind of work do you prefer?
I prefer using CUDALucas since mfaktc gives good speed but also requires CPU core(s) when running. That hurts overall system throughput. I can either work on 5 LL tests, or 3LL tests plus mfaktc on my 4-core system. The former seems more useful, especially with the TF wavefront moving faster than LL testing.

Not to take anything away from mfaktc, though. And I honestly haven't looked at the GHz-days/day comparison between the two scenarios, so it's more a rationalization that properly thought out at this point.

Kevin

 2011-02-17, 12:53 #417 Svenie25     Aug 2008 Good old Germany 2158 Posts I have running the windows version. My first test, a DC around 27M is nearly complete. But wich command do I use for checkpoints?
2011-02-17, 17:41   #418
Brain

Dec 2009
Peine, Germany

331 Posts

Quote:
 Originally Posted by Svenie25 I have running the windows version. My first test, a DC around 27M is nearly complete. But wich command do I use for checkpoints?
I had the same problem:
1. When you start an expo for the first time:
Code:
CUDALucas.exe -c10000 <prime_expo>
2. Next time use:
Code:
CUDALucas.exe -c10000 c<prime_expo>
This reads as "read checkpoint file" as you will find a same named file.
There's also a t<prime_expo> file. I had to use it because c<prime_expo> file was corrupt because of forced shutdown via timed skript.
c10000 means every 10000 iterations, about 70 secs here at me.

 Similar Threads Thread Thread Starter Forum Replies Last Post LaurV Data 131 2017-05-02 18:41 Brain GPU Computing 13 2016-02-19 15:53 Karl M Johnson GPU Computing 15 2015-10-13 04:44 fairsky GPU Computing 11 2013-11-03 02:08 Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 14:00.

Thu Oct 1 14:00:30 UTC 2020 up 21 days, 11:11, 2 users, load averages: 1.20, 1.54, 1.76