![]() |
Hi ,aaronhaviland
You are completly right. |
msft,
Can you pm me an article about this software for the wiki? Or post one yourself. |
Hi ,Uncwilly
[QUOTE=Uncwilly;252146]Can you pm me an article about this software for the wiki? Or post one yourself.[/QUOTE] permission granted.:blush: |
[QUOTE=aaronhaviland;252002]There seems to be a couple upper limits to this right now. I tried running higher numbers, and get a couple different errors:
#CUDALucas 151150000 err = 0.353794, increasing n from 8388608 CUDALucas.cu(534) : cufftSafeCall() CUFFT error. I'm guessing it's because of: "The cuFFT manual states that 1-D ffts are supported for < 8 million elements." The other is at exponents around 318750000, I hit the memory limit on my 768MB card. At 336000000, it wants over 1Gb. Combined, these prevent it from being useful for the 100 million digit numbers. (I can't be the only one eyeing this as making that task feasible.)[/QUOTE] Update your CUDA library and CUFFT. The most recent version no longer has the 8M element limit. It's also much more numerically accurate, particularly the non-power-of-two transforms. |
[QUOTE=Andrew Thall;252179]Update your CUDA library and CUFFT. The most recent version no longer has the 8M element limit. It's also much more numerically accurate, particularly the non-power-of-two transforms.[/QUOTE]
These were with CUDA/CUFFT 3.2.16. Is there a newer version? (Also, I'm running nv drivers 270.18, which say they support CUDA 4.0. Any word of a newer toolkit/sdk?) |
CUDALucas thoughts
[QUOTE=kjaget;250895]
Run times on my factory overclocked GTX 275, along with some rough run times for current work assignments. I know these aren't the most efficient use of the code but it's a good basis for comparison to a CPU. 8.96 msec/iter @ 2M FFT (~ 2.5 days for a 25M LL double check) 18.8 msec/iter @ 4M FFT (~ 11 days for a 47M LL first time run) Not sure how that compares to Linux versions, but it's definitely fast enough to be useful.[/QUOTE] - I timed 6.8 msec/iter @ 2M FFT (DC @ 26M) on my GTX 560 Ti with Win7 (GPU load @ 93%). Seems reasonable to me. Thanks for the build, kjaget. Stay tuned. - I'd also like to know some Linux comparisons. - I had trouble to figure out the checkpoint command but Uncwilly will document this. - Which CUDA version (CUFFT) is CUDALucas build with? Will most current 3.2 bring a speedup? - It seems that mfaktc gets a bigger "bang" out of the GPU but PrimeNet has enough TF power. What kind of work do you prefer? - Last question: Where is the turnover point to the 4M FFT? |
Hi ,Brain
[QUOTE=Brain;252262]- Last question: Where is the turnover point to the 4M FFT?[/QUOTE] 39800000. |
1 Attachment(s)
Support CUDA device number.
cudalucas.1.1$ ./CUDALucas -D1 216091 device_number >= device_count ... exiting |
[QUOTE=Brain;252262]- Which CUDA version (CUFFT) is CUDALucas build with? Will most current 3.2 bring a speedup?[/QUOTE]
From memory, it's 3.1. I've seen mixed reviews of 3.2 for other projects, but have no idea what it will do for this one. [QUOTE]- It seems that mfaktc gets a bigger "bang" out of the GPU but PrimeNet has enough TF power. What kind of work do you prefer?[/QUOTE] I prefer using CUDALucas since mfaktc gives good speed but also requires CPU core(s) when running. That hurts overall system throughput. I can either work on 5 LL tests, or 3LL tests plus mfaktc on my 4-core system. The former seems more useful, especially with the TF wavefront moving faster than LL testing. Not to take anything away from mfaktc, though. And I honestly haven't looked at the GHz-days/day comparison between the two scenarios, so it's more a rationalization that properly thought out at this point. Kevin |
I have running the windows version. My first test, a DC around 27M is nearly complete. But wich command do I use for checkpoints?
|
[QUOTE=Svenie25;252766]I have running the windows version. My first test, a DC around 27M is nearly complete. But wich command do I use for checkpoints?[/QUOTE]
I had the same problem: 1. When you start an expo for the first time: [CODE]CUDALucas.exe -c10000 <prime_expo>[/CODE]2. Next time use: [CODE]CUDALucas.exe -c10000 c<prime_expo>[/CODE]This reads as "read checkpoint file" as you will find a same named file. There's also a t<prime_expo> file. I had to use it because c<prime_expo> file was corrupt because of forced shutdown via timed skript. c10000 means every 10000 iterations, about 70 secs here at me. |
| All times are UTC. The time now is 22:59. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.