mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

ET_ 2010-07-31 12:06

[QUOTE=msft;223431]Hi,

llrpsrc.zip on ubuntu

k*2^n+1:correct result


k*2^n-1:abort[/QUOTE]

I guess that could be used to sieve out Fermat factors easily... :smile:

Luigi

mdettweiler 2010-07-31 14:26

[quote=msft;223431]Hi,

llrpsrc.zip on ubuntu

k*2^n+1:correct result
[quote]1*2^216091-1 is prime! Time : 113.888 sec.
55185*2^8092+1 is prime! Time : 334.000 ms.
93*2^135908+1 is prime! Time : 69.017 sec. [/quote]
k*2^n-1:abort[/quote]
Hmm...1*2^216091-1 was a -1 number yet it seemed to work. I wonder why that didn't cause a problem?

Do you think you can fix the problem so it can work on k*2^n-1 as well? While it would surely be useful on k*2^n+1, k*2^n-1 [i]is[/i] the one I'm primarily interested in here. :smile:

mdettweiler 2010-07-31 14:48

BTW: I've sent an email to my friend who previously said he'd be OK with buying a GPU to help with this development effort once it got started. I'm hoping he can grab a GTX 460 within a week or so; then I can help test the new application on his computer via SSH. :smile:

mdettweiler 2010-07-31 15:06

[quote=Oddball;223390]Not much. But I wouldn't want to imagine what a Primegrid equipped with GPUs would be like. What would the minimum entry level of the top 5000 be? 1 million digits? Fighting off a pit bull is quite hard, but you'll probably make it out alive. Now if you had to fight off a whole pack of wolves instead...[/quote]
Now that we've confirmed the CUDA LLR application works for k*2^n+1 (but not yet k*2^n-1) it got me thinking about this...and I remembered one more thing. From what I've read on the PrimeGrid forum and heard from one of their admins (Lennart) who participates at NPLB as well, their database can barely handle the load of running small-top-5000 candidates (currently ~530K) through BOINC as it is. With GPUs cutting down search times by quite a bit, the workunits would complete even more quickly--surely their database would not be able to handle that all. So since they're already taxing the limits of MySQL as it is, they most definitely would NOT want to add GPUs to the Proth Prime Search.

What I would suggest to them once this application develops more fully, and I expect they'll be fine with, is to focus GPU LLR on the subprojects that test tons of huge candidates but find primes only rarely. That is, the conjecture searches (Riesel/Sierp base 2 conjectures, PSP) and the Cullen/Woodall prime searches. None of those would produce primes often enough (even with hordes of GPUs) to make any serious difference on the top-5000 entry level, so they're an ideal place to let GPUs give a much-appreciated boost without any ill effects.

NPLB would do a similar thing--besides focusing GPUs on non-top-5000 and doublecheck work, and on entry-level top-5000 work (which we can get away with without ill effect as long as our total efforts on such are still smaller than PrimeGrid's), we could put GPUs to great use on our sister project, Conjectures 'R Us, which has a number of power-of-2 conjectures that would benefit from a GPU app that can test k*2^n+-1. Again, the primes from those are sufficiently far and few that it would not make any serious dent in the top-5000.

em99010pepe 2010-07-31 16:30

People need to be in pace with technology if not they stay behind.
CUDA LLR application will be the start of a new era for prime search.
It's lyric to say to apply CUDA LLR application only on a certain type of prime search, let's see when your projects starts to go behind in all rankings in Top5000 page...lol We live for competition, if not we are losers.

If CUDA LLR application will be more energy efficiency than running LLR on processors I will certain buy three GPU's for each machine I have...

frmky 2010-07-31 17:57

Brakes on everyone!
[QUOTE=msft;223440]It is no change,Original source.:cry:
[/QUOTE]
[B]This isn't CUDA.[/B] msft is saying the the [I]original[/I] FFTW code doesn't work for k*2^n-1. Jean Penné mentioned that he's having trouble getting FFTW to work. Perhaps this is it.

mdettweiler 2010-07-31 18:03

[quote=frmky;223480]Brakes on everyone!

[B]This isn't CUDA.[/B] msft is saying the the [I]original[/I] FFTW code doesn't work for k*2^n-1. Jean Penné mentioned that he's having trouble getting FFTW to work. Perhaps this is it.[/quote]
Oh, I see. Hmm...this is one of the reasons why I was thinking it might be easier to just modify the existing MacLucasFFTW application. I don't know how much coding it would entail, but the LLR code may be useful at least to give an idea of how this has been done before.

BTW: I just received word back from my friend and it's official: he should have a GTX 460 ordered and installed within a week and a half. He's on a business trip until next Wednesday, so he'll order it when he gets back and it should be a week from there.

msft 2010-08-01 00:32

Hi, ET_
[QUOTE=ET_;223445]Fermat factors[/QUOTE]
Nice sound.

msft 2010-08-01 01:03

Hi,
I think everyone can convert to CUDA.:smile:

1. not use CPU<->GPU memory transfer in main loop.
2. only use Complex to Complex FFT with power of 2 element size.
3. Use The Warp.

Ken_g6 2010-08-23 15:56

Any progress on this?

If LLR is too complex, may I suggest a simple PRP test? Once upon a time, when Proth.exe was the fastest primality [i]prover[/i] around, we used to use a PRP program using GWNums to quickly remove almost all composites.

I imagine it would be pretty easy to write a simple Fermat's Little Theorem test: (2^(p-1) mod p == 1)?pseudoprime:composite. I'd do it myself if I had a clue what to do with FFTs.:sirrobin:

TheJudger 2010-08-23 16:54

Hi msft,

do you know how much your CUDA-enabled MacLucasFFTW is memory bandwidth limited (on GPU code)?

Tesla C2050:
2M FFT ~4.8 / ~4.3 ms/iter (ECC enabled/disabled)
4M FFT ~8.6 ms/iter (ECC disabled)

- OpenSUSE 11.2 x86_64
- CUDA toolkit 3.1
- nvidia driver 256.40
- MacLucasFFTW.S.tar.gz
compiled for 32bit!

Oliver


All times are UTC. The time now is 22:30.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.