mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

frmky 2010-08-24 03:48

[QUOTE=TheJudger;226691]
Tesla C2050:
2M FFT ~4.8 / ~4.3 ms/iter (ECC enabled/disabled)
4M FFT ~8.6 ms/iter (ECC disabled)
[/QUOTE]
Really? I was expecting much better than that. From the GTX 260 all the way up to the GTX 480, the speed has scaled linearly with the frequency and number of DP units with no sign of being bandwidth limited. On the GTX 480, I'm getting nearly the same speed as you have posted using a 64-bit binary.

Edit: To rule out a weird compiler issue, can you try the binary at [URL="http://physics.fullerton.edu/gchilders/verS.tar.gz"]http://physics.fullerton.edu/gchilders/verS.tar.gz[/URL]? I've included the CUDA library files, so you can run with, for example,
LD_LIBRARY_PATH=. ./MacLucasFFTW 24036583
to test the 2M FFT.

mdettweiler 2010-08-24 04:29

[quote=Ken_g6;226683]Any progress on this?

If LLR is too complex, may I suggest a simple PRP test? Once upon a time, when Proth.exe was the fastest primality [I]prover[/I] around, we used to use a PRP program using GWNums to quickly remove almost all composites.

I imagine it would be pretty easy to write a simple Fermat's Little Theorem test: (2^(p-1) mod p == 1)?pseudoprime:composite. I'd do it myself if I had a clue what to do with FFTs.:sirrobin:[/quote]
The Fermat PRP test, in fact, is what LLR does for non-base-2 numbers--that code is of direct lineage from the old PRP program. The latest version of LLR (3.8) adds some code to turn a Fermat PRP test into a full N-1/N+1 primality test when a PRP result is returned, but it's not that much different. Other gwnum-based programs like PFGW and Prime95 still use the basic PRP test.

Merely having the PRP test for CUDA would be immensely useful. Its residuals would be compatible with those produced by LLR's standard tests for non-base-2 numbers, and even though they wouldn't match LLR's for base 2, one could just as easily run LLR with the ForcePRP=1 option, PFGW, or Prime95 or produce compatible results at the same speed.

TheJudger 2010-08-24 09:14

Hi!

[QUOTE=frmky;226773]Really? I was expecting much better than that. From the GTX 260 all the way up to the GTX 480, the speed has scaled linearly with the frequency and number of DP units with no sign of being bandwidth limited. On the GTX 480, I'm getting nearly the same speed as you have posted using a 64-bit binary.
[/QUOTE]
Yep, I expected more, too. :sad:
It is a little bit faster than a GTX 480. Clock by clock it performs better than a GTX 480 (1150 vs 1404 MHz and 448 vs. 480 enabled shader cores) so it is taking advantage of the "additional" DP units (or the dual DMA engine).

[QUOTE=frmky;226773]Edit: To rule out a weird compiler issue, can you try the binary at [URL="http://physics.fullerton.edu/gchilders/verS.tar.gz"]http://physics.fullerton.edu/gchilders/verS.tar.gz[/URL]? I've included the CUDA library files, so you can run with, for example,
LD_LIBRARY_PATH=. ./MacLucasFFTW 24036583
to test the 2M FFT.[/QUOTE]

Same speed as my binary.
Can you downclock the memory on your GTX 480 to check how much it depends on memory bandwidth?

Oliver

msft 2010-08-24 11:00

1 Attachment(s)
Hi, TheJudger

Can you execute "CUDA Visual Profiler" ?

TheJudger 2010-08-24 13:24

1 Attachment(s)
Hi msft,

as you wish!

Oliver

msft 2010-08-24 22:45

Hi, TheJudger

My diagnosis is Normal , No abnormality.:smile:

MooMoo2 2010-08-24 23:21

[quote=Oddball;223372]Not for those who've spent thousands of dollars on their own CPU farm. I feel sorry for the person who bought a lot of quad cores for prime hunting, only to have those primes wiped off the top 5000 list by a few GPUs :sad:
[/quote]

[quote] I wouldn't want to imagine what a Primegrid equipped with GPUs would be like. What would the minimum entry level of the top 5000 be? 1 million digits?[/quote]

[quote]CUDA LLR application will be the start of a new era for prime search.[/quote]
Enough with the exaggerations on both sides (pro GPU and anti GPU). It's insanely hard to run LLR or even a PRP test on GPUs, and even if it were possibile, the additional computing power would be so little that it would hardly be worth the effort. There were no GPUs crunching for GIMPS in mid 2009, but GIMPS's output then (in teraflops) was almost the same as it is today.

You need to take into account the fact that the LLR code was highly optimized for x86 architectures, not GPUs. Moreover, GPUs aren't as efficient since only power-of-2 FFT lengths are supported, and the GPU applications at Primegrid have a much higher error rate than CPU applications.

A mid range GPU might be able to perform the same as a high end quad core, and this is an optimistic scenario.

frmky 2010-08-25 04:13

[QUOTE=TheJudger;226796]
Can you downclock the memory on your GTX 480 to check how much it depends on memory bandwidth?
[/QUOTE]
I don't think I can. This is a Linux compute node with no X installed. nvidia-settings complains about the lack of libX. nvidia-smi doesn't seem to be able to adjust the memory clock. Do you know of a Linux command line utility that will adjust it?

TheJudger 2010-08-25 08:57

Hi frmky,

[QUOTE=frmky;226969]I don't think I can. This is a Linux compute node with no X installed. nvidia-settings complains about the lack of libX. nvidia-smi doesn't seem to be able to adjust the memory clock. Do you know of a Linux command line utility that will adjust it?[/QUOTE]

no X is my problem, too. :smile:
I can try on my private computer next weekend (GTX 470).

Oliver

fivemack 2010-08-25 09:08

[QUOTE=MooMoo2;226932]Moreover, GPUs aren't as efficient since only power-of-2 FFT lengths are supported[/QUOTE]

That is hardly a fact of nature; it's just that the GPU programmers are still at the point of using standard libraries whilst the CPU programmers have been writing their own FFTs for more than fifteen years.

nucleon 2010-08-27 23:42

Has there been any work done on the competition?

On ATI's cards - any work done on them? Any figures for comparison?

-- Craig


All times are UTC. The time now is 22:42.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.