mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

TheJudger 2010-08-30 10:39

Hi sanzo,

your Geforce 210 is based on the GF218 chip. GF218 has compute capability 1.2 which means [B]no[/B] double precission. So this Chip can't run msft's code.
And I didn't noticed any Windows binary of this code.

This GPU hasn't much computing power compared to highend GPUs.
The difference of a current lowend GPU to a current highend GPU ist much higher than it is on CPU.

[B]Simplyfied comparison:[/B]
CPU lowend: 2 cores 2GHz
CPU highend: 6 cores 3.3GHz
(6 * 3.3GHz) / (2 * 2GHz) = ~5 times

GPU lowend: (Geforce 205) 8 CUDA cores @ 1402MHz
GPU highend: (Geforce GTX 480) 480 CUDA cores @ 1401MHz
difference: ~60 times

Oliver

sanzo 2010-08-30 12:00

Thanks [URL="http://www.mersenneforum.org/member.php?u=1696"]TheJudger [/URL]
In this case I wait a softw that support my ATI 5870 :)

thanks all!!

TheJudger 2010-08-30 23:33

[QUOTE=TheJudger;226993]Hi frmky,

[QUOTE=frmky]I don't think I can. This is a Linux compute node with no X installed. nvidia-settings complains about the lack of libX. nvidia-smi doesn't seem to be able to adjust the memory clock. Do you know of a Linux command line utility that will adjust it?
[/QUOTE]

no X is my problem, too. :smile:
I can try on my private computer next weekend (GTX 470).

Oliver[/QUOTE]

I forget to mention that I was unable to change the clock rates of my GF100 with 256.35 and 256.40 driver... Google says other can't, too. :sad:

Oliver

bdodson 2010-09-03 16:47

[QUOTE=frmky;226773]Really? I was expecting much better than that. From the GTX 260 all the way up to the GTX 480, the speed has scaled linearly with the frequency and number of DP units with no sign of being bandwidth limited. On the GTX 480, I'm getting nearly the same speed as you have posted using a 64-bit binary.

Edit: To rule out a weird compiler issue, can you try the binary at [URL="http://physics.fullerton.edu/gchilders/verS.tar.gz"]http://physics.fullerton.edu/gchilders/verS.tar.gz[/URL]? I've included the CUDA library files, so you can run with, for example,
LD_LIBRARY_PATH=. ./MacLucasFFTW 24036583
to test the 2M FFT.[/QUOTE]

With Greg's binary our Nvidia C2050 (fermi) reports 4.371 ms/iteration
for the 2M FFT with ECC off, starting with 24036583; which would appear
to confirm "TheJudger"'s timing. A test with 4M crashed, but it's just
been unpacked; so we may need to load some more libraries.

Two boards. And the view here is that the GPUs won't be useful
for Mersenne primes? The boinc projects with GPU applications run
circles around CPU aps (such as sieving under NFS@Home). I'm just
over 28M credits of CPU computing on sieving from almost a year; while
people running GPU apps are getting 1M/day. Not that I'm seeing
anything useful from Collatz, et. al.; I'm interested in NFS polyn
selection with msieve. -Bruce

TheJudger 2010-09-03 17:55

Hi Bruce,

you're still faster than a current quadcore desktop with a single GTX 480. But the speedup is smaller than in other projects.
Perhaps because Georges CPU implementation is tweaked very very well and msft's GPU code is "just using a generic FFT implementation" ([B]no[/B] offense on you, msft!).

Perhaps an LL test isn't suited well for GPUs.

Oliver

bdodson 2010-09-03 18:03

[QUOTE=frmky;194992]Version k runs at .0141 sec/iter for the 2048K FFT and .0264 sec/iter for the 4096K FFT on the C1060.[/QUOTE]

Uhm, I looked up sec <---> ms and got a report that 4.371 ms/iter
translates to 0.004371 sec/iter. Not sure about the other comparisions,
but tessla/fermi seems to be doing well relative to tessla1, yes?
(that's c1060 -vs- c2050). About 3 times quicker? -bd ("Mr Obvious")

ET_ 2010-09-03 18:35

[QUOTE=frmky;226773]Really? I was expecting much better than that. From the GTX 260 all the way up to the GTX 480, the speed has scaled linearly with the frequency and number of DP units with no sign of being bandwidth limited. On the GTX 480, I'm getting nearly the same speed as you have posted using a 64-bit binary.

Edit: To rule out a weird compiler issue, can you try the binary at [URL="http://physics.fullerton.edu/gchilders/verS.tar.gz"]http://physics.fullerton.edu/gchilders/verS.tar.gz[/URL]? I've included the CUDA library files, so you can run with, for example,
LD_LIBRARY_PATH=. ./MacLucasFFTW 24036583
to test the 2M FFT.[/QUOTE]

10.4 ms/iteration on my GTX275 :smile:

Luigi

ET_ 2010-09-03 19:16

But only 11.283 ms/iteration for 35000293! :smile:

Luigi

jasonp 2010-09-03 20:03

[QUOTE=bdodson;228298]
The boinc projects with GPU applications run circles around CPU aps (such as sieving under NFS@Home). I'm just over 28M credits of CPU computing on sieving from almost a year; while people running GPU apps are getting 1M/day. Not that I'm seeing anything useful from Collatz, et. al.; I'm interested in NFS polyn selection with msieve. -Bruce[/QUOTE]
Note that with Paul Zimmermann's help I've figured out a lot more about how Kleinjung's improved algorithm works, and if I ever get the time to overhaul the CPU code then polynomial selection with msieve can be made much more efficient. The problem is that the changes I'm considering will not work well with a GPU (i.e. using big hashtables)

frmky 2010-09-03 20:52

[QUOTE=bdodson;228298]And the view here is that the GPUs won't be useful
for Mersenne primes? [/QUOTE]
I was hoping for a bit faster, but to be realistic it is "only" the fastest single LL test that I'm aware of right now. Not to mention that the CPU use is tiny so it can be run in parallel with calculations on the CPU. Granted it's not 50x or 100x CPU, but it's still fast! :smile:

I don't have time to put into it right now, but with George's blessing it wouldn't be difficult to incorporate this into a BOINC project. PrimeGrid would probably be the best home if they're interested. It would probably give GIMPS quite a boost.

mdettweiler 2010-09-15 18:20

Well, my friend finally got his GTX 460 received and installed. I've installed the CUDA SDK tools and successfully compiled MacLucasFFTW. The problem, though, is when I try to run it:
[code]
gary@Buttford:~/Desktop/gpu-stuff/MacLucasFFTW$ ./MacLucasFFTW
./MacLucasFFTW: error while loading shared libraries: libcudart.so.3: cannot open shared object file: No such file or directory[/code]
(yes, ignore the computer name...it's a long story :razz:)

I tried pulling the libcudart.so.3.1.9 file (which is linked to from libcudart.so.3) out of the /usr/local/cuda/lib directory, putting it in the MacLucasFFTW directory, and renaming it libcudart.so.3. Yet I still get the error. Anybody have an idea what's going on here?


All times are UTC. The time now is 22:42.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.