mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

Karl M Johnson 2011-07-14 11:08

I can do that if someone provides me with CUDALucas 1.2 64 bit binaries, compiled for 3.2 TK and 32 bit binaries, compiled for 4.0 TK, for Windows.

apsen 2011-07-14 13:10

[QUOTE=Karl M Johnson;266367]I can do that if someone provides me with CUDALucas 1.2 64 bit binaries, compiled for 3.2 TK and 32 bit binaries, compiled for 4.0 TK, for Windows.[/QUOTE]

I could compile all of them tonight unless someone beets me to it.

Karl M Johnson 2011-07-14 13:24

Good, good!
I look forward to this !
If you dont mind, please use [B]-arch=sm_20[/B] flag instead of [B]-arch=sm_13[/B] .

I never lose hope, even though most of the apps did not benefit much from being compiled specifically for sm_20 Fermi GPUs.

apsen 2011-07-15 02:13

1 Attachment(s)
[QUOTE=Karl M Johnson;266381]Good, good!
I look forward to this !
If you dont mind, please use [B]-arch=sm_20[/B] flag instead of [B]-arch=sm_13[/B] ..[/QUOTE]

Here they are.


:redface: Sorry for archive in archive but zip cannot compress multiple files together and this board does not allow rars so it's rar inside zip. This way it's only one small file.

axn 2011-07-15 04:03

[QUOTE=apsen;266446]but zip cannot compress multiple files together[/QUOTE]

OT: Since when? :glare:

Karl M Johnson 2011-07-15 05:20

'tis ok, though I personally prefer 7z/.tar.lzma.

Before you read the results, here's how I did all these benchmarks:
Launched CUDALucas against a 50M exponent.
Waited till it printed first 10K iterations, then I started measuring time, using a stopwatch:smile:
Stopped when it printed 60K iterations.
A little bit of math then.

Additional notes:
Windows7 SP1 x64
Q6600 CPU @ 3.0 Ghz
GTX 480 @ 800 Mhz
ForceWare 275.33


Results:
[CODE]CUDALucas.cuda3.2.sm_20.WIN32 = 8.0728 ms/iter
CUDALucas.cuda3.2.sm_20.WIN64 = 8.0726 ms/iter
CUDALucas.cuda4.0.sm_20.WIN32 = 8.3190 ms/iter
CUDALucas.cuda4.0.sm_20.WIN64 = 8.7344 ms/iter


CUDALucas.cuda3.2.sm_13.WIN32 = 8.0570 ms/iter
CUDALucas.cuda3.2.sm_13.WIN64 = 8.0560 ms/iter
CUDALucas.cuda4.0.sm_13.WIN32 = 8.3466 ms/iter
CUDALucas.cuda4.0.sm_13.WIN64 = 8.7036 ms/iter[/CODE]
Conclusions:
-arch=sm_xx flag has little effect on performance, sm_13 seems best for Fermi, not sm_20.
CUDA 3.2 toolkit seems to be the best in terms of performance.
CUDA 4.0 toolkit creates 64 bit binaries, which are slower than their 32 bit variants(at least for Windows).

apsen 2011-07-15 11:18

[QUOTE=axn;266450]OT: Since when? :glare:[/QUOTE]

...cannot compress multiple files together [COLOR="DarkRed"]as if it was one file (it's called solid archive in rar)[/COLOR]. So it misses an opportunity to capitalize on a lot of redundancy across the multiple files.

Just try to compress the content of the above with zip directly: there's about 3 times difference in size.

If zip can do it I'd like to know which one it is.

apsen 2011-07-15 11:52

[QUOTE=Karl M Johnson;266455]
-arch=sm_xx flag has little effect on performance...[/QUOTE]


BTW CUDALucas compiles fine with sm_10 or sm_11.
Is there a problem with that or it was just random pick to use sm_13?

If sm_10/11 is fine would you, Karl, care to test them too?

Ralf Recker 2011-07-15 12:36

Thanks for compiling and testing the various binaries.

Karl M Johnson 2011-07-15 12:49

sm_13 is required for double precision support.

My guess that it may compile, but it will not work.

apsen 2011-07-15 17:51

[QUOTE=Karl M Johnson;266492]sm_13 is required for double precision support.

My guess that it may compile, but it will not work.[/QUOTE]

sm_10 produces exactly the same output as sm_13. :unsure:
(and seem to be faster but I have other stuff running so that might not really mean anything).

Does anyone know if double precision is actually necessary and if so how to make it show?


All times are UTC. The time now is 23:01.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.