mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

flashjh 2011-12-30 23:11

[QUOTE=flashjh;283988]
4.11 build is slower than 1.2b for me. I had to install CUDA 4.0 to get the lastest .dll files.

Anyone know why the newer one is slower or something I can do to make it faster? Anyone know how to get the ETA back in 4.11?[/QUOTE]

[QUOTE=Brain;284052]
As CUDALucas does auto-resume from checkpoint files we should recommend [COLOR=red][B]not using "-c"[/B][/COLOR] any more, do we? I will have to update the GPU guide in the new year...

By the way, the iteration times are so low as I didn't do complete 10000 runs. Kind of a bug.

Last but not least, utilisation for state-of-the-art expos (40M range) is 97% as before. Low utilisation is understandable for small FFT sizes...[/QUOTE]

I have not tried 1.3, but 1.4.1 is about 2x slower for me than 1.2b. I dropped the -c and -t and only use -D01 for GPU 2. I'm still learning... Is [COLOR=black]it better to use 1.4.1 to get the [FONT=Arial][FONT=Arial][FONT=Arial]non-power-of-2-fft-sizes or use 1.2b to optimuze speed? Thanks.[/FONT][/FONT][/FONT][/COLOR][FONT=Arial]
[/FONT]

Brain 2011-12-31 09:27

[QUOTE=Brain;284052]As CUDALucas does auto-resume from checkpoint files we should recommend [COLOR=Red][B]not using "-c"[/B][/COLOR] any more, do we? I will have to update the GPU guide in the new year...[/QUOTE]
Stupid me: If we don't use -c there are no checkpoints written. So we have to use it. I will have time to test more in the new year but currently I recommend using 1.2(b)!

[URL="http://home.htp-tel.de/shornbostel/"]DLL files download[/URL]

Brain 2011-12-31 12:44

From "does hardly use any CPU resources" to "takes almost a full CPU core"
 
Hi,
although nobody has yet confirmed that CL >= 1.4 uses more CPU resources I wrote msft this question. He answered that he is investigating and will try to fix it.
I :love: CUDALucas (and all GPU based primality tests)
Thanks to msft!

msft 2011-12-31 14:19

1 Attachment(s)
Hi ,
[QUOTE=Brain;284216]although nobody has yet confirmed that CL >= 1.4 uses more CPU resources I wrote msft this question.[/QUOTE]
Fixed CPU issue.
Thank you,

msft 2011-12-31 14:35

1 Attachment(s)
exec file.

Brain 2011-12-31 14:36

1 Attachment(s)
[QUOTE=msft;284224]Hi ,

Fixed CPU issue.
Thank you,[/QUOTE]
Thanks a lot. I noticed you didn't change the version string. For Win64 compilation I will change it to:
[CODE]const char program_revision[] = "$Revision: 1.4.2 $";[/CODE]

My last test with 1.41 didn't show the high CPU utlilisation any more, see attached. I am a bit confused. Maybe I made a mistake with the "-c" param so that CL wrote it every iteration. Just guessing.

Will now compile again.

Brain 2011-12-31 14:48

1.4.2 looks good
 
1 Attachment(s)
Maybe even a bit faster. We'll see.

Brain 2011-12-31 14:50

Shader Model 1.3
 
1 Attachment(s)
CUDALucas 1.4.2 for Win64 / CUDA 4.0 / Compute Capability 1.3

Brain 2011-12-31 14:53

Shader Model 2.1
 
1 Attachment(s)
CUDALucas 1.4.2 for Win64 / CUDA 4.0 / Compute Capability 2.1

msft 2011-12-31 14:54

CUFFT benchmark with cuda4.0:
[CODE]
CUFFT_D2Z size=512 k time=1.269744 msec
CUFFT_D2Z size=1024 k time=2.609184 msec
CUFFT_D2Z size=1536 k time=4.359898 msec
CUFFT_D2Z size=2048 k time=5.615232 msec
CUFFT_D2Z size=2560 k time=7.277350 msec
CUFFT_D2Z size=3072 k time=8.969321 msec
CUFFT_D2Z size=3584 k time=10.251376 msec
CUFFT_D2Z size=4096 k time=11.749495 msec
CUFFT_D2Z size=4608 k time=13.065844 msec
CUFFT_D2Z size=5120 k time=14.971667 msec
[COLOR="Red"]CUFFT_D2Z size=5632 k time=148.589874 msec[/COLOR]
CUFFT_D2Z size=6144 k time=19.145430 msec
[COLOR="Red"]CUFFT_D2Z size=6656 k time=217.340973 msec[/COLOR]
CUFFT_D2Z size=7168 k time=21.095901 msec
CUFFT_D2Z size=7680 k time=24.699974 msec
CUFFT_D2Z size=8192 k time=24.172211 msec
[/CODE]
Some fft length was very slow.
Ver 1.42 not avoid this length.

f11ksx 2011-12-31 15:27

v1.14 VS v1.42
 
V1.41 is running pretty well just like v1.42 :smile:

9.3 ms/iter for 54M exponent on GTX-580 card with v1.41
8.6 ms/iter with v1.42

Domo


All times are UTC. The time now is 23:06.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.