![]() |
[QUOTE=flashjh;283988]
4.11 build is slower than 1.2b for me. I had to install CUDA 4.0 to get the lastest .dll files. Anyone know why the newer one is slower or something I can do to make it faster? Anyone know how to get the ETA back in 4.11?[/QUOTE] [QUOTE=Brain;284052] As CUDALucas does auto-resume from checkpoint files we should recommend [COLOR=red][B]not using "-c"[/B][/COLOR] any more, do we? I will have to update the GPU guide in the new year... By the way, the iteration times are so low as I didn't do complete 10000 runs. Kind of a bug. Last but not least, utilisation for state-of-the-art expos (40M range) is 97% as before. Low utilisation is understandable for small FFT sizes...[/QUOTE] I have not tried 1.3, but 1.4.1 is about 2x slower for me than 1.2b. I dropped the -c and -t and only use -D01 for GPU 2. I'm still learning... Is [COLOR=black]it better to use 1.4.1 to get the [FONT=Arial][FONT=Arial][FONT=Arial]non-power-of-2-fft-sizes or use 1.2b to optimuze speed? Thanks.[/FONT][/FONT][/FONT][/COLOR][FONT=Arial] [/FONT] |
[QUOTE=Brain;284052]As CUDALucas does auto-resume from checkpoint files we should recommend [COLOR=Red][B]not using "-c"[/B][/COLOR] any more, do we? I will have to update the GPU guide in the new year...[/QUOTE]
Stupid me: If we don't use -c there are no checkpoints written. So we have to use it. I will have time to test more in the new year but currently I recommend using 1.2(b)! [URL="http://home.htp-tel.de/shornbostel/"]DLL files download[/URL] |
From "does hardly use any CPU resources" to "takes almost a full CPU core"
Hi,
although nobody has yet confirmed that CL >= 1.4 uses more CPU resources I wrote msft this question. He answered that he is investigating and will try to fix it. I :love: CUDALucas (and all GPU based primality tests) Thanks to msft! |
1 Attachment(s)
Hi ,
[QUOTE=Brain;284216]although nobody has yet confirmed that CL >= 1.4 uses more CPU resources I wrote msft this question.[/QUOTE] Fixed CPU issue. Thank you, |
1 Attachment(s)
exec file.
|
1 Attachment(s)
[QUOTE=msft;284224]Hi ,
Fixed CPU issue. Thank you,[/QUOTE] Thanks a lot. I noticed you didn't change the version string. For Win64 compilation I will change it to: [CODE]const char program_revision[] = "$Revision: 1.4.2 $";[/CODE] My last test with 1.41 didn't show the high CPU utlilisation any more, see attached. I am a bit confused. Maybe I made a mistake with the "-c" param so that CL wrote it every iteration. Just guessing. Will now compile again. |
1.4.2 looks good
1 Attachment(s)
Maybe even a bit faster. We'll see.
|
Shader Model 1.3
1 Attachment(s)
CUDALucas 1.4.2 for Win64 / CUDA 4.0 / Compute Capability 1.3
|
Shader Model 2.1
1 Attachment(s)
CUDALucas 1.4.2 for Win64 / CUDA 4.0 / Compute Capability 2.1
|
CUFFT benchmark with cuda4.0:
[CODE] CUFFT_D2Z size=512 k time=1.269744 msec CUFFT_D2Z size=1024 k time=2.609184 msec CUFFT_D2Z size=1536 k time=4.359898 msec CUFFT_D2Z size=2048 k time=5.615232 msec CUFFT_D2Z size=2560 k time=7.277350 msec CUFFT_D2Z size=3072 k time=8.969321 msec CUFFT_D2Z size=3584 k time=10.251376 msec CUFFT_D2Z size=4096 k time=11.749495 msec CUFFT_D2Z size=4608 k time=13.065844 msec CUFFT_D2Z size=5120 k time=14.971667 msec [COLOR="Red"]CUFFT_D2Z size=5632 k time=148.589874 msec[/COLOR] CUFFT_D2Z size=6144 k time=19.145430 msec [COLOR="Red"]CUFFT_D2Z size=6656 k time=217.340973 msec[/COLOR] CUFFT_D2Z size=7168 k time=21.095901 msec CUFFT_D2Z size=7680 k time=24.699974 msec CUFFT_D2Z size=8192 k time=24.172211 msec [/CODE] Some fft length was very slow. Ver 1.42 not avoid this length. |
v1.14 VS v1.42
V1.41 is running pretty well just like v1.42 :smile:
9.3 ms/iter for 54M exponent on GTX-580 card with v1.41 8.6 ms/iter with v1.42 Domo |
| All times are UTC. The time now is 23:06. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.