![]() |
[QUOTE=Brain;266669]Is this CUDA 3.2 or 4.0 and CUDALucas 1.2? ... I downgraded to 1.0b again because of performance on CUDA 3.2 with previous Win64 build.[/QUOTE]
Win64/CUDA 3.2/CUDALucas 1.2 compiled for sm_21. What is the difference between CUDALucas 1.0b and 1.2? And how much performance difference between them? |
[QUOTE=apsen;266687]
What is the difference between CUDALucas 1.0b and 1.2? And how much performance difference between them?[/QUOTE] 1.2 has improved check file handling: it no longer needs different command line calls to resume from check file. Additionally, your 1.2 has timing outputs. Iteration time @4M on GTX 560 Ti 822 MHz is 13.6ms (1.0b) versus 13.4ms (1.2, your build). I switched to your build. Thanks. P.S.: One has to download CUDA 3.2 as cufft ist too large to zip to this forum. Needed: cufft64_32_16.dll and cudart64_32_16.dll. |
[QUOTE=Brain;266705]Iteration time @4M on GTX 560 Ti 822 MHz is 13.6ms (1.0b) versus 13.4ms (1.2, your build).
I switched to your build. Thanks.[/QUOTE] Was it faster than other 8 binaries provided ? If yes, then -arch=sm_xx should be used for sm_21 gpus, because it's actually does something beneficial, unlike for sm_20 GPUs. |
[QUOTE=Brain;266705] Additionally, your 1.2 has timing outputs.[/QUOTE]
:redface: Oops... Forgot about that. I meant to release untouched code. Actually original version has the function for that that tries to print that info to stderr and I've just added the call but it did not work for some reason so I replaced fprintf with regular printf to make it work. The other difference would be that this version will not complain if you try to run it on pre 1.3 board. It will not work anyway but you'll get generic error message instead. It will not let you choose the graphic device too. I think I better compile original code... |
Offer both versions
I love the timing output! Imho: "Keep it". And I suppose the final result line in mersarch.txt is not affected. Even if so I would prefer to manually remove it before I submit the result.
@Karl M Johnson: I only tried the initial CUDA 4.0 build from apsen release which was slightly over 14ms (worse). By the way, I doubt any noticable speedup. Timings for 13.6ms where done by waiting 136secs on Windows' clock i.e. inaccurate. 13.4ms are measured in 60%, 13.5ms in 40% via timing outputs. So there is no big difference. |
[QUOTE=Brain;266714]I love the timing output! Imho: "Keep it".[/QUOTE]
Maybe I'll find some time to actually look at the code. I do not feel good about it anyway as some command line options do not work... |
BTW does anyone experience crashes with the program. It seem to finish the work fine but crashes after that so the results are apparently not affected but it is not nice to crash.
|
1 Attachment(s)
[QUOTE=apsen;266882]BTW does anyone experience crashes with the program. It seem to finish the work fine but crashes after that so the results are apparently not affected but it is not nice to crash.[/QUOTE]
I figured it out (kind of). When restarting from checkpoint and finishing the test it will then try to read more input from the same file although it has already been closed. It will loop endlessly until it crashes (haven't really figured out the exact point and reason for crash but it happens in "input" function when it enters endless loop). As I tried to figure it out I have cut out a lot of unused code, removed K&R style prototypes, etc. Also added some timing output. I haven't touched anything related to calculations but it would be prudent to be cautious - it needs a lot of testing before production use. Most likely bugs would be in parsing command line. That code suffered most nontrivial change. I'm attaching the modified source code with Win64 executable compiled for sm_13. |
1 Attachment(s)
Approved !
This is not related to this version, but it is related to CUDALucas: the GPU usage isnt close to 99%. So there can be a speedup, ye ? |
[QUOTE=Karl M Johnson;267366]Approved ![/QUOTE]
BTW try -t switch. :smile: [QUOTE=Karl M Johnson;267366] This is not related to this version, but it is related to CUDALucas: the GPU usage isnt close to 99%. So there can be a speedup, ye ?[/QUOTE] What do you use to see GPU usage? At a quick glance It seems that most of the time it sits in the cufft library. But I may be mistaken. It may be that unless FFT is rewritten there's not much to gain. Also according to GIMPS side floating FFT is chosen due to pentium processor specifics. On GPU discreet FFT may be faster. [url]http://www.mersenne.org/various/intfft.txt[/url] on GIMPS assumes some knowledge that I currently do not have so to me it's not obvious how to implement. I might find time to look into it in the future though. |
Ok.
I use MSI Afterburner(modified RivaTuner) to measure GPU load. Also, stuff like GPU-Z, EVGA Precision(another Riva clone) can be used. The ETA and performance in ms/iter is VERY useful, thanks ! |
| All times are UTC. The time now is 23:03. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.