mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

zs6nw 2012-04-09 19:13

The -polite 0 option seems to be the actual culprit.

[QUOTE=Dubslow;295890]Hmm, I've turned off -k but I'm still seeing 15-20% CPU usage.
[code]LD_LIBRARY_PATH=~/CUDALucas/lib ~/CUDALucas/CUDALucas -c 10000 -f 1474560 -polite 0 worktodo.txt[/code][/QUOTE]

Dubslow 2012-04-09 19:18

[QUOTE=zs6nw;295931]The -polite 0 option seems to be the actual culprit.[/QUOTE]

Indeed, I can confirm this. msft, do you know why it does that?


(Btw, if you DLed the script between 10 minutes before this post and ~1 hr after the original post, there was a typo that is now fixed that prevented it from working properly.)

msft 2012-04-10 00:40

[QUOTE=Dubslow;295933]Indeed, I can confirm this. msft, do you know why it does that?[/QUOTE]
Side effect.

Dubslow 2012-04-10 01:36

[QUOTE=msft;295965]Side effect.[/QUOTE]

I suppose the better question is, is it possible to get aggressive (p=0) performance without adding the extra cpu time, or is that just the way it is?

rcv 2012-04-10 01:46

[QUOTE=Dubslow;295971]I suppose the better question is, is it possible to get aggressive (p=0) performance without adding the extra cpu time, or is that just the way it is?[/QUOTE]
I haven't looked carefully at the CUDALucas code. However, this sounds very similar to the standard NVIDIA problem of using spin-loops.

@msft: I sent a patch to Cyril for the GPU version of gmp-ecm that cures this problem for his application. xilman acknowledged in another thread ( [URL]http://www.mersenneforum.org/showpost.php?p=295541&postcount=63[/URL] ) that it worked for him. It's only a dozen+ lines. Since you are using the CUFFT library, there may be some complications, but if you are interested and if you think it may be the NVIDIA spin-loops, PM me, and I'll send you the technique.

msft 2012-04-10 05:49

[QUOTE=Dubslow;295971]I suppose the better question is, is it possible to get aggressive (p=0) performance without adding the extra cpu time, or is that just the way it is?[/QUOTE]
"-polite 64" Good balance on my linux box.

Dubslow 2012-04-10 05:54

[QUOTE=msft;295993]"-polite 64" Good balance on my linux box.[/QUOTE]

Well I'll be. What exactly does the 64 mean? I thought it was just a binary switch.

zs6nw 2012-04-10 21:39

1 Attachment(s)
Graph of GTX460 cufftbench timing data.

[QUOTE=Prime95;294530]Attached is my cufftbench for a GTX460. I flagged with a "Y" the FFT sizes that make sense.[/QUOTE]

LaurV 2012-04-11 04:53

[QUOTE=Dubslow;295994]Well I'll be. What exactly does the 64 mean? I thought it was just a binary switch.[/QUOTE]
"-polite x" will be polite... every x iterations. It says in the help. :P
There was an example with 100 earlier in this thread. So, the higher the number, the more aggressive it becomes. Use 0 for "infinite aggressive" (never do the wait loop). Use 1 for the most polite (do the wait loop every "1" iterations).

Dubslow 2012-04-11 20:24

[QUOTE=Dubslow;295786]CuLu Spider...

If an exponent result is correctly parsed, it's passed into the "decide" function, which decides what's appropriate; it has the following logic:
<snip>

Unfortunately, due to a paucity of exponents, I have not tested every case; I do know that the "Verified LL" portion works as advertised (thanks msft for posting that 2.00 result you had :smile:), and that it works in the basic case of a match with no prior CuLu result, however the other scenarios remain untested (but *should* work).[/QUOTE]
I can now confirm that it will successfully notice a previous CUDALucas test, and will therefore not submit such an exponent. The underlined parts of the logic have now been tested at least once:

[code]
[U]if( 2*"Verified LL" is in the expo status page) {
then submit anyways, just in case[/U]
} [U]else if( there is a string of 14 lowercase hex digits [or all decimals][/U] {
if(all decimals) { print warning; ask user to check exponent manually }
[U]else { we know there's a CuLu test; exponent will not be submitted
if(your residue matches another) { print "match"; do not submit }[/U]
else { print no current matches, use Prime95! do not submit }
}
else {[U] no previous CuLu, and not DCed
if( there is a matching residue ) {submit!}
else { print warning: no match; do not submit }[/U]
}[/code]
Edit: Version 0.02 now available; fixed a logging bug; no change to major functionality. I have now also tested that a mismatch with no previous CuLu result is properly detected. Chart above updated as such.
([url]http://dubslow.tk/gimps/CuLuSpider.txt[/url] View in browser)
([url]http://dubslow.tk/gimps/CuLuSpider.py[/url] Download)

Batalov 2012-04-11 20:45

All this double accounting is dubious. Who is served by having some unobservable wrong or right residue? (Perhaps some misplaced pride? Well, in that case, it would be better served by tuning the card to work right, not just "look ma! no hands! 5GHz!!")

I've submitted a non-matching residue long ago and never thought twice about it but bookmarked the result to revisit later. [URL="http://www.mersenne.org/report_exponent/?exp_lo=27402559&exp_hi=10000&B1=Get+status"]Et voila[/URL] - CUDA was right. (I've looked back at the version, it was CUDALucas v1.48.)


All times are UTC. The time now is 23:14.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.