![]() |
[QUOTE=TObject;298012]What is the “CPU Wait”? The bigger the % the worse the CPU is keeping up? Or is it the other way around?
Thanks[/QUOTE] The latter. A high CPU wait means that it is waiting for the GPU. That is, the CPU is running ahead of the GPU. |
Wunderbar
[QUOTE=Dubslow;298010]It's the count of the number of primes to sieve n=2kp+1 with. The more primes you use in sieving, the more not-prime-n you eliminate, but of course the law of diminishing returns applies; the important fact is that this sieving-candidates is done on the CPU, while the actual trying-candidates happens on the GPU, so that the SP count is effectively how much work the CPU has to do before a candidate is sent to the GPU. If the CPU can't keep up with the GPU, lower sieve primes so it's doing less work; if the GPU can't keep up, increase SP so the CPU does more work.[/QUOTE]
So, how do I change SP? |
[QUOTE=c10ck3r;298021]So, how do I change SP?[/QUOTE]Ideally, set [i]SievePrimesAdjust=1[/i] in mfaktc.ini and let it reach optimum.
|
[QUOTE=James Heinrich;298024]Ideally, set [i]SievePrimesAdjust=1[/i] in mfaktc.ini and let it reach optimum.[/QUOTE]
Often users find that it isn't very good; you can set [i]SievePrimes=5000[/i] or whatever number in mfaktc.ini, and SPAdjust to taste. (If adjust is on, it will start with whatever value you gave but will change on the fly.) |
mfaktc 0.18 compiled with CUDA 4.[B][COLOR="Red"]2[/COLOR][/B] and compute capability [B][COLOR="Red"]3.0[/COLOR][/B] support. Sources are unchanged so just a new executable. :smile:
[url]http://www.mersenneforum.org/mfaktc/mfaktc-0.18.win.cuda42.zip[/url] This version is for GTX 680 owners (which can't run the CUDA 4.0 or 4.1 executables). All other users can upgrade but there is no need to do so. As always recommended: run the full selftest (mfaktc...exe -st2) before you start productive jobs. About GTX 680: I still hadn't had my hands on a GTX 680, the tests where done by a forum user here. Once I have access to a Kepler card (and some time) I guess I can tweak the code a little bit but don't expect that a GTX 680 will ever perform as good as a GTX 580. :sad: Oliver |
[URL="http://www.abload.de/image.php?img=neuebitmap2s2f8r.jpg"][img]http://www.abload.de/img/neuebitmap2s2f8r.jpg[/img][/URL]
Just playing around with some 65xxxxxx Exponents 70 - 71 :smile: |
So your GTX 680 is ~20% overclocked and is worth ~400M/s for some reasonable assignments. So a stock GTX 680 is at ~330M/s, just 10% faster than my stock GTX 470.
For mfaktc: 470 < [B]680[/B] < 480 < 570 < 580 Less than we all hoped for but not really bad. Now I'm interested in the power consumption while running mfaktc. Perhaps a 680 does a good job at mfaktc-performance per watt? Oliver |
70% TDP means perhaps 140W which is quite better than i expected
|
[QUOTE=TheJudger;298272]For mfaktc: 470 < [B]680[/B] < 480 < 570 < 580[/QUOTE]According to [url=http://mersenne-aries.sili.net/mfaktc.php?sort=ghdpd&noA=1]my chart[/url] based on one benchmark from a while ago, I have the 680 and 470 very close together, with the 680 slightly behind (206 vs 218 GHz-days/day). Should I increase the expected performance of the compute-3.0 cards?
[i]edit:[/i] I've just added more 600 series GPUs to my list. What an ugly mess of computer 2.1 / 3.0 chips making up the lineup. And three variants of the GT 640! Performance-per-watt is all over the place, even performance itself: the GT 630 is rated 672 GFLOPS vs 415 GFLOPS for the 40nm version of the GT 640. But thanks to the discrepancy between 2.1 and 3.0 performance, the GT 640 still outperforms at mfaktc. |
Let's wait for some more (non-OCed) results and I get my hands on a Kepler.
Oliver |
[QUOTE=Prime95;294453]I'd say they are about 20 times slower than they should be!! 32-bit muls are much faster than shift lefts! Repeated adds are much faster than small shift lefts. Algorithms may have to change to avoid shift rights.[/QUOTE]
Well, the barrett79 kernel (this kernel is the fasted and most used kernel) doesn't contain many shifts at all. [QUOTE=TheJudger;298282]Let's wait for some more (non-OCed) results and I get my hands on a Kepler.[/QUOTE] For a limited amount of time I can put my hand on a GTX 680 (factory overclocked, driver reports 1124MHz). [B]Raw[/B] GPU speed for TF M66362159 69 70 mfaktc 0.19-pre1: 380.74M/s (my stock GTX 470 does ~335M/s) mfaktc 0.19-pre2: 380.92M/s -pre2 is the first attempt to optimized for Kepler... in the barrett79 kernel I've replaced all shiftlefts by multiplies... not really worth the extra code! :sad: Another attempt was to replace all shiftrights by multiplies (hi 32bit word), too... not a good idea, result was ~370M/s. :sad: Actual code for a shiftleft of a mutliword integer [I]nn[/I] 23 bits: [CODE] // shiftleft nn 23 bits [...] #if __CUDA_ARCH__ >= 300 nn.d4 = __umad32(nn.d4, 8388608, __umul32hi(nn.d3, 8388608)); nn.d3 = __umad32(nn.d3, 8388608, __umul32hi(nn.d2, 8388608)); nn.d2 = __umad32(nn.d2, 8388608, __umul32hi(nn.d1, 8388608)); nn.d1 = __umul32(nn.d1, 8388608); #else nn.d4 = (nn.d4 << 23) + (nn.d3 >> 9); nn.d3 = (nn.d3 << 23) + (nn.d2 >> 9); nn.d2 = (nn.d2 << 23) + (nn.d1 >> 9); nn.d1 = nn.d1 << 23; #endif [/CODE] The old code has 3[SUP]*1[/SUP] instructions per word: shiftleft + shiftright + add The new code has only 2[SUP]*1[/SUP] instructions per word: multiply (high word) + multiply-add [SUP]*1[/SUP] we don't really know how many hardware instructions those are in hardware, PTX code is only a interim code. Oliver |
| All times are UTC. The time now is 23:17. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.