![]() |
|
|
#1772 |
|
"Kieren"
Jul 2011
In My Own Galaxy!
100111101011102 Posts |
The latter. A high CPU wait means that it is waiting for the GPU. That is, the CPU is running ahead of the GPU.
Last fiddled with by kladner on 2012-04-30 at 21:29 |
|
|
|
|
|
#1773 | |
|
Aug 2010
Kansas
547 Posts |
Quote:
|
|
|
|
|
|
|
#1774 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
D5D16 Posts |
|
|
|
|
|
|
#1775 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
1C3516 Posts |
Often users find that it isn't very good; you can set SievePrimes=5000 or whatever number in mfaktc.ini, and SPAdjust to taste. (If adjust is on, it will start with whatever value you gave but will change on the fly.)
|
|
|
|
|
|
#1776 |
|
"Oliver"
Mar 2005
Germany
11·101 Posts |
mfaktc 0.18 compiled with CUDA 4.2 and compute capability 3.0 support. Sources are unchanged so just a new executable.
![]() http://www.mersenneforum.org/mfaktc/...win.cuda42.zip This version is for GTX 680 owners (which can't run the CUDA 4.0 or 4.1 executables). All other users can upgrade but there is no need to do so. As always recommended: run the full selftest (mfaktc...exe -st2) before you start productive jobs. About GTX 680: I still hadn't had my hands on a GTX 680, the tests where done by a forum user here. Once I have access to a Kepler card (and some time) I guess I can tweak the code a little bit but don't expect that a GTX 680 will ever perform as good as a GTX 580. ![]() Oliver |
|
|
|
|
|
#1777 |
|
Apr 2012
Berlin Germany
3×17 Posts |
http://www.abload.de/img/neuebitmap2s2f8r.jpg
Just playing around with some 65xxxxxx Exponents 70 - 71
Last fiddled with by Redarm on 2012-05-02 at 23:49 |
|
|
|
|
|
#1778 |
|
"Oliver"
Mar 2005
Germany
111110 Posts |
So your GTX 680 is ~20% overclocked and is worth ~400M/s for some reasonable assignments. So a stock GTX 680 is at ~330M/s, just 10% faster than my stock GTX 470.
For mfaktc: 470 < 680 < 480 < 570 < 580 Less than we all hoped for but not really bad. Now I'm interested in the power consumption while running mfaktc. Perhaps a 680 does a good job at mfaktc-performance per watt? Oliver |
|
|
|
|
|
#1779 |
|
Apr 2012
Berlin Germany
3×17 Posts |
70% TDP means perhaps 140W which is quite better than i expected
|
|
|
|
|
|
#1780 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
11·311 Posts |
According to my chart based on one benchmark from a while ago, I have the 680 and 470 very close together, with the 680 slightly behind (206 vs 218 GHz-days/day). Should I increase the expected performance of the compute-3.0 cards?
edit: I've just added more 600 series GPUs to my list. What an ugly mess of computer 2.1 / 3.0 chips making up the lineup. And three variants of the GT 640! Performance-per-watt is all over the place, even performance itself: the GT 630 is rated 672 GFLOPS vs 415 GFLOPS for the 40nm version of the GT 640. But thanks to the discrepancy between 2.1 and 3.0 performance, the GT 640 still outperforms at mfaktc. Last fiddled with by James Heinrich on 2012-05-03 at 13:01 |
|
|
|
|
|
#1781 |
|
"Oliver"
Mar 2005
Germany
111110 Posts |
Let's wait for some more (non-OCed) results and I get my hands on a Kepler.
Oliver |
|
|
|
|
|
#1782 | ||
|
"Oliver"
Mar 2005
Germany
100010101112 Posts |
Quote:
Quote:
Raw GPU speed for TF M66362159 69 70 mfaktc 0.19-pre1: 380.74M/s (my stock GTX 470 does ~335M/s) mfaktc 0.19-pre2: 380.92M/s -pre2 is the first attempt to optimized for Kepler... in the barrett79 kernel I've replaced all shiftlefts by multiplies... not really worth the extra code! ![]() Another attempt was to replace all shiftrights by multiplies (hi 32bit word), too... not a good idea, result was ~370M/s. ![]() Actual code for a shiftleft of a mutliword integer nn 23 bits: Code:
// shiftleft nn 23 bits [...] #if __CUDA_ARCH__ >= 300 nn.d4 = __umad32(nn.d4, 8388608, __umul32hi(nn.d3, 8388608)); nn.d3 = __umad32(nn.d3, 8388608, __umul32hi(nn.d2, 8388608)); nn.d2 = __umad32(nn.d2, 8388608, __umul32hi(nn.d1, 8388608)); nn.d1 = __umul32(nn.d1, 8388608); #else nn.d4 = (nn.d4 << 23) + (nn.d3 >> 9); nn.d3 = (nn.d3 << 23) + (nn.d2 >> 9); nn.d2 = (nn.d2 << 23) + (nn.d1 >> 9); nn.d1 = nn.d1 << 23; #endif The new code has only 2*1 instructions per word: multiply (high word) + multiply-add *1 we don't really know how many hardware instructions those are in hardware, PTX code is only a interim code. Oliver |
||
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
| gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 32 | 2020-11-11 19:56 |
| mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |