mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-04-30, 21:28   #1772
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2·3·1,693 Posts
Default

Quote:
Originally Posted by TObject View Post
What is the “CPU Wait”? The bigger the % the worse the CPU is keeping up? Or is it the other way around?

Thanks
The latter. A high CPU wait means that it is waiting for the GPU. That is, the CPU is running ahead of the GPU.

Last fiddled with by kladner on 2012-04-30 at 21:29
kladner is offline   Reply With Quote
Old 2012-04-30, 22:48   #1773
c10ck3r
 
c10ck3r's Avatar
 
Aug 2010
Kansas

54710 Posts
Default Wunderbar

Quote:
Originally Posted by Dubslow View Post
It's the count of the number of primes to sieve n=2kp+1 with. The more primes you use in sieving, the more not-prime-n you eliminate, but of course the law of diminishing returns applies; the important fact is that this sieving-candidates is done on the CPU, while the actual trying-candidates happens on the GPU, so that the SP count is effectively how much work the CPU has to do before a candidate is sent to the GPU. If the CPU can't keep up with the GPU, lower sieve primes so it's doing less work; if the GPU can't keep up, increase SP so the CPU does more work.
So, how do I change SP?
c10ck3r is offline   Reply With Quote
Old 2012-04-30, 22:52   #1774
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11×311 Posts
Default

Quote:
Originally Posted by c10ck3r View Post
So, how do I change SP?
Ideally, set SievePrimesAdjust=1 in mfaktc.ini and let it reach optimum.
James Heinrich is offline   Reply With Quote
Old 2012-04-30, 22:55   #1775
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
Ideally, set SievePrimesAdjust=1 in mfaktc.ini and let it reach optimum.
Often users find that it isn't very good; you can set SievePrimes=5000 or whatever number in mfaktc.ini, and SPAdjust to taste. (If adjust is on, it will start with whatever value you gave but will change on the fly.)
Dubslow is offline   Reply With Quote
Old 2012-05-02, 22:25   #1776
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11×101 Posts
Default

mfaktc 0.18 compiled with CUDA 4.2 and compute capability 3.0 support. Sources are unchanged so just a new executable.
http://www.mersenneforum.org/mfaktc/...win.cuda42.zip

This version is for GTX 680 owners (which can't run the CUDA 4.0 or 4.1 executables). All other users can upgrade but there is no need to do so.

As always recommended: run the full selftest (mfaktc...exe -st2) before you start productive jobs.

About GTX 680: I still hadn't had my hands on a GTX 680, the tests where done by a forum user here. Once I have access to a Kepler card (and some time) I guess I can tweak the code a little bit but don't expect that a GTX 680 will ever perform as good as a GTX 580.

Oliver
TheJudger is offline   Reply With Quote
Old 2012-05-02, 23:38   #1777
Redarm
 
Redarm's Avatar
 
Apr 2012
Berlin Germany

3·17 Posts
Thumbs up

http://www.abload.de/img/neuebitmap2s2f8r.jpg

Just playing around with some 65xxxxxx Exponents 70 - 71

Last fiddled with by Redarm on 2012-05-02 at 23:49
Redarm is offline   Reply With Quote
Old 2012-05-03, 10:45   #1778
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11·101 Posts
Default

So your GTX 680 is ~20% overclocked and is worth ~400M/s for some reasonable assignments. So a stock GTX 680 is at ~330M/s, just 10% faster than my stock GTX 470.

For mfaktc: 470 < 680 < 480 < 570 < 580
Less than we all hoped for but not really bad. Now I'm interested in the power consumption while running mfaktc. Perhaps a 680 does a good job at mfaktc-performance per watt?

Oliver
TheJudger is offline   Reply With Quote
Old 2012-05-03, 10:53   #1779
Redarm
 
Redarm's Avatar
 
Apr 2012
Berlin Germany

5110 Posts
Default

70% TDP means perhaps 140W which is quite better than i expected
Redarm is offline   Reply With Quote
Old 2012-05-03, 12:45   #1780
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11×311 Posts
Default

Quote:
Originally Posted by TheJudger View Post
For mfaktc: 470 < 680 < 480 < 570 < 580
According to my chart based on one benchmark from a while ago, I have the 680 and 470 very close together, with the 680 slightly behind (206 vs 218 GHz-days/day). Should I increase the expected performance of the compute-3.0 cards?

edit: I've just added more 600 series GPUs to my list. What an ugly mess of computer 2.1 / 3.0 chips making up the lineup. And three variants of the GT 640! Performance-per-watt is all over the place, even performance itself: the GT 630 is rated 672 GFLOPS vs 415 GFLOPS for the 40nm version of the GT 640. But thanks to the discrepancy between 2.1 and 3.0 performance, the GT 640 still outperforms at mfaktc.

Last fiddled with by James Heinrich on 2012-05-03 at 13:01
James Heinrich is offline   Reply With Quote
Old 2012-05-03, 12:59   #1781
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11·101 Posts
Default

Let's wait for some more (non-OCed) results and I get my hands on a Kepler.

Oliver
TheJudger is offline   Reply With Quote
Old 2012-05-26, 12:20   #1782
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

111110 Posts
Default

Quote:
Originally Posted by Prime95 View Post
I'd say they are about 20 times slower than they should be!! 32-bit muls are much faster than shift lefts! Repeated adds are much faster than small shift lefts. Algorithms may have to change to avoid shift rights.
Well, the barrett79 kernel (this kernel is the fasted and most used kernel) doesn't contain many shifts at all.

Quote:
Originally Posted by TheJudger View Post
Let's wait for some more (non-OCed) results and I get my hands on a Kepler.
For a limited amount of time I can put my hand on a GTX 680 (factory overclocked, driver reports 1124MHz).

Raw GPU speed for TF M66362159 69 70
mfaktc 0.19-pre1: 380.74M/s (my stock GTX 470 does ~335M/s)
mfaktc 0.19-pre2: 380.92M/s

-pre2 is the first attempt to optimized for Kepler... in the barrett79 kernel I've replaced all shiftlefts by multiplies... not really worth the extra code!
Another attempt was to replace all shiftrights by multiplies (hi 32bit word), too... not a good idea, result was ~370M/s.

Actual code for a shiftleft of a mutliword integer nn 23 bits:
Code:
// shiftleft nn 23 bits
[...]
#if __CUDA_ARCH__ >= 300
  nn.d4 = __umad32(nn.d4, 8388608, __umul32hi(nn.d3, 8388608));
  nn.d3 = __umad32(nn.d3, 8388608, __umul32hi(nn.d2, 8388608));
  nn.d2 = __umad32(nn.d2, 8388608, __umul32hi(nn.d1, 8388608));
  nn.d1 = __umul32(nn.d1, 8388608);
#else
  nn.d4 = (nn.d4 << 23) + (nn.d3 >> 9);
  nn.d3 = (nn.d3 << 23) + (nn.d2 >> 9);
  nn.d2 = (nn.d2 << 23) + (nn.d1 >> 9);
  nn.d1 =  nn.d1 << 23;
#endif
The old code has 3*1 instructions per word: shiftleft + shiftright + add
The new code has only 2*1 instructions per word: multiply (high word) + multiply-add

*1 we don't really know how many hardware instructions those are in hardware, PTX code is only a interim code.

Oliver
TheJudger is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 07:29.


Mon Aug 2 07:29:38 UTC 2021 up 10 days, 1:58, 0 users, load averages: 1.51, 1.32, 1.40

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.