mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-03-27, 08:05   #1695
BigBrother
 
Feb 2005
The Netherlands

110110102 Posts
Default

I can saturate (100% GPU) my GTX680 when running two instances of mfaktc.

http://gpuz.techpowerup.com/12/03/27/a9a.png

Note that the GPU core clock is constantly boosted to +-1100MHz and the power consumption hovers around 72% TDP, which could mean that the performance/Watt for this chip is higher than in James' calculations. This power consumption sensor seems to be a new feature on this chip, I've never seen it displayed in GPU-Z before on any other card.
BigBrother is offline   Reply With Quote
Old 2012-03-27, 11:13   #1696
BigBrother
 
Feb 2005
The Netherlands

2×109 Posts
Default

It turns out that I plugged my brand new shiny bling-bling GTX680 into a PCI-E 2.0 x8 slot instead of a PCI-E 2.0 x16 slot... I'll change it tonight, and also try to fix a crazy problem that causes my motherboard to refuse more than one memory module, forcing it to use single channel DDR3. I don't expect radically improved CUDA performance, but we'll see.
BigBrother is offline   Reply With Quote
Old 2012-03-27, 11:44   #1697
msft
 
msft's Avatar
 
Jul 2009
Tokyo

61010 Posts
Default

Quote:
Originally Posted by Prime95 View Post
This is somewhat surprising to me. However, I thought mfaktc would use the more numerous CUDA cores to do the 32-bit muls and adds that predominate in TF. Where did I go wrong?
http://forums.nvidia.com/index.php?s...&#entry1387312
Quote:
* Relative to the throughput of single precision multiply-add, the throughput of integer shifts, integer comparison, and integer multiplication is lower than before.
It is answer?
msft is offline   Reply With Quote
Old 2012-03-27, 12:33   #1698
nucleon
 
nucleon's Avatar
 
Mar 2003
Melbourne

5·103 Posts
Default

Quote:
Originally Posted by TheJudger View Post
I guess I need to buy a GTX 6[78]0... ;)
I'd be curious if you can weave some more TheJudger magic to get more out of the GTX680. :)

Now with some performance figures out, I'm pretty disappointed. I was hoping to buy some GTX680s to replace some hardware here to reduce my power bill.

It doesn't even surpass what I have on performance per watt metrics.

-- Craig
nucleon is offline   Reply With Quote
Old 2012-03-27, 14:13   #1699
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11·311 Posts
Default

Quote:
Originally Posted by axn View Post
You're nearly there. Rather than using the cum.prob., just use the probability for the given bit depth. You should see a rough doubling of the % with every bit.
Thanks. I didn't have my brain screwed on quite straight yesterday, but I think I've fixed it so it makes sense now.
http://mersenne-aries.sili.net/cudalucas.php?model=13
James Heinrich is offline   Reply With Quote
Old 2012-03-27, 14:37   #1700
BigBrother
 
Feb 2005
The Netherlands

2·109 Posts
Default

Quote:
Originally Posted by msft View Post
Some exact numbers: (Operations per Clock Cycle per Multiprocessor)
Code:
                  CC 1.x           CC 2.0         CC 2.1         CC 3.0
                  
32-bit floating
point add,          8                32             48             192
multiply,
multiply-add

64-bit floating
point add,          1                16              4              8
multiply,
multiply-add

32-bit
integer add         10               32             48             168

32-bit integer
multiply,
multiply-add,      Multiple          16             16              32
sum of absolute  instructions
difference
From table 5-1 in the CUDA C Programming Guide Version 4.2

Not much love for 32-bit integer multiply & multiply-add, compared to 32-bit floating point operations.
BigBrother is offline   Reply With Quote
Old 2012-03-27, 15:01   #1701
axn
 
axn's Avatar
 
Jun 2003

2·3·7·112 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
Thanks. I didn't have my brain screwed on quite straight yesterday, but I think I've fixed it so it makes sense now.
http://mersenne-aries.sili.net/cudalucas.php?model=13
Much better. Now, if we could just drill down an individual row to 1M granularity...
axn is online now   Reply With Quote
Old 2012-03-27, 15:12   #1702
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
I didn't have my brain screwed on quite straight yesterday
That page has really come a long way in a short time. Another great tool!
Thanks for doing it.

BTW: I wasn't thinking too well, either, when I ran the CuLu benchmarks. Sorry for the incomplete data, James.
kladner is offline   Reply With Quote
Old 2012-03-27, 16:09   #1703
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11×311 Posts
Default

Quote:
Originally Posted by axn View Post
Much better. Now, if we could just drill down an individual row to 1M granularity...
You can if you click the zoom in/out links I just added.
James Heinrich is offline   Reply With Quote
Old 2012-03-27, 16:14   #1704
msft
 
msft's Avatar
 
Jul 2009
Tokyo

11428 Posts
Default

Quote:
Originally Posted by BigBrother View Post
Some exact numbers: (Operations per Clock Cycle per Multiprocessor)
Code:
                  CC 1.x           CC 2.0         CC 2.1         CC 3.0
32-bit integer
multiply,
multiply-add,      Multiple          16             16              32
sum of absolute  instructions
difference
GTX-580 have 16 Multiprocessors,GTX-680 have 8.
GTX-680 each Multiprocessor have 192 core,But only 32 32-bit integer multiply exec.
Lots of thread wait exec.
Code:
                  CC 1.x           CC 2.0         CC 2.1         CC 3.0
32-bit integer
shift compare     8               16                16                8
msft is offline   Reply With Quote
Old 2012-03-27, 17:52   #1705
BigBrother
 
Feb 2005
The Netherlands

2·109 Posts
Default

Quote:
Originally Posted by BigBrother View Post
It turns out that I plugged my brand new shiny bling-bling GTX680 into a PCI-E 2.0 x8 slot instead of a PCI-E 2.0 x16 slot... I'll change it tonight, and also try to fix a crazy problem that causes my motherboard to refuse more than one memory module, forcing it to use single channel DDR3. I don't expect radically improved CUDA performance, but we'll see.
Well, The Card is now inserted into a PCI-E 2.0 x16 slot, and my brain surgery skills allowed me to fix a bent pin on the CPU socket so my memory is back at dual channel again.

One instance of mfaktc is now taking +-70% GPU instead of the 74% I reported yesterday, and nVidia's Visual Profiler shows transfer rates of 6Gb/s instead of 3 Gb/s, but since the amount of data to transfer is relatively small, there's no earth-shattering improvement. I could run the same benchmark I did yesterday again if James would like me to do that.
BigBrother is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 11:57.


Mon Aug 2 11:57:35 UTC 2021 up 10 days, 6:26, 0 users, load averages: 1.67, 1.70, 1.43

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.