mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

BigBrother 2012-03-27 08:05

I can saturate (100% GPU) my GTX680 when running two instances of mfaktc.

[IMG]http://gpuz.techpowerup.com/12/03/27/a9a.png[/IMG]

Note that the GPU core clock is constantly boosted to +-1100MHz and the power consumption hovers around 72% TDP, which could mean that the performance/Watt for this chip is higher than in James' calculations. This power consumption sensor seems to be a new feature on this chip, I've never seen it displayed in GPU-Z before on any other card.

BigBrother 2012-03-27 11:13

It turns out that I plugged my brand new shiny bling-bling GTX680 into a PCI-E 2.0 x8 slot instead of a PCI-E 2.0 x16 slot... :blush: I'll change it tonight, and also try to fix a crazy problem that causes my motherboard to refuse more than one memory module, forcing it to use single channel DDR3. I don't expect radically improved CUDA performance, but we'll see.

msft 2012-03-27 11:44

[QUOTE=Prime95;294330]This is somewhat surprising to me. However, I thought mfaktc would use the more numerous CUDA cores to do the 32-bit muls and adds that predominate in TF. Where did I go wrong?[/QUOTE]
[URL="http://forums.nvidia.com/index.php?showtopic=225312&st=20&p=1387312&#entry1387312"]http://forums.nvidia.com/index.php?showtopic=225312&st=20&p=1387312&#entry1387312[/URL]
[QUOTE]
* Relative to the throughput of single precision multiply-add, the throughput of integer shifts, integer comparison, and integer multiplication is lower than before.
[/QUOTE]
It is answer?

nucleon 2012-03-27 12:33

[QUOTE=TheJudger;293953]I guess I need to buy a GTX 6[78]0... ;)[/QUOTE]

I'd be curious if you can weave some more TheJudger magic to get more out of the GTX680. :)

Now with some performance figures out, I'm pretty disappointed. I was hoping to buy some GTX680s to replace some hardware here to reduce my power bill.

It doesn't even surpass what I have on performance per watt metrics.

-- Craig

James Heinrich 2012-03-27 14:13

[QUOTE=axn;294351]You're nearly there. Rather than using the cum.prob., just use the probability for the given bit depth. You should see a rough doubling of the % with every bit.[/QUOTE]Thanks. I didn't have my brain screwed on quite straight yesterday, but I think I've fixed it so it makes sense now.
[url]http://mersenne-aries.sili.net/cudalucas.php?model=13[/url]

BigBrother 2012-03-27 14:37

[QUOTE=msft;294375][URL="http://forums.nvidia.com/index.php?showtopic=225312&st=20&p=1387312&#entry1387312"]http://forums.nvidia.com/index.php?showtopic=225312&st=20&p=1387312&#entry1387312[/URL]

It is answer?[/QUOTE]

Some exact numbers: (Operations per Clock Cycle per Multiprocessor)
[CODE]
CC 1.x CC 2.0 CC 2.1 CC 3.0

32-bit floating
point add, 8 32 48 192
multiply,
multiply-add

64-bit floating
point add, 1 16 4 8
multiply,
multiply-add

32-bit
integer add 10 32 48 168

32-bit integer
multiply,
multiply-add, Multiple 16 16 32
sum of absolute instructions
difference
[/CODE]
From table 5-1 in the CUDA C Programming Guide Version 4.2

Not much love for 32-bit integer multiply & multiply-add, compared to 32-bit floating point operations.

axn 2012-03-27 15:01

[QUOTE=James Heinrich;294383]Thanks. I didn't have my brain screwed on quite straight yesterday, but I think I've fixed it so it makes sense now.
[url]http://mersenne-aries.sili.net/cudalucas.php?model=13[/url][/QUOTE]

Much better. Now, if we could just drill down an individual row to 1M granularity... :whistle:

kladner 2012-03-27 15:12

[QUOTE=James Heinrich;294383]I didn't have my brain screwed on quite straight yesterday[URL="http://mersenne-aries.sili.net/cudalucas.php?model=13"][/URL][/QUOTE]

That page has really come a long way in a short time. Another great tool!
Thanks for doing it.

BTW: I wasn't thinking too well, either, when I ran the CuLu benchmarks. Sorry for the incomplete data, James.

James Heinrich 2012-03-27 16:09

[QUOTE=axn;294388]Much better. Now, if we could just drill down an individual row to 1M granularity... :whistle:[/QUOTE]You can if you click the zoom in/out links I just added. :smile:

msft 2012-03-27 16:14

[QUOTE=BigBrother;294384]Some exact numbers: (Operations per Clock Cycle per Multiprocessor)
[CODE]
CC 1.x CC 2.0 CC 2.1 CC 3.0
32-bit integer
multiply,
multiply-add, Multiple 16 16 32
sum of absolute instructions
difference
[/CODE]
[/QUOTE]
GTX-580 have 16 Multiprocessors,GTX-680 have 8.
GTX-680 each Multiprocessor have 192 core,But only 32 32-bit integer multiply exec.
Lots of thread wait exec.
[CODE]
CC 1.x CC 2.0 CC 2.1 CC 3.0
32-bit integer
shift compare 8 16 16 8
[/CODE]

BigBrother 2012-03-27 17:52

[QUOTE=BigBrother;294372]It turns out that I plugged my brand new shiny bling-bling GTX680 into a PCI-E 2.0 x8 slot instead of a PCI-E 2.0 x16 slot... :blush: I'll change it tonight, and also try to fix a crazy problem that causes my motherboard to refuse more than one memory module, forcing it to use single channel DDR3. I don't expect radically improved CUDA performance, but we'll see.[/QUOTE]

Well, The Card is now inserted into a PCI-E 2.0 x16 slot, and my brain surgery skills allowed me to fix a bent pin on the CPU socket so my memory is back at dual channel again. :cool:

One instance of mfaktc is now taking +-70% GPU instead of the 74% I reported yesterday, and nVidia's Visual Profiler shows transfer rates of 6Gb/s instead of 3 Gb/s, but since the amount of data to transfer is relatively small, there's no earth-shattering improvement. I could run the same benchmark I did yesterday again if James would like me to do that.


All times are UTC. The time now is 23:16.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.