![]() |
I can saturate (100% GPU) my GTX680 when running two instances of mfaktc.
[IMG]http://gpuz.techpowerup.com/12/03/27/a9a.png[/IMG] Note that the GPU core clock is constantly boosted to +-1100MHz and the power consumption hovers around 72% TDP, which could mean that the performance/Watt for this chip is higher than in James' calculations. This power consumption sensor seems to be a new feature on this chip, I've never seen it displayed in GPU-Z before on any other card. |
It turns out that I plugged my brand new shiny bling-bling GTX680 into a PCI-E 2.0 x8 slot instead of a PCI-E 2.0 x16 slot... :blush: I'll change it tonight, and also try to fix a crazy problem that causes my motherboard to refuse more than one memory module, forcing it to use single channel DDR3. I don't expect radically improved CUDA performance, but we'll see.
|
[QUOTE=Prime95;294330]This is somewhat surprising to me. However, I thought mfaktc would use the more numerous CUDA cores to do the 32-bit muls and adds that predominate in TF. Where did I go wrong?[/QUOTE]
[URL="http://forums.nvidia.com/index.php?showtopic=225312&st=20&p=1387312&#entry1387312"]http://forums.nvidia.com/index.php?showtopic=225312&st=20&p=1387312&#entry1387312[/URL] [QUOTE] * Relative to the throughput of single precision multiply-add, the throughput of integer shifts, integer comparison, and integer multiplication is lower than before. [/QUOTE] It is answer? |
[QUOTE=TheJudger;293953]I guess I need to buy a GTX 6[78]0... ;)[/QUOTE]
I'd be curious if you can weave some more TheJudger magic to get more out of the GTX680. :) Now with some performance figures out, I'm pretty disappointed. I was hoping to buy some GTX680s to replace some hardware here to reduce my power bill. It doesn't even surpass what I have on performance per watt metrics. -- Craig |
[QUOTE=axn;294351]You're nearly there. Rather than using the cum.prob., just use the probability for the given bit depth. You should see a rough doubling of the % with every bit.[/QUOTE]Thanks. I didn't have my brain screwed on quite straight yesterday, but I think I've fixed it so it makes sense now.
[url]http://mersenne-aries.sili.net/cudalucas.php?model=13[/url] |
[QUOTE=msft;294375][URL="http://forums.nvidia.com/index.php?showtopic=225312&st=20&p=1387312&#entry1387312"]http://forums.nvidia.com/index.php?showtopic=225312&st=20&p=1387312&#entry1387312[/URL]
It is answer?[/QUOTE] Some exact numbers: (Operations per Clock Cycle per Multiprocessor) [CODE] CC 1.x CC 2.0 CC 2.1 CC 3.0 32-bit floating point add, 8 32 48 192 multiply, multiply-add 64-bit floating point add, 1 16 4 8 multiply, multiply-add 32-bit integer add 10 32 48 168 32-bit integer multiply, multiply-add, Multiple 16 16 32 sum of absolute instructions difference [/CODE] From table 5-1 in the CUDA C Programming Guide Version 4.2 Not much love for 32-bit integer multiply & multiply-add, compared to 32-bit floating point operations. |
[QUOTE=James Heinrich;294383]Thanks. I didn't have my brain screwed on quite straight yesterday, but I think I've fixed it so it makes sense now.
[url]http://mersenne-aries.sili.net/cudalucas.php?model=13[/url][/QUOTE] Much better. Now, if we could just drill down an individual row to 1M granularity... :whistle: |
[QUOTE=James Heinrich;294383]I didn't have my brain screwed on quite straight yesterday[URL="http://mersenne-aries.sili.net/cudalucas.php?model=13"][/URL][/QUOTE]
That page has really come a long way in a short time. Another great tool! Thanks for doing it. BTW: I wasn't thinking too well, either, when I ran the CuLu benchmarks. Sorry for the incomplete data, James. |
[QUOTE=axn;294388]Much better. Now, if we could just drill down an individual row to 1M granularity... :whistle:[/QUOTE]You can if you click the zoom in/out links I just added. :smile:
|
[QUOTE=BigBrother;294384]Some exact numbers: (Operations per Clock Cycle per Multiprocessor)
[CODE] CC 1.x CC 2.0 CC 2.1 CC 3.0 32-bit integer multiply, multiply-add, Multiple 16 16 32 sum of absolute instructions difference [/CODE] [/QUOTE] GTX-580 have 16 Multiprocessors,GTX-680 have 8. GTX-680 each Multiprocessor have 192 core,But only 32 32-bit integer multiply exec. Lots of thread wait exec. [CODE] CC 1.x CC 2.0 CC 2.1 CC 3.0 32-bit integer shift compare 8 16 16 8 [/CODE] |
[QUOTE=BigBrother;294372]It turns out that I plugged my brand new shiny bling-bling GTX680 into a PCI-E 2.0 x8 slot instead of a PCI-E 2.0 x16 slot... :blush: I'll change it tonight, and also try to fix a crazy problem that causes my motherboard to refuse more than one memory module, forcing it to use single channel DDR3. I don't expect radically improved CUDA performance, but we'll see.[/QUOTE]
Well, The Card is now inserted into a PCI-E 2.0 x16 slot, and my brain surgery skills allowed me to fix a bent pin on the CPU socket so my memory is back at dual channel again. :cool: One instance of mfaktc is now taking +-70% GPU instead of the 74% I reported yesterday, and nVidia's Visual Profiler shows transfer rates of 6Gb/s instead of 3 Gb/s, but since the amount of data to transfer is relatively small, there's no earth-shattering improvement. I could run the same benchmark I did yesterday again if James would like me to do that. |
| All times are UTC. The time now is 23:16. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.