mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   Nvidia's next-generation graphics cards (https://www.mersenneforum.org/showthread.php?t=14162)

ixfd64 2010-11-06 01:22

Nvidia's next-generation graphics cards
 
A lot of people were disappointed with Nvidia's GTX 480 video card, one of the reasons being that it had a much lower FLOPS number than the HD 5870. Of course, the FLOPS number alone does not determine the performance of a graphics card. However, the GeForce 400 series only yields 12.5% double precision, while the Fermi architecture was designed to allow 50%. For the purpose of finding prime numbers, this isn't very useful.

Anyways, Nvidia recently released a rough roadmap that mentions two future architectures: Kepler and Maxwell. More information here: [url]http://www.tomshardware.com/news/fermi-kepler-maxwell-gigaflop-watt,11339.html[/url]

Also, the GeForce 500 series was announced about two weeks ago: [url]http://www.tomshardware.com/news/gpu-gtx-580-sli-gts-gtx,11529.html[/url]

Some people say that the GeForce 500 series is just a "tweaked" version of GeForce 400, while others are saying that it will be a brand new design that will offer as many as 768 CUDA cores.

What does everyone make of this?

msft 2010-11-06 05:50

Hi ,ixfd64

Frmky report GTX480 2M FFT 4.43 ms/iter.
TheJudger report Tesla C2050 2M FFT 4.3 ms/iter.

I think LLR performance depend memory bandwidth.

fivemack 2010-11-06 11:00

I expect the graphics cards sold to the gaming market to continue to have double-precision crippled: the cost of getting decent CUDA code written is still enough that people only do it when they have a serious need, and organisations with a serious need are reasonably budget insensitive. AMD has no double-precision in anything but its highest-end graphics cards (the 58xx series), but AMD's attempts to sell something like the Tesla boards has had no perceptible success.

So they'll continue to get faster, but you'll always have to buy a $300 card to get the double precision, even if the $120 card outperforms in most other metrics the last generation's $300 card.

TheJudger 2010-11-06 12:46

[QUOTE=msft;235773]Hi ,ixfd64

Frmky report GTX480 2M FFT 4.43 ms/iter.
TheJudger report Tesla C2050 2M FFT 4.3 ms/iter.

I think LLR performance depend memory bandwidth.[/QUOTE]

Yepp, memory bandwidth is a problem in many cases, not just LLR. :sad:

A [B]simplified[/B] calculation:
A GTX 480 has 480 cores (enabled), all running at 1401MHz: It is capable of 168GFLOPS in double precission (DP): 480(cores) * 1401MHz * 2 / 8 = 168.12GFLOPS.
Those FLOPS are multiply-adds. For each independend multiply-add we need 3 inputs and have one output. So each DP float has 64bits / 8bytes. So for each multiply-add we need to read 3x 8byte and write 1x 8byte => 32bytes of bandwidth for a single multiply-add.
168.12GFLOPS * 32bytes = ~5.4TB/sec bandwidth needed. A GTX 480 has 177.4GB/sec on device memory...
A paper from Vasily Volkov mentions that a GTX 480 has ~1.3TB/sec bandwitdh in onchip shared memory. The size of the shared memory is ~1MB in total (which is splitt into smaller pieces of L1-cache and shared memory for each multiprocessor). Register bandwidth seems to be at least 8TB/sec but all registers together are ~2MB.
None of the onchip registers, shared memory and L2-cache is big enough to hold the hole dataset for an LLR so we need really the bandwidth of the device memory.
Of course this is a very simplified reflection. In reality you can hide some traffic to the slow device memory... but device memory bandwidth is a limitation.
So now imaging why a Tesla 20x0 doesn't perform much better than a GTX 480: C2050 has 515GFLOPS DP and only 144GB/sec on device memory...


Oliver

ET_ 2010-11-06 20:27

I will repost here an interesting link coming from [URL="http://www.mersenneforum.org/showthread.php?t=14150"]this[/URL] thread...

[URL="http://www.gpgpgpu.com/gecco2009/6.pdf"]http://www.gpgpgpu.com/gecco2009/6.pdf[/URL]

Luigi

msft 2010-11-07 10:47

Hi, TheJudger
[QUOTE=TheJudger;235789]Yepp, memory bandwidth is a problem in many cases, not just LLR. :sad:[/QUOTE]
Linpack(TOP500) is gift.
Matrix multiplication is not depend mamory bandwidth.

Ken_g6 2010-11-09 16:27

[QUOTE=ET_;235821]I will repost here an interesting link coming from [URL="http://www.mersenneforum.org/showthread.php?t=14150"]this[/URL] thread...

[URL="http://www.gpgpgpu.com/gecco2009/6.pdf"]http://www.gpgpgpu.com/gecco2009/6.pdf[/URL]

Luigi[/QUOTE]

FYI, that looks like it's doing tests of primes < 2^32! There's no FFT involved there, and the FFT is what makes things much harder for LLR.

ET_ 2010-11-09 16:59

[QUOTE=Ken_g6;236267]FYI, that looks like it's doing tests of primes < 2^32! There's no FFT involved there, and the FFT is what makes things much harder for LLR.[/QUOTE]

You are right, but I'm not wrong! :smile:

I was considering uses of GPU for other taks, apart from LLR on a thread dedicated to nVidia GPUs...

Luigi

ewmayer 2010-11-09 18:02

I am preparing to port some Mersenne TF code to a Tesla GPU over the coming months ... basic question about the FP arithmetic coding and optimization: Should one target HLL code to the GPU's FMADD-based ISA, e.g. by refactoring one's C code to provide "hints" to the compiler, or are there more-direct way of doing so, e.g. by way of compiler intrinsics or direct GPU-specific inline ASM?
(I have an SSE2 inline-ASM version of the key factoring modpow loop, but I expect the GPU analog of same is going to look rather different.)

Sorry if this has been answered elsewhere - I'm just at the point where I'm starting to gather the various GPU-specific documentation and figuring out which is likely to be the most relevant to my work.

Thanks,
-Ernst

delta_t 2010-11-09 19:39

GTX580 review
 
Saw this up on AnandTech:
[url]http://www.anandtech.com/show/4008/nvidias-geforce-gtx-580[/url]

Karl M Johnson 2010-11-09 22:17

I was very worried if GTX 580 would become sm_21 sh!t, but, luckly, it aint!
Woohoo!


All times are UTC. The time now is 05:58.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.