![]() |
|
|
#1 |
|
Bemusing Prompter
"Danny"
Dec 2002
California
23×13×23 Posts |
A lot of people were disappointed with Nvidia's GTX 480 video card, one of the reasons being that it had a much lower FLOPS number than the HD 5870. Of course, the FLOPS number alone does not determine the performance of a graphics card. However, the GeForce 400 series only yields 12.5% double precision, while the Fermi architecture was designed to allow 50%. For the purpose of finding prime numbers, this isn't very useful.
Anyways, Nvidia recently released a rough roadmap that mentions two future architectures: Kepler and Maxwell. More information here: http://www.tomshardware.com/news/fer...att,11339.html Also, the GeForce 500 series was announced about two weeks ago: http://www.tomshardware.com/news/gpu...gtx,11529.html Some people say that the GeForce 500 series is just a "tweaked" version of GeForce 400, while others are saying that it will be a brand new design that will offer as many as 768 CUDA cores. What does everyone make of this? |
|
|
|
|
|
#2 |
|
Jul 2009
Tokyo
2×5×61 Posts |
Hi ,ixfd64
Frmky report GTX480 2M FFT 4.43 ms/iter. TheJudger report Tesla C2050 2M FFT 4.3 ms/iter. I think LLR performance depend memory bandwidth. |
|
|
|
|
|
#3 |
|
(loop (#_fork))
Feb 2006
Cambridge, England
72·131 Posts |
I expect the graphics cards sold to the gaming market to continue to have double-precision crippled: the cost of getting decent CUDA code written is still enough that people only do it when they have a serious need, and organisations with a serious need are reasonably budget insensitive. AMD has no double-precision in anything but its highest-end graphics cards (the 58xx series), but AMD's attempts to sell something like the Tesla boards has had no perceptible success.
So they'll continue to get faster, but you'll always have to buy a $300 card to get the double precision, even if the $120 card outperforms in most other metrics the last generation's $300 card. |
|
|
|
|
|
#4 | |
|
"Oliver"
Mar 2005
Germany
11·101 Posts |
Quote:
![]() A simplified calculation: A GTX 480 has 480 cores (enabled), all running at 1401MHz: It is capable of 168GFLOPS in double precission (DP): 480(cores) * 1401MHz * 2 / 8 = 168.12GFLOPS. Those FLOPS are multiply-adds. For each independend multiply-add we need 3 inputs and have one output. So each DP float has 64bits / 8bytes. So for each multiply-add we need to read 3x 8byte and write 1x 8byte => 32bytes of bandwidth for a single multiply-add. 168.12GFLOPS * 32bytes = ~5.4TB/sec bandwidth needed. A GTX 480 has 177.4GB/sec on device memory... A paper from Vasily Volkov mentions that a GTX 480 has ~1.3TB/sec bandwitdh in onchip shared memory. The size of the shared memory is ~1MB in total (which is splitt into smaller pieces of L1-cache and shared memory for each multiprocessor). Register bandwidth seems to be at least 8TB/sec but all registers together are ~2MB. None of the onchip registers, shared memory and L2-cache is big enough to hold the hole dataset for an LLR so we need really the bandwidth of the device memory. Of course this is a very simplified reflection. In reality you can hide some traffic to the slow device memory... but device memory bandwidth is a limitation. So now imaging why a Tesla 20x0 doesn't perform much better than a GTX 480: C2050 has 515GFLOPS DP and only 144GB/sec on device memory... Oliver |
|
|
|
|
|
|
#5 |
|
Banned
"Luigi"
Aug 2002
Team Italia
32×5×107 Posts |
I will repost here an interesting link coming from this thread...
http://www.gpgpgpu.com/gecco2009/6.pdf Luigi |
|
|
|
|
|
#6 |
|
Jul 2009
Tokyo
2×5×61 Posts |
|
|
|
|
|
|
#7 | |
|
Jan 2005
Caught in a sieve
5·79 Posts |
Quote:
|
|
|
|
|
|
|
#8 | |
|
Banned
"Luigi"
Aug 2002
Team Italia
32×5×107 Posts |
Quote:
![]() I was considering uses of GPU for other taks, apart from LLR on a thread dedicated to nVidia GPUs... Luigi |
|
|
|
|
|
|
#9 |
|
∂2ω=0
Sep 2002
República de California
2D7716 Posts |
I am preparing to port some Mersenne TF code to a Tesla GPU over the coming months ... basic question about the FP arithmetic coding and optimization: Should one target HLL code to the GPU's FMADD-based ISA, e.g. by refactoring one's C code to provide "hints" to the compiler, or are there more-direct way of doing so, e.g. by way of compiler intrinsics or direct GPU-specific inline ASM?
(I have an SSE2 inline-ASM version of the key factoring modpow loop, but I expect the GPU analog of same is going to look rather different.) Sorry if this has been answered elsewhere - I'm just at the point where I'm starting to gather the various GPU-specific documentation and figuring out which is likely to be the most relevant to my work. Thanks, -Ernst |
|
|
|
|
|
#10 |
|
Nov 2002
Anchorage, AK
3×7×17 Posts |
Saw this up on AnandTech:
http://www.anandtech.com/show/4008/n...eforce-gtx-580 |
|
|
|
|
|
#11 |
|
Mar 2010
41110 Posts |
I was very worried if GTX 580 would become sm_21 sh!t, but, luckly, it aint!
Woohoo! |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Prime95 and graphics cards | keithschmidt | Information & Answers | 45 | 2016-09-10 10:08 |
| New Linux rootkit leverages graphics cards for stealth. | swl551 | Lounge | 0 | 2015-05-08 14:06 |
| Report: Nvidia Making Dual-GK110 Graphics Card | kracker | GPU Computing | 8 | 2013-08-29 11:32 |
| how do graphics cards work so fast? | ixfd64 | Hardware | 1 | 2004-06-02 03:01 |
| Chance to use modern Graphics Cards as.. | Marco | Hardware | 28 | 2003-11-02 23:21 |