![]() |
![]() |
#1 |
Bemusing Prompter
"Danny"
Dec 2002
California
5·499 Posts |
![]()
I know that it's generally very hard to efficiently convert single to double precision, except in the case of nVidia's high-end Fermi GPUs. But what about the other way around? For example, would 100 GFLOPS of double precision easily convert to 200 GFLOPS of single precision?
Sorry if this is a dumb question. |
![]() |
![]() |
![]() |
#2 |
"Mark"
Apr 2003
Between here and the
23×13×67 Posts |
![]()
Simply put, no. In other words, An FPU cannot do twice as much SP work in the same amount of time as DP work just because the size of the variables it is working with are half the size. One of the reasons is that the FP registers are 64 bits and can only hold one value. You can't put two 32-bit values in a 64-bit FP register. WRT vector programming, that doesn't apply.
Last fiddled with by rogue on 2011-02-03 at 00:29 |
![]() |
![]() |
![]() |
#3 |
Bemusing Prompter
"Danny"
Dec 2002
California
5×499 Posts |
![]()
Yeah, I was thinking of vector processing. I do know that Intel's "Sandy Bridge" chips are supposed to be up to twice as fast as those of the previous generation due to the use of 256-bit registers. Strangely, the FLOPS numbers of the newly released chips do not reflect this, but then again, FLOPS are not the only means of measuring a processor's performance.
|
![]() |
![]() |
![]() |
#4 |
Bemusing Prompter
"Danny"
Dec 2002
California
5·499 Posts |
![]()
Sorry for bumping such an old thread, but some of the slides from IDF 2012 show that Intel's newer chips are able to do twice as many SP FLOPS as DP FLOPS per clock cycle. Interesting.
Last fiddled with by ixfd64 on 2012-09-11 at 23:56 Reason: missing "to" |
![]() |
![]() |
![]() |
#5 | |
∂2ω=0
Sep 2002
República de California
5×2,351 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#6 |
Romulan Interpreter
"name field"
Jun 2011
Thailand
282916 Posts |
![]()
That for sure. Win for SP would be for (about) 9 to 12 SPFlops per DPFlop. Think about a very simple example: multiplying two DPFloat numbers A*f+B and C*f+D, where f is the size of a SPFloat, you need 4 SPFloats to store them and you need 4 SPFlops to multiply them (or 3 with Karatsuba, with some overload of additions and subtractions). If you can multiply the two DPFloats in a single flop, then you are 4 times faster already. Add this to the ability to store larger numbers (when you do carry propagation) and/or more accurate/higher precision and you see that 2 times (even 4 times) faster SPFlops is not enough to beat DP.
Another example, think to very fast video cards, which can get almost 2 TeraFlops of SP, but only 300-400 GigaFlops of DP (5-6 times less). If "times 4" or "times 5" would be enough, why the manufacturers don't use (micro)programming to do a DPFlop with 4 SPflops? |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
does half-precision have any use for GIMPS? | ixfd64 | GPU Computing | 9 | 2017-08-05 22:12 |
Fast double precision Division | __HRB__ | Programming | 21 | 2012-01-10 02:10 |
so what GIMPS work can single precision do? | ixfd64 | Hardware | 21 | 2007-10-16 03:32 |
Double precision GPUs coming very soon | dsouza123 | Hardware | 4 | 2007-10-15 02:20 |
double precision in LL tests | drew | Software | 4 | 2006-08-08 04:08 |