![]() |
[QUOTE=Uncwilly;223143]What is the FLOP/W of each?[/QUOTE]GTX 295: 149.04 GFLOP/s
PIII 800MHz: 3.2 GFLOP/s Using a slightly high 500 W as the power figure for the graphics card based system and the extremely low figure of 40 W for the PIII system: GTX 295: 298.08 MFLOP/J PIII 800 MHz: 80 MFLOP/J |
[QUOTE=lavalamp;223204]PIII 800MHz: 3.2 GFLOP/s[/QUOTE]
1.6 GFLOP/s (maybe even 0.8). Not until Core 2, could x86 do 4 FLOP / cycle. |
[QUOTE=axn;223207]1.6 GFLOP/s (maybe even 0.8). Not until Core 2, could x86 do 4 FLOP / cycle.[/QUOTE][quote=all knowing wikipedia]The first Pentium III variant was the Katmai (Intel product code 80525).[/quote][quote=all knowing wikipedia]Since Katmai was built in the same 0.25 µm process as Pentium II "Deschutes", it had to implement SSE using as little silicon as possible. To achieve this goal, Intel implemented the 128-bit architecture by double-cycling the existing 64-bit data paths and by merging the SIMD-FP multiplier unit with the x87 scalar FPU multiplier into a single unit. To utilize the existing 64-bit data paths, Katmai issues each SIMD-FP instruction as two μops. To compensate partially for implementing only half of SSE’s architectural width, Katmai implements the SIMD-FP adder as a separate unit on the second dispatch port. This organization allows one half of a SIMD multiply and one half of an independent SIMD add to be issued together [b]bringing the peak throughput back to four floating point operations per cycle[/b] — at least for code with an even distribution of multiplies and adds.[/quote]Emphasis mine.
The 800 MHz Pentium III is either a Coppermine or a Coppermine T based chip, both of which were released after Katmai. If it makes a difference, I found information on a different website about the throughput of the PIII and confirmed it with wikipedia, but I can't remember what the other site was so you get wikipedia quotes. |
[QUOTE=lavalamp;223224]Emphasis mine.
The 800 MHz Pentium III is either a Coppermine or a Coppermine T based chip, both of which were released after Katmai. If it makes a difference, I found information on a different website about the throughput of the PIII and confirmed it with wikipedia, but I can't remember what the other site was so you get wikipedia quotes.[/QUOTE] Interesting. Not sure that these are applicable for double precision though (which is the relevant metric for the context). |
[QUOTE=axn;223226]Interesting. Not sure that these are applicable for double precision though (which is the relevant metric for the context).[/QUOTE]
I can confirm that the PIII has a throughput of 1 double-precision floating point operation per clock cycle. I don't know if the quoted "GTX 295: 149.04 GFLOP/s" was for single or double precision flops. |
[QUOTE=Prime95;223238]I don't know if the quoted "GTX 295: 149.04 GFLOP/s" was for single or double precision flops.[/QUOTE]
I had assumed that this was DP, since [URL="http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units#GeForce_200_Series"]all knowing wiki[/URL] states that the raw thruput is 1788.480 GFLOPS. |
I am now a bit unsure of the figure for the GTX 295.
I calculated the single precision throughput for the card as: op/clock * shaders * freq / 1000 = 2 * 480 * 1242 / 1000 = 1192.32 GFLOP/s However, it appears that the card can do three floating point operations per cycle, not two. So 3 * 480 * 1242 / 1000 = 1788.48 To get the double precision performance from the single precision performance, divide by 8. So it seems likely that the figure I gave was only 2/3 of the double precision throughput, in which case the true value would be 223.56 DP GFLOP/s. So, revised figures: GTX 295: 223.56 DP GFLOP/s PIII 800MHz: 0.8 DP GFLOP/s Using the same 500 W and 40 W values for the power draw of each system: GTX 295: 447.12 DP MFLOP/J PIII 800 MHz: 20 DP MFLOP/J |
Your previous figure of 149 GFLOP matches what is described here: [url]http://perspectives.mvdirona.com/2009/03/15/HeterogeneousComputingUsingGPGPUsNVidiaGT200.aspx[/url]
|
llrpsrc.zip include small treasure, Complex FFT version IBDWT.
|
[QUOTE=ldesnogu;223290]Your previous figure of 149 GFLOP matches what is described here: [url]http://perspectives.mvdirona.com/2009/03/15/HeterogeneousComputingUsingGPGPUsNVidiaGT200.aspx[/url][/QUOTE]
*dies from nervous exhaustion* Seems it's 3 ops/clock for SP and 2 ops/clock for DP then, and SP performance gets multiplied by 8 thanks to the "Streaming Processors". GTX 295: 1788.48 SP GFLOP/s GTX 295: 149.04 DP GFLOP/s As agent Smith once said, "Is it over?" |
What are the GFLOP/s and the MFLOP/J for a core i7? Just wondering.
|
| All times are UTC. The time now is 22:30. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.