mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2010-07-28, 22:36   #232
lavalamp
 
lavalamp's Avatar
 
Oct 2007
Manchester, UK

23×59 Posts
Default

Quote:
Originally Posted by Uncwilly View Post
What is the FLOP/W of each?
GTX 295: 149.04 GFLOP/s
PIII 800MHz: 3.2 GFLOP/s

Using a slightly high 500 W as the power figure for the graphics card based system and the extremely low figure of 40 W for the PIII system:

GTX 295: 298.08 MFLOP/J
PIII 800 MHz: 80 MFLOP/J

Last fiddled with by lavalamp on 2010-07-28 at 22:36
lavalamp is offline   Reply With Quote
Old 2010-07-28, 23:03   #233
axn
 
axn's Avatar
 
Jun 2003

32·5·113 Posts
Default

Quote:
Originally Posted by lavalamp View Post
PIII 800MHz: 3.2 GFLOP/s
1.6 GFLOP/s (maybe even 0.8). Not until Core 2, could x86 do 4 FLOP / cycle.
axn is offline   Reply With Quote
Old 2010-07-29, 04:17   #234
lavalamp
 
lavalamp's Avatar
 
Oct 2007
Manchester, UK

23×59 Posts
Default

Quote:
Originally Posted by axn View Post
1.6 GFLOP/s (maybe even 0.8). Not until Core 2, could x86 do 4 FLOP / cycle.
Quote:
Originally Posted by all knowing wikipedia
The first Pentium III variant was the Katmai (Intel product code 80525).
Quote:
Originally Posted by all knowing wikipedia
Since Katmai was built in the same 0.25 µm process as Pentium II "Deschutes", it had to implement SSE using as little silicon as possible. To achieve this goal, Intel implemented the 128-bit architecture by double-cycling the existing 64-bit data paths and by merging the SIMD-FP multiplier unit with the x87 scalar FPU multiplier into a single unit. To utilize the existing 64-bit data paths, Katmai issues each SIMD-FP instruction as two μops. To compensate partially for implementing only half of SSE’s architectural width, Katmai implements the SIMD-FP adder as a separate unit on the second dispatch port. This organization allows one half of a SIMD multiply and one half of an independent SIMD add to be issued together bringing the peak throughput back to four floating point operations per cycle — at least for code with an even distribution of multiplies and adds.
Emphasis mine.

The 800 MHz Pentium III is either a Coppermine or a Coppermine T based chip, both of which were released after Katmai.

If it makes a difference, I found information on a different website about the throughput of the PIII and confirmed it with wikipedia, but I can't remember what the other site was so you get wikipedia quotes.
lavalamp is offline   Reply With Quote
Old 2010-07-29, 04:38   #235
axn
 
axn's Avatar
 
Jun 2003

13DD16 Posts
Default

Quote:
Originally Posted by lavalamp View Post
Emphasis mine.

The 800 MHz Pentium III is either a Coppermine or a Coppermine T based chip, both of which were released after Katmai.

If it makes a difference, I found information on a different website about the throughput of the PIII and confirmed it with wikipedia, but I can't remember what the other site was so you get wikipedia quotes.
Interesting. Not sure that these are applicable for double precision though (which is the relevant metric for the context).
axn is offline   Reply With Quote
Old 2010-07-29, 13:18   #236
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

1D7716 Posts
Default

Quote:
Originally Posted by axn View Post
Interesting. Not sure that these are applicable for double precision though (which is the relevant metric for the context).
I can confirm that the PIII has a throughput of 1 double-precision floating point operation per clock cycle.

I don't know if the quoted "GTX 295: 149.04 GFLOP/s" was for single or double precision flops.
Prime95 is offline   Reply With Quote
Old 2010-07-29, 14:22   #237
axn
 
axn's Avatar
 
Jun 2003

10011110111012 Posts
Default

Quote:
Originally Posted by Prime95 View Post
I don't know if the quoted "GTX 295: 149.04 GFLOP/s" was for single or double precision flops.
I had assumed that this was DP, since all knowing wiki states that the raw thruput is 1788.480 GFLOPS.
axn is offline   Reply With Quote
Old 2010-07-29, 19:42   #238
lavalamp
 
lavalamp's Avatar
 
Oct 2007
Manchester, UK

23×59 Posts
Default

I am now a bit unsure of the figure for the GTX 295.

I calculated the single precision throughput for the card as:

op/clock * shaders * freq / 1000 = 2 * 480 * 1242 / 1000 = 1192.32 GFLOP/s

However, it appears that the card can do three floating point operations per cycle, not two. So 3 * 480 * 1242 / 1000 = 1788.48

To get the double precision performance from the single precision performance, divide by 8. So it seems likely that the figure I gave was only 2/3 of the double precision throughput, in which case the true value would be 223.56 DP GFLOP/s.

So, revised figures:

GTX 295: 223.56 DP GFLOP/s
PIII 800MHz: 0.8 DP GFLOP/s

Using the same 500 W and 40 W values for the power draw of each system:

GTX 295: 447.12 DP MFLOP/J
PIII 800 MHz: 20 DP MFLOP/J
lavalamp is offline   Reply With Quote
Old 2010-07-29, 22:15   #239
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

22616 Posts
Default

Your previous figure of 149 GFLOP matches what is described here: http://perspectives.mvdirona.com/200...idiaGT200.aspx
ldesnogu is offline   Reply With Quote
Old 2010-07-30, 01:32   #240
msft
 
msft's Avatar
 
Jul 2009
Tokyo

10011000102 Posts
Default

llrpsrc.zip include small treasure, Complex FFT version IBDWT.
msft is offline   Reply With Quote
Old 2010-07-30, 02:08   #241
lavalamp
 
lavalamp's Avatar
 
Oct 2007
Manchester, UK

54D16 Posts
Default

Quote:
Originally Posted by ldesnogu View Post
Your previous figure of 149 GFLOP matches what is described here: http://perspectives.mvdirona.com/200...idiaGT200.aspx
*dies from nervous exhaustion*

Seems it's 3 ops/clock for SP and 2 ops/clock for DP then, and SP performance gets multiplied by 8 thanks to the "Streaming Processors".

GTX 295: 1788.48 SP GFLOP/s
GTX 295: 149.04 DP GFLOP/s

As agent Smith once said, "Is it over?"
lavalamp is offline   Reply With Quote
Old 2010-07-30, 02:41   #242
Oddball
 
Oddball's Avatar
 
May 2010

499 Posts
Default

What are the GFLOP/s and the MFLOP/J for a core i7? Just wondering.
Oddball is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 04:56.


Fri Aug 6 04:56:50 UTC 2021 up 13 days, 23:25, 1 user, load averages: 2.28, 2.41, 2.77

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.