View Single Post
Old 2005-07-15, 20:47   #5
Dresdenboy's Avatar
Apr 2003
Berlin, Germany

192 Posts

Originally Posted by TheJudger
Intels Netburst has a very high raw power (compared to AMDs K8-Core and Intels Pentium-M core)... _BUT_ it is _VERY_ hard to use this power cause of the very deep pipeline of the Netburst cores...
Typical "desktop-application" utilities the CPUs not very efficent the penalties for "bad code" on K8 and Pentium-M are much smaller than on P4s
The reason for Netburst's "raw power" is its clock frequency, which is the reason for the long pipeline. It's more a "throughput architecture", which doesn't like branches and tight dependency chains that much as a result. With "raw power" compared to the other 2 mentioned cores I assume, you mean the Prime95 LLR testing performance, because the image one gets from performance of these cores running other applications is different. Otherwise both P-M and K8 - and even more the K8 in 64 bit mode - the raw power of the Netburst doesn't show up.

Originally Posted by TheJudger
I'm very sure that Netburst CPUs do 128bit SSE operations... it would be
hard to imagin that a P4 does more than one FLOP per clock in the Linpack
benchmarks (solves a dense linear system using double-precission floats (64bit)) without 128bit ;)
Don't forget, that Linpack uses MUL and ADD, which can be executed in parallel on all 3 architectures mentioned here. So your proof does not count Especially because the math libraries used for Linpack benchmarking (like ATLAS, MKL, ACML) are so efficient in using cache blocking and have very good optimized kernels, that they perform at 90% or more efficiency - even for large arrays because of cache blocking (which Prime95 is using also).

For some details about the way, how a Netburst CPU executes 128 bit SSE2 instructions, you can read here:
Dresdenboy is offline   Reply With Quote