![]() |
Why Athlon XP benchmarks much more slowly?
I don't think Athlon Xp 2.2G is slower than P4 1.8G, why the benchmark results show that?
Is it possible that the program was not well optimized for AMD CPU's? |
P4 CPUs have SSE2 instructions, which are much more efficient for this program. Athlon 64s also have these, but not older Athlons.
|
:shock:
iSSE2 so magic? I have heard that P4 without optimization is far slower than XP, and comparable with XP with optimization. I still think there should be more done on XP optimization for the program. Thanks. This benchmark makes me feel reluctant to run prime95 anymore on my XP machine, because its low efficiency. |
Athlon XP supports MMX(+), iSSE and 3DNow(+), is it possible for prime95 to use these instruction sets?
|
Prime is one of the very few cases where the SSE2 optimizations make a CPU a *lot* more efficient, similar to the case with Altivec and RC5 (G4 and later AltiVec CPUs blow EVERYTHING else away on RC5 per Mhz, and in the upper speed ranges in absolute keyrate terms as well).
It's also VERY memory-intensive, thus also favoring the P4 and it's very high front-side bus speed. Some preliminary observations have shown that the Opteron gains quite a bit from it's SSE2 support, but still lags the P4 somewhat due to much lower clock speed than current P4s. The Athlon64 should also close the gap on the P4 quite a bit, for the same reason. Distributed computing projects tend to use a very small set of the instructions in a CPU, so they are VERY sensisive to how efficient those particular instructions are. They are NOT a good indication of the "general purpose" power or capabilities of any CPU. Again, as an example, the Athlon and K5 and Pentium-III benefit in RC5 work because they implimented a "barrel shifter" in hardware - RC5 relies a LOT on one particular instruction that uses that barrel shifter. The P4 suffers a LOT from having it's barrel shifter implimented in microcode, which makes that particular instruction a LOT slower. The Altivec CPUs benefit from being able to work on something like 16 different RC5 keys AT THE SAME TIME, so even though each key is processed by the AltiVec unit a lot slower, the overall average is that it gets through a lot MORE keys in a given period of time. |
thanks for the detailed explanation, :w00t:
But I am still confused at whether Prime95 takes full power of Athlon XP's... |
BTW - MMX is an integer-only instructions set. This does NOT help on Prime LL work, which is floating-point based. It DID help the Pentium-MMX somewhat on RC5 work, for similar reasons to why AltiVec CPUs are efficient in that project - but doesn't handle nearly as MANY keys, so was only a 50% or so speedup as opposed to a 300%+ speedup for AltiVec (and the K5 was STILL faster at 117.5 Mhz than a Pentium-MMX was at 166Mhz ).
SSE and 3DNow are single-precision floating point instruction sets (IIRC) - Prime LL work needs a lot higher precision than these sets are capable of. |
I run Prime on all of my Athlons here - they may not be as fast as the 'leading edge" of P4s, but they CAN argue with any P4 in their price range (they lose, but it's fairly close)....
P4's vs. Athlons are a long-running debate for Prime cost-effectiveness - if you stay away from the "bleeding edge", like my trio of Thuroughbred XP1800s (all running 1800Mhz or faster) - the closest cost-effective P4 is the Northwood 1.7Ghz or 1.8Ghz, 'cause those can be overclocked about as much. Depends mostly on how expen$ive your electric is - mine here is cheap, so it's a tossup over a 3 year period, other places where electric is a dime per KWH or more the P4s win easily. |
My XP 1800+ can run @2.3G with AVC112C86 stably on nForce2. Athlon XPs are really great general-purpose CPUs.
|
An Athlon is most efficient for one part of Prime95.
For Trial Factoring to a bit depth of 2^64 Athlons are the most efficient (x86 architecture). At an equal clock speed Athlons will beat the other CPUs ( will calculate it faster ). Though given a P4 with a sufficiently higher clock speed, the P4 can finish quicker than an Athlon. At 2^65 and above P4s are top. The SSE2 instructions are more efficient with 64 bit doubles. Combined with the clock speed advantage make it unbeatable. I am not sure what speed a P4 would need to beat an Athlon @2.3 Ghz up to 2^64 bits, it may require one faster than the highest unclocked available ( 3.2 at this point ). |
Thanks for giving me confidence to run prime95 on my machine.
|
| All times are UTC. The time now is 05:48. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.