View Single Post
Old 2009-02-21, 03:56   #9
Tribal Bullet
jasonp's Avatar
Oct 2004

3·52·47 Posts

Originally Posted by ewmayer View Post
And whaddya know - the runtime instantly more than doubled. Note I said "runtime", not "performance"
IIRC George found something similar when optimizing Prime95 for the first Pentium 4 CPUs. However, he noticed that L2 bandwidth increased markedly when the stores were contiguous to some other memory region, and not back to the original addresses.

Sometimes I think MOVNTPD is designed only for memory copies; there's an AMD example whitepaper for that application where the MMX version of the instruction increases performance drastically because it allows write combining.
jasonp is offline   Reply With Quote