View Single Post
Old 2009-02-20, 21:23   #4
rogue's Avatar
Apr 2003
Between here and the

581610 Posts

Originally Posted by ewmayer View Post
pmulld xmm0,xmm1,xmm2
pmuludh xmm0,xmm1,xmm3

with the low halves output in xmm2 and the high halves in xmm3. This seems inefficient because it uses 2 instructions and the 2nd mul discards the lower halves, but one could add microcode support so that the hardware recognizes such paired lower-and-upper-half muls and fuses them into a single hardware operation, which splits the double-wide outputs into the 2 destination registers. All sorts of ways to do this.
They must be learning from IBM as PowerPC does the same thing. You need two multiplies to get the 128-bit product (for 64x64 multiplies in 64-bit registers). They did it before the PowerPC line for some instructions, why did they drop it on PowerPC? I suspect there is some thing about "pure RISC" that using two registers for output (only one of which is specified on the instruction) is not to be done. Then again they set bits in control registers all the time based upon the results of different instructions...
rogue is online now   Reply With Quote