View Single Post
Old 2009-02-20, 23:22   #8
__HRB__'s Avatar
Dec 2008
Boycotting the Soapbox

24·32·5 Posts

Originally Posted by ewmayer View Post
Another example - the utterly idiotic lack of any support whatsoever for complex MUL in SSE and SSE2.
You mean support for doing: (a+ib)(c+id)=ac-bd + i[(a+b)(c+d)-ac-bd] in 3 cycles?

I'd rather have a status register, a 4-bit condition-code, free shifts & rotates in ALL instructions.

Originally Posted by ewmayer View Post
You gave us a RISC-style register set, why not a set of RISC-style instructions to go along with it, which don't force us to do cycle-wasting register-operand copying at every turn?
This probably has to do with the small number of registers and the way the decoder works. During the decode the processor starts renaming register, so the cost of the copy is essentially only reduced decoding bandwidth. The register file has something like 96 entries, which I think acts more like a dual-ported level-0 cache.

I want 64 ARMs on one chip running at 2Ghz for my birthday. Oh, and 64 blocks of 64k dual ported RAM on the same chip. Thank you.

Originally Posted by jasonp View Post
The real problem is that multiple precision arithmetic was not on the agenda when this stuff was designed, so we'll just have to make do with what we have.
My objection is that it didn't have to be on the agenda. Just sticking to SIMD philosophy would have been sufficient. pmuludq mixes 32 and 64-bit operands, so this can't be right if the general idea is to have 2/4/8/16 independent streams.

Last fiddled with by __HRB__ on 2009-02-20 at 23:28
__HRB__ is offline   Reply With Quote