MCM is by far the most interesting part, for servers it's easy to predict that it's a game changer (for consumers with cache it probably is too, lets see in a few years). Full rate FP64 is less interesting mainly because I'm too cynical to believe they're not just halving the FP32 units per chip then using minimum two chips to bring the FP32 rates back to previous gen levels. FP16 is already disjoint from FP32, can't they do the same for FP64 in a world where compute appears to be specialising into largely high and low precision? Alternatively could they have figured out how to truncate FP64 to FP32 accurately, such that internally they can do FP32 using the FP64 units and do away with FP32 units completely? I'm ignorant of GPU hardware internals, anyone more knowledgeable care to hazard a guess at how they might optimise a design where FP32 is a second class citizen?
