@above:
Intel could easily reduce their transistor budget for SIMD support and provide the muchimproved integermath functionality Linus Torvalds yearns for if they weren't so crazybiased towards FP support and thought more about multiple kinds of instructions sharing the same transistors insofar as possible. Let's consider the notoriouslytransistorhungry case of multiply: instead of first offering only avx512 FP multiply and lowwidth vector integer mul, then later adding another halfmeasure, using those FPmul units to generate the high 52 bits of a 64x64bit integer product, plunk down a bunch of full 64x64>128bit integer multipliers, supporting a vector analog (at long last) of the longstanding integer MUL instructions. Then design things so those units can be used for both integer and FP operands. Need bottom 64bits of 64x64bit integer mul? Just discard the high product halves, and maybe shave a few cycles. Signed vs unsigned high half of 64x64bit product? Easily handled via a tiny bit of extra logic. VectorDP product, either high53bits or fullwidth FMA style? No problem, just use the usual FPoperand preprocessing logic, then feed the resulting integer mantissas to the multipurpose vectorMUL unit, then the usual postprocessing pipeline stages to properly deal with the resulting 106bit product.
The HPC part comes in in the above context this way: very few programs are gonna need *both* highperf integer and FP mul  the ones that do are *truly* outliers, unlike Torvalds' inane labeling of all HPC as some kind of fringe community. Using the same bigblock transistor budget to support multiple data types is a bigpicture win, even it leads to longer pipelines: the 32 avx512 vector registers are more thn enough to allow coders to do a good job at latency hiding even with fairly long instruction pipelines.
Last fiddled with by ewmayer on 20200714 at 21:08
