![]() |
|
|
#45 | |
|
Tribal Bullet
Oct 2004
DED16 Posts |
Quote:
|
|
|
|
|
|
|
#46 |
|
Feb 2016
UK
26·7 Posts |
Why would everything else get slower? To me, the biggest reason for not adding every possible feature is die area, and thus manufacturing cost.
|
|
|
|
|
|
#47 |
|
Undefined
"The unspeakable one"
Jun 2006
My evil lair
6,793 Posts |
|
|
|
|
|
|
#48 |
|
Feb 2016
UK
26·7 Posts |
If it is a trade off that clocks might have to be reduced to control heat while that instruction is being used, I think that's an ok tradeoff. Similar to what we have with AVX-512 now. Other things probably will be unaffected.
Latency, in this context, I'm not sure is really significant. I still think die area, yields, and manufacturing costs are the bigger factor than either of the above. |
|
|
|
|
|
#49 |
|
Undefined
"The unspeakable one"
Jun 2006
My evil lair
1A8916 Posts |
If you want to add more transistors then the die gets larger. So the travel time from one end of the chip to the other takes longer. That means either adding an extra delay cycle and extra buffers, or slowing down the maximum clock speed, or both. This would affect all operations of the chip regardless of which instructions are being executed.
This is why there are various levels of caching. L1 is the closest and the smallest so it is the fastest. If you push the L1 further away because you have more computation transistors in there then you have to slow things down. The same with access to the register file, it takes longer to send and receive data since it is further from the action. |
|
|
|
|
|
#50 |
|
Feb 2016
UK
7008 Posts |
Just how much more die area are you thinking this could add? My question is specifically on the significance. I'd still rate it far lower down on factors than simple area translating into cost.
|
|
|
|
|
|
#51 | |
|
Undefined
"The unspeakable one"
Jun 2006
My evil lair
6,793 Posts |
Quote:
Like I mentioned above, it isn't impossible to do. Just show them the positive ROI and it can happen. |
|
|
|
|
|
|
#52 | |
|
Sep 2016
38010 Posts |
Quote:
|
|
|
|
|
|
|
#53 |
|
∂2ω=0
Sep 2002
República de California
22×2,939 Posts |
@Mysticial -- Also, if you're a hw vendor and looking to add silicon, I expect you'd have a lot more customers interested in adding more 64-bit FMA units than adding ones with support for 128-bit floats. You could likely double the number of FMA units for less area/speed hit than 128-bit-ifying the current number of units.
|
|
|
|
|
|
#54 | |
|
Sep 2016
17C16 Posts |
Quote:
As a lesser version of the quad-precision requests, other bignum people have been requesting that the SIMD unit be widened to do 64-bit multiplies. Not surprisingly these fell on deaf ears because widening the multiplier from 52x52 to 64x64 is a ~50% increase in area. But the one that did made sense is the request to expose the 52-bit multiplier. No real new silicon - and resulted in the AVX512-IFMA. |
|
|
|
|
|
|
#55 |
|
Feb 2016
UK
26×7 Posts |
Thanks Mysticial, 10% sounds like a fair bit.
I was thinking, do we have an existing example with the addition of AVX-512 in Skylake-X/SP, especially the two unit models. But this is complicated when doing a comparison since it also has differences in cache and IMC. Still, with my single sample of Skylake-X I can't say the clocks or general performance are impacted compared to Skylake, even if Skylake-X is more comparable is Kaby Lake in process. I am curious what Intel did with Ice Lake, and it would be interesting to put that against Zen 2 although I really don't want another laptop just to do that testing. |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Ryzen help | Prime95 | Hardware | 9 | 2018-05-14 04:06 |
| Ryzen 2 efficiency improvements | M344587487 | Hardware | 3 | 2018-04-25 15:23 |
| Help to choose components for a Ryzen rig | robert44444uk | Hardware | 50 | 2018-04-07 20:41 |
| 29.2 benchmark help #2 (Ryzen only) | Prime95 | Software | 10 | 2017-05-08 13:24 |
| AMD Ryzen is risin' up. | jasong | Hardware | 11 | 2017-03-02 19:56 |