View Single Post
Old 2019-09-09, 11:49   #17
mackerel's Avatar
Feb 2016

1B816 Posts

Originally Posted by Mysticial View Post
I've tried this on Linux. Core-to-core latency is very noisy and is highly sensitive to where the synchronization variable is located in the L3. While there are measurable differences in latency between different pairs of cores, they aren't precise enough to deduce where on the die the core is. Likewise temperatures are even more noisy. By stressing cores individually, you can kind of tell which other cores the heat will spill to. But as in the case of the latencies, temperature data isn't precise enough.
I wonder how testing like the above was carried out. There seems to be some observable difference which I was hoping would be sufficient for this exercise. Even if noisy, could it be worked around in some way? For example, when doing P95 benching, I run multiple times and take the best result. Not average, not mean. My thinking is, anything unwanted getting in the way would make things slower. There shouldn't be any mechanism that makes it faster (which would be more interesting if it exists!). So with multiple runs, you should converge towards best case.

Edit: on further look, at 3rd chart in link, the differences don't appear constant. So while there is variation, it is probably more random than I was thinking.

Last fiddled with by mackerel on 2019-09-09 at 11:55
mackerel is offline   Reply With Quote