View Single Post
Old 2019-09-09, 16:31   #18
Mysticial's Avatar
Sep 2016

34410 Posts

Originally Posted by mackerel View Post on further look, at 3rd chart in link, the differences don't appear constant. So while there is variation, it is probably more random than I was thinking.
From what I've seen (and maybe it's just a symptom of the test), the variation is huge. Individual latencies have a standard deviation comparable to the mean. And the location of the sync variable in the L3 can affect latencies by as much as 2x.

The L3 is NUMA, and cachelines are hashed across all of them. Which means that if you're ping-ponging data between two cores, the coherency traffic will need to travel to the portion of the L3 where the data sits - IOW a 3rd core. So for every pair of cores, if you test random addresses and histogram the latencies, you will get up to N spikes where N is the # of cores. If the test is correct and you get less than N spikes, it's usually because some of the spikes are overlapping. Conversely, if the die has cache enabled on disabled cores, you may get more than N spikes.

So what you're testing isn't just 2 cores, but 2 cores + the L3 of an unknown 3rd core. If you manage to get the sync variable in the L3 of one of the cores you're testing, there may be some asymmetry depending on which core it's on.

That said, if you completely isolate all this data, you'll have much more consistent results that may be enough to infer physical locations. But I haven't tried. And it's probably non-trivial to say the least.

Last fiddled with by Mysticial on 2019-09-09 at 16:34
Mysticial is offline   Reply With Quote