View Single Post
Old 2019-09-17, 05:30   #5
lavalamp
 
lavalamp's Avatar
 
Oct 2007
Manchester, UK

2·5·137 Posts
Default

Quote:
Originally Posted by M344587487 View Post
the large conjoined cache likely helps avoid a RAM bottleneck
That is not my understanding of how the L3 cache is configured.

As I understand it:
4 cores share an L3 cache and form a "core complex" (CCX).
2 CCX's on a die and are connected by infinity facbric, these are called a CCD.
2 CCD chiplets on the 3900 and 3950X are connected individually to the IO die, and any access of L3 cache or RAM must occur via the IO die.

Additionally, the post you linked to is based on a 3600, which only has a single CCD chiplet.

Therefore I would still expect 2 workers (1 per chiplet) with either 8 or 16 threads to perform optimally, but as I said, benchmarks would be interesting to see.
lavalamp is offline   Reply With Quote