Quote:
Originally Posted by M344587487
the large conjoined cache likely helps avoid a RAM bottleneck
|
That is not my understanding of how the L3 cache is configured.
As I understand it:
4 cores share an L3 cache and form a "core complex" (CCX).
2 CCX's on a die and are connected by infinity facbric, these are called a CCD.
2 CCD chiplets on the 3900 and 3950X are connected individually to the IO die, and any access of L3 cache or RAM must occur via the IO die.
Additionally, the post you linked to is based on a 3600, which only has a single CCD chiplet.
Therefore I would still expect 2 workers (1 per chiplet) with either 8 or 16 threads to perform optimally, but as I said, benchmarks would be interesting to see.