Thread: Intel Xeon PHI?
View Single Post
Old 2020-12-01, 22:29   #143
ewmayer's Avatar
Sep 2002
República de California

24·727 Posts

Originally Posted by kriesel View Post
Have you added DDR4 DIMMs to your system? These systems shipped from the seller with only the 16 GB MCDRAM in the processor package; 6 empty DDR4 slots. I'd expect changing Northbridge settings to have effect on any DDR4 present, and no effect on the much faster integral MCDRAM speed. If you mentioned adding DDR4, I missed it.
From the mobo manual, I've bolded the relevant snip:

"Memory Frequency
Use this feature to set the maximum memory frequency for onboard memory modules. The options are Auto, 1600, 1867, 2133, and 2400."

Does 'onboard' not refer to the 16GB, um, onboard memory? OTOH the above clock settings are def. DDR-range, not MCDRAM range. From the Anandtech article you linked:
As the diagram stands, the MCDRAM and the regular DDR4 (up to six channels of 386GB of DDR4-2400) are wholly separate, indicating a bi-memory model. This stands at the heart at which developers will have to contend with, should they wish to extract performance from the part.

The KNL memory can work in three modes, which are determined by the BIOS at POST time and thus require a reboot to switch between them.

The first mode is a cache mode, where nothing is needed to be changed in the code. The OS will organize the data to use the MCDRAM first similar to an L3 cache, then the DDR4 as another level of memory. Intel was coy onto the nature of the cache (victim cache, writeback, cache coherency), but as it is used by default it might offer some performance benefit up to 16GB data sizes. The downside here is when the MCDRAM experiences a cache miss – because of the memory controllers the cache miss has to travel back into the die and then go search out into DDR for the relevant memory. This means that an MCDRAM cache miss is more expensive than a simple read out to DDR.

The second mode is ‘Flat Mode’, allowing the MCDRAM to have a physical addressable space which allows the programmer to migrate data structures in and out of the MCDRAM. This can be useful to keep large structures in DDR4 and smaller structures in MCDRAM. We were told that this mode can also be simulated by developers who do not have hardware in hand yet in a dual CPU Xeon system if each CPU is classified as a NUMA node, and Node 0 is pure CPU and Node 1 is for memory only. The downside of the flat mode means that the developer has to maintain and keep track of what data goes where, increasing software design and maintenance costs.

The final mode is a hybrid mode, giving a mix of the two.

In flat mode, there are separate ways to access the high performance memory – either as a pure NUMA node (only applicable if the whole program can fit in MCDRAM), using direct OS system calls (not recommended) or through the Memkind libraries which implements a series of library calls. There is also an interposer library over Memkind available called AutoHBW which simplifies some of the commands at the expense of fine control. Under Memkind/AutoHBW, data structures aimed at MCDRAM have their own commands in order to be generated in MCDRAM.
The "Memkind libraries" ref. sounds like it refers to Intel’s VTune utilities, no idea if GCC supports any of that stuff, but doubt it.

I searched for 'mcdram' and 'flat' in the mobo manual, nothing for either.

Last fiddled with by ewmayer on 2020-12-01 at 22:30
ewmayer is offline   Reply With Quote