I run LLR on exponents from 3M to 7M on Haswell-era quad-core dual-channel machines, and I do run into memory saturation on the 4th test. That is, 3 copies of LLR running is nearly as fast as 4, when all run single-threaded. That's with an office-machine-grade Dell, with stock DDR4 at stock speed (2133?). If you run 3200 memory, you're getting 50% more bandwidth, and it's possible that running each test 2- or 3- threaded will not saturate the memory (that is, enough data will stay in cache).
Without relying on cache to save you, I'd think 3600 memory and that 6-core would match up well for tests around 3-5mbits each. As your tests grow in the future, you can go more multithreaded.
Hopefully, someone with more recent/similar hardware can amplify my hand-waving about bandwidth...
