View Single Post
Old 2020-07-25, 08:53   #3
mackerel's Avatar
Feb 2016

1101000012 Posts

There are multiple factors that all feed into the overall performance.

For scenarios where the cores are not limited by cache/ram, normalised for cores&clock, Zen 2 is about 5.7% faster than Skylake (includes Coffee Lake, Comet Lake, and more). Skylake-X/Cascade Lake-X has AVX-512. I've only just tried to compare recently and don't fully trust the numbers as precise, but it is about 75-80% faster than Skylake. This is actually lower than I thought, as in the past I had seen closer to 100%. I don't know what's changed, or if I tested with a different methodology previously. The above tests were performed in the last few days using the SGS project at Primegrid. These are very small units so wont be ram or cache limited, but I do wonder if they're so small the faster CPUs can't reach their best efficiency. I'll retest with some bigger tasks another time as I'm running a challenge at Primegrid right now.

3700X has 2x16MB L3 cache, with a further 0.5M/core L2 that could be added as they run exclusive cache. The split CCX structure means best performance is achieved if tasks fit in each and run separately (up to around 2048k FFT). For larger tasks this is not possible and the internal bus speed (FCLK) and ram speed will impact it. Best performance is usually achieved if these are kept in sync and high as possible (typically 3600 ram).

10920X is more complicated. It has 19.25MB of L3 cache, but through observation it seems applicable to also count the non-inclusive L2 cache of 1MB/core, for a total effective 31.25MB. It should do well for most tasks, combined with its quad channel ram support. The single benchmark linked seems unremarkable, but we don't know how it was configured. A drawback of AVX-512 is that while it can provide massive throughput, it also takes massive power while doing so. Clocks tend to run lower while it is in operation, which offsets the gains. My 7920X runs around 2.9 GHz for this type of work.

9900k is relatively simple with 16MB of L3 cache. So single tasks beyond 2048k will start hitting ram, and fast ram will provide a good benefit there.

If I were asked what is the most cost effective system, it is a difficult one to answer as it will be influenced by what else you put around the system.

We also have rumours that next gen Ryzen (Zen 3) will have a bigger CCX of 8 cores. This removes a barrier and unifies the cache withing a die, and should allow it to attain even more performance in more use cases. I'm excited to get one and try it out as soon as they're released.
mackerel is offline   Reply With Quote