mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   More cores or higher clock speed? (https://www.mersenneforum.org/showthread.php?t=24565)

carpetpool 2019-07-05 19:03

More cores or higher clock speed?
 
So I have been researching about processors I could buy to improve the speed of programs such as LLR (or even GIMPS) that allow for being [B]multithreaded[/B].

The first question would be, should I go for more threads or more cores if the goal is to improve the speed and timing of LLR computation (or something similar)? For instance, should I buy a (10 core / 20 thread) @ 2.4 GHz OR a (6 core / 12 thread) @ 4.0 GHz if I were to buy one?

The second question relates to the price of certain processor brands. Here is a list of [URL="https://www.portatech.com/products/category.cshtml?id=1240"]processors (and MB combos)[/URL] I am considering to buy at some point. Some of the affordable Xeon brand processors (such as the x series) have low frequency but more cores. Out of curiosity, does anyone have a machine with CPU either the AMD Ryzen Threadtripper or the i9 (9th gen) series?

Thanks for help and advice.

preda 2019-07-05 21:01

Very important is the number of memory channels; which is, on consummer CPUs, either 2 or 4. Choosing a system (CPU/motherboard) with 4 RAM channels is very important for performance.

Another important thing to consider is the size of the last-level-cache on the CPU; a larger LLC is better.

About the 4GHz clock, is it sustained (when all cores are active), or is it a boost clock (achieved only when a few cores are active)?

[QUOTE=carpetpool;520815]So I have been researching about processors I could buy to improve the speed of programs such as LLR (or even GIMPS) that allow for being [B]multithreaded[/B].

The first question would be, should I go for more threads or more cores if the goal is to improve the speed and timing of LLR computation (or something similar)? For instance, should I buy a (10 core / 20 thread) @ 2.4 GHz OR a (6 core / 12 thread) @ 4.0 GHz if I were to buy one?

The second question relates to the price of certain processor brands. Here is a list of [URL="https://www.portatech.com/products/category.cshtml?id=1240"]processors (and MB combos)[/URL] I am considering to buy at some point. Some of the affordable Xeon brand processors (such as the x series) have low frequency but more cores. Out of curiosity, does anyone have a machine with CPU either the AMD Ryzen Threadtripper or the i9 (9th gen) series?

Thanks for help and advice.[/QUOTE]

henryzz 2019-07-05 21:42

For LLR it is worth considering the FFT lengths you will be testing. Smaller FFT lengths may fit in the L3 cache of the cpu making memory speed worthless. This is one advantage of running a test on multiple threads as the L3 cache is shared. However, smaller tests don't run multi-threaded so well so it is a balancing act.

VBCurtis 2019-07-06 00:07

[QUOTE=carpetpool;520815]The first question would be, should I go for more threads or more cores if the goal is to improve the speed and timing of LLR computation (or something similar)? For instance, should I buy a (10 core / 20 thread) @ 2.4 GHz OR a (6 core / 12 thread) @ 4.0 GHz if I were to buy one?
Thanks for help and advice.[/QUOTE]

For LLR and other assembly-optimized tasks, the generation of the chip matters quite a lot. Haswell was quite an upgrade per-cycle over sandy bridge/ivy bridge, for instance; so, if you had a chance to buy, say, an HP Z600 with dual xeon sandy bridge CPUs, you'll be disappointed for LLR performance.

Conveniently, this per-cycle speed boost in Xeons coincided with DDR4; so, if you go Xeon, I strongly suggest you stick to a chip modern enough to require DDR4. Quad-channel memory makes a *big* difference for most FFT sizes, but if you're doing sub-top5000 size work (say, sophie germains or CRUS work with small exponents) then Henry's note applies that FFT fits in cache and memory channels won't matter as much.

If you're looking at modern-generation CPUs, then for a given raw ghz fewer cores will serve you better; in your example I'd pick 6@4ghz over 10@2.4ghz. All items that don't thread well will be faster, while it's hard to think of a workload that prefers more slower cores for the same total ghz.

If you have workloads that don't saturate the memory bus, then the most total ghz often gets the most work done. I'd rather have a 10-core 2.4 ghz than a 6-core 3.3 ghz, for instance, though the choice is close for my workloads.

carpetpool 2019-07-06 02:08

[QUOTE=preda;520823]Very important is the number of memory channels; which is, on consummer CPUs, either 2 or 4. Choosing a system (CPU/motherboard) with 4 RAM channels is very important for performance.

Another important thing to consider is the size of the last-level-cache on the CPU; a larger LLC is better.

About the 4GHz clock, is it sustained (when all cores are active), or is it a boost clock (achieved only when a few cores are active)?[/QUOTE]

That raises good point. So the processors I was going to compare are the AMD Ryzen 7 2700X and Intel Core i7-9700F. Let's look at each of them:

The AMD has 8 cores / 16 threads at a base speed 3.7GHz and turbo (or max) speed of 4.3GHz. I am not sure if you are supposed to overclock the processor to achieve the 4.3GHz speed.

The Intel processor has 8 cores / 8 threads at a base speed 3.0GHz and turbo speed of 4.7GHz. Like with the AMD, I am not sure if you are supposed to overclock.

I would consider overclocking both processors if necessary to achieve max speed, but do the benefits outweigh the risks?

For Xeon, I tried to find one that is compatible with DDR4. The Intel Xeon E5-2680 V3 has 12 cores / 24 threads at a base speed 2.5GHz and max turbo speed of 3.3 GHz. If overclocking to 4.0GHz was possible (and safe), I would pick this one.

I know the basics of overclocking but never attempted to do it myself.

paulunderwood 2019-07-06 02:18

Another consideration is that good ECC memory would almost dispense with the need for double checking. :smile:

mackerel 2019-07-06 08:17

AMD Ryzen 1000 and 2000 series CPUs only have half the FP performance (per core, per clock) of most Intel CPUs, so you're working at a big disadvantage there. The 3000 series due out tomorrow is expected to reach parity with Intel, so they will be the ones to look at. They have the potential to be the fastest consumer and good value CPUs for general prime number finding. Note I say in general, which is mostly smaller tasks, and not necessarily for GIMPS in particular with massive tasks.

Ball park peak per core per clock performance relative to recent Intel CPUs, where not limited by ram bandwidth or other factors. Reality will likely be different but this gives a good indication.
200% Skylake-X, some expensive Xeons (with 2 unit AVX-512)
100% Skylake, Kaby Lake, Coffee Lake
100% (estimate) Zen 2 (Ryzen 3000 series, excluding APUs)
88% Haswell
82% Broadwell
58% Sandy Bridge
50%-ish Zen 1, Zen+ (Ryzen 1000, 2000)
Assume any Intel CPU older than Sandy Bridge is half that. I never tested in detail.

As others have mentioned, if work doesn't fit in the CPU cache, you may be limited by ram bandwidth.

Back on the original question. For a given cores * clock indication of potential performance, it generally works better with more clock than more cores as there is some inefficiency in the multi-threading code that is complicated. However I have a 14 core Xeon which runs at 2.4 GHz, and it still eats bigger work tasks, but scales badly for small ones where running them on one or two cores each are generally better.

Specifically on the 2700X don't expect it to overclock to 4.3 GHz. That generation usually hits a wall before then and the voltage required to get it going means power consumption is through the roof. Consider also power efficiency.

And finally on HT/SMT, I've never proved it gives more peak throughput in this use case but in some cases it can reduce losses from elsewhere. It can also increase power consumption even when not giving more performance.

LaurV 2019-07-06 08:20

Re thread title, definitively more cores. With higher clock speed comes higher loses, i.e. less efficiency.

R. Gerbicz 2019-07-06 17:24

[QUOTE=paulunderwood;520837]Another consideration is that good ECC memory would almost dispense with the need for double checking. :smile:[/QUOTE]

Or even better use my error check with prp testing.

petrw1 2019-07-06 17:31

[QUOTE=LaurV;520867]Re thread title, definitively more cores. With higher clock speed comes higher loses, i.e. less efficiency.[/QUOTE]

+1

a1call 2019-07-06 18:29

Number of cores/threads is an absolute value. Clock-Speed on the other hand is subject to Throttle. I'd go with higher number of cores. The inefficiency of Multi-Thread calculation can be offset by running single tasks per core/candidate rather than distributing a single task among multiple cores. Won't be suitable for world records in a rush though.:smile:


All times are UTC. The time now is 05:16.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.