![]() |
![]() |
#1 |
"Sam"
Nov 2016
2×163 Posts |
![]()
So I have been researching about processors I could buy to improve the speed of programs such as LLR (or even GIMPS) that allow for being multithreaded.
The first question would be, should I go for more threads or more cores if the goal is to improve the speed and timing of LLR computation (or something similar)? For instance, should I buy a (10 core / 20 thread) @ 2.4 GHz OR a (6 core / 12 thread) @ 4.0 GHz if I were to buy one? The second question relates to the price of certain processor brands. Here is a list of processors (and MB combos) I am considering to buy at some point. Some of the affordable Xeon brand processors (such as the x series) have low frequency but more cores. Out of curiosity, does anyone have a machine with CPU either the AMD Ryzen Threadtripper or the i9 (9th gen) series? Thanks for help and advice. |
![]() |
![]() |
![]() |
#2 | |
"Mihai Preda"
Apr 2015
31·43 Posts |
![]()
Very important is the number of memory channels; which is, on consummer CPUs, either 2 or 4. Choosing a system (CPU/motherboard) with 4 RAM channels is very important for performance.
Another important thing to consider is the size of the last-level-cache on the CPU; a larger LLC is better. About the 4GHz clock, is it sustained (when all cores are active), or is it a boost clock (achieved only when a few cores are active)? Quote:
|
|
![]() |
![]() |
![]() |
#3 |
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2·2,897 Posts |
![]()
For LLR it is worth considering the FFT lengths you will be testing. Smaller FFT lengths may fit in the L3 cache of the cpu making memory speed worthless. This is one advantage of running a test on multiple threads as the L3 cache is shared. However, smaller tests don't run multi-threaded so well so it is a balancing act.
|
![]() |
![]() |
![]() |
#4 | |
"Curtis"
Feb 2005
Riverside, CA
22·1,151 Posts |
![]() Quote:
Conveniently, this per-cycle speed boost in Xeons coincided with DDR4; so, if you go Xeon, I strongly suggest you stick to a chip modern enough to require DDR4. Quad-channel memory makes a *big* difference for most FFT sizes, but if you're doing sub-top5000 size work (say, sophie germains or CRUS work with small exponents) then Henry's note applies that FFT fits in cache and memory channels won't matter as much. If you're looking at modern-generation CPUs, then for a given raw ghz fewer cores will serve you better; in your example I'd pick 6@4ghz over 10@2.4ghz. All items that don't thread well will be faster, while it's hard to think of a workload that prefers more slower cores for the same total ghz. If you have workloads that don't saturate the memory bus, then the most total ghz often gets the most work done. I'd rather have a 10-core 2.4 ghz than a 6-core 3.3 ghz, for instance, though the choice is close for my workloads. |
|
![]() |
![]() |
![]() |
#5 | |
"Sam"
Nov 2016
32610 Posts |
![]() Quote:
The AMD has 8 cores / 16 threads at a base speed 3.7GHz and turbo (or max) speed of 4.3GHz. I am not sure if you are supposed to overclock the processor to achieve the 4.3GHz speed. The Intel processor has 8 cores / 8 threads at a base speed 3.0GHz and turbo speed of 4.7GHz. Like with the AMD, I am not sure if you are supposed to overclock. I would consider overclocking both processors if necessary to achieve max speed, but do the benefits outweigh the risks? For Xeon, I tried to find one that is compatible with DDR4. The Intel Xeon E5-2680 V3 has 12 cores / 24 threads at a base speed 2.5GHz and max turbo speed of 3.3 GHz. If overclocking to 4.0GHz was possible (and safe), I would pick this one. I know the basics of overclocking but never attempted to do it myself. |
|
![]() |
![]() |
![]() |
#6 |
Sep 2002
Database er0rr
3,527 Posts |
![]()
Another consideration is that good ECC memory would almost dispense with the need for double checking.
![]() |
![]() |
![]() |
![]() |
#7 |
Feb 2016
UK
3×7×19 Posts |
![]()
AMD Ryzen 1000 and 2000 series CPUs only have half the FP performance (per core, per clock) of most Intel CPUs, so you're working at a big disadvantage there. The 3000 series due out tomorrow is expected to reach parity with Intel, so they will be the ones to look at. They have the potential to be the fastest consumer and good value CPUs for general prime number finding. Note I say in general, which is mostly smaller tasks, and not necessarily for GIMPS in particular with massive tasks.
Ball park peak per core per clock performance relative to recent Intel CPUs, where not limited by ram bandwidth or other factors. Reality will likely be different but this gives a good indication. 200% Skylake-X, some expensive Xeons (with 2 unit AVX-512) 100% Skylake, Kaby Lake, Coffee Lake 100% (estimate) Zen 2 (Ryzen 3000 series, excluding APUs) 88% Haswell 82% Broadwell 58% Sandy Bridge 50%-ish Zen 1, Zen+ (Ryzen 1000, 2000) Assume any Intel CPU older than Sandy Bridge is half that. I never tested in detail. As others have mentioned, if work doesn't fit in the CPU cache, you may be limited by ram bandwidth. Back on the original question. For a given cores * clock indication of potential performance, it generally works better with more clock than more cores as there is some inefficiency in the multi-threading code that is complicated. However I have a 14 core Xeon which runs at 2.4 GHz, and it still eats bigger work tasks, but scales badly for small ones where running them on one or two cores each are generally better. Specifically on the 2700X don't expect it to overclock to 4.3 GHz. That generation usually hits a wall before then and the voltage required to get it going means power consumption is through the roof. Consider also power efficiency. And finally on HT/SMT, I've never proved it gives more peak throughput in this use case but in some cases it can reduce losses from elsewhere. It can also increase power consumption even when not giving more performance. |
![]() |
![]() |
![]() |
#8 |
Romulan Interpreter
Jun 2011
Thailand
32×5×7×29 Posts |
![]()
Re thread title, definitively more cores. With higher clock speed comes higher loses, i.e. less efficiency.
|
![]() |
![]() |
![]() |
#9 |
"Robert Gerbicz"
Oct 2005
Hungary
22×3×7×17 Posts |
![]() |
![]() |
![]() |
![]() |
#10 |
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
11B216 Posts |
![]() |
![]() |
![]() |
![]() |
#11 |
"Rashid Naimi"
Oct 2015
Remote to Here/There
1,979 Posts |
![]()
Number of cores/threads is an absolute value. Clock-Speed on the other hand is subject to Throttle. I'd go with higher number of cores. The inefficiency of Multi-Thread calculation can be offset by running single tasks per core/candidate rather than distributing a single task among multiple cores. Won't be suitable for world records in a rush though.
![]() |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Is there any sensible auxiliary task for HT logical cores when physical cores already used for PRP? | hansl | Information & Answers | 5 | 2019-06-17 14:07 |
LL speed vs cores | danmur | Hardware | 28 | 2018-05-06 06:09 |
laptop reporting wrong clock speed to PrimeNet | ixfd64 | Hardware | 1 | 2008-10-19 03:20 |
Mprime is faster on lower CPU clock speed | drewster1829 | Hardware | 6 | 2008-07-17 13:43 |
Adding RAM with different clock speed(bad idea?) | jasong | Hardware | 8 | 2006-10-25 10:05 |