mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2019-07-05, 19:03   #1
carpetpool
 
carpetpool's Avatar
 
"Sam"
Nov 2016

2·163 Posts
Post More cores or higher clock speed?

So I have been researching about processors I could buy to improve the speed of programs such as LLR (or even GIMPS) that allow for being multithreaded.

The first question would be, should I go for more threads or more cores if the goal is to improve the speed and timing of LLR computation (or something similar)? For instance, should I buy a (10 core / 20 thread) @ 2.4 GHz OR a (6 core / 12 thread) @ 4.0 GHz if I were to buy one?

The second question relates to the price of certain processor brands. Here is a list of processors (and MB combos) I am considering to buy at some point. Some of the affordable Xeon brand processors (such as the x series) have low frequency but more cores. Out of curiosity, does anyone have a machine with CPU either the AMD Ryzen Threadtripper or the i9 (9th gen) series?

Thanks for help and advice.
carpetpool is offline   Reply With Quote
Old 2019-07-05, 21:01   #2
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

5·271 Posts
Default

Very important is the number of memory channels; which is, on consummer CPUs, either 2 or 4. Choosing a system (CPU/motherboard) with 4 RAM channels is very important for performance.

Another important thing to consider is the size of the last-level-cache on the CPU; a larger LLC is better.

About the 4GHz clock, is it sustained (when all cores are active), or is it a boost clock (achieved only when a few cores are active)?

Quote:
Originally Posted by carpetpool View Post
So I have been researching about processors I could buy to improve the speed of programs such as LLR (or even GIMPS) that allow for being multithreaded.

The first question would be, should I go for more threads or more cores if the goal is to improve the speed and timing of LLR computation (or something similar)? For instance, should I buy a (10 core / 20 thread) @ 2.4 GHz OR a (6 core / 12 thread) @ 4.0 GHz if I were to buy one?

The second question relates to the price of certain processor brands. Here is a list of processors (and MB combos) I am considering to buy at some point. Some of the affordable Xeon brand processors (such as the x series) have low frequency but more cores. Out of curiosity, does anyone have a machine with CPU either the AMD Ryzen Threadtripper or the i9 (9th gen) series?

Thanks for help and advice.
preda is offline   Reply With Quote
Old 2019-07-05, 21:42   #3
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

33·7·31 Posts
Default

For LLR it is worth considering the FFT lengths you will be testing. Smaller FFT lengths may fit in the L3 cache of the cpu making memory speed worthless. This is one advantage of running a test on multiple threads as the L3 cache is shared. However, smaller tests don't run multi-threaded so well so it is a balancing act.
henryzz is offline   Reply With Quote
Old 2019-07-06, 00:07   #4
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

128B16 Posts
Default

Quote:
Originally Posted by carpetpool View Post
The first question would be, should I go for more threads or more cores if the goal is to improve the speed and timing of LLR computation (or something similar)? For instance, should I buy a (10 core / 20 thread) @ 2.4 GHz OR a (6 core / 12 thread) @ 4.0 GHz if I were to buy one?
Thanks for help and advice.
For LLR and other assembly-optimized tasks, the generation of the chip matters quite a lot. Haswell was quite an upgrade per-cycle over sandy bridge/ivy bridge, for instance; so, if you had a chance to buy, say, an HP Z600 with dual xeon sandy bridge CPUs, you'll be disappointed for LLR performance.

Conveniently, this per-cycle speed boost in Xeons coincided with DDR4; so, if you go Xeon, I strongly suggest you stick to a chip modern enough to require DDR4. Quad-channel memory makes a *big* difference for most FFT sizes, but if you're doing sub-top5000 size work (say, sophie germains or CRUS work with small exponents) then Henry's note applies that FFT fits in cache and memory channels won't matter as much.

If you're looking at modern-generation CPUs, then for a given raw ghz fewer cores will serve you better; in your example I'd pick 6@4ghz over 10@2.4ghz. All items that don't thread well will be faster, while it's hard to think of a workload that prefers more slower cores for the same total ghz.

If you have workloads that don't saturate the memory bus, then the most total ghz often gets the most work done. I'd rather have a 10-core 2.4 ghz than a 6-core 3.3 ghz, for instance, though the choice is close for my workloads.
VBCurtis is online now   Reply With Quote
Old 2019-07-06, 02:08   #5
carpetpool
 
carpetpool's Avatar
 
"Sam"
Nov 2016

2×163 Posts
Post

Quote:
Originally Posted by preda View Post
Very important is the number of memory channels; which is, on consummer CPUs, either 2 or 4. Choosing a system (CPU/motherboard) with 4 RAM channels is very important for performance.

Another important thing to consider is the size of the last-level-cache on the CPU; a larger LLC is better.

About the 4GHz clock, is it sustained (when all cores are active), or is it a boost clock (achieved only when a few cores are active)?
That raises good point. So the processors I was going to compare are the AMD Ryzen 7 2700X and Intel Core i7-9700F. Let's look at each of them:

The AMD has 8 cores / 16 threads at a base speed 3.7GHz and turbo (or max) speed of 4.3GHz. I am not sure if you are supposed to overclock the processor to achieve the 4.3GHz speed.

The Intel processor has 8 cores / 8 threads at a base speed 3.0GHz and turbo speed of 4.7GHz. Like with the AMD, I am not sure if you are supposed to overclock.

I would consider overclocking both processors if necessary to achieve max speed, but do the benefits outweigh the risks?

For Xeon, I tried to find one that is compatible with DDR4. The Intel Xeon E5-2680 V3 has 12 cores / 24 threads at a base speed 2.5GHz and max turbo speed of 3.3 GHz. If overclocking to 4.0GHz was possible (and safe), I would pick this one.

I know the basics of overclocking but never attempted to do it myself.
carpetpool is offline   Reply With Quote
Old 2019-07-06, 02:18   #6
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

19×191 Posts
Default

Another consideration is that good ECC memory would almost dispense with the need for double checking.
paulunderwood is online now   Reply With Quote
Old 2019-07-06, 08:17   #7
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

3·139 Posts
Default

AMD Ryzen 1000 and 2000 series CPUs only have half the FP performance (per core, per clock) of most Intel CPUs, so you're working at a big disadvantage there. The 3000 series due out tomorrow is expected to reach parity with Intel, so they will be the ones to look at. They have the potential to be the fastest consumer and good value CPUs for general prime number finding. Note I say in general, which is mostly smaller tasks, and not necessarily for GIMPS in particular with massive tasks.

Ball park peak per core per clock performance relative to recent Intel CPUs, where not limited by ram bandwidth or other factors. Reality will likely be different but this gives a good indication.
200% Skylake-X, some expensive Xeons (with 2 unit AVX-512)
100% Skylake, Kaby Lake, Coffee Lake
100% (estimate) Zen 2 (Ryzen 3000 series, excluding APUs)
88% Haswell
82% Broadwell
58% Sandy Bridge
50%-ish Zen 1, Zen+ (Ryzen 1000, 2000)
Assume any Intel CPU older than Sandy Bridge is half that. I never tested in detail.

As others have mentioned, if work doesn't fit in the CPU cache, you may be limited by ram bandwidth.

Back on the original question. For a given cores * clock indication of potential performance, it generally works better with more clock than more cores as there is some inefficiency in the multi-threading code that is complicated. However I have a 14 core Xeon which runs at 2.4 GHz, and it still eats bigger work tasks, but scales badly for small ones where running them on one or two cores each are generally better.

Specifically on the 2700X don't expect it to overclock to 4.3 GHz. That generation usually hits a wall before then and the voltage required to get it going means power consumption is through the roof. Consider also power efficiency.

And finally on HT/SMT, I've never proved it gives more peak throughput in this use case but in some cases it can reduce losses from elsewhere. It can also increase power consumption even when not giving more performance.
mackerel is offline   Reply With Quote
Old 2019-07-06, 08:20   #8
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

938610 Posts
Default

Re thread title, definitively more cores. With higher clock speed comes higher loses, i.e. less efficiency.
LaurV is offline   Reply With Quote
Old 2019-07-06, 17:24   #9
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

22×5×73 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
Another consideration is that good ECC memory would almost dispense with the need for double checking.
Or even better use my error check with prp testing.
R. Gerbicz is offline   Reply With Quote
Old 2019-07-06, 17:31   #10
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

3·29·53 Posts
Default

Quote:
Originally Posted by LaurV View Post
Re thread title, definitively more cores. With higher clock speed comes higher loses, i.e. less efficiency.
+1
petrw1 is offline   Reply With Quote
Old 2019-07-06, 18:29   #11
a1call
 
a1call's Avatar
 
"Rashid Naimi"
Oct 2015
Remote to Here/There

111110111002 Posts
Default

Number of cores/threads is an absolute value. Clock-Speed on the other hand is subject to Throttle. I'd go with higher number of cores. The inefficiency of Multi-Thread calculation can be offset by running single tasks per core/candidate rather than distributing a single task among multiple cores. Won't be suitable for world records in a rush though.
a1call is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Is there any sensible auxiliary task for HT logical cores when physical cores already used for PRP? hansl Information & Answers 5 2019-06-17 14:07
LL speed vs cores danmur Hardware 28 2018-05-06 06:09
laptop reporting wrong clock speed to PrimeNet ixfd64 Hardware 1 2008-10-19 03:20
Mprime is faster on lower CPU clock speed drewster1829 Hardware 6 2008-07-17 13:43
Adding RAM with different clock speed(bad idea?) jasong Hardware 8 2006-10-25 10:05

All times are UTC. The time now is 19:29.

Thu Apr 22 19:29:24 UTC 2021 up 14 days, 14:10, 0 users, load averages: 1.85, 1.87, 1.97

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.