mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2018-01-08, 01:21   #1
jbpace
 
"Jon Pace"
Jan 2018
Germantown, TN

13 Posts
Default Help needed - M77 232 917 celebration build

I'm building a new PC to celebrate 'discovery' of the 50th Mersenne prime. I've been reading threads on the board (that I honestly don't fully comprehend) about AVX512 throttling & memory bandwidth bottlenecks, which makes me question component choices.

I'm not looking to optimize iterations/$ (i.e. power draw is irrelevant), rather I'm seeking a workstation class PC that also tackles first-time LL tests. The motherboard has already been sourced, and I'm not inclined to swap (unless there's a really compelling reason), so assume an ASRock X299 Taichi XE.

I need guidance on CPU & memory selection from those that understand applicable choke points. I was headed for an i9-7940X (14 core) with 32GB Quad Channel DDR4-3200 (i.e. not cheap), but now I'm wondering if I can achieve 95% as much performance for half the $.

Do high core counts improve performance very much, or is bandwidth saturated at 4 to 8 cores? I think I want a CPU with 44 PCIe lanes, or does that matter for Prime95 (I'll still need them for other reasons)?

I assume quad channel memory provides the most bandwidth, but at what point are faster speeds not worth the price and hassle? The MB claims DDR4-4400 capability - is faster RAM a better option than more cores?

Thanks for the help, life left me pretty far behind on current tech.

Jon
jbpace is offline   Reply With Quote
Old 2018-01-08, 02:11   #2
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

1163910 Posts
Default

Hi, Jon: suggest you have a look at the nearby New PC dedicated to Mersenne Prime Search thread.
ewmayer is online now   Reply With Quote
Old 2018-01-08, 02:12   #3
Mysticial
 
Mysticial's Avatar
 
Sep 2016

22×83 Posts
Default

Quote:
I need guidance on CPU & memory selection from those that understand applicable choke points. I was headed for an i9-7940X (14 core) with 32GB Quad Channel DDR4-3200 (i.e. not cheap), but now I'm wondering if I can achieve 95% as much performance for half the $.

Do high core counts improve performance very much, or is bandwidth saturated at 4 to 8 cores? I think I want a CPU with 44 PCIe lanes, or does that matter for Prime95 (I'll still need them for other reasons)?
From my experience, for LL-type workloads, you're going to be bottlenecked on memory bandwidth with anything more than like 8 cores. I imagine that even the 6 core could be bottlenecked by a hypothetical AVX512-optimized P95 if the memory is clocked any lower than like 3200 MT/s.

Quote:
I assume quad channel memory provides the most bandwidth, but at what point are faster speeds not worth the price and hassle? The MB claims DDR4-4400 capability - is faster RAM a better option than more cores?
Most people with the highest-binned Samsung B-die memory will be able to achieve bench-stability at 4200 MT/s. But in reality, absolute stability will likely be difficult above 3200 MT/s.

On my own system with a 7900X, 3800 MT/s seems to be stable. But in reality, anything above 3200 MT/s will error at least once a month under sustained load. Your results may vary depending on the capability of the IMC.
Mysticial is offline   Reply With Quote
Old 2018-01-08, 03:22   #4
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

293210 Posts
Default

Basically, what Mysticial said.

I'd probably get the i9-7900X. At its all-core turbo of 4 GHz, it will still be a bit memory bottlenecked with only quad channel DDR4-3200.

The i7-7820X is probably a better fit for that memory, but then you'll lose PCIe lanes. Prime95 doesn't care about PCIe lanes.

Last fiddled with by Mark Rose on 2018-01-08 at 03:22
Mark Rose is offline   Reply With Quote
Old 2018-01-09, 15:19   #5
jbpace
 
"Jon Pace"
Jan 2018
Germantown, TN

158 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Hi, Jon: suggest you have a look at the nearby New PC dedicated to Mersenne Prime Search thread.
That's one of the threads I'd studied, hence Quad channel RAM.


Quote:
Originally Posted by Mark Rose View Post
I'd probably get the i9-7900X. At its all-core turbo of 4 GHz, it will still be a bit memory bottlenecked with only quad channel DDR4-3200.
This is the type information I wish I could calculate! Is there any way of calculating (or estimating) MT/s needed for X workers using Y cores running at Z GHz, or is it all word of mouth as different people experiment with different CPU/RAM combinations?


Here's an example I'd like to better understand - Aaron's Xeon server double-checked the prime in 37 hours vs. the ~140 my PC required. How much of that reduction was driven by:
  • more CPUs assigned to the worker (two workers can't process the same number, can they?)
  • higher memory bandwidth in server class machines
  • setting Prime95 to a higher priority (I'm assuming)

If server class bandwidth is my ultimate bottleneck, my next build will be completely different!
jbpace is offline   Reply With Quote
Old 2018-01-09, 16:22   #6
axn
 
axn's Avatar
 
Jun 2003

13BB16 Posts
Default

Quote:
Originally Posted by jbpace View Post
Here's an example I'd like to better understand - Aaron's Xeon server double-checked the prime in 37 hours vs. the ~140 my PC required. How much of that reduction was driven by:
  • more CPUs assigned to the worker (two workers can't process the same number, can they?)
  • higher memory bandwidth in server class machines
  • setting Prime95 to a higher priority (I'm assuming)

If server class bandwidth is my ultimate bottleneck, my next build will be completely different!
Those Xeon chips have humongous L3 cache which can fit an entire FFT, hence memory bandwidth will not be a concern for them (provided you only do one LL at a time).
axn is offline   Reply With Quote
Old 2018-01-09, 17:09   #7
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

293210 Posts
Default

Quote:
Originally Posted by jbpace View Post
This is the type information I wish I could calculate! Is there any way of calculating (or estimating) MT/s needed for X workers using Y cores running at Z GHz, or is it all word of mouth as different people experiment with different CPU/RAM combinations?
Benchmarking and extrapolation. Experience shows the optimum is about a 3 to 2 ratio of Skylake core MHz to DDR4 MHz. So a quad-core at 3.3 GHz would be happy with dual channel DDR4-2200.

Quote:
[*]setting Prime95 to a higher priority (I'm assuming)
This makes almost no difference.
Mark Rose is offline   Reply With Quote
Old 2018-01-09, 17:23   #8
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2·3·1,693 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
Benchmarking and extrapolation. Experience shows the optimum is about a 3 to 2 ratio of Skylake core MHz to DDR4 MHz. So a quad-core at 3.3 GHz would be happy with dual channel DDR4-2200.
Thanks for the ratio, Mark. It suggests that my 6700K at 4300 MHz has headroom with RAM at 3200 MHz. (result is 2867 MHz)
kladner is offline   Reply With Quote
Old 2018-01-09, 18:04   #9
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

24·33 Posts
Default

Quote:
Originally Posted by jbpace View Post
This is the type information I wish I could calculate! Is there any way of calculating (or estimating) MT/s needed for X workers using Y cores running at Z GHz, or is it all word of mouth as different people experiment with different CPU/RAM combinations?
I did a similar exercise to Mark Rose in the past, but I ended up with a different conclusion. What I figured out was:
Score = (ideal ram bandwidth in GB/s) / (all core AVX active clock in GHz * number of cores * CPU correction factor)

Ideal ram bandwidth is approx. rated MT/s * channels * 8 / 1000
CPU correction factor is 1 for Skylake+, 0.88 for Haswell, 0.82 for Broadwell, 0.58 for Sandy Bridge, and I'm assuming 0.5 for Ryzen but not tested in depth.

I don't have the exact values on me, but the score above is "higher is better" in a non-linear way. A score of around 3 would be about 90% the performance compared to being practically ram unlimited. A score of around 4 would be around 95%. It dropped off quite steeply below 3. There should be consideration to ram ranks per channel also, but I can't remember if the above included or excluded that effect. I have a figure in my head of up to 20% benefit from dual rank ram vs single rank if operating in severely ram bandwidth limited scenario.

Following CPUs are all 4 cores at 4 GHz fixed. My main system is 6700k, with 3200 dual channel, dual rank ram, is noticeably faster than my other 6700k with 2666 dual channel, dual rank ram, or my 6600k with 3000 dual channel single rank ram. I'm currently testing a 8350k with manually overclocked ram 3866 dual channel single rank, and that is faster than all previous. L3 cache size doesn't make a significant difference between these for large FFT tasks. I can get some numbers later for better indication. Currently all these systems are running the SoB double-check challenge at PrimeGrid, and I did a comparison of timings for 2M FFT work units I can look up later.

8350k (3866 SR ram) - 2.02 results per day
6700k (3200 DR ram) - 1.85 results per day (this is my main system which is in daily use, so may have impacting timings)
6700k (2666 DR ram) - 1.78 results per day
6600k (3000 SR ram) - 1.61 results per day
Approx 4% spread in timings from slowest to fastest for each of these.

7800X (3200 SR ram) - 2.60 results per day (stock clocks, 3.8 GHz all cores AVX)

Last fiddled with by mackerel on 2018-01-09 at 18:46 Reason: Add timings
mackerel is offline   Reply With Quote
Old 2018-01-09, 18:31   #10
Mysticial
 
Mysticial's Avatar
 
Sep 2016

22·83 Posts
Default

So I ran prime95 under VTune last night for an unrelated reason.

For the 4096K FFT, the result is that it was completely memory bound.

Core i9 7900X @ 4.0 GHz AVX (11% overclock)
Quad Channel DDR4 @ 3200 MT/s (20% overclock)
Cache @ 3.0 GHz (25% overclock)

I don't have the bandwidth graph on hand to show it. But basically, the system was maxed out at 70+ GB/s for the entire test without ever dropping below that.

So if you're looking to maximize the price/performance for LL-testing, I'd recommend the 7800X. It's already enough as it is. And will be more than overkill when George gets AVX512 in.

This is not unexpected since we've already calculated approximately a 4x performance gap between CPU and memory: http://www.mersenneforum.org/showthr...222#post469222

Last fiddled with by Mysticial on 2018-01-09 at 18:41
Mysticial is offline   Reply With Quote
Old 2018-01-09, 21:06   #11
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

22·733 Posts
Default

mackerel has done more testing than I have, so I would take his numbers over mine.
Mark Rose is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Newer X64 build needed Googulator Msieve 73 2020-08-30 07:47
New build help Prime95 Hardware 147 2018-11-10 00:58
New build with ECC ram PageFault Hardware 1 2012-08-23 03:44
Prime celebration in Ottawa, Canada? argilo Lounge 5 2008-09-17 08:43
Prime celebration in Rome, Italy ET_ Lounge 3 2008-09-12 11:22

All times are UTC. The time now is 01:11.


Sat Jul 17 01:11:40 UTC 2021 up 49 days, 22:58, 1 user, load averages: 0.63, 1.20, 1.38

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.