mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2020-07-25, 08:07   #1
lukerichards
 
lukerichards's Avatar
 
"Luke Richards"
Jan 2018
Birmingham, UK

25·32 Posts
Default AMD Ryzen 7 3700X?

Hi all,

So for reasons unbeknown to me, I've started binge watching Linus Tech Tips on YouTube. (I had no intention of building a new PC, although Linus may have given me the big a bit).

Anyhoo, he's been raving a lot recently about how AMD are having a revival and finally overtaking Intel in terms of gaming performance.

The frustrating thing for me when I hear these reviews is that I'm not interested in gaming performance. If I were to build a PC (and I'm not in the marking right now but I might be in 12 months) then my main focus from the CPU would be Prime95 performance. P95, as we all know, is heavily Intel optimised.

So out of curiosity I went over to the CPU benchmarks section of the Mersenne.org website and was somewhat surprised by what I found.

AMD Ryzen™ 7 3700X - £261 on Amazon UK
https://www.mersenne.org/report_benc...ic_cpu=4384178

Intel(R) Core(TM) i9-10920X - £690 on Amazon UK
https://www.mersenne.org/report_benc...ic_cpu=4384796

Intel Core i9-9900K - £450 on Amazon UK
https://www.mersenne.org/report_benc...ic_cpu=4382946



So the Ryzen is clearly up there with these high end Intel CPUs.

What gives? Is there something I've missed here?
lukerichards is offline   Reply With Quote
Old 2020-07-25, 08:44   #2
Viliam Furik
 
Jul 2018
Martin, Slovakia

13410 Posts
Default

Ryzen processors have a really good price/performance ratio, but also higher L3 cache size, which AFAIK results in better speeds in Prime95, because whole FFT data can stay in fast processor cache, instead of wandering back and forth into the slower RAM. (If I am wrong, somebody, please, correct me on this). Especially 3900X is great with its 64 MB of L3 cache that can contain one test up to ~120M exponent size.

But if you want to build your computer year from now, I would certainly go for Ryzen 4000 that will launch in Fall. Go for the top model in the not-yet-Threadripper line. I have done some calculations that result in 3900X having the best performance per watt and cost, compared to TR 3970X and TR 3990X. However, I didn't do the performance tests myself to get the numbers, so I just guesstimated the wattage on the Threadrippers.
Viliam Furik is offline   Reply With Quote
Old 2020-07-25, 08:53   #3
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

5×7×11 Posts
Default

There are multiple factors that all feed into the overall performance.

For scenarios where the cores are not limited by cache/ram, normalised for cores&clock, Zen 2 is about 5.7% faster than Skylake (includes Coffee Lake, Comet Lake, and more). Skylake-X/Cascade Lake-X has AVX-512. I've only just tried to compare recently and don't fully trust the numbers as precise, but it is about 75-80% faster than Skylake. This is actually lower than I thought, as in the past I had seen closer to 100%. I don't know what's changed, or if I tested with a different methodology previously. The above tests were performed in the last few days using the SGS project at Primegrid. These are very small units so wont be ram or cache limited, but I do wonder if they're so small the faster CPUs can't reach their best efficiency. I'll retest with some bigger tasks another time as I'm running a challenge at Primegrid right now.

3700X has 2x16MB L3 cache, with a further 0.5M/core L2 that could be added as they run exclusive cache. The split CCX structure means best performance is achieved if tasks fit in each and run separately (up to around 2048k FFT). For larger tasks this is not possible and the internal bus speed (FCLK) and ram speed will impact it. Best performance is usually achieved if these are kept in sync and high as possible (typically 3600 ram).

10920X is more complicated. It has 19.25MB of L3 cache, but through observation it seems applicable to also count the non-inclusive L2 cache of 1MB/core, for a total effective 31.25MB. It should do well for most tasks, combined with its quad channel ram support. The single benchmark linked seems unremarkable, but we don't know how it was configured. A drawback of AVX-512 is that while it can provide massive throughput, it also takes massive power while doing so. Clocks tend to run lower while it is in operation, which offsets the gains. My 7920X runs around 2.9 GHz for this type of work.

9900k is relatively simple with 16MB of L3 cache. So single tasks beyond 2048k will start hitting ram, and fast ram will provide a good benefit there.

If I were asked what is the most cost effective system, it is a difficult one to answer as it will be influenced by what else you put around the system.

We also have rumours that next gen Ryzen (Zen 3) will have a bigger CCX of 8 cores. This removes a barrier and unifies the cache withing a die, and should allow it to attain even more performance in more use cases. I'm excited to get one and try it out as soon as they're released.
mackerel is offline   Reply With Quote
Old 2020-07-25, 09:15   #4
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

631 Posts
Default

P95 might be more optimised for intel than AMD because they've been the dominant examples of x86 for the majority of recent memory, but P95 is really optimised for x86 in general. There was an effort to implement the intel-only AVX512 and I think there were gains, but the downclocking and power usage that occurs when AVX512 is used made the results a little anti-climactic compared to what you might expect. There are differences like certain instructions taking a different number of cycles to complete on different architectures, but there are wide differences even within intel's numerous architectures so I don't think even P95 optimises fully down to a per-cycle basis.

Any Ryzen from Zen 2 onwards has full AVX2 support and oodles of cache so it's a good choice for P95. This post and the thread it's from has data on throughput vs intel ( https://www.mersenneforum.org/showpo...3&postcount=61 ). In the end intel and AMD both hit a memory bandwidth bottleneck, but AFAIK the cache makes AMD a throughput winner as for a given amount of work less bandwidth is required (I know nothing about the latest intel chips, they've made alterations to cache I'm not up to speed on). You may not be interested in gaming but the gamers favourite the 3600 is probably the best Zen 2 chip from a price-to-performance viewpoint, but it depends on the rest of the system.

The CPUs coming out this year based on Zen 3 should be interesting (Ryzen 4000), there's the usual generational IPC and clock gain benefits but they are also unifying the L3 cache so you have 32MiB shared per chiplet instead of the current Zen 2 which has 16MiB shared per quad core. This means that the cache can be more effectively utilised on the same workload and may provide quite a performance boost over Zen 2, worth watching at any rate.

But really if you're buying hardware specifically for PRP or P-1 you should get the Radeon VII, it kicks the crap out of anything else on a price-to-throughput, price-to-efficiency and price-to-density basis.

Last fiddled with by M344587487 on 2020-07-25 at 09:20 Reason: finish sentences
M344587487 is offline   Reply With Quote
Old 2020-07-25, 10:14   #5
S485122
 
S485122's Avatar
 
Sep 2006
Brussels, Belgium

30468 Posts
Default

Quote:
Originally Posted by mackerel View Post
...
10920X is more complicated. It has 19.25MB of L3 cache, but through observation it seems applicable to also count the non-inclusive L2 cache of 1MB/core, for a total effective 31.25MB. It should do well for most tasks, combined with its quad channel ram support. The single benchmark linked seems unremarkable, but we don't know how it was configured. A drawback of AVX-512 is that while it can provide massive throughput, it also takes massive power while doing so. Clocks tend to run lower while it is in operation, which offsets the gains. My 7920X runs around 2.9 GHz for this type of work.
...
I think that benchmark for the i9-10920X comes from my computer. It was submitted while the OS was Windows 7 which implies no support for Intel processors after the 6th generation and especially no AVX-512 as is revealed by the available features on the benchmark.

At the moment on 2800K FFTs the timing for one worker twelve cores is 0,9 ms/iteration. The CPU is under-clocked (no turbo and a power cap of 140 W instead of 165 W to keep the noise low, the temperature below 55 °C and the power usage reasonable at the same time.)

I didn't act on the fact that that benchmark had been uploaded : I was busy installing Windows 10 and trying to make the interface more ergonomic and also to close the data leaks (as far as possible :´-(

I just marked the benchmark as suspect.

Jacob
S485122 is offline   Reply With Quote
Old 2020-07-25, 18:33   #6
lukerichards
 
lukerichards's Avatar
 
"Luke Richards"
Jan 2018
Birmingham, UK

12016 Posts
Default

Quote:
Originally Posted by S485122 View Post
I think that benchmark for the i9-10920X comes from my computer. It was submitted while the OS was Windows 7 which implies no support for Intel processors after the 6th generation and especially no AVX-512 as is revealed by the available features on the benchmark.

At the moment on 2800K FFTs the timing for one worker twelve cores is 0,9 ms/iteration. The CPU is under-clocked (no turbo and a power cap of 140 W instead of 165 W to keep the noise low, the temperature below 55 °C and the power usage reasonable at the same time.)

I didn't act on the fact that that benchmark had been uploaded : I was busy installing Windows 10 and trying to make the interface more ergonomic and also to close the data leaks (as far as possible :´-(

I just marked the benchmark as suspect.

Jacob
I'm unaware how the benchmark database works... does this mean that there will be a new benchmark for i9-10920X soon?
lukerichards is offline   Reply With Quote
Old 2020-07-25, 19:31   #7
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11011111101102 Posts
Default

Quote:
Originally Posted by lukerichards View Post
I'm unaware how the benchmark database works... does this mean that there will be a new benchmark for i9-10920X soon?
Do not trust the benchmark database to make any purchasing decisions. It shows the per-iteration times for one exponent (I believe on just one core).

The more important number is throughput. How many iterations/second can a CPU produce with all cores running on the optimal number of workers.
Prime95 is offline   Reply With Quote
Old 2020-07-25, 19:44   #8
lukerichards
 
lukerichards's Avatar
 
"Luke Richards"
Jan 2018
Birmingham, UK

25·32 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Do not trust the benchmark database to make any purchasing decisions. It shows the per-iteration times for one exponent (I believe on just one core).

The more important number is throughput. How many iterations/second can a CPU produce with all cores running on the optimal number of workers.
Thanks. I had assumed it was a single core value. That is, I'd assumed that the value on a dual core CPU should not be directly compared with one on an eight core CPU.
lukerichards is offline   Reply With Quote
Old 2020-07-25, 19:47   #9
lukerichards
 
lukerichards's Avatar
 
"Luke Richards"
Jan 2018
Birmingham, UK

25×32 Posts
Default

Quote:
Originally Posted by M344587487 View Post
But really if you're buying hardware specifically for PRP or P-1 you should get the Radeon VII, it kicks the crap out of anything else on a price-to-throughput, price-to-efficiency and price-to-density basis.
I have always been nervous about looking into buying a GPU for prime crunching on account of not having the foggiest idea what software can and should be used for what purpose, or even where to find this information.

I should probably get over that at some point.
lukerichards is offline   Reply With Quote
Old 2020-07-25, 20:33   #10
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2×3×1,193 Posts
Default

Quote:
Originally Posted by lukerichards View Post
I have always been nervous about looking into buying a GPU for prime crunching on account of not having the foggiest idea what software can and should be used for what purpose, or even where to find this information.

I should probably get over that at some point.
The problem is actually the opposite. If you open the gpuowl threads, prepare to be overwhelmed.

Just so you know "kicks the crap" means 10 times the throughput of Intel and AMD offerings that use dual-channel memory. Maybe 5 times for Intel's pricey offerings with quad-channel memory. Not sure about AMD large L3 cache offerings -- maybe 6 times the throughput.

The downside is getting your hands on a Radeon VII is no longer easy.
Prime95 is offline   Reply With Quote
Old 2020-07-28, 03:53   #11
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

7·409 Posts
Default

Quote:
Originally Posted by Prime95 View Post
The downside is getting your hands on a Radeon VII is no longer easy.
There are two on eBay right now
Mark Rose is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Ryzen help Prime95 Hardware 9 2018-05-14 04:06
Ryzen 2 efficiency improvements M344587487 Hardware 3 2018-04-25 15:23
Help to choose components for a Ryzen rig robert44444uk Hardware 50 2018-04-07 20:41
29.2 benchmark help #2 (Ryzen only) Prime95 Software 10 2017-05-08 13:24
AMD Ryzen is risin' up. jasong Hardware 11 2017-03-02 19:56

All times are UTC. The time now is 07:29.

Sun Sep 27 07:29:04 UTC 2020 up 17 days, 4:40, 0 users, load averages: 0.95, 1.23, 1.42

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.