mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   AMD Ryzen 7 3700X? (https://www.mersenneforum.org/showthread.php?t=25775)

lukerichards 2020-07-25 08:07

AMD Ryzen 7 3700X?
 
Hi all,

So for reasons unbeknown to me, I've started binge watching Linus Tech Tips on YouTube. (I had no intention of building a new PC, although Linus may have given me the big a bit).

Anyhoo, he's been raving a lot recently about how AMD are having a revival and finally overtaking Intel in terms of gaming performance.

The frustrating thing for me when I hear these reviews is that I'm not interested in gaming performance. If I were to build a PC (and I'm not in the marking right now but I might be in 12 months) then my main focus from the CPU would be Prime95 performance. P95, as we all know, is heavily Intel optimised.

So out of curiosity I went over to the CPU benchmarks section of the Mersenne.org website and was somewhat surprised by what I found.

AMD Ryzen™ 7 3700X - £261 on Amazon UK
[url]https://www.mersenne.org/report_benchmarks/?exp_date=2016-01-01&64bit=1&exover=1&exbad=1&exv25=1&exv26=1&specific_cpu=4384178[/url]

Intel(R) Core(TM) i9-10920X - £690 on Amazon UK
[url]https://www.mersenne.org/report_benchmarks/?exp_date=2016-01-01&64bit=1&exover=1&exbad=1&exv25=1&exv26=1&specific_cpu=4384796[/url]

Intel Core i9-9900K - £450 on Amazon UK
[url]https://www.mersenne.org/report_benchmarks/?exp_date=2016-01-01&64bit=1&exover=1&exbad=1&exv25=1&exv26=1&specific_cpu=4382946[/url]



So the Ryzen is clearly up there with these high end Intel CPUs.

What gives? Is there something I've missed here?

Viliam Furik 2020-07-25 08:44

Ryzen processors have a really good price/performance ratio, but also higher L3 cache size, which AFAIK results in better speeds in Prime95, because whole FFT data can stay in fast processor cache, instead of wandering back and forth into the slower RAM. (If I am wrong, somebody, please, correct me on this). Especially 3900X is great with its 64 MB of L3 cache that can contain one test up to ~120M exponent size.

But if you want to build your computer year from now, I would certainly go for Ryzen 4000 that will launch in Fall. Go for the top model in the not-yet-Threadripper line. I have done some calculations that result in 3900X having the best performance per watt and cost, compared to TR 3970X and TR 3990X. However, I didn't do the performance tests myself to get the numbers, so I just guesstimated the wattage on the Threadrippers.

mackerel 2020-07-25 08:53

There are multiple factors that all feed into the overall performance.

For scenarios where the cores are not limited by cache/ram, normalised for cores&clock, Zen 2 is about 5.7% faster than Skylake (includes Coffee Lake, Comet Lake, and more). Skylake-X/Cascade Lake-X has AVX-512. I've only just tried to compare recently and don't fully trust the numbers as precise, but it is about 75-80% faster than Skylake. This is actually lower than I thought, as in the past I had seen closer to 100%. I don't know what's changed, or if I tested with a different methodology previously. The above tests were performed in the last few days using the SGS project at Primegrid. These are very small units so wont be ram or cache limited, but I do wonder if they're so small the faster CPUs can't reach their best efficiency. I'll retest with some bigger tasks another time as I'm running a challenge at Primegrid right now.

3700X has 2x16MB L3 cache, with a further 0.5M/core L2 that could be added as they run exclusive cache. The split CCX structure means best performance is achieved if tasks fit in each and run separately (up to around 2048k FFT). For larger tasks this is not possible and the internal bus speed (FCLK) and ram speed will impact it. Best performance is usually achieved if these are kept in sync and high as possible (typically 3600 ram).

10920X is more complicated. It has 19.25MB of L3 cache, but through observation it seems applicable to also count the non-inclusive L2 cache of 1MB/core, for a total effective 31.25MB. It should do well for most tasks, combined with its quad channel ram support. The single benchmark linked seems unremarkable, but we don't know how it was configured. A drawback of AVX-512 is that while it can provide massive throughput, it also takes massive power while doing so. Clocks tend to run lower while it is in operation, which offsets the gains. My 7920X runs around 2.9 GHz for this type of work.

9900k is relatively simple with 16MB of L3 cache. So single tasks beyond 2048k will start hitting ram, and fast ram will provide a good benefit there.

If I were asked what is the most cost effective system, it is a difficult one to answer as it will be influenced by what else you put around the system.

We also have rumours that next gen Ryzen (Zen 3) will have a bigger CCX of 8 cores. This removes a barrier and unifies the cache withing a die, and should allow it to attain even more performance in more use cases. I'm excited to get one and try it out as soon as they're released.

M344587487 2020-07-25 09:15

P95 might be more optimised for intel than AMD because they've been the dominant examples of x86 for the majority of recent memory, but P95 is really optimised for x86 in general. There was an effort to implement the intel-only AVX512 and I think there were gains, but the downclocking and power usage that occurs when AVX512 is used made the results a little anti-climactic compared to what you might expect. There are differences like certain instructions taking a different number of cycles to complete on different architectures, but there are wide differences even within intel's numerous architectures so I don't think even P95 optimises fully down to a per-cycle basis.

Any Ryzen from Zen 2 onwards has full AVX2 support and oodles of cache so it's a good choice for P95. This post and the thread it's from has data on throughput vs intel ( [URL]https://www.mersenneforum.org/showpost.php?p=521143&postcount=61[/URL] ). In the end intel and AMD both hit a memory bandwidth bottleneck, but AFAIK the cache makes AMD a throughput winner as for a given amount of work less bandwidth is required (I know nothing about the latest intel chips, they've made alterations to cache I'm not up to speed on). You may not be interested in gaming but the gamers favourite the 3600 is probably the best Zen 2 chip from a price-to-performance viewpoint, but it depends on the rest of the system.

The CPUs coming out this year based on Zen 3 should be interesting (Ryzen 4000), there's the usual generational IPC and clock gain benefits but they are also unifying the L3 cache so you have 32MiB shared per chiplet instead of the current Zen 2 which has 16MiB shared per quad core. This means that the cache can be more effectively utilised on the same workload and may provide quite a performance boost over Zen 2, worth watching at any rate.

But really if you're buying hardware specifically for PRP or P-1 you should get the Radeon VII, it kicks the crap out of anything else on a price-to-throughput, price-to-efficiency and price-to-density basis.

S485122 2020-07-25 10:14

[QUOTE=mackerel;551542]...
10920X is more complicated. It has 19.25MB of L3 cache, but through observation it seems applicable to also count the non-inclusive L2 cache of 1MB/core, for a total effective 31.25MB. It should do well for most tasks, combined with its quad channel ram support. The single benchmark linked seems unremarkable, but we don't know how it was configured. A drawback of AVX-512 is that while it can provide massive throughput, it also takes massive power while doing so. Clocks tend to run lower while it is in operation, which offsets the gains. My 7920X runs around 2.9 GHz for this type of work.
...[/QUOTE]I think that benchmark for the i9-10920X comes from my computer. It was submitted while the OS was Windows 7 which implies no support for Intel processors after the 6th generation and especially no AVX-512 as is revealed by the available features on the benchmark.

At the moment on 2800K FFTs the timing for one worker twelve cores is 0,9 ms/iteration. The CPU is under-clocked (no turbo and a power cap of 140 W instead of 165 W to keep the noise low, the temperature below 55 °C and the power usage reasonable at the same time.)

I didn't act on the fact that that benchmark had been uploaded : I was busy installing Windows 10 and trying to make the interface more ergonomic and also to close the data leaks (as far as possible [noparse]:´-([/noparse]

I just marked the benchmark as suspect.

Jacob

lukerichards 2020-07-25 18:33

[QUOTE=S485122;551547]I think that benchmark for the i9-10920X comes from my computer. It was submitted while the OS was Windows 7 which implies no support for Intel processors after the 6th generation and especially no AVX-512 as is revealed by the available features on the benchmark.

At the moment on 2800K FFTs the timing for one worker twelve cores is 0,9 ms/iteration. The CPU is under-clocked (no turbo and a power cap of 140 W instead of 165 W to keep the noise low, the temperature below 55 °C and the power usage reasonable at the same time.)

I didn't act on the fact that that benchmark had been uploaded : I was busy installing Windows 10 and trying to make the interface more ergonomic and also to close the data leaks (as far as possible [noparse]:´-([/noparse]

I just marked the benchmark as suspect.

Jacob[/QUOTE]

I'm unaware how the benchmark database works... does this mean that there will be a new benchmark for i9-10920X soon?

Prime95 2020-07-25 19:31

[QUOTE=lukerichards;551587]I'm unaware how the benchmark database works... does this mean that there will be a new benchmark for i9-10920X soon?[/QUOTE]

Do not trust the benchmark database to make any purchasing decisions. It shows the per-iteration times for one exponent (I believe on just one core).

The more important number is throughput. How many iterations/second can a CPU produce with all cores running on the optimal number of workers.

lukerichards 2020-07-25 19:44

[QUOTE=Prime95;551595]Do not trust the benchmark database to make any purchasing decisions. It shows the per-iteration times for one exponent (I believe on just one core).

The more important number is throughput. How many iterations/second can a CPU produce with all cores running on the optimal number of workers.[/QUOTE]

Thanks. I had assumed it was a single core value. That is, I'd assumed that the value on a dual core CPU should not be directly compared with one on an eight core CPU.

lukerichards 2020-07-25 19:47

[QUOTE=M344587487;551543]
But really if you're buying hardware specifically for PRP or P-1 you should get the Radeon VII, it kicks the crap out of anything else on a price-to-throughput, price-to-efficiency and price-to-density basis.[/QUOTE]

I have always been nervous about looking into buying a GPU for prime crunching on account of not having the foggiest idea what software can and should be used for what purpose, or even where to find this information.

I should probably get over that at some point.

Prime95 2020-07-25 20:33

[QUOTE=lukerichards;551600]I have always been nervous about looking into buying a GPU for prime crunching on account of not having the foggiest idea what software can and should be used for what purpose, or even where to find this information.

I should probably get over that at some point.[/QUOTE]

The problem is actually the opposite. If you open the gpuowl threads, prepare to be overwhelmed.

Just so you know "kicks the crap" means 10 times the throughput of Intel and AMD offerings that use dual-channel memory. Maybe 5 times for Intel's pricey offerings with quad-channel memory. Not sure about AMD large L3 cache offerings -- maybe 6 times the throughput.

The downside is getting your hands on a Radeon VII is no longer easy.

Mark Rose 2020-07-28 03:53

[QUOTE=Prime95;551604]The downside is getting your hands on a Radeon VII is no longer easy.[/QUOTE]

There are two on eBay right now


All times are UTC. The time now is 04:26.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.