mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2021-07-30, 17:36   #34
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

25·3·5 Posts
Default

Quote:
Originally Posted by M344587487 View Post
Bandwidth per iteration is also affected by L3

Is this of interest in the given context? Zen3 with 192MB L3 cache...
https://www.extremetech.com/computin...cache-at-2tb-s
Till is offline   Reply With Quote
Old 2021-07-31, 09:18   #35
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2·3·5·29 Posts
Default

Quote:
Originally Posted by Till View Post
Is this of interest in the given context? Zen3 with 192MB L3 cache...
https://www.extremetech.com/computin...cache-at-2tb-s

Yes it is, it's AMD's next move which will answer Alder Lake, way ahead of Zen4's release. Probably a decent uplift for normal workloads but potentially a game changer for us.



https://mersenneforum.org/showthread.php?t=26864
M344587487 is online now   Reply With Quote
Old 2021-07-31, 13:16   #36
Till
 
Till's Avatar
 
"Tilman Neumann"
Jan 2016
Germany

25·3·5 Posts
Default

Quote:
Originally Posted by M344587487 View Post
Yes it is, it's AMD's next move which will answer Alder Lake, way ahead of Zen4's release. Probably a decent uplift for normal workloads but potentially a game changer for us.



https://mersenneforum.org/showthread.php?t=26864

I should have known that there is a proper thread on the topic ;-)
Till is offline   Reply With Quote
Old 2021-08-01, 23:40   #37
Viliam Furik
 
Viliam Furik's Avatar
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

10110000102 Posts
Default

According to this video from the channel "Moore's Law Is Dead":


there should be Threadripper 5000 3D lineup. It was not mentioned how much cache should be added, but if we assume +64 MiB per chiplet, alongside the already present 32 MiB per chiplet, we could be getting 768 MiB of L3 cache for the "5990X 3D".

That amount of cache should be able to contain: up to 16 FTC's with 6M FFT (roughly 109M-115M exponents) - tight fit, more realistic for lower FFTs, such as 5.5M; up to 32 DC's with 3M FFT - again, tight fit; up to 27 DC's with 3.5M FFT - roughly 66M-69M exponents; up to 128 PRP-CF's with 768K FFT (roughly 15M exponents) - we've already run out of cores...

For the giants: 5 tests in 100M-digit exponents for 18M FFT; 3 tests in the range of roughly 522M-559M exponents - 32M FFT; 2 tests in the range of roughly 885M to 893M exponents - 48M FFT; 1 test in the range of roughly 1779M to 1787M exponents - 96M FFT.

In the sense of what cache allows, we can potentially run out of not only cores but of the Prime95 currently available exponent range (caused by CPU limits). However, it will still be a long time until we need to expand further than 1000M exponents.



I will soon post my 3990X benchmark analysis, with the data kindly provided by paulunderwood. It will be in the "Perpetual benchmark thread...".

Based on that data, a rough estimate of performance increase caused by the Zen 3 generation, and the assumption the 5990X 3D will indeed have 768 MiB of cache for 64 cores, I made this rough performance extrapolation (relevance of its inaccuracy goes up with the FFT length):

FFT - best-configuration throughput (it/s) - time per test - test throughput - number of simultaneous tests
768K - 20,000 - 13.5 hours - 115 tests/day - 64
3.5M - 5,000 - 4 days - 180 tests/month - 24
6M - 2,500 - 8 days - 60 tests/month - 16
18M - 575 - 33 days - 60 tests/year - 5
32M - ~300 - 60 days - 18 tests/year - 3
architectural limits reached, only theoretical extrapolation:
48M - ~200 - 100 days - 7 tests/year - 2
96M - ~100 - 200 days - 3 tests/year - 1

These extrapolations were hard to estimate because of weirdness in the data, which will be mentioned in the benchmark analysis - could be very different based on other data.
Viliam Furik is offline   Reply With Quote
Old 2021-08-02, 08:42   #38
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

23·5·11 Posts
Default

I haven't looked at it in a while but I recall from testing on Zen 2 scaling beyond a CCX was less than ideal. I don't see any reason for that to change with Zen 3 other than a CCX is now bigger. Peak performance was for work that fit in L3 cache of a single CCX. With bigger CCX that increased the local L3 pool, and the vcache increases that further. Breaking out of a CCX saw a limited increase in performance, presumably due to the bandwidth constraint between CCDs. Sure, they fell off less fast than a similar core count Intel that has less cache and was memory bound, but it still wasn't ideal scaling that would be expected from unified L3.

So for a hypothetical 64 core Zen 3 high cache CPU, I'd expect the following breakpoints. They're not exact, and it gets a bit questionable close to the boundaries:
3M-4M FFT: 24 workers
4M-6M FFT: 16 workers
6M-12M FFT: 8 workers
12M-24M FFT: 4 workers
24M-48M FFT: 2 workers
Beyond 48M: 1 worker

It scales to smaller FFTs, you can see the pattern there. My reasoning for the above is to keep data local within one CCX which works well up to 12M FFT assuming 96MB cache per CCX. Beyond that I'm less confident as internal bandwidth will play a limiting factor and I've not got a good understanding of it. I aim to keep possible traffic low by each worker using as few CCXs as possible. Dividing the cores otherwise I think will increase bandwidth usage and lower performance. Assumes good affinity set. Fewer workers may be better if the internal bandwidth is a bigger constraint in practice.
mackerel is offline   Reply With Quote
Old 2021-08-03, 09:52   #39
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2·3·5·29 Posts
Default

Moore's Law is dead is pure speculation just like any predictions we care to make. The main thing I got from the video is that they should definitely call Zen 3 + v-cache Zen3D. Then you can do marketing nonsense like Zen4D, indices like Zen3D^2 for multiple cache stacks, and an homage to voodoo with 3DFX for a cheeky poke at nvidia.


Putting v-cache on some but not all dies is not something I've considered as it makes the processor less uniform, but it is feasible as there is already the concept of some dies being stronger than others (and a preferred die and even preferred core). From AMD's perspective it could be a rather neat way of differentiating Threadripper from Epyc and having four tiers of product:
  • Ryzen with no v-cache except for the highest "gamer" SKU
  • Epyc with no v-cache as now
  • Threadripper with one die v-cache
  • EpycD with fully populated v-cache

That would justify making Threadripper more expensive than base Epyc, and hiking the price of EpycD to old-intel proportions. Whatever AMD does intel had better get a move on before their roles are fully reversed.
M344587487 is online now   Reply With Quote
Old 2021-08-06, 01:59   #40
tuckerkao
 
"Tucker Kao"
Jan 2020
Head Base M168202123

563 Posts
Default

New APUs become available to the desktops. They are AMD Ryzen 7 5700G and AMD Ryzen 5 5600G. Since there are "Radeon Vega Graphics" in the CPUs, will those allow me to run the trial factoring on Prime95 with the similar speed as mfaktc on the GPUs? Will they speed up the PRP tests?

Last fiddled with by tuckerkao on 2021-08-06 at 02:01
tuckerkao is offline   Reply With Quote
Old 2021-08-06, 05:19   #41
Viliam Furik
 
Viliam Furik's Avatar
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

2·353 Posts
Default

Quote:
Originally Posted by tuckerkao View Post
New APUs become available to the desktops. They are AMD Ryzen 7 5700G and AMD Ryzen 5 5600G. Since there are "Radeon Vega Graphics" in the CPUs, will those allow me to run the trial factoring on Prime95 with the similar speed as mfaktc on the GPUs? Will they speed up the PRP tests?
The iGPUs in those APUs are not nearly as powerful as a true, dedicated GPU, like the 2080Ti, or 6800XT. They will offer you relatively weak TF performance, and are compatible only with mfakto, not mfaktc.

No, they can not run PRP faster, nor help the CPU part with PRP. At least as far as I know.
Viliam Furik is offline   Reply With Quote
Old 2021-10-11, 11:20   #42
tuckerkao
 
"Tucker Kao"
Jan 2020
Head Base M168202123

563 Posts
Default

Looks like the release dates of AMD Threadripper 5970x has been delayed to Jan or Feb 2022. The supply market seems to run short now.
tuckerkao is offline   Reply With Quote
Old 2021-10-12, 05:26   #43
JWNoctis
 
"J. W."
Aug 2021

2×13 Posts
Default

To be honest, a Zen 3+ 5800X-class CPU with 96MB L3 would have been much more enticing.

AMD had been quiet and tightlipped lately - Right now it looked like open season for Alder Lake, assuming supply holds up.

Last fiddled with by JWNoctis on 2021-10-12 at 05:27
JWNoctis is offline   Reply With Quote
Old 2021-10-12, 19:42   #44
moebius
 
moebius's Avatar
 
"CharlesgubOB"
Jul 2009
Germany

2×313 Posts
Default

Quote:
Originally Posted by tuckerkao View Post
New APUs become available to the desktops. They are AMD Ryzen 7 5700G and AMD Ryzen 5 5600G. Since there are "Radeon Vega Graphics" in the CPUs, will those allow me to run the trial factoring on Prime95 with the similar speed as mfaktc on the GPUs? Will they speed up the PRP tests?
The Radeon RX Vega 7 Graphics of Ryzen 5 5600G is similar in speed to the GeForce GTX 1650 at PRP-Testing.
https://mersenneforum.org/showpost.p...postcount=2729

Last fiddled with by moebius on 2021-10-12 at 19:43
moebius is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Threadripper 3990X vs. Threadripper 3970X Viliam Furik Hardware 21 2020-04-19 08:24
Updated Bios on B350M Board for Ryzen Series 3000! megav13 Hardware 1 2019-09-03 15:10
Has anyone tried linear algebra on a Threadripper yet? fivemack Hardware 3 2017-10-03 03:11
5000 < k < 6000 justinsane Riesel Prime Data Collecting (k*2^n-1) 26 2010-12-31 12:27
6000 < k < 7000 amphoria Riesel Prime Data Collecting (k*2^n-1) 0 2009-04-12 16:58

All times are UTC. The time now is 15:50.


Tue Nov 30 15:50:21 UTC 2021 up 130 days, 10:19, 0 users, load averages: 1.76, 1.41, 1.44

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.