![]() |
|
|
#34 |
|
"Tilman Neumann"
Jan 2016
Germany
2×3×7×11 Posts |
Is this of interest in the given context? Zen3 with 192MB L3 cache... https://www.extremetech.com/computin...cache-at-2tb-s |
|
|
|
|
|
#35 | |
|
"Composite as Heck"
Oct 2017
22·32·23 Posts |
Quote:
Yes it is, it's AMD's next move which will answer Alder Lake, way ahead of Zen4's release. Probably a decent uplift for normal workloads but potentially a game changer for us. https://mersenneforum.org/showthread.php?t=26864 |
|
|
|
|
|
|
#36 | |
|
"Tilman Neumann"
Jan 2016
Germany
46210 Posts |
Quote:
I should have known that there is a proper thread on the topic ;-) |
|
|
|
|
|
|
#37 |
|
"Viliam Furík"
Jul 2018
Martin, Slovakia
54 Posts |
According to this video from the channel "Moore's Law Is Dead":
there should be Threadripper 5000 3D lineup. It was not mentioned how much cache should be added, but if we assume +64 MiB per chiplet, alongside the already present 32 MiB per chiplet, we could be getting 768 MiB of L3 cache for the "5990X 3D". That amount of cache should be able to contain: up to 16 FTC's with 6M FFT (roughly 109M-115M exponents) - tight fit, more realistic for lower FFTs, such as 5.5M; up to 32 DC's with 3M FFT - again, tight fit; up to 27 DC's with 3.5M FFT - roughly 66M-69M exponents; up to 128 PRP-CF's with 768K FFT (roughly 15M exponents) - we've already run out of cores... For the giants: 5 tests in 100M-digit exponents for 18M FFT; 3 tests in the range of roughly 522M-559M exponents - 32M FFT; 2 tests in the range of roughly 885M to 893M exponents - 48M FFT; 1 test in the range of roughly 1779M to 1787M exponents - 96M FFT. In the sense of what cache allows, we can potentially run out of not only cores but of the Prime95 currently available exponent range (caused by CPU limits). However, it will still be a long time until we need to expand further than 1000M exponents. I will soon post my 3990X benchmark analysis, with the data kindly provided by paulunderwood. It will be in the "Perpetual benchmark thread...". Based on that data, a rough estimate of performance increase caused by the Zen 3 generation, and the assumption the 5990X 3D will indeed have 768 MiB of cache for 64 cores, I made this rough performance extrapolation (relevance of its inaccuracy goes up with the FFT length): FFT - best-configuration throughput (it/s) - time per test - test throughput - number of simultaneous tests 768K - 20,000 - 13.5 hours - 115 tests/day - 64 3.5M - 5,000 - 4 days - 180 tests/month - 24 6M - 2,500 - 8 days - 60 tests/month - 16 18M - 575 - 33 days - 60 tests/year - 5 32M - ~300 - 60 days - 18 tests/year - 3 architectural limits reached, only theoretical extrapolation: 48M - ~200 - 100 days - 7 tests/year - 2 96M - ~100 - 200 days - 3 tests/year - 1 These extrapolations were hard to estimate because of weirdness in the data, which will be mentioned in the benchmark analysis - could be very different based on other data. |
|
|
|
|
|
#38 |
|
Feb 2016
UK
22×109 Posts |
I haven't looked at it in a while but I recall from testing on Zen 2 scaling beyond a CCX was less than ideal. I don't see any reason for that to change with Zen 3 other than a CCX is now bigger. Peak performance was for work that fit in L3 cache of a single CCX. With bigger CCX that increased the local L3 pool, and the vcache increases that further. Breaking out of a CCX saw a limited increase in performance, presumably due to the bandwidth constraint between CCDs. Sure, they fell off less fast than a similar core count Intel that has less cache and was memory bound, but it still wasn't ideal scaling that would be expected from unified L3.
So for a hypothetical 64 core Zen 3 high cache CPU, I'd expect the following breakpoints. They're not exact, and it gets a bit questionable close to the boundaries: 3M-4M FFT: 24 workers 4M-6M FFT: 16 workers 6M-12M FFT: 8 workers 12M-24M FFT: 4 workers 24M-48M FFT: 2 workers Beyond 48M: 1 worker It scales to smaller FFTs, you can see the pattern there. My reasoning for the above is to keep data local within one CCX which works well up to 12M FFT assuming 96MB cache per CCX. Beyond that I'm less confident as internal bandwidth will play a limiting factor and I've not got a good understanding of it. I aim to keep possible traffic low by each worker using as few CCXs as possible. Dividing the cores otherwise I think will increase bandwidth usage and lower performance. Assumes good affinity set. Fewer workers may be better if the internal bandwidth is a bigger constraint in practice. |
|
|
|
|
|
#39 |
|
"Composite as Heck"
Oct 2017
82810 Posts |
Moore's Law is dead is pure speculation just like any predictions we care to make. The main thing I got from the video is that they should definitely call Zen 3 + v-cache Zen3D. Then you can do marketing nonsense like Zen4D, indices like Zen3D^2 for multiple cache stacks, and an homage to voodoo with 3DFX for a cheeky poke at nvidia.
Putting v-cache on some but not all dies is not something I've considered as it makes the processor less uniform, but it is feasible as there is already the concept of some dies being stronger than others (and a preferred die and even preferred core). From AMD's perspective it could be a rather neat way of differentiating Threadripper from Epyc and having four tiers of product:
That would justify making Threadripper more expensive than base Epyc, and hiking the price of EpycD to old-intel proportions. Whatever AMD does intel had better get a move on before their roles are fully reversed. |
|
|
|
|
|
#40 |
|
Jan 2020
32×41 Posts |
New APUs become available to the desktops. They are AMD Ryzen 7 5700G and AMD Ryzen 5 5600G. Since there are "Radeon Vega Graphics" in the CPUs, will those allow me to run the trial factoring on Prime95 with the similar speed as mfaktc on the GPUs? Will they speed up the PRP tests?
Last fiddled with by tuckerkao on 2021-08-06 at 02:01 |
|
|
|
|
|
#41 | |
|
"Viliam Furík"
Jul 2018
Martin, Slovakia
11618 Posts |
Quote:
No, they can not run PRP faster, nor help the CPU part with PRP. At least as far as I know. |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Threadripper 3990X vs. Threadripper 3970X | Viliam Furik | Hardware | 21 | 2020-04-19 08:24 |
| Updated Bios on B350M Board for Ryzen Series 3000! | megav13 | Hardware | 1 | 2019-09-03 15:10 |
| Has anyone tried linear algebra on a Threadripper yet? | fivemack | Hardware | 3 | 2017-10-03 03:11 |
| 5000 < k < 6000 | justinsane | Riesel Prime Data Collecting (k*2^n-1) | 26 | 2010-12-31 12:27 |
| 6000 < k < 7000 | amphoria | Riesel Prime Data Collecting (k*2^n-1) | 0 | 2009-04-12 16:58 |