20210730, 17:36  #34 
"Tilman Neumann"
Jan 2016
Germany
481_{10} Posts 
Is this of interest in the given context? Zen3 with 192MB L3 cache... https://www.extremetech.com/computin...cacheat2tbs 
20210731, 09:18  #35  
"Composite as Heck"
Oct 2017
5^{3}×7 Posts 
Quote:
Yes it is, it's AMD's next move which will answer Alder Lake, way ahead of Zen4's release. Probably a decent uplift for normal workloads but potentially a game changer for us. https://mersenneforum.org/showthread.php?t=26864 

20210731, 13:16  #36  
"Tilman Neumann"
Jan 2016
Germany
1E1_{16} Posts 
Quote:
I should have known that there is a proper thread on the topic ;) 

20210801, 23:40  #37 
"Viliam Furík"
Jul 2018
Martin, Slovakia
2×373 Posts 
According to this video from the channel "Moore's Law Is Dead":
there should be Threadripper 5000 3D lineup. It was not mentioned how much cache should be added, but if we assume +64 MiB per chiplet, alongside the already present 32 MiB per chiplet, we could be getting 768 MiB of L3 cache for the "5990X 3D". That amount of cache should be able to contain: up to 16 FTC's with 6M FFT (roughly 109M115M exponents)  tight fit, more realistic for lower FFTs, such as 5.5M; up to 32 DC's with 3M FFT  again, tight fit; up to 27 DC's with 3.5M FFT  roughly 66M69M exponents; up to 128 PRPCF's with 768K FFT (roughly 15M exponents)  we've already run out of cores... For the giants: 5 tests in 100Mdigit exponents for 18M FFT; 3 tests in the range of roughly 522M559M exponents  32M FFT; 2 tests in the range of roughly 885M to 893M exponents  48M FFT; 1 test in the range of roughly 1779M to 1787M exponents  96M FFT. In the sense of what cache allows, we can potentially run out of not only cores but of the Prime95 currently available exponent range (caused by CPU limits). However, it will still be a long time until we need to expand further than 1000M exponents. I will soon post my 3990X benchmark analysis, with the data kindly provided by paulunderwood. It will be in the "Perpetual benchmark thread...". Based on that data, a rough estimate of performance increase caused by the Zen 3 generation, and the assumption the 5990X 3D will indeed have 768 MiB of cache for 64 cores, I made this rough performance extrapolation (relevance of its inaccuracy goes up with the FFT length): FFT  bestconfiguration throughput (it/s)  time per test  test throughput  number of simultaneous tests 768K  20,000  13.5 hours  115 tests/day  64 3.5M  5,000  4 days  180 tests/month  24 6M  2,500  8 days  60 tests/month  16 18M  575  33 days  60 tests/year  5 32M  ~300  60 days  18 tests/year  3 architectural limits reached, only theoretical extrapolation: 48M  ~200  100 days  7 tests/year  2 96M  ~100  200 days  3 tests/year  1 These extrapolations were hard to estimate because of weirdness in the data, which will be mentioned in the benchmark analysis  could be very different based on other data. 
20210802, 08:42  #38 
Feb 2016
UK
2^{3}×5×11 Posts 
I haven't looked at it in a while but I recall from testing on Zen 2 scaling beyond a CCX was less than ideal. I don't see any reason for that to change with Zen 3 other than a CCX is now bigger. Peak performance was for work that fit in L3 cache of a single CCX. With bigger CCX that increased the local L3 pool, and the vcache increases that further. Breaking out of a CCX saw a limited increase in performance, presumably due to the bandwidth constraint between CCDs. Sure, they fell off less fast than a similar core count Intel that has less cache and was memory bound, but it still wasn't ideal scaling that would be expected from unified L3.
So for a hypothetical 64 core Zen 3 high cache CPU, I'd expect the following breakpoints. They're not exact, and it gets a bit questionable close to the boundaries: 3M4M FFT: 24 workers 4M6M FFT: 16 workers 6M12M FFT: 8 workers 12M24M FFT: 4 workers 24M48M FFT: 2 workers Beyond 48M: 1 worker It scales to smaller FFTs, you can see the pattern there. My reasoning for the above is to keep data local within one CCX which works well up to 12M FFT assuming 96MB cache per CCX. Beyond that I'm less confident as internal bandwidth will play a limiting factor and I've not got a good understanding of it. I aim to keep possible traffic low by each worker using as few CCXs as possible. Dividing the cores otherwise I think will increase bandwidth usage and lower performance. Assumes good affinity set. Fewer workers may be better if the internal bandwidth is a bigger constraint in practice. 
20210803, 09:52  #39 
"Composite as Heck"
Oct 2017
5^{3}×7 Posts 
Moore's Law is dead is pure speculation just like any predictions we care to make. The main thing I got from the video is that they should definitely call Zen 3 + vcache Zen3D. Then you can do marketing nonsense like Zen4D, indices like Zen3D^2 for multiple cache stacks, and an homage to voodoo with 3DFX for a cheeky poke at nvidia.
Putting vcache on some but not all dies is not something I've considered as it makes the processor less uniform, but it is feasible as there is already the concept of some dies being stronger than others (and a preferred die and even preferred core). From AMD's perspective it could be a rather neat way of differentiating Threadripper from Epyc and having four tiers of product:
That would justify making Threadripper more expensive than base Epyc, and hiking the price of EpycD to oldintel proportions. Whatever AMD does intel had better get a move on before their roles are fully reversed. 
20210806, 01:59  #40 
"Tucker Kao"
Jan 2020
Head Base M168202123
2^{5}·3·7 Posts 
New APUs become available to the desktops. They are AMD Ryzen 7 5700G and AMD Ryzen 5 5600G. Since there are "Radeon Vega Graphics" in the CPUs, will those allow me to run the trial factoring on Prime95 with the similar speed as mfaktc on the GPUs? Will they speed up the PRP tests?
Last fiddled with by tuckerkao on 20210806 at 02:01 
20210806, 05:19  #41  
"Viliam Furík"
Jul 2018
Martin, Slovakia
2×373 Posts 
Quote:
No, they can not run PRP faster, nor help the CPU part with PRP. At least as far as I know. 

20211011, 11:20  #42 
"Tucker Kao"
Jan 2020
Head Base M168202123
672_{10} Posts 
Looks like the release dates of AMD Threadripper 5970x has been delayed to Jan or Feb 2022. The supply market seems to run short now.

20211012, 05:26  #43 
"J. W."
Aug 2021
2^{2}×7 Posts 
To be honest, a Zen 3+ 5800Xclass CPU with 96MB L3 would have been much more enticing.
AMD had been quiet and tightlipped lately  Right now it looked like open season for Alder Lake, assuming supply holds up. Last fiddled with by JWNoctis on 20211012 at 05:27 
20211012, 19:42  #44  
"CharlesgubOB"
Jul 2009
Germany
2×313 Posts 
Quote:
https://mersenneforum.org/showpost.p...postcount=2729 Last fiddled with by moebius on 20211012 at 19:43 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Threadripper 3990X vs. Threadripper 3970X  Viliam Furik  Hardware  27  20220114 09:41 
Updated Bios on B350M Board for Ryzen Series 3000!  megav13  Hardware  1  20190903 15:10 
Has anyone tried linear algebra on a Threadripper yet?  fivemack  Hardware  3  20171003 03:11 
5000 < k < 6000  justinsane  Riesel Prime Data Collecting (k*2^n1)  26  20101231 12:27 
6000 < k < 7000  amphoria  Riesel Prime Data Collecting (k*2^n1)  0  20090412 16:58 