![]() |
![]() |
#23 |
Romulan Interpreter
Jun 2011
Thailand
222428 Posts |
![]() ![]() |
![]() |
![]() |
![]() |
#24 |
Apr 2017
22·5 Posts |
![]()
Hello,
From the first arch details : - Still 4 cores per CCX, two CCX per chiplet, two core chiplets and one IO die per CPU. Two cores on different dies communicate via the IO die. - Cores support one AVX-256 op per cycle now. - Memory performance is somewhat capped : Infinity fabric frequency is decoupled from RAM freq and halved from 3733MT/s and up. - Still dual channel memory. I had read, that on top of the AVX256 emulation (it took then two cycles), Zen 1/+ was badly starved for memory when using it for GIMPS. Is that likely to remain the case? What else is there that I missed that will make a difference with previous Zen generations and current Intel position? Core topology ? I guess a 8/4 and a 6/6 3900X will not perform the same. (stuffing here also a general question about gpgpu current situation compared to high-end cpus for GIMPS) Edit : damn sorry, should have bumped this thread instead, feel free to move this post... Last fiddled with by maxzor on 2019-06-12 at 09:20 |
![]() |
![]() |
![]() |
#25 |
Feb 2016
UK
1101000002 Posts |
![]()
I'd expect a good uplift in performance, but it will depend on the size of test being run.
The beefed up AVX performance should put it near enough at parity with Intel consumer CPUs, but still lagging AVX-512 2 unit implementations. There are still two 4 core CCX on each 8 core chiplet. Assuming the L3 cache is shared equally with the CCXs then I don't think that would have an impact. Cores don't need to directly talk to each other in this application? The 32MB L3 cache per chiplet should help tests that can fit inside it. I have no idea what GIMPS tests are at, but for other prime number finding with e.g. LLR/PFGW that will fit in the cache, I hope it is practically ram bandwidth unlimited and cores can show their full potential. If it exceeds the cache, the dual channel ram will probably limit. Indicators are these CPUs will support faster ram more easily than before, but it really could use more channels. The two chiplet ones (12, 16 core) will be interesting to test out. I have no plans to get one myself. As far as I can tell each chiplet is connected to the IO die at a rate of 32B/cycle, tied to the base ram clock. Overall effect is bandwidth in each direction is comparable to the ram bandwidth (assuming dual channel). If data has to pass between chiplets that could be a choke point. |
![]() |
![]() |
![]() |
#26 |
Apr 2017
1416 Posts |
![]()
Starting from mackerel's numbers and playing with table-corner calculus :
From this slide of AMD presentation : there is indeed 32MB transferred to/from RAM per cycle. The clockspeed of a 3733MT/s DDR4 RAM is 18xx MHz. (Note that it might be allowed to be manually set higher, will see...) So that would be a 90GB/s max bandwidth. One channel of DDR4-3733 memory should be around 45GB/s, so a dual channel would not bottleneck much more than a quad channel incoming Threadripper? A 4M FFT an iteration needs 133MB bandwidth. That means we could theoretically reach ~650 iter/s on a 8cores-in-one-chiplet - 1-worker? Last fiddled with by maxzor on 2019-06-19 at 18:16 |
![]() |
![]() |
![]() |
#27 |
Feb 2016
UK
41610 Posts |
![]()
I make dual channel 3733 ram about 58GB/s.
I got some conflicting info elsewhere that I haven't been able to verify. To my understanding, Zen and Zen+ have infinity fabric synced to the ram base clock. I assumed this was same for Zen 2, but the unconfirmed claim I had elsewhere was that it was now tied to the ram effective clock, or effectively double that before. The main concern I had was that there is going to be an option to run IF at half or full ratio with ram speeds (to allow higher ram speeds). If IF was at half speed relative to ram base clock, it wouldn't have enough bandwidth even for the ram. So it would make sense if it was referenced off the effective ram speed. We'll just have to wait to get confirmation. Just over 2 weeks to go... |
![]() |
![]() |
![]() |
#28 |
"Sam Laur"
Dec 2018
Turku, Finland
33010 Posts |
![]()
IF at full ratio up to 3733 MHz RAM speed, half rate after that. See attached marketing slide.
|
![]() |
![]() |
![]() |
#29 |
Feb 2016
UK
25·13 Posts |
![]()
I have seen that, but it doesn't say what the actual speed is. Only that it changes. Is the reference clock (for full speed) the ram clock or the effective ram speed?
|
![]() |
![]() |
![]() |
#30 |
"Sam Laur"
Dec 2018
Turku, Finland
1010010102 Posts |
![]()
Should be the same as before, RAM clock, i.e. half the effective speed.
|
![]() |
![]() |
![]() |
#31 |
Feb 2016
UK
6408 Posts |
![]()
Unless I'm missing something or doing something very wrong, that would mean IF bandwidth is lower than ram once you have fast enough ram to hit the low ratio.
|
![]() |
![]() |
![]() |
#32 |
"Composite as Heck"
Oct 2017
3×263 Posts |
![]()
That's the scuttlebutt, one of the major things that could impact a niche like ours. It's plausible to be reality as it likely doesn't affect 99% of use cases so it seems to be a reasonable lever for them to pull.
|
![]() |
![]() |
![]() |
#33 |
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2×29×101 Posts |
![]()
Doesn't it defeat the point of having faster RAM though?
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
RX470 and RX460 announced | VictordeHolland | GPU Computing | 0 | 2016-07-30 13:05 |
Intel Xeon D announced | VictordeHolland | Hardware | 7 | 2015-03-11 23:26 |
Factoring details | mturpin | Information & Answers | 4 | 2013-02-08 02:43 |
Euler (6,2,5) details. | Death | Math | 10 | 2011-08-03 13:49 |
Larrabee instruction set announced | fivemack | Hardware | 0 | 2009-03-25 12:09 |