mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2023-01-13, 12:27   #1
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2·52·19 Posts
Default Sapphire Rapids +++

https://www.phoronix.com/review/inte...ire-rapids-max

https://www.phoronix.com/review/inte...platinum-8490h

I'll just cut to the chase and ask how long before intel is competitive in servers again? They seem to have lost general compute in performance and efficiency, banking instead on accelerators in a big way to make up lost ground in key specialist compute. AMX seems to be the big win for ML workloads, and that's it? From what I read you mostly want to run ML on GPU's as they are a powerhouse with ML on CPU's being more useful for rapid development, so even that win while important is not as important as it appears? Less bandwidth (8vs12 both ddr5-4800), less efficiency, mostly less performance, not a great outlook. At least the HBM2e model is interesting for more cache less burden on bandwidth (for GIMPS maybe a big win, but maybe not as much of a benefit as poorer power efficiency is a detriment).

The gen after sapphire is emerald which should come pretty quickly, the new xeons with HBM2e as L4 cache might be a preview. It doesn't look interesting aside from HBM2e but there's limited information available, it's on the same intel7 node so efficiency prospects look grim. The generation after that will split into a P core variant (Granite rapids) and an E core variant (Sierra forest), on the intel3 node which you'd suppose has enough efficiency gains to warrant a new name. Granite will probably be the more traditional generational upgrade, sierra looks like it'll ditch AVX512 to try and compete with AMD on core count. That they are doing sierra at all is potentially not a good sign that far out (would they be doing it if they didn't have to to compete on core count with AMD? If it's a good move why not do it now?), but AVX512 isn't useful everywhere so max-core-optimised parts do make a lot of sense in general.

How when and the impact of intel and AMD integrating FPGA tech is also up for debate, I assume intel is going to be the first mover and that may be a big part of the accelerator plan. It might at least play a big part in the DLC plan to cater to very niche workloads for a price. Zen5 is likely 2025 (they say 2024/2025 so 2025 for general consumption) and is the earliest we're likely to see AMD integrate Xilinx tech in a user-facing way.

Amd has Zen4, Zen4 with extra cache, and Zen4c cores (AFAIK reducing cache to fit more cores, allowing this gen to top out at 128 cores), Zen5 is 2025 when TSMC's 3nm becomes viable. In that time it seems unlikely that intel can even achieve parity with Zen4 in the general sense let alone play leapfrog, but maybe I'm underestimating the impact of some of these moves.

Is that a reasonable analysis, is there more to account for?
M344587487 is offline   Reply With Quote
Old 2023-01-13, 12:49   #2
Luminescence
 
"Florian"
Oct 2021
Germany

110011102 Posts
Default

Quote:
Originally Posted by M344587487 View Post
I'll just cut to the chase and ask how long before intel is competitive in servers again? They seem to have lost general compute in performance and efficiency, banking instead on accelerators in a big way to make up lost ground in key specialist compute.

Is that a reasonable analysis, is there more to account for?
You should account for the fact that those accelerators are not even available fully on most CPUs... and even if they are available, they are locked away behind a subscription paywall ("Intel On Demand").

The pricepoints are even more ridiculous considering the paywall.
Luminescence is offline   Reply With Quote
Old 2023-01-13, 21:44   #3
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Liverpool (GMT/BST)

614110 Posts
Default

The root of Intel's problems has been the stalling on 14nm. Since this they have restructured and have been working on a number of nodes in parallel(7/4/3/20A/18A). Intel seem to be indicating that the majority of these nodes should be available on time(or ahead). I think we will see some cancelled products as they work through these nodes as nodes are skipped due to better nodes being available earlier(time gaps were always narrow).
If things go to plan we should be seeing 18A cpus in 2025 which should be on a node at the very least as good as TSMC(probably better).

https://www.tomshardware.com/news/in...18nm-pulled-in

There is also some pretty decent evidence that Intel are working on a redesigned core with Jim Keller in the 2024/2025 timeframe which should see large IPC gains.

I suspect that around 2025 there will be fierce competion between Zen 5 and Intel's offering. Having competition again is only going to be good for cpu improvements.
henryzz is offline   Reply With Quote
Old 2023-01-15, 10:40   #4
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

1110000002 Posts
Default

I don't keep too close on server tier hardware but I do wonder if SPR would have been better perceived had it been on time? If its successor is not similarly delayed it would be interesting to see how they line up there. Even if both are nominally on the "same" Intel 7 process, that doesn't rule out ongoing improvements to that process. Look at the difference between 12th and 13th gen consumer where power efficiency has improved, and 13th gen isn't that far behind Zen 4 in perf/W in architecture terms, as opposed to product operating points. To me it feels like AMD are still supply constrained, and it still falls to Intel to provide the volume the industry needs. A 2nd position product is still better than no product.

Intel know the pains since 14nm have held them back, with plans to catch up and even overtake TSMC in process technology in the next couple years or so. With product delays everywhere it remains to be seen if they can execute.

On the GPU question, that's always been there. The way I understand it is that some workloads don't scale to GPUs. They might be incredibly fast if you can feed them, but the on board VRAM is inadequate to do that. Thus CPUs with possibly TBs of ram can outperform in that usage case.

On FPGAs, I think there's two ways that could go. Would either side offer a general FPGA module in general? I feel it is too specialised for that, and those that know what to do with one can continue much as they do now. But what about a FPGA module offering a defined function? AMD might have already gone this route on their Phoenix mobile processors, which are reported to have a FPGA implemented "AI Engine". I suspect that is to allow them to update throughout the product lifetime for better performance as it is a fast moving area.
mackerel is offline   Reply With Quote
Old 2023-01-15, 13:15   #5
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2×52×19 Posts
Default

The accelerators available look quite focused (compression, encryption, memory management, storage management) and there's very little public software that has implemented any of it, the ecosystem probably needs a few years in the oven at least to be useful beyond the big players: https://www.phoronix.com/review/inte...lerators-linux

Quote:
Originally Posted by mackerel View Post
I don't keep too close on server tier hardware but I do wonder if SPR would have been better perceived had it been on time? If its successor is not similarly delayed it would be interesting to see how they line up there. Even if both are nominally on the "same" Intel 7 process, that doesn't rule out ongoing improvements to that process. Look at the difference between 12th and 13th gen consumer where power efficiency has improved, and 13th gen isn't that far behind Zen 4 in perf/W in architecture terms, as opposed to product operating points. To me it feels like AMD are still supply constrained, and it still falls to Intel to provide the volume the industry needs. A 2nd position product is still better than no product.
Ideally for intel it would probably have been released between Zen2/3 for some leapfrog, the generally worse general performance relative to zen3 being offset by specialising. As it stands they lost their big differentiator of AVX512, they have the next differentiator in AMX but without general efficiency or performance competitiveness it's nowhere near enough. If these accelerators were mature today and the ecosystem was bigger then they would feel more competitive IMO, from the looks of it they're talking a big game but that game is in the far distance.

Quote:
Originally Posted by mackerel View Post
On the GPU question, that's always been there. The way I understand it is that some workloads don't scale to GPUs. They might be incredibly fast if you can feed them, but the on board VRAM is inadequate to do that. Thus CPUs with possibly TBs of ram can outperform in that usage case.
You might be right, partially for caching the entire input in RAM to be pre/post-processed and partially to avoid PCIe bottlenecking. I think the next big leap will come from fat direct connections between storage and GPU, storage access could be parallelised to match the bandwidth of vram which far exceeds the bandwidth of CPU RAM and then petabytes of models could be fed as quickly as gpu-core/vram allows.

Quote:
Originally Posted by mackerel View Post
On FPGAs, I think there's two ways that could go. Would either side offer a general FPGA module in general? I feel it is too specialised for that, and those that know what to do with one can continue much as they do now. But what about a FPGA module offering a defined function? AMD might have already gone this route on their Phoenix mobile processors, which are reported to have a FPGA implemented "AI Engine". I suspect that is to allow them to update throughout the product lifetime for better performance as it is a fast moving area.
If the FPGA element is strong enough maybe a general FPGA module will exist but be gated behind an exorbitant paywall as it would otherwise completely upend the FPGA market. What I am imagining of a general FPGA module if it existed would be something smaller that allows you to define your own specialised instructions, not entirely unlike microcode updates (to my understanding) which they already do under the hood to tweak things but a user-accessible version in its own paddling pool (which totally wouldn't be the next vector for spectre/meltdown exploits honest guv).

With or without a general FPGA module, defined functions seem very likely. Gated behind variable paywalls depending on niche/demand/value.

Quote:
Originally Posted by henryzz View Post
The root of Intel's problems has been the stalling on 14nm. Since this they have restructured and have been working on a number of nodes in parallel(7/4/3/20A/18A). Intel seem to be indicating that the majority of these nodes should be available on time(or ahead). I think we will see some cancelled products as they work through these nodes as nodes are skipped due to better nodes being available earlier(time gaps were always narrow).
If they can move up the timeline then great, however time is limited as it is and 2 years is less than it appears. If they can cancel a step entirely it's probably emerald which seems mostly a "previous but better". If granite/sierra can come out before zen5 to close the gap (if not leapfrog) then intel is getting a handle on the situation IMO.

If Zen4c is just Zen4 with less L3, at some point AMD might be better served ditching ZenX and ZenX3D in favour of ZenXC3D (one very focused compute die for everything, stacked with a variable amount of cache). That might be necessary to cram more cores, 1024 AVX512 cores per socket this decade? Extrapolating to absurdity a socket will end up the size of a paving slab and a server rack will just be paving slabs interleaved with water cooling and fiber-optic-spaghetti. This will eventually be miniaturised and skynet can then do its thing: https://static.wikia.nocookie.net/te...20081004025149 Too far?

https://www.youtube.com/watch?v=zIWlXjxyIuM

Last fiddled with by M344587487 on 2023-01-15 at 14:16
M344587487 is offline   Reply With Quote
Old 2023-06-29, 13:01   #6
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

11101101102 Posts
Default

The hbm2e processors at least look interesting in general, more so for prime hunting and anything that can fit in 64GB of ram as hbm-only mode seems to have the most performance: https://www.phoronix.com/review/xeon...468-9480-hbm2e

For prime hunting purposes it may be way better than those benchmarks imply, scaling mostly with bandwidth and having up to 56 avx512 cores. If it weren't for gpu's having eaten cpu's lunch years ago for PRP it would probably have been the next big thing for gimps.

If there's a decent 2nd hand market for these in 5 years (big if, probably US only if anywhere), not having to buy scads of RAM and having the right compute/bandwidth ratio for the way general workloads seem to be heading these might make nice custom workstations (possibly reasonably compactly if someone makes a motherboard with that in mind). Of course by then even top end consumer might be on 32+ cores and 128MiB+ of cache or thereabouts on a newer node, but it's unlikely that so much progress will be made in that time that intels best server part now isn't competitive with something cost-optimised in 2028. IMO all it would take is TSMC's N2 to be delayed to potentially keep these in the running for a while.
M344587487 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Sapphire AMD RADEON VII 16GB HBM2 549 Euro only EU Shipping moebius Flea Market 5 2023-03-09 03:47

All times are UTC. The time now is 16:25.


Fri Jul 7 16:25:12 UTC 2023 up 323 days, 13:53, 0 users, load averages: 2.19, 2.09, 1.69

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔