mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2021-08-05, 23:19   #1
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

2·17·71 Posts
Default Why do some Xeon processors use so much power?

Quick question for the hardware experts. Intel recently released five new CPUs in the Xeon W-3000 series: https://anandtech.com/show/16822/int...ations-38-core

However, one thing I noticed is that these new chips have a very high TDP compared to those with similar specs. For instance, the 12-core Xeon W-3323 uses up to 220 watts: https://ark.intel.com/content/www/us...-3-90-ghz.html

In comparison, the TDP of the i9-10920X (also 12 cores) is "only" 165 watts even though it has a higher maximum turbo frequency and is built on the more power-consuming 14 nm process: https://ark.intel.com/content/www/us...-3-50-ghz.html

Is there any special reason for this? Would a Xeon W-3323 (as an example) actually consume up to the full 220 watts?

Last fiddled with by ixfd64 on 2021-08-05 at 23:20
ixfd64 is offline   Reply With Quote
Old 2021-08-06, 16:40   #2
Mysticial
 
Mysticial's Avatar
 
Sep 2016

23×43 Posts
Default

Quote:
Originally Posted by ixfd64 View Post
Quick question for the hardware experts. Intel recently released five new CPUs in the Xeon W-3000 series: https://anandtech.com/show/16822/int...ations-38-core

However, one thing I noticed is that these new chips have a very high TDP compared to those with similar specs. For instance, the 12-core Xeon W-3323 uses up to 220 watts: https://ark.intel.com/content/www/us...-3-90-ghz.html

In comparison, the TDP of the i9-10920X (also 12 cores) is "only" 165 watts even though it has a higher maximum turbo frequency and is built on the more power-consuming 14 nm process: https://ark.intel.com/content/www/us...-3-50-ghz.html

Is there any special reason for this? Would a Xeon W-3323 (as an example) actually consume up to the full 220 watts?
My guess is that since these are turbo speeds, they will throttle down when actually pushed against the TDP limit.

IOW, even though the i9-10920X has higher turbo speeds, it would probably run well below max turbo if TDP would go past 165W.
Mysticial is offline   Reply With Quote
Old 2021-08-06, 17:19   #3
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

2·17·71 Posts
Default

I forgot to mention that both CPUs have the same 3.5 GHz base clock speed. However, the new series does have eight (up from six) memory channels. Would this have a significant effect on power usage?
ixfd64 is offline   Reply With Quote
Old 2021-08-06, 17:30   #4
Mysticial
 
Mysticial's Avatar
 
Sep 2016

23·43 Posts
Default

Quote:
Originally Posted by ixfd64 View Post
I forgot to mention that both CPUs have the same 3.5 GHz base clock speed. However, the new series does have eight (up from six) memory channels. Would this have a significant effect on power usage?
My guess is yes. But whether it makes that much of a difference is uh... above my pay grade.

A few years back I was in a company meeting with Intel where they mentioned caches and memory subsystem being a "significant part" of power consumption. But they didn't mention what the scope of it is (whether it was significant on-chip, or entire system).

We asked them if cache/uncore speeds would improve going from Skylake -> Cascade Lake. Not only did they say no it won't increase, they told us that cache/uncore would get slower in the future in order to reduce power consumption.

Needless to say, we weren't amused since cache/uncore was our bottleneck. But this was before AMD started kicking their ass so it's very possible things may have changed now that AMD is putting competitive pressure on them.
Mysticial is offline   Reply With Quote
Old 2021-08-06, 21:08   #5
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

33×233 Posts
Default

Package size and design have an effect on how much power you can push through a CPU.

A physically larger package is easier/faster to cool, so that allows for more power dissipation.

A different (more expensive?) internal packaging design can allow for lower thermal resistance to the case.

More pin connections to the motherboard can allow for more/better heat to flow via that path.
retina is online now   Reply With Quote
Old 2021-08-07, 08:48   #6
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

23×5×11 Posts
Default

I don't have a complete picture here, but based on what we know about Ice Lake mobile and Ice Lake Xeons, my money would be on the process technology as the primary differentiator.

Cascade Lake is built on the well matured 14nm process. Intel know how to control and balance power and clock of it well. Ice Lake including the Xeons are on 10nm. The older 10nm. It is known to have problems scaling clock beyond 4 GHz or so. When you approach the clock wall, power consumption goes up a lot. Ice Lake Xeons are very late and it seems from an outsider perspective Intel struggled to overcome the process limitations. Given it's lateness I had kinda hoped they used 10SF (SuperFin) like that of Tiger Lake, but it was not to be. That significantly improves performance efficiency, and once again unlocks potential to nudge close to 5 GHz boost speeds at more reasonable power efficiency.

Ice Lake at least has an architectural update to its advantage, giving ball park average 20% uplift per core, per clock over previous, but it isn't enough to offset the much lower efficiency at base.

Because Ice Lake was so delayed, it could be a shorter than normal cycle to its successor Sapphire Rapids which is rumoured to be released next year. It will be based on the process newer than Intel's best in use today and expected to give another improvement in efficiency over 10SF without impacting clocks. It will also be at least an architecture generation ahead of Ice Lake, depending if you consider Tiger Lake to be an architecture update over Ice Lake or not. They're essentially the same with the latter having much bigger caches.

Process - Mobile CPU - Server CPU
10nm - Ice Lake - Ice Lake
10SF - Tiger Lake - None
Intel 7 - Alder Lake - Sapphire Rapids

Alder Lake is expected to hit mainstream end of this year and it will give an indication of how well Intel is doing in its attempts to catch up to AMD again. Note they don't expect to retake process leadership vs TSMC until 2025, but AMD are not using TSMC leading edge nodes so Intel don't have that to fight also.
mackerel is offline   Reply With Quote
Old 2021-08-29, 11:59   #7
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

52×31 Posts
Default

The short answer why it eats so much power is very simple.

To represent it simplistic:

power usage = clock * clock * clock

Or in words: power is a function of clockspeed of a cpu to the power 3.

So the cpu's i got here is 145 watt tdp and is ES versions of the e5-2699v4 with 22 cores.

Now the cpu you refer to W-3300 has a whopping 38 cores.

The next problem you face with that many cores is the internal crossbar within a CPU.

So for example the threadripper 64 core cpu's are in fact 8 processor cpu's called CCD's with in the middle a crossbar that connects them. So in reality it is a 8 socket system.

Then there is the ECC issue.

The normal threadripper 64 cores without ECC is clocked 3.0Ghz whereas there is a bunch of epyc versions with ECC that are clocked at 2Ghz and highest one at 2.6Ghz

Now this intel chip doesn't exist out of 8 cpu CCD's as far as i know - but might be intel has a different course there nowadays, but it's much higher clocked AND has more cores.

To produce the hardware on ASML machines, made in The Netherlands, without ASML forget much of the progress from past 50 years everywhere, as they have like a 90% market share and entire TSMC and Intel is ASML machines that produce cpu's.

Now from what i have been told when i visited there is that it is all about the yields. If you produce a wafer which is round of say 300mm diameter (the largest machines ASML sells currently though there is an experiment with 450mm wafers as well). Producing some hundreds of cpu's at the same time within a single wafer is very expensive.

So you want nearly all of them to work properly. The ones not working correctly you can throw away.
Typically you want 90% yields.

Obviously a 8 core cpu CCD from AMD is much easier to produce than a 38 core cpu from intel. Just look at the total size of it.

Now obviously they will have tricks but for cpu's those tricks are pretty limited. With manycore processors as GPU's are there is more tricks possible like disabling some cores that do not work ok - because there is not much of a coherency between the cores other than to the L2 cache that is doing interaction with the gpu ram.

Standards past so many years were that producing a cpu of 300mm^2 is pretty well possible at good yields.

So you can take a look at this chip and its size and you know quite a lot more about it.

Now i'm no electronics expert so i might say it wrong here: to clock something higher and still get good yields is of course use thicker coppertracks - still might give you the same yields - yet at a much higher power consumption.

It is all about the yields. Something with much worse yields will simply not even be considered producing.

All of the above is a simple representation and we are ignoring the hard work it is to design a cpu and the hundreds of engineers that then optimize such design - reality is far more complex of course with tricks on the left and right - but this is sketching the big picture.
diep is offline   Reply With Quote
Old 2021-08-29, 12:56   #8
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

15408 Posts
Default

TDP/cores/cache/memory-speeds etc are all trending up every generation as things get more dense (soon server liquid cooling will become more mainstream to allow TDP to get much higher). A chips paper specs don't mean much when it'll boost until one of its limiting metrics (including some form of TDP) is saturated regardless, which is workload dependent. TDP's in a series tend to be similar so that a single chassis design (which is designed with an upper TDP in mind) can cater to all of them and have the paper specs be at least somewhat meaningful when comparing within the series. Comparing to different generations basically has to be done through benchmarks.
M344587487 is online now   Reply With Quote
Old 2021-09-05, 08:09   #9
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

23×5×11 Posts
Default

Quote:
Originally Posted by diep View Post
To represent it simplistic:

power usage = clock * clock * clock

Or in words: power is a function of clockspeed of a cpu to the power 3.
Do you have a reference to this which goes into more depth?

I recognise you say it is simplistic, but as an illustration, using that "as is" then a 5.0 GHz CPU would be 95% more power usage compared to 4.0 GHz. For a more mild example of 4.5 GHz compared to 4.0 GHz, that would be 42%. I assume this would only apply to running the same CPU at different clocks. You can't use this to compare different CPUs at the same clock for example.

I further assume from the number of terms this takes into consideration two effects. One is that if all else is fixed (especially voltage), and ignoring static losses, then power consumption is proportional to clock. You do work on each clock cycle, which takes the power.

The 2nd effect is something I've long wondered about, which is well known especially for overclockers. But it is not so well described: the voltage-clock curve for stable operation. Generally speaking, as clocks go up, the voltage required to be stable would also go up in a non-linear manner. At this point I wish I paid more attention at university, in that I don't know how switched semiconductor power usage scales with voltage. My mind vaguely wanders back to the likes of the ideal diode equation, but at the end of the day I don't know. Maybe it is totally wrong, but the other thing I think of is for a resistive load, it is proportional to the voltage squared, so we have another non-linear term there.

I'm trying to find old power test data now to see if and how this fits. I think I tested Coffee Lake vs Zen 2 (vs Skylake-X?) as I had 6 core versions of each then.
mackerel is offline   Reply With Quote
Old 2021-09-05, 11:20   #10
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

77510 Posts
Default

It's clock = big O from Order ( n * n * n )

Now i'm not the correct person to answer your question here as i'm not knowing much about all this subject.
Yet so to clock higher there is all sorts of variables in the equation. Basically optimizing a cpu in order to clock it higher starts with the design of course and pipelining.

So the small design team of a new cpu, typically 3-5 guys which all make a couple of millions or more when taping out a new cpu, basically take this pipelining and clock it's supposed to run into account.

After some years they tape out a new design - then hand it over to a team of some hundreds of engineers that go optimize the design trying to optimize the design.

Basically the struggle starts at 300Mhz, that's where automatic producing the design with software gets you easily, i was told by a guy who designed an important CPU i'm sure most of us have been using 1 or another version from. From there the huge team tries to clock it higher and that takes years.

Clocking most cpu's quite a tad higher with a lot of cooling usually is not a great plan. You simply run into simple problems like timing issues of the level caches. The first generations i7 was troubles clocking higher than 4.4Ghz as around 4.5Ghz the level caches couldn't keep up with the timing. Especially the L1 instruction cache - of course the instruction stream being one of the major problems in modern cpu's.

So from a pure theoretical viewpoint seen. It's easier to produce a CCD of 32 cores at 1 Ghz than 1 CCD clocked at 2.6Ghz with 8 cores. And as you know AMD's 64 core threadripper is internally a 8 socket cpu with 8 ccd's and 1 crossbar...

The problem is that not many will buy a 1Ghz clocked cpu. Most software benefits really a lot from higher clocked cpu and all OSes, especially linux, is handling so much sequential.

For example incoming data over internet - every packet you receive gets a hard lock - whether that's tcp/ip or raw data - everything gets global locked.

So a 4Ghz cpu will simply be way faster for internet because of how the OSes work.

They are pretty primitive there.

Think of this: how many cpu's of 4Ghz do you know that eat less than 100 watts?
Whereas those ARMS quadcore 1.4Ghz they eat 2.2 watt.

the formula goes way worse than you guess. Why waste 100 watt on a cpu?
For 128 cores you could use 100 watt easily - if the cores were a bit lower clocked...

Yet no one buys those cpu's.

What you and i would want is quadcore ARM cpu but then 1024 of them on a single chip and each quadcore cpu not connected via any sort of cache to any other cpu. And each one having their own spot in the RAM. So completely embarrassingly parallel.

Then we'll figure out some mechanism later on in hardware - some sort of lightweight thing to produce on a chip - to somehow have the cpu's communicate with each other...

If they can sell 100 million of such ARM cpu's clocked at say 1Ghz and 1024 cores on a single chip and the garantuee it can be sold for 1000 dollar - they would produce it.

Last fiddled with by diep on 2021-09-05 at 11:42
diep is offline   Reply With Quote
Old 2021-09-05, 13:22   #11
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

23·5·11 Posts
Default

Quote:
Originally Posted by diep View Post
It's clock = big O from Order ( n * n * n )
Thanks for the reply, but what I was looking for are the physical reasons why that is the case. Design history and the core tradeoffs don't address that.

To answer your question on 4 GHz CPUs that use less than 100W, you didn't specify constraints so the answer could be complicated. Some CPUs spring to mind including the 6700k, 7350k, 7700k, 8350k, 8086k. All of these have a base clock at or above 4 GHz and a TDP below 100W. I own 4 of those 5, and they all run Prime95-like workloads at base clock around or below TDP. They are pretty much the same microarchitecture but differ in process, which can be seen to improve their efficiency slightly each time. If you expand the question to include CPUs with all core turbo above 4.0 GHz, the list would be a lot longer but not necessarily all of those

The comparison with Arm is not like for like. They are optimised towards their use case, which does not entirely replace x86 usage.

On many slower cores vs fewer fast ones, we're going to see x86's first explore of that area with Alder Lake, but that's another thread. There's one other big disadvantage to many slower core method: cost. If they are of same complexity, you're going to be paying a lot more for more silicon, and processing that stuff isn't exactly in capacity surplus at the moment. If you simplify cores to make them smaller, you're basically moving towards a GPU like model.
mackerel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Does there exist another power of 2, except 65536, that doesn't contain a power of two digit? Viliam Furik Puzzles 20 2020-09-01 14:28
Mixing Xeon E5-26xx processors on Dual CPU motherboard VictordeHolland Hardware 13 2019-03-05 06:09
Which versions for multiple processors, Xeon? kdq Information & Answers 5 2008-09-28 06:18
HT processors paulunderwood 3*2^n-1 Search 7 2007-02-15 15:47
Advice wanted - HP workstation with XEON processors AntonVrba Hardware 4 2006-06-04 11:01

All times are UTC. The time now is 07:46.


Thu Oct 28 07:46:11 UTC 2021 up 97 days, 2:15, 0 users, load averages: 2.09, 1.97, 1.93

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.