mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2023-01-01, 14:39   #1
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

3·269 Posts
Default High energy price vs cheapest cruncher

Now i have built a CAD machine here some time ago (44 cores in total) the box eats like 400+ watt if i add gpu and disk arrays and the watercooling and buch of fans.

Let me quote you the current price of energy from Eneco for variable contract in The Netherlands:

https://www.eneco.nl/duurzame-energie/modelcontract/

<pre>
Power a kWh day € 0,91886
Power a kWh night € 0,73654
Power a kWh single € 0,82899
Gas each m^3 € 3,06421
</pre>

In short 0.83 euro a kWh and this is including most taxes.
400 watt usage (i believe it's more than this on average) sets you down a sloppy 2908 euro a year then.
Most of winter this also heats office just enough. At gas 3 euro a m^3 of course central heating turned off in office here.
When it's freezing outside all this isn't enough to heat office when i'm there, but is not the relevant discussion.

Now we all realize that not having a machine turned on might seem cheapest to most.
Yet that's an illusion - something will need to run here of course. I do the CAD work on that box as well to design hardware components.

What cpu at the moment the most efficient performer a watt for George Woltman's excellent DWT implementation?
So not GPU yet cpu.

Then we can take it from there :)
diep is offline   Reply With Quote
Old 2023-01-01, 19:14   #2
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013
https://pedan.tech/

24×199 Posts
Default

My guess would be a 32 core 9004 series Epyc, such as the 9354P, with all 8 memory channels populated.
Mark Rose is offline   Reply With Quote
Old 2023-01-01, 19:51   #3
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

3×269 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
My guess would be a 32 core 9004 series Epyc, such as the 9354P, with all 8 memory channels populated.
That's what i feared already yes. That chip in existance at all? Seemingly no one offers it for sale. It's a benchmark chip?

3.25Ghz @ 64 cores at just 280 watt TDP seems very very very high clocked for that tdp...

When chips get produced the middle of the wafers has sometimes cpu's that can clock a lot higher. If they selected 8 out of it that can they home.
diep is offline   Reply With Quote
Old 2023-01-01, 20:56   #4
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013
https://pedan.tech/

24·199 Posts
Default

I think the 64 core would be memory bandwidth starved. The supported DDR5 speeds are 4800 I believe. That may be enough for the 48 core, but the AVX512 support makes the 32 core likely a better fit.
Mark Rose is offline   Reply With Quote
Old 2023-01-02, 00:11   #5
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

3×269 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
I think the 64 core would be memory bandwidth starved. The supported DDR5 speeds are 4800 I believe. That may be enough for the 48 core, but the AVX512 support makes the 32 core likely a better fit.
Doesn't look bad at first sight:

460 GB/s bandwidth is claim from AMD.

Which is interesting claim as i'd naively guess 32 GB/s x 12 channels = 384GB/s (that's raw bandwidth - user data bandwidth is usually about 80% of that as rule of thumb).

Yet a whopping 256MB L3 cache gives it good odds provided your working set size fits in there kind of, or when you execute quite some instructions for every L3 cache miss.
diep is offline   Reply With Quote
Old 2023-01-02, 00:28   #6
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

14478 Posts
Default

I notice here: the 9354p has 32 cores.

https://www.amd.com/en/processors/epyc-9004-series

Whereas the 9534p has 64 cores at 2.45Ghz @ 280 watt.

the 9634 has 86 cores at 2.25Ghz @ 290 watt - single socket i assume as has no P behind it.

At first sight it's the one giving most Ghz per watt TDP. Yet the 96 cores are close.

What i find weird is the L3 caches. That 4 MB a core was wrong. The 9354 is 8MB a core and the 96 core version is 4MB a core L3 cache.

Yet at 84 and 96 cores the size of the L3 cache makes little sense to me except when they manage to 'turn off cores' of the chip meanwhile keeping the L3 cache alive.

Ok price makes sense now. 10k dollar for the 96 core ones - that will be out of my budget for a while i'm afraid (except when i sell lots of 3d printers any time soon which is months away).

Question there is more: are you willing to pay that high of a price - rather than whether you'd buy one if you had the cash.

In any case the 32 core versions have a LOT more bandwidth to the RAM and a much larger L3 cache for each core. That is pretty interesting observation indeed!

edit: Seems more like if they use chiplets of 12 cores (just guessing) that 84 cores is 7 chiplets and 96 cores is 8 chiplets - rather than that any core has been 'turned off'.
Simply a chiplet less in the package (just a guess). It's a different crossbar in such case.

Last fiddled with by diep on 2023-01-02 at 00:31
diep is offline   Reply With Quote
Old 2023-01-02, 00:48   #7
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

3·269 Posts
Default

So interesting if we compare with nearly 4 years ago

https://www.amd.com/en/products/cpu/amd-epyc-7702

Then constant = Ghz * cores / TDP

We get to the same value like the 96 core versions recently launched - namely 0.64 Ghz each watt.
Now of course the 9004 series have 50% more memory channels which is very nice.

Yet huge price difference obviously. The 7702 is in huge quantities on aliexpress. (64 cores @ 2.0Ghz @ 200 watt TDP

Very interesting to build 2nd hand now for HPC is of course the 7H12 that is in huge quantities available as well on aliexpress.
Even though less efficient, it's higher clocked at 2.6Ghz (64 cores @ 280 watt).

7742 is also efficient at 0.64 yet very expensive still on aliexpress - far over 3000 euro. That's for those who otherwise would invest in crypto's. Better buy a good chip then instead of wash it through the toilet.

Lots of different chips getting sold by AMD in short.

Where is intel?
diep is offline   Reply With Quote
Old 2023-01-02, 05:42   #8
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013
https://pedan.tech/

24·199 Posts
Default

Quote:
Originally Posted by diep View Post
Doesn't look bad at first sight:

460 GB/s bandwidth is claim from AMD.

Which is interesting claim as i'd naively guess 32 GB/s x 12 channels = 384GB/s (that's raw bandwidth - user data bandwidth is usually about 80% of that as rule of thumb).

Yet a whopping 256MB L3 cache gives it good odds provided your working set size fits in there kind of, or when you execute quite some instructions for every L3 cache miss.
Don't forget that 256 MB of L3 is segmented: each chiplet has 32 MB. It's not a unified L3 cache across all chiplets.

The Epyc Milan-X series have 96 MB of L3 per chiplet. That fits a lot of PRP work, but newer Genoa is more power efficient.

Genoa-X with up to 1152 MB of L3 is coming soon (96 MB per chiplet). It may be worth waiting for that.

Quote:
Originally Posted by diep View Post
I notice here: the 9354p has 32 cores.

https://www.amd.com/en/processors/epyc-9004-series

Whereas the 9534p has 64 cores at 2.45Ghz @ 280 watt.

the 9634 has 86 cores at 2.25Ghz @ 290 watt - single socket i assume as has no P behind it.

At first sight it's the one giving most Ghz per watt TDP. Yet the 96 cores are close.

What i find weird is the L3 caches. That 4 MB a core was wrong. The 9354 is 8MB a core and the 96 core version is 4MB a core L3 cache.

Yet at 84 and 96 cores the size of the L3 cache makes little sense to me except when they manage to 'turn off cores' of the chip meanwhile keeping the L3 cache alive.
In my limited experience, my rule of thumb has been 1 channel of memory is good for 2 cores before memory bandwidth saturation begins. Additional cores may be able to squeeze more out but often they'll be spinning waiting for data: making a lot more heat for not much more throughput. With 12 channels, that would indicate a 24 core chip like the 9224 or 9254, but those have only 64 and 128 MB of L3. The high watt 9274F gets 256 MB — 3 cores and 32 MB L3 per chiplet — but it's not throughput/watt efficient. That's why I think the 9354P or 9354 is the way to go: 4 cores and 32 MB L3 per chiplet, and run 8 workers, one per chiplet.

I really do think the 48 and higher core chips will be memory bandwidth starved at wavefront PRP by George's amazingly efficient code.

Genoa-X with 96 MB of L3 per chiplet would probably be fine with the higher core count parts.

Quote:
Ok price makes sense now. 10k dollar for the 96 core ones - that will be out of my budget for a while i'm afraid (except when i sell lots of 3d printers any time soon which is months away).

Question there is more: are you willing to pay that high of a price - rather than whether you'd buy one if you had the cash.

In any case the 32 core versions have a LOT more bandwidth to the RAM and a much larger L3 cache for each core. That is pretty interesting observation indeed!

edit: Seems more like if they use chiplets of 12 cores (just guessing) that 84 cores is 7 chiplets and 96 cores is 8 chiplets - rather than that any core has been 'turned off'.
Simply a chiplet less in the package (just a guess). It's a different crossbar in such case.
The 84 and 96 cores have 12 chiplets, with 7 or 8 active cores, and 32 MB of L3. Most of the 32-64 core parts have 8 chiplets, except the 9334 which has only 4 chiplets. The 9274F and 9174F have 8 chiplets, Chiplets with bad L3 cache get used for the 9224 where there are 4 chiplets but only half the working L3 cache.

Quote:
Originally Posted by diep View Post
So interesting if we compare with nearly 4 years ago

https://www.amd.com/en/products/cpu/amd-epyc-7702

Then constant = Ghz * cores / TDP

We get to the same value like the 96 core versions recently launched - namely 0.64 Ghz each watt.
Now of course the 9004 series have 50% more memory channels which is very nice.

Yet huge price difference obviously. The 7702 is in huge quantities on aliexpress. (64 cores @ 2.0Ghz @ 200 watt TDP

Very interesting to build 2nd hand now for HPC is of course the 7H12 that is in huge quantities available as well on aliexpress.
Even though less efficient, it's higher clocked at 2.6Ghz (64 cores @ 280 watt).

7742 is also efficient at 0.64 yet very expensive still on aliexpress - far over 3000 euro. That's for those who otherwise would invest in crypto's. Better buy a good chip then instead of wash it through the toilet.

Lots of different chips getting sold by AMD in short.

Where is intel?
Intel is five years behind.

AMD recently slowed down orders from TSMC, so they shouldn't be out of stock for long.

I wouldn't get 7002 series as that's Zen 2. Zen 3 is far more efficient. Zen 4 adds AVX512, which is even more power efficient.
Mark Rose is offline   Reply With Quote
Old 2023-01-02, 10:30   #9
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

3×269 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
Don't forget that 256 MB of L3 is segmented: each chiplet has 32 MB. It's not a unified L3 cache across all chiplets.
Interesting. Is it a form of a NUMA-L3 type cache (or SRAM)?
So Non Uniform Memory Access type L3 cache where they needed the 8th chiplet just to provide the L3 cache access?

As otherwise you won't get to that size L3, or the paperclaim is incorrect.
diep is offline   Reply With Quote
Old 2023-01-02, 10:35   #10
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

3·269 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
Don't forget that 256 MB of L3 is segmented: each chiplet has 32 MB. It's not a unified L3 cache across all chiplets.

Intel is five years behind.

AMD recently slowed down orders from TSMC, so they shouldn't be out of stock for long.

I wouldn't get 7002 series as that's Zen 2. Zen 3 is far more efficient. Zen 4 adds AVX512, which is even more power efficient.
5 years is a lot. Must be patents from AMD somewhere that prevents intel from using the chiplet idea. I thought that idea wasn't new if we look back to Q6600 introduction some years ago. Yet of course that didn't have a special chip that forms a central bridge between the chiplets. Some patent must stop intel, other explanation then i'd fire the entire staff for incompetence if i was a shareholder of intel.

p.s. if it has a process explanation that intel doesn't have 7nm machines from ASML - then it's only the White House and/or US congress that might've stopped intel from buying latest ASML machine technology - to avoid Israel to build such plant. I remember how here in Netherlands basically the White House nearly wanted to declare some sort of financial war to netherlands when China some years ago wanted to have those machines to build a plant. Only the independant Taiwan with TSMC seemingly allowed to have it by USA. But i didn't follow the latest there, i notice on intel website they quote 10nm process technology for 3d generation scalable Xeon processors. Latest release seemingly in 2021.

That's older than 7nm - though i read article explaining that diff might be not so enormeous between the latest technologies.

A small 8 or 12 core chiplet is of course much cheaper to produce than a 96 core chip. Like factor 100 easier - it's about the yields of course :)

Last fiddled with by diep on 2023-01-02 at 10:43
diep is offline   Reply With Quote
Old 2023-01-02, 13:34   #11
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

3B616 Posts
Default

intel can get round any patent issues easily if they exist, they're just going a much more complex direction trying to interpose dozens of chips together (EMIB?). They've had issues with their 10nm onwards since forever and are on something like their dozenth+ stepping of sapphire rapids, things haven't been going to plan for a long time which wasn't a problem until Zen2 onwards turned up. They hung on for a few years in desktop adding cores and pumping power, then P+E cores is a way for them to compete in desktop longer term. Laptops they're probably fine as they can use their clout to maintain orders and it doesn't seem to be AMD's focus anyway, but in servers AMD is eating their lunch. intel are focusing heavily on accelerators for common tasks as the next step, which is fine if you are one of those customers and is something in intels wheelhouse as they are good as developing accompanying software, that is a key point of putting so many resources into oneapi IMO. Less common accelerators they're gating behind paywalls, the hardware may have certain features built in but just like tesla's heated seats you may have to pay a fee or subscription to active them.
M344587487 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Current cheapest cost per first time test? M344587487 Cloud Computing 12 2019-03-30 15:23
What's cheapest way to get a Spotify device? jasong jasong 8 2014-10-16 05:14
Cheapest CPU cycles? siegert81 Hardware 17 2010-12-27 18:00
New Cruncher....Mabey moo Hardware 13 2005-05-20 02:57
Unusual Cruncher Deaths... E_tron Lounge 6 2003-11-18 23:05

All times are UTC. The time now is 16:24.


Fri Jul 7 16:24:26 UTC 2023 up 323 days, 13:53, 0 users, load averages: 2.56, 2.12, 1.68

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔