mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2020-01-30, 00:32   #111
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

22·41·47 Posts
Default

Xyzzy is offline   Reply With Quote
Old 2020-02-01, 21:55   #112
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

244616 Posts
Default

Quote:
Originally Posted by chalsall View Post
I'm going to run an experiment, and use most of it for a swap partition. I sometimes want to do Blender rendering jobs which won't fit in my main workstation's RAM, so I have to spin up a "cloud" instance.
So, my world finally stabilized a little bit, and I was just now able to install the 256 GB SSD. 100 GB for a Fedora 31 root partition, and 120 GB of swap. Left 36 GB un-partitioned for additional fail-over capacity.

I haven't tried running a big Blender job which causes swapping yet, but that will be very interesting. I have to say I've never worked on an SSD based machine before. Wow! Snappy! The latency in the polished rust is nice to get away from.
chalsall is offline   Reply With Quote
Old 2020-02-01, 23:09   #113
Runtime Error
 
Sep 2017
USA

3·5·11 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
[...] when the FFT data fits into the CPU cache. That's why on some machines (and for small-enough FFT sizes), one can get nice timings without crushing memory bandwidth. This applies generally either to tests with FFT sizes far below the Prime95 interest level (say, exponents below 10M on other projects), or on Xeons with large L3 caches.
Thanks for the great explanation!

My next naive question is of course, "Why don't CPUs come with larger caches?" Eons ago, I bought a wicked awesome custom computer from a local company and remember upgrading the cache. I'm not aware that is a build option anymore. Do motherboards still support cache expansion? Is there a non-monetary benefit for having smaller caches? I imagine that if you had a (say) whole gigabyte L4 cache it wouldn't be very efficient quickly access the bits you need immediately next, but wouldn't it be faster than going to RAM?

P.S. Sorry for taking this thread away from "big memory", and thanks for the ELI5 explanations, perhaps I should send in tuition payments. You folks rock.
Runtime Error is offline   Reply With Quote
Old 2020-02-01, 23:38   #114
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2×331 Posts
Default

Quote:
Originally Posted by Runtime Error View Post
Thanks for the great explanation!

My next naive question is of course, "Why don't CPUs come with larger caches?" Eons ago, I bought a wicked awesome custom computer from a local company and remember upgrading the cache. I'm not aware that is a build option anymore. Do motherboards still support cache expansion? Is there a non-monetary benefit for having smaller caches? I imagine that if you had a (say) whole gigabyte L4 cache it wouldn't be very efficient quickly access the bits you need immediately next, but wouldn't it be faster than going to RAM?

P.S. Sorry for taking this thread away from "big memory", and thanks for the ELI5 explanations, perhaps I should send in tuition payments. You folks rock.
Motherboards do not support cache expansion. For L3 cache (and probably the rest) there is a latency penalty associated with larger caches but the benefits generally outweigh the negatives.
M344587487 is offline   Reply With Quote
Old 2020-02-01, 23:41   #115
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

2·3·5·13 Posts
Default

Not an expert in the area, but understand that bigger cache = slower access. That's why there are usually 3 levels of cache on x86 CPUs. Small but ultra-fast next to the cores, medium size medium speed in 2nd tier, and relatively big slower speed as 3rd tier. Then you reach ram. There is some overhead to keeping track of what data is in the cache. I think L2 is generally tied to the core, but L3 can be shared between cores to some extent.

Broadwell consumer desktop CPUs were an odd ball, with 128MB of L4 cache. For its time, that was great as it was practically ram bandwidth unlimited for prime number finding. I didn't lose performance even running a single stick of slow ram since it worked out of the cache. However its cache speed isn't amazing today, so if it were to be revisited, it would have to be much faster.

Removable cache isn't really a thing any more, unless you count Intel Optane but that acts more like an extra tier between ram and bulk storage so not of direct help here.
mackerel is offline   Reply With Quote
Old 2020-02-02, 01:27   #116
xx005fs
 
"Eric"
Jan 2018
USA

24×13 Posts
Default

Quote:
Originally Posted by Runtime Error View Post
Thanks for the great explanation!

My next naive question is of course, "Why don't CPUs come with larger caches?" Eons ago, I bought a wicked awesome custom computer from a local company and remember upgrading the cache. I'm not aware that is a build option anymore. Do motherboards still support cache expansion? Is there a non-monetary benefit for having smaller caches? I imagine that if you had a (say) whole gigabyte L4 cache it wouldn't be very efficient quickly access the bits you need immediately next, but wouldn't it be faster than going to RAM?

P.S. Sorry for taking this thread away from "big memory", and thanks for the ELI5 explanations, perhaps I should send in tuition payments. You folks rock.
If you have ever seen a die shot of Ryzen 3000 series, you realize that the cache blocks takes a significant amount of die space for the tiny amount of space provided. In the future as cores gets more complex and process technology improves, more cache can definitely be crammed on the die, or even an L4 cache structure like Broadwell that's on package (hell HBM memory on video cards have similar ideas to Broadwell's L4 cache). As the cache gets bigger, bandwidth increases, but on the other hand Latency also increases, but not significantly so.

The whole point of having a cache integrated within a CPU is that its latency is significantly lower than memory latency, and the closer it is to the CPU, the lower the latency, meaning that motherboard expansions will just defeat the whole purpose of having CPU cache and it will be similar to RAM (for example, L3 cache latency can be in the single digit nanoseconds, while memory latency are generally above 60ns).
xx005fs is online now   Reply With Quote
Old 2020-02-02, 05:44   #117
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

19×467 Posts
Default

Paraphrasing what ante-posters said, CPUs do come with larger cache, but larger cache = slower access, and larger cache + faster access = buckets of money, because each row of cell blocks you add may double or triple the silicon, and decrease the fabrication yield.

Re upgrading the cache, you make confusion between different layers of cache, the fastest one was always inside of the CPU (except for the very early models of CPUs which had no cache). The slowest one, less expensive, may be outside of the CPU and still could be upgradable for some mobos, but current RAM memories are fast enough and provide a wide-enough bus to make the external cache obsolete. Many systems could have multiple layers of cache, with pipelines for both instructions and data, from which the fastest one (in small amount) is always internal (in the CPU) and the slowest one (in larger amount) is inside of your RAM stick.

What makes the RAM "slow" is the multiplexing of the pins, and the refresh cycle (dynamic RAM). Memories can be static or dynamic. In their quest to make larger memories with smaller dimensions and cheaper price, manufacturers decreased the RAM cells to such small, micronic dimensions that the cell is not able anymore to hold the information for long time, so the memory cell needs a periodic "refresh". This means "somebody" must read the content of the memory every few milliseconds, and write it back. If you found a zero, write a zero. If you found a one, write a one. If you forget to do that, after more milliseconds the content of the cell is lost (the static charge stored there will discharge through the parasitic circuits of the cell) and the memory will contain unreliable, random data. That is called "dynamic" memory, as opposite to "static", where the cells are big enough to retain the electric charge as long as the power is applied to them, and no refresh cycle is needed.

Also, to reduce the cost, manufacturers will multiplex the access lines for data and address. Imagine the memory like a neighborhood map, with horizontal and vertical streets, and houses at the corners, arranged in square fashion. You can tell the post man "give this letter to the house at the intersection of the horizontal 17 street with the vertical 23 street". He will know exactly where to go. Alternatively, imagine the postman is a bit stupid and he can't remember two numbers, but only one. You will have to tell him "go to horizontal 17 street and when you are there, call me back". Then, when he is there, he'll call you and you can tell him "go on that street till you intersect vertical 23 street and let the letter there". This is called "multiplexed address". It takes more time to be reached, and you need more communication to get there. In silicon terms, what makes an integrated circuit expensive nowadays, is not the amount of memory, but the package, and the amount of pins. For example, one Cortex ARM microcontroller (in short, MCU) with 64 pins and 128K memory costs $2, if you need to double the memory to 256k, the new MCU will cost $2.20, or if you need only half of the memory, 64k, you could pay $1.80. That is because memory is just one grain of sand/silicone more or less, inside the package. But, for example, if you only need few input/outputs, and move to a package with 48 pins, then the price is $0.8, and if you need more I/Os like you have to command a lot of stuff, or LCDs, then you will have to pay $4 or $5 for the 80-pin or 100-pin packages, which are much larger, and have a lot more metal (the pins) and wires bonded inside (wires made of gold, or gold bumps for non-bond dies), in spite of the fact that exactly the same chip/die is used in all 4 packages (in the packages with low pin count, some MCU pins are not used, and not bonded internally). Back to memories, the manufacturers reduce the costs by making integrated circuits (ICs) with less pins, smaller packages, but the drawback is that you have less bandwidth to communicate with them, your postman can only remember one number. In memory terms, you need to send the address of the cell you want to access in two (or more) chunks, send the first half of it now, and the second half a bit later. Because you do not have enough communication lines (channels) to transmit all at once. This also goes when you read the data back.

So, static memories are much faster (no refresh needed, most of them no need any multiplex for addresses or data, but there are static RAMs which are multiplexed too). But they are bulky and expensive. Cache memories are something derived from the static memories, interposed between your dynamic RAM and your CPU. Every time you read something from the dynamic RAM, the information is stored in the cache too. Next time when you need the same data, this is already in the cache and it is read from there, without accessing the (slow) dynamic RAM. That is all the trick. Well, mainly.

You can have one or more layers of cache, the fastest, most expensive, and smallest in size toward the CPU, the slower, cheaper, coming in larger amount, toward the RAM. There are also systems with complete static RAM and wide enough bus to make the multiplex futile. These systems, you can consider that they have "only cache memory", as they have zero-latency to read-write the static cells. But they can get bloody expensive. Imagine that is is not only the 512 pins to read the data non-muxed, but you also need 512 pins on the CPU/GPU side, and 512 copper tracks on the PCB/mobo/chipset, whatever... A lot of metal, a lot of money, not talking about how much EMI (electromagnetic interference) all these parallel lines generate, and the investment you need to make to protect and shield against such things... That is why new/fast GPUs are so bloody expensive...

Last fiddled with by LaurV on 2020-02-02 at 07:11
LaurV is offline   Reply With Quote
Old 2020-02-02, 07:03   #118
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

33×163 Posts
Default

Quote:
Originally Posted by Runtime Error View Post
Thanks for the great explanation!
....
P.S. Sorry for taking this thread away from "big memory", and thanks for the ELI5 explanations, perhaps I should send in tuition payments. You folks rock.
You're quite welcome, and thank you for the kind words. This place is full of inquisitive idiots, some of which enjoy sharing our findings with the freshly arrived ones. Welcome!
VBCurtis is online now   Reply With Quote
Old 2020-02-02, 09:37   #119
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2×331 Posts
Default

Quote:
Originally Posted by xx005fs View Post
...
In the future as cores gets more complex and process technology improves, more cache can definitely be crammed on the die, or even an L4 cache structure like Broadwell that's on package (hell HBM memory on video cards have similar ideas to Broadwell's L4 cache).
....
I can see HBM potentially becoming an L4 of sorts particularly as we try to break the bandwidth limit for iGPU's. Currently an iGPU shares DDR4 memory bandwidth with the cores which heavily caps how performant iGPU's can be. A stack of HBM would solve that problem and has the potential to be used as a victim cache (or something) by the CPU cores.



I'm probably giving intel/AMD too much credit, such a thing might exist one day in a mobile form factor that does away with DDR and a discrete card altogether but it's unlikely to exist in a desktop form factor.
M344587487 is offline   Reply With Quote
Old 2020-02-02, 11:35   #120
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

770810 Posts
Default

Quote:
Originally Posted by LaurV View Post
In their quest to make larger memories with smaller dimensions and cheaper price, manufacturers decreased the RAM cells to such small, micronic dimensions that the cell is not able anymore to hold the information for long time, so the memory cell needs a periodic "refresh". This means "somebody" must read the content of the memory every few milliseconds, and write it back. If you found a zero, write a zero. If you found a one, write a one. If you forget to do that, after more milliseconds the content of the cell is lost (the static charge stored there will discharge through the parasitic circuits of the cell) and the memory will contain unreliable, random data. That is called "dynamic" memory, as opposite to "static", where the cells are big enough to retain the electric charge as long as the power is applied to them, and no refresh cycle is needed.
We use a memory testing program that does this sub-test:
Quote:
Bit fade test, 2 patterns
The bit fade test initalizes all of memory with a pattern and then sleeps for 5 minutes (or a custom user-specifed time interval). Then memory is examined to see if any memory bits have changed. All ones and all zero patterns are used.
Do you think the memory can hold its contents that long? We've never experienced an error with this particular sub-test.

Xyzzy is offline   Reply With Quote
Old 2020-02-02, 11:45   #121
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

2×3×5×13 Posts
Default

Quote:
Originally Posted by M344587487 View Post
I can see HBM potentially becoming an L4 of sorts particularly as we try to break the bandwidth limit for iGPU's.
The strength and weakness of HBM is that it is very wide but relatively slow clocked. For a highly parallel workload like GPU, that's not a problem. For a CPU it might be too wide to be effective unless you have a lot of cores working on the same kind of data.

Quote:
I'm probably giving intel/AMD too much credit, such a thing might exist one day in a mobile form factor that does away with DDR and a discrete card altogether but it's unlikely to exist in a desktop form factor.
Intel Foveros might some day turn into that, as well as a future implementation of AMD's chiplet strategy. It'll be interesting to see how things go.
mackerel is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 05:12.

Thu Oct 29 05:12:53 UTC 2020 up 49 days, 2:23, 1 user, load averages: 1.78, 1.64, 1.52

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.