mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Increasing memory channels, but not RAM, slows mprime. (https://www.mersenneforum.org/showthread.php?t=26988)

drkirkby 2021-07-09 09:26

Increasing memory channels, but not RAM, slows mprime.
 
Due to a fault on my motherboard, my computer (Dell 7920) would not power on if RAM modules were inserted in certain sockets. The CPUs have 6 memory channels, but the only way to use 6 DIMMs on the first CPU was to place them in sockets which resulted in only 4 memory channels being used. (There are 12 DIMM sockets per CPU, so I had the flexability to use the wrong sockets). The second CPU had no such issues, but the BIOS indicated only 4 of the 12 memory channels were being used, despite there were 12 DIMMs. (I would have expected 4+4=8 or 4+6=10 memory channels, but the BIOS said only 4). I would expect this to be a non-optimal configuration. With this sub-optimal memory configuration, I had run the benchmarks on mprime and found the optimal throughput was to use 2 workers. With 2 workers, each iteration of a PRP test of a 104 million exponent was taking about 1.5 ms.

The Dell motherboard was changed for an identical model and the fault went away. With the same 12 DIMMs as before, the memory channels have been increased from 4 to 12. Much to my surprise, this [B]increased[/B] the time per iteration of mprime 30.6b4 from around 1.5 to 2.0 ms for the same exponents! Essentially the throughput had decreased. Can anyone explain why this might happen?

I re-run the benchmarks and found the configuration giving optimal throughput had changed from 2 workers to 4 workers. So I increased the number of workers from 2 to 4. Unsurprisingly reducing the number of cores per worker from 26 to 13 increased the iteration time further. The iteration time is now 3 ms.

Since now 4 exponents are being tested simultaneously, at 3 ms/iteration, this is essentially the same total throughput as testing 2 exponents at 1.5 ms/iteration. So the increase in memory channels from 4 to 12 has not resulted in any change in total throughput, but has resulted in the optimal number of workers changing from 2 to 4.

The CPUs are only clocking the RAM at 2400 MHz, not the 2933 MHz that the motherboard is capable of. But that's due to CPUs I have. I'm assuming this means that the performance of the computer is not set by memory bandwidth, but by CPU speed (only 2.0 GHz). [B]But I'm still puzzled why changing the memory channels from 4 to 12 resulted in decreased throughput until I increased the number of workers from 2 to 4. [/B]

Any thoughts?

paulunderwood 2021-07-09 09:48

I doubt your Dell runs "12 channel" -- most likely it will be a quad channel box. Check you motherboard manual for optimal channel settings. (If you find a manual online please provide a link to it.)

axn 2021-07-09 10:21

How much L3 does your CPUs have? The more L3 you have, the less the impact of poor memory bandwidth.

Also, previously, was the memory running at 2400 only or was it faster?

Finally, if truly 12 ram channels are active, you might look at other configurations with more workers.

ATH 2021-07-09 10:22

1 Attachment(s)
I have only heard about up to 8-channel memory so far on the newest servers.
Searching for Dell 7920:
[url]https://i.dell.com/sites/csdocuments/Shared-Content_data-Sheets_Documents/en/us/Precision-7920-Tower-Spec-Sheet.pdf[/url]
[QUOTE]
Memory Options1 [B][I]Six channel memory[/I][/B] up to 1.5TB 2666MHz DDR4
ECC memory with dual CPUs, up to 3TB with
select CPU SKUs
Up to 768GB of 2933MHz DDR4 ECC memory 24
DIMM Slots (12 DIMMs per CPU).
Note: memory speed is dependent on specific
Intel Xeon Scalable Processor CPU installed[/QUOTE]

To get 6-channel memory, I think you need to install 6 DIMMs in every 2nd slot or all 12 DIMMs, and it should be the exact same type of RAM.


Try CPU-Z: [url]https://www.cpuid.com/[/url]
You do not have to install anything, there is a zip file with CPU-Z:
[url]https://download.cpuid.com/cpu-z/cpu-z_1.96-en.zip[/url]

Just run the "cpuz_x64.exe" and on the Memory tab it will show you how many memory channels you use, mine is running 4-channel memory (quad):

drkirkby 2021-07-09 10:35

1 Attachment(s)
[QUOTE=paulunderwood;582872]I doubt your Dell runs "12 channel" -- most likely it will be a quad channel box. Check you motherboard manual for optimal channel settings. (If you find a manual online please provide a link to it.)[/QUOTE]
Attached is a photograph I took of the BIOS, showing 12 memory channels in use. The particular CPU is not generally available, so there's no information on the Intel website, but other sources indicate the CPU has 6 memory channels.
[URL="https://www.cpu-world.com/CPUs/Xeon/Intel-Xeon%208167M.html"]https://www.cpu-world.com/CPUs/Xeon/Intel-Xeon%208167M.html[/URL]

Here's the [B]user[/B] manual.
[URL]https://dl.dell.com/topicspdf/precision-7920-workstation_owners-manual_en-us.pdf[/URL]
Page 98 says up to 6 memory channels per CPU. Someone kindly sent me a technical reference on this workstation, which Dell don't make public, which I can't send at the moment, but I will send a link later. But I think the photograph shows it is using 12 memory channels.

Note there's a rackmount version of this, so if you Google it, ignore the rackmount version, although I think they are pretty similar. The rackmount version has dual PSUs and enhanced security features, whereas the tower does not.

drkirkby 2021-07-09 10:49

[QUOTE=axn;582873]How much L3 does your CPUs have? The more L3 you have, the less the impact of poor memory bandwidth.

Also, previously, was the memory running at 2400 only or was it faster?

Finally, if truly 12 ram channels are active, you might look at other configurations with more workers.[/QUOTE]
There's 35.75 MB of L3 cache per CPU. The 2400 MHz is a limitation of the CPU - other CPUs in the Xeon gold or platinum range run the RAM up to 2933 MHz, but they are quite expensive CPUs, whereas these CPUs are quite cheap. I've benchmarked more workers (I tried, 1, 2, 3 .. 52). But 4 workers gives optimal throughput.

axn 2021-07-09 10:55

[QUOTE=drkirkby;582877]There's 35.75 MB of L3 cache per CPU. The 2400 MHz is a limitation of the CPU - other CPUs in the Xeon gold or platinum range run the RAM up to 2933 MHz, but they are quite expensive CPUs, whereas these CPUs are quite cheap. I've benchmarked more workers (I tried, 1, 2, 3 .. 52). But 4 workers gives optimal throughput.[/QUOTE]

I'm interested in (3 workers x 8 threads) x 2 config (yes, two cores will be idle in each CPU). How does it compare to the (2x13)x2?

drkirkby 2021-07-09 10:59

[QUOTE=ATH;582874]I have only heard about up to 8-channel memory so far on the newest servers.
Searching for Dell 7920:
[URL]https://i.dell.com/sites/csdocuments/Shared-Content_data-Sheets_Documents/en/us/Precision-7920-Tower-Spec-Sheet.pdf[/URL]

To get 6-channel memory, I think you need to install 6 DIMMs in every 2nd slot or all 12 DIMMs, and it should be the exact same type of RAM.

Try CPU-Z: [URL]https://www.cpuid.com/[/URL]
You do not have to install anything, there is a zip file with CPU-Z:
[URL]https://download.cpuid.com/cpu-z/cpu-z_1.96-en.zip[/URL]

Just run the "cpuz_x64.exe" and on the Memory tab it will show you how many memory channels you use, mine is running 4-channel memory (quad):[/QUOTE]
There are 12 sockets of the 24 occupied. One DIMM is Dell, the other 11 Kingston. All DIMMs are the same size. Perhaps I should swap the Dell DIMM for a Kingston one, but there are clearly 12 channels in use, as the photograph from the BIOS shows. Even the Kingston DIMMs, although cheaper than Dell, are not exactly cheap. If it was thought to be a benefit from changing I would do so, but I doubt there is. But maybe others feel otherwise - I'm open to suggestions.

[B]Edit - I will try those utilities later, but I need to reboot into Windows, and I have some real work to do just now. [/B]

drkirkby 2021-07-09 11:01

[QUOTE=axn;582878]I'm interested in (3 workers x 8 threads) x 2 config (yes, two cores will be idle in each CPU). How does it compare to the (2x13)x2?[/QUOTE]I'll test this out later - I have to do some real work now, that pays the bills.

paulunderwood 2021-07-09 12:57

It is two hex channel chips. It looks like you have installed the DIMMS correctly. You can run mprime/Prime95's benchmarking to automatically get the maximum throughput based on the number of workers and the current wavefront FFT sizes.

kriesel 2021-07-09 13:46

[QUOTE=drkirkby;582879]I will try those utilities later, but I need to reboot into Windows, and I have some real work to do just now. [/QUOTE]Surely linux has the capability tucked away somewhere. (Quick DuckDuckGo search later...)
Dmidecode appears to provide the necessary info, at least indirectly.
"Bank locator: CHAN A DIMM 0" [URL]https://www.cyberciti.biz/faq/check-ram-speed-linux/[/URL]


All times are UTC. The time now is 16:37.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.