mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Running both CPUs on dual CPU system is slower than using one CPU (https://www.mersenneforum.org/showthread.php?t=26694)

drkirkby 2021-04-11 18:05

Running both CPUs on dual CPU system is slower than using one CPU
 
I have a Dell 7920 with dual Intel Xeon 8167M CPUs (2.0 GHz, 24-core). Running one worker, with 26 cores results in an iteration time of around 2.5 ms on a 110 million exponent. However, if I try to run 26 other cores, the time increases to about 8 ms or so. I've tried running two completely independant copies of mprime in different dirrectories That does still screws things up. I have tried forcing one copy of mprime to use a particular CPU, and the memory associated with that CPU, but it does not help.



The best throughput seems to be to run multiple workers, with the number of cores not exceeding 26.



There is something wrong with this Dell 7920. I believe there's a fault on the motherboard, but I can't get Dell to look at it, despite its under warranty. The fault only shows up when the 4th DIMM slot on CPU0 is occupied. But I have 3rd party RAM. I only have two pieces of Dell RAM, so I am stuck. The memory configuration is



CPU0 4 x 32 GB
CPU1 6 x 32 GB.



I'm guessing the memory problem might be the cause that two CPUs work slowly, but any other suggestions that might improve things would be welcome.



Dave

kriesel 2021-04-11 18:55

[URL]https://ark.intel.com/content/www/us/en/ark/products/120502/intel-xeon-platinum-8160m-processor-33m-cache-2-10-ghz.html[/URL] 24 cores, hyperthreading won't help with P-1 or PRP or LL throughput. Cache is cpu-package-local. Communication between packages is slower than within a cpu package.
It's common knowledge to run at least as many prime95 / mprime workers as cpu packages, for performance.
I'd expect a dual cpu system with 24 cores per cpu to be max throughput at all cores running, no use of hyperthreading, and the number of workers optimal to be a function of fft length. See the benchmarks attached at [URL]https://www.mersenneforum.org/showpost.php?p=504218&postcount=4[/URL] [URL]https://www.mersenneforum.org/showpost.php?p=504219&postcount=5[/URL] [URL]https://www.mersenneforum.org/showpost.php?p=563304&postcount=11[/URL]
Do not overload the cpu cores 24x2 but 26+26 as you posted.
Madpoo has tested dual 14-core and found a fastest iteration timing around 20 cores there in one worker; that means 6 cores communicating with the rest over the connection between cpu sockets. Fastest iteration time, most energy-efficient, and maximum throughput for a system are different conditions.

a1call 2021-04-11 18:58

This may or may not work, but I would setup a virtual machine and run the program in there. I assume you will be able to pool all the cores into a single virtual CPU that way.
Others might correct me on that.

ETA Nevermind then, this was a cross-post.

ETA II
:goodposting:

ETA III On my Ryzen 9 system with 16 cores and 32 hyper-threads the fastest total performance seems to be at just above 16 threads. Say 19 threads is marginally faster than 29 threads or 16 threads but the CPU runs significantly hotter than 29 threads. As a result I prefer to run at higher number of threads to keep the CPU at about 58° C. Total performance is not significantly reduced in my observation.
The caveats are that I don't run GIMPS and that I run everything on virtual machines.

drkirkby 2021-04-11 19:41

[QUOTE=kriesel;575731][URL]https://ark.intel.com/content/www/us/en/ark/products/120502/intel-xeon-platinum-8160m-processor-33m-cache-2-10-ghz.html[/URL] 24 cores, hyperthreading won't help with P-1 or PRP or LL throughput. [/QUOTE]

Hyperthreading was not being used.

[QUOTE=kriesel;575731]
Do not overload the cpu cores 24x2 but 26+26 as you posted.
Madpoo has tested dual 14-core and found a fastest iteration timing around 20 cores there in one worker; that means 6 cores communicating with the rest over the connection between cpu sockets. Fastest iteration time, most energy-efficient, and maximum throughput for a system are different conditions.[/QUOTE]

The 24 was a typo. The CPUs have 26 cores each.

[URL="https://www.cpubenchmark.net/cpu.php?cpu=Intel+Xeon+Platinum+8167M+%40+2.00GHz&id=3389"]https://www.cpubenchmark.net/cpu.php?cpu=Intel+Xeon+Platinum+8167M+%40+2.00GHz&id=3389[/URL]


This is a "secret" OEM CPU for which Intel will release no information. It might have some unusual characteristics. I believe some of the CPUs in this range have two links between them, and others three. I suspect this has two, but I don't know.

I really need to get the motherboard sorted out, but Dell charge a fortune for RAM. The cheapest way I could reproduce the problem with Dell RAM would be to buy two new 8 GB RAM modules. These are over £250 (around $300 USD) in the UK. Absolutely crazy for 8 GB modules. Dell support is absolutely ****.

a1call 2021-04-11 20:33

Could throttling due to excessive heat be the reason for significant increased timings per iterations?
Do the CPU'S maintain their temperature when all cores are utilized?

drkirkby 2021-04-18 18:19

[QUOTE=a1call;575737]Could throttling due to excessive heat be the reason for significant increased timings per iterations?
Do the CPU'S maintain their temperature when all cores are utilized?[/QUOTE]

I have not checked the temperature, but I would doubt that to be an issue. The Dell 7920 is sold by Dell with CPUs up to 205 W, but the machines are often on eBay with 250 W CPUs. The CPUs I have are only 150 W. The machine is nowhere near fully loaded.

I seem to have sorted out an optimal strategy for at least 110 million exponents

1) One one copy of mprime.
2) Run two workers.

Each worker takes about two days to generate a PRP test of a 110 million digit exponent, so I should be able to do about 1 per day on average.

I've currently got one worker doing 110904847 and the other doing 332646233, so very different sizes of FFTs. I have not tried benchmarking to see if there's any advantage working any differently. Once the 332646233 is finished, it will be the last large exponent I attempt for a very long time. As exponents for testing get larger, it may be a different stratergy will work better.

I thought running two copies of mprime, and forcing one to use one CPU and the RAM from that CPU was logical. But it actually works quite poorly.

Dave

LaurV 2021-04-20 02:03

Just to point out that "the amount of power it sucks" and "how hot it gets" are, in theory, and 99.9999 percent of the time in practice too, [U][B]unrelated[/B][/U]. I can put "one volt/one ampere" through a "one ohm/one watt" resistor and make it 2000°C "one red hot son of a pepper" in few minutes, with no fan blowing on it. How hot your CPU gets has nothing to do with the power it consumes, but with metal blocks, thermal paste, fans, air, water, radiators, thermal paste (did I say that?), dust clogs, dead cockroaches under the chipset carcass, etc., i.e. how able you are to remove the heat it produces, fast and efficient. Do you think the 5000 MEGA-Watts power generators at the power factories get so much hotter than your computer? :razz:

drkirkby 2021-04-20 13:06

[QUOTE=LaurV;576224]Just to point out that "the amount of power it sucks" and "how hot it gets" are, in theory, and 99.9999 percent of the time in practice too, [U][B]unrelated[/B][/U]. I can put "one volt/one ampere" through a "one ohm/one watt" resistor and make it 2000°C "one red hot son of a pepper" in few minutes, with no fan blowing on it. How hot your CPU gets has nothing to do with the power it consumes, but with metal blocks, thermal paste, fans, air, water, radiators, thermal paste (did I say that?), dust clogs, dead cockroaches under the chipset carcass, etc., i.e. how able you are to remove the heat it produces, fast and efficient. Do you think the 5000 MEGA-Watts power generators at the power factories get so much hotter than your computer? :razz:[/QUOTE]I will see if I can find a way of checking the temperature - I'm unaware of a bit of Linux code, but I'm sure I will be able to find something.

I accept that fans can get blocked but there are several things that make me think it's not a thermal problem.

a) It's a Dell 7920 workstation in a very large case, that's designed to take a lot of parts - about 10 disks, 3 TB RAM, 2 CPUs, 24 RDIMMS, loads of PCI slots. A fully configured one of these is over $100,000. Mine is virtually empty.

b) There are around 10 fans, which speed up and get very noisy if the machine is pushed. I have only heard that when running diagnostics. It is not making much noise.

c) It's not overclocked.

FWIW, here's a video about it. There's also a rackmount version which has dual power supplies.

[URL]https://www.youtube.com/watch?v=jP65i_Iqml8[/URL]


Dave

Uncwilly 2021-04-20 13:30

[QUOTE=drkirkby;576248]b) There are around 10 fans, which speed up and get very noisy if the machine is pushed. I have only heard that when running diagnostics. It is not making much noise. [/QUOTE]The CPU or GPU coolers might be more of an issue than the case fans.

drkirkby 2021-04-20 13:47

[QUOTE=Uncwilly;576249]The CPU or GPU coolers might be more of an issue than the case fans.[/QUOTE]Yes, something does seem to be amiss. I installed lm-sensors a minute ago, and see the 2nd CPU is significantly hotter than the first - see results at bottom of the post. Although both are within acceptable limits, the 2nd CPU is around 30 deg C hotter than the first. However, there's something amiss with the program, as its showing 29 cores for each CPU, but the CPUs are only 26 cores, not 29. Perhaps the other 3-cores are not really cores, but sensors in other parts of the CPU - L2 and L3 cache for example.

There is something a bit weird about the heatsinks in this machine, as the arrows on them both show the airflow towards the centre of the chassis. However, my attempts to get from Dell the correct part number of the heatsink have failed.

a) They gave me a part number, but they did not have any. When I looked on eBay for the part, I see they were all marked "CPU0" and not "CPU1" as the second one should be.

b) I found another part number on eBay, which looked as though it was right.

c) I went back to Dell. They said they did not have any of that part either, but they were equivalent

However, I do note that the direction of airflow shown on the 2nd CPU appears to be from back to front, rather than front to back. But the heatsinks don't have fans on them - they are purely passive.



[CODE][dkirkby@jackdaw ~]$ sensors
#coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +56.0°C (high = +89.0°C, crit = +99.0°C)
Core 0: +56.0°C (high = +89.0°C, crit = +99.0°C)
Core 1: +56.0°C (high = +89.0°C, crit = +99.0°C)
Core 2: +55.0°C (high = +89.0°C, crit = +99.0°C)
Core 3: +54.0°C (high = +89.0°C, crit = +99.0°C)
Core 4: +54.0°C (high = +89.0°C, crit = +99.0°C)
Core 5: +54.0°C (high = +89.0°C, crit = +99.0°C)
Core 6: +53.0°C (high = +89.0°C, crit = +99.0°C)
Core 8: +52.0°C (high = +89.0°C, crit = +99.0°C)
Core 9: +55.0°C (high = +89.0°C, crit = +99.0°C)
Core 10: +55.0°C (high = +89.0°C, crit = +99.0°C)
Core 11: +54.0°C (high = +89.0°C, crit = +99.0°C)
Core 12: +53.0°C (high = +89.0°C, crit = +99.0°C)
Core 13: +52.0°C (high = +89.0°C, crit = +99.0°C)
Core 16: +54.0°C (high = +89.0°C, crit = +99.0°C)
Core 17: +55.0°C (high = +89.0°C, crit = +99.0°C)
Core 18: +56.0°C (high = +89.0°C, crit = +99.0°C)
Core 19: +54.0°C (high = +89.0°C, crit = +99.0°C)
Core 20: +52.0°C (high = +89.0°C, crit = +99.0°C)
Core 21: +54.0°C (high = +89.0°C, crit = +99.0°C)
Core 22: +50.0°C (high = +89.0°C, crit = +99.0°C)
Core 24: +54.0°C (high = +89.0°C, crit = +99.0°C)
Core 25: +53.0°C (high = +89.0°C, crit = +99.0°C)
Core 26: +54.0°C (high = +89.0°C, crit = +99.0°C)
Core 27: +54.0°C (high = +89.0°C, crit = +99.0°C)
Core 28: +54.0°C (high = +89.0°C, crit = +99.0°C)
Core 29: +53.0°C (high = +89.0°C, crit = +99.0°C)

coretemp-isa-0001
Adapter: ISA adapter
Package id 1: +85.0°C (high = +89.0°C, crit = +99.0°C)
Core 0: +81.0°C (high = +89.0°C, crit = +99.0°C)
Core 1: +79.0°C (high = +89.0°C, crit = +99.0°C)
Core 2: +79.0°C (high = +89.0°C, crit = +99.0°C)
Core 3: +84.0°C (high = +89.0°C, crit = +99.0°C)
Core 4: +83.0°C (high = +89.0°C, crit = +99.0°C)
Core 5: +80.0°C (high = +89.0°C, crit = +99.0°C)
Core 6: +76.0°C (high = +89.0°C, crit = +99.0°C)
Core 8: +82.0°C (high = +89.0°C, crit = +99.0°C)
Core 9: +82.0°C (high = +89.0°C, crit = +99.0°C)
Core 10: +81.0°C (high = +89.0°C, crit = +99.0°C)
Core 11: +75.0°C (high = +89.0°C, crit = +99.0°C)
Core 12: +75.0°C (high = +89.0°C, crit = +99.0°C)
Core 13: +75.0°C (high = +89.0°C, crit = +99.0°C)
Core 16: +82.0°C (high = +89.0°C, crit = +99.0°C)
Core 17: +81.0°C (high = +89.0°C, crit = +99.0°C)
Core 18: +80.0°C (high = +89.0°C, crit = +99.0°C)
Core 19: +74.0°C (high = +89.0°C, crit = +99.0°C)
Core 20: +74.0°C (high = +89.0°C, crit = +99.0°C)
Core 21: +81.0°C (high = +89.0°C, crit = +99.0°C)
Core 22: +77.0°C (high = +89.0°C, crit = +99.0°C)
Core 24: +82.0°C (high = +89.0°C, crit = +99.0°C)
Core 25: +85.0°C (high = +89.0°C, crit = +99.0°C)
Core 26: +83.0°C (high = +89.0°C, crit = +99.0°C)
Core 27: +81.0°C (high = +89.0°C, crit = +99.0°C)
Core 28: +76.0°C (high = +89.0°C, crit = +99.0°C)
Core 29: +75.0°C (high = +89.0°C, crit = +99.0°C)

[/CODE]

Xyzzy 2021-04-20 17:21

As root, did you run [C]sensors-detect --auto[/C]?


All times are UTC. The time now is 11:19.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.