mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2021-07-10, 18:50   #23
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

22×32×179 Posts
Default

Quote:
Originally Posted by drkirkby View Post
I intend swapping out the 8167Ms at a later date for higher performance CPUs when the prices fall. But currently the fast gold or platinum CPUs with a lot of cores are very expensive, but the 8167M offers a lot of bang for the buck.
I'd be surprised if you could get anything much higher performance ... the board won't support the bizarre Platinum 92xx dual-chip packages, and the absolutely highest performance Platinum 82xx is 28 cores at 2.7GHz so only 40% faster. I'd expect it to be three or four years before those chips hit the second-hand market and another two or three to get trivially cheap - the Ivy Bridge Xeons have only just hit the sub-£100 mark and supply for the 12-core versions isn't there.
fivemack is offline   Reply With Quote
Old 2021-07-13, 08:16   #24
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

26×7 Posts
Default

Quote:
Originally Posted by fivemack View Post
I'd be surprised if you could get anything much higher performance ... the board won't support the bizarre Platinum 92xx dual-chip packages, and the absolutely highest performance Platinum 82xx is 28 cores at 2.7GHz so only 40% faster. I'd expect it to be three or four years before those chips hit the second-hand market and another two or three to get trivially cheap - the Ivy Bridge Xeons have only just hit the sub-£100 mark and supply for the 12-core versions isn't there.
I partially agree with you. It’s clear I could not get CPUs with much higher performance, but the platinum 8167M is a first generation Skylake CPU, so I suspect that a bit more than 40% improvement will be possible, which would be worthwhile - even 40% I would consider worthwhile.

As far as I can determine, the platinum processors offer no advantage over gold in dual-CPU systems - the advantage of platinum CPUs is they can be used in boards taking up to 8 processors.

The 2nd generation 24-core 3.0 GHz Gold 6248R is being sold by Intel at around $2700 in quantities of 1000. That’s considerably less than the $10,000 of the 28-core 2.7 GHz Platinum 8280. For this reason there are a large number of 6248Rs being bought from Intel, and there are plenty of 6248Rs on the second hand market today. There should not be much difference in performance between 28-core 2.7 GHz and 24-core 3.0 GHz - in fact I would prefer the higher clock-speed even at a reduced number of cores. Currently the 6248Rs are selling around £1600 (GBP ) each. When they fall to £500 each I will buy a couple.

Last fiddled with by drkirkby on 2021-07-13 at 08:20
drkirkby is offline   Reply With Quote
Old 2021-07-14, 18:06   #25
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

26×7 Posts
Default

Quote:
Originally Posted by axn View Post
I'm interested in (3 workers x 8 threads) x 2 config (yes, two cores will be idle in each CPU). How does it compare to the (2x13)x2?
I run some benchmarks, including what you wanted to see, and got some quite surprising results. However, kriesel's view that a single copy of mprime handles multiple processors well appears true - running two copies always results in a lower total throughput, but there are a few surprising results here. Hyperthreading was not used - I don't believe it is beneficial for PRP tests, but I guess I should verify that is indeed the case, as some results surprise me.

1) One copy of mprime, 52 cores
Code:
[Worker #1 Jul 14 16:25] Timing 5760K FFT, 52 cores, 2 workers.  Average times:  2.07,  2.10 ms.  Total throughput: 959.54 iter/sec.
[Worker #1 Jul 14 16:26] Timing 5760K FFT, 52 cores, 3 workers.  Average times:  3.92,  3.92,  2.12 ms.  Total throughput: 981.91 iter/sec.
[Worker #1 Jul 14 16:28] Timing 5760K FFT, 52 cores, 4 workers.  Average times:  3.92,  3.91,  3.93,  3.94 ms.  Total throughput: 1019.05 iter/sec.
The highest throughput, which is 1019.05 iter/sec., is achieved with 4 workers (I had benchmarked higher numbers of workers before, and found 4 workers to give the highest throughput)

2) One copy of mprime, 48 cores
Code:
[Worker #1 Jul 14 16:31] Timing 5760K FFT, 48 cores, 2 workers.  Average times:  2.02,  2.02 ms.  Total throughput: 991.00 iter/sec.
[Worker #1 Jul 14 16:32] Timing 5760K FFT, 48 cores, 3 workers.  Average times:  4.00,  3.99,  2.03 ms.  Total throughput: 992.50 iter/sec.
[Worker #1 Jul 14 16:33] Timing 5760K FFT, 48 cores, 4 workers.  Average times:  4.00,  3.99,  3.98,  3.99 ms.  Total throughput: 1002.56 iter/sec.
I'm surprised, but 48 cores has higher throughput than 52 cores when using 2 or 3 workers! However, when using 4 workers, 52 cores has slightly (about 1.5%) greater throughput than 48 cores.

3) Two copies of mprime running, in two different directories. 26 cores for each process.
Code:
[Worker #1 Jul 14 15:56]Timing 5760K FFT, 26 cores, 2 workers.  Average times:  6.27,  6.19 ms.  Total throughput: 320.92 iter/sec.
[Worker #1 Jul 14 15:57] Timing 5760K FFT, 26 cores, 3 workers.  Average times: 13.04, 12.49,  6.11 ms.  Total throughput: 320.51 iter/sec.
[Worker #1 Jul 14 15:58] Timing 5760K FFT, 26 cores, 4 workers.  Average times: 13.25, 12.72, 13.36, 12.85 ms.  Total throughput: 306.80 iter/sec.

[Worker #1 Jul 14 15:56] Timing 5760K FFT, 26 cores, 2 workers.  Average times:  6.27,  6.19 ms.  Total throughput: 320.92 iter/sec.
[Worker #1 Jul 14 15:57] Timing 5760K FFT, 26 cores, 3 workers.  Average times: 13.04, 12.49,  6.11 ms.  Total throughput: 320.51 iter/sec.
[Worker #1 Jul 14 15:58] Timing 5760K FFT, 26 cores, 4 workers.  Average times: 13.25, 12.72, 13.36, 12.85 ms.  Total throughput: 306.80 iter/sec.
So total throughputs with all 52 cores in use are
2 workers on each process 320.92+321.10=642.02 iter/sec
3 workers on each process 320.51+320.34=640.85 iter/sec
4 workers on each process 306.80+305.99=612.79 iter/sec
The best result, using two workers, gives 642.02 iter/sec, which is significantly poorer than the configuration with highest throughput (1 mprime process, 52 cores, 4 workers).

4) Two copies of mprime running, in two different directories. 24 cores for each process.
Code:
[Worker #1 Jul 14 16:08] Timing 5760K FFT, 24 cores, 2 workers.  Average times:  6.44,  6.38 ms.  Total throughput: 311.93 iter/sec.
[Worker #1 Jul 14 16:10] Timing 5760K FFT, 24 cores, 3 workers.  Average times: 13.36, 12.82,  6.32 ms.  Total throughput: 311.13 iter/sec.
[Worker #1 Jul 14 16:11] Timing 5760K FFT, 24 cores, 4 workers.  Average times: 12.98, 13.07, 13.03, 12.97 ms.  Total throughput: 307.36 iter/sec.

[Worker #1 Jul 14 16:08] Timing 5760K FFT, 24 cores, 2 workers.  Average times:  6.44,  6.38 ms.  Total throughput: 311.93 iter/sec.
[Worker #1 Jul 14 16:10] Timing 5760K FFT, 24 cores, 3 workers.  Average times: 13.36, 12.82,  6.32 ms.  Total throughput: 311.13 iter/sec.
[Worker #1 Jul 14 16:11] Timing 5760K FFT, 24 cores, 4 workers.  Average times: 12.98, 13.07, 13.03, 12.97 ms.  Total throughput: 307.36 iter/sec.
Calculating the total throughput

2 workers per process =311.93+311.69 = 623.62 iter/sec
3 workers per process =311.13+310.32 = 621.45 iter/sec
4 workers per process =307.36+306.75 = 614.11 iter/sec
The best result (2 workers per process), is still far from optimal. Using two copies of mprime appears to kill the performance.

5) Two copies of mprime running, in two different directories. 52 cores for each process - trying to use twice as many cores as available.
I did not mean to benchmark this - I got the result by accident. I would not even bother having reported it, other than it gives a surprising result. Trying to use 104 cores, when there are only 52 available, actually gives better throughput than expected when using two copies of mprime. But it is not as good as using one copy of mprime.

Code:
[Worker #1 Jul 14 15:43] Timing 5760K FFT, 52 cores, 2 workers.  Average  times:  4.89,  5.03 ms.  Total throughput: 403.19 iter/sec.
[Worker  #1 Jul 14 15:44] Timing 5760K FFT, 52 cores, 3 workers.  Average times:   9.78,  9.78,  4.95 ms.  Total throughput: 406.50 iter/sec.
[Worker  #1 Jul 14 15:45] Timing 5760K FFT, 52 cores, 4 workers.  Average times:   9.36,  9.25,  9.31,  9.29 ms.  Total throughput: 430.14 iter/sec.

[Worker  #1 Jul 14 15:43] Timing 5760K FFT, 52 cores, 2 workers.  Average  times:  4.82,  5.00 ms.  Total throughput: 407.47 iter/sec.
[Worker  #1 Jul 14 15:44] Timing 5760K FFT, 52 cores, 3 workers.  Average times:   9.82,  9.74,  4.93 ms.  Total throughput: 407.12 iter/sec.
[Worker  #1 Jul 14 15:45] Timing 5760K FFT, 52 cores, 4 workers.  Average times:   9.28,  9.18,  9.24,  9.33 ms.  Total throughput: 431.98 iter/sec.
So the total throughputs were
2 workers 403.19 + 407.47 = 810.66 iter/sec
3 workers 406.50 + 407.12 = 813.62 iter/sec
4 workers 430.14 + 431.98 = 862.12 iter/sec

Conclusions.
Based on just the one FFT size of 5760 KB, the best results (1019.05 iter/sec) are achieved using

* One copy of mprime
* All 52 cores
* 4 workers.

Running two copies of mprime does not seem effective. The combined throughput of both copies is much less than obtainable with one copy. This supports kriesel's view that mprime handles multiple processors well.

A couple of results surprise me
a) Using 48-cores gives higher throughput than 52 cores if using 2 or 3 workers. However, if using 4 workers, using 52 cores is best.
b) Running two copies of mprime, each using 52 cores (twice as many as available), actually gives better results than using two copies of mprime with the available cores. However, the maximum throughput with that weird configuration is 862.12 iter/sec, which is much less than the 1019.05 iter/sec possible with one copy of mprime using all 52 cores.

Any comments - particularly on some of the weird results, like 48 cores outperforming 52 cores with 2 or 3 workers?

Last fiddled with by drkirkby on 2021-07-14 at 18:43
drkirkby is offline   Reply With Quote
Old 2021-07-14, 18:34   #26
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

11·461 Posts
Default

Three.
VBCurtis is offline   Reply With Quote
Old 2021-07-15, 02:46   #27
axn
 
axn's Avatar
 
Jun 2003

5,189 Posts
Default

Quote:
Originally Posted by drkirkby View Post
Running two copies of mprime does not seem effective. The combined throughput of both copies is much less than obtainable with one copy.
What steps did you take to ensure that both copies were not trying to run on same cores?


Also, when I said (3x8)x2, that would mean 6 workers total on 48 cores. Do you have those results as well?
axn is online now   Reply With Quote
Old 2021-07-15, 08:00   #28
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

7008 Posts
Default

Quote:
Originally Posted by axn View Post
What steps did you take to ensure that both copies were not trying to run on same cores?
I did not take any special steps to ensure that both copies were not trying to run on same cores. I have tested this in the past, taking care that the process run on a particular CPU, using a particular bit of memory, with
Code:
$numactl --cpunodebind=0 --membind=0 mprime1
$numactl --cpunodebind=1 --membind=1 mprime2
(I did at that time copy the mprime binary to new names so it showed in "top" as different process names easily) Using numactl did not change any results, but I don't know how many memory channels I had active at that time, other than to say there would have been less than 12 active memory channels. I thought Ubuntu would manage this sort of thing quite well, but I can certainly try again, forcing the use of a particular CPU, and/or a particular section of memory.
Quote:
Originally Posted by axn View Post
Also, when I said (3x8)x2, that would mean 6 workers total on 48 cores. Do you have those results as well?
Either you missed the results, or I misunderstood you, but I interpreted your (3x8)x2 as meaning.
  • 2 processes
  • Each process having 3 workers
  • Each worker having 8 cores
  • Each process would have used 3*8=24 cores
  • Each CPU would have had 26-24=2 cores inactive
If that is what you meant, the results are in section 4. The total throughput was a dismal 621.45 iter/sec. If instead you meant something else, I can try it if you clarify what you mean.
drkirkby is offline   Reply With Quote
Old 2021-07-15, 08:37   #29
axn
 
axn's Avatar
 
Jun 2003

5,189 Posts
Default

Quote:
Originally Posted by drkirkby View Post
If instead you meant something else, I can try it if you clarify what you mean.
Single process. 6 workers. Each worker using 8 cores. The assumption is that 3 workers will be mapped to one socket and the other 3 to the second socket.
axn is online now   Reply With Quote
Old 2021-07-15, 08:42   #30
axn
 
axn's Avatar
 
Jun 2003

121058 Posts
Default

Quote:
Originally Posted by drkirkby View Post
I did not take any special steps to ensure that both copies were not trying to run on same cores. I have tested this in the past
Without taking steps, they both will try to run on first socket. You results would be meaningless.

In fact, the whole point of doing this test was the inability of P95 to take advantage of the 4->12 improvement in mem channel, and George brought up the issue of mprime not being numa-aware.

So you need to redo the test with numactl. But... you might also need to set the affinity flags inside each mprime copy so that the program doesn't try to fight the OS.
axn is online now   Reply With Quote
Old 2021-07-15, 18:13   #31
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22·1,481 Posts
Default

Such a situation could be detected on Windows using the CPU pane of Task Manager (Win 7) or Resource Monitor (Win 10) with the caveat that each hyperthread is represented, so it becomes very unwieldy on Xeon Phi. (7250 is 68 cores x 4-way HT.)
kriesel is online now   Reply With Quote
Old 2021-07-15, 21:04   #32
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

26×7 Posts
Default

Quote:
Originally Posted by axn View Post
Single process. 6 workers. Each worker using 8 cores. The assumption is that 3 workers will be mapped to one socket and the other 3 to the second socket.
Okay, here's a few more results, all using 48 cores, so all having 2 cores per CPU idle. I thought it sensible to start the numbering of the tests at one more than the previous results, so a particular results could be discussed. Hence results start at 6.

I'm not sure how to use the Affinity properly - I have attached the benchmark result, as maybe someone can point out how to control this. I realise I need to put something in local.txt

numactl is not binding processes to particular CPUs - I infer this from the temperature of the cores. Perhaps I am using it numactl incorrectly.

6) Testing with both 4 workers (12 cores each) and 6 workers (8 cores each). No attempt was made to control where workers run.
Code:
[Worker #1 Jul 15 21:14] Timing 5760K FFT, 48 cores, 4 workers.  Average times:  3.98,  3.98,  4.07,  4.07 ms.  Total throughput: 993.78 iter/sec.
[Worker #1 Jul 15 21:15] Timing 5760K FFT, 48 cores, 6 workers.  Average times:  6.44,  6.38,  6.38,  6.61,  6.62,  6.62 ms.  Total throughput: 921.81 iter/sec.
The 4 workers has about 8% higher throughput than 6 workers.

7) Two processes, with each process tested using 2, 3 and 4 workers. Attempting (UNSUCCESSFULLY) to tie processes to specific CPUs using numactl.
Code:
$ numactl --physcpubind=0 mprime0 -m
[Worker #1 Jul 15 15:32] Timing 5760K FFT, 24 cores, 2 workers.  Average times:  6.24,  6.51 ms.  Total throughput: 313.79 iter/sec.
[Worker #1 Jul 15 15:33] Timing 5760K FFT, 24 cores, 3 workers.  Average times: 13.07, 13.00,  6.53 ms.  Total throughput: 306.46 iter/sec.
[Worker #1 Jul 15 15:34] Timing 5760K FFT, 24 cores, 4 workers.  Average times: 13.35, 13.36, 13.42, 13.57 ms.  Total throughput: 298.06 iter/sec.
and
Code:
$ numactl --physcpubind=1 mprime1 -m
[Worker #1 Jul 15 15:32] Timing 5760K FFT, 24 cores, 2 workers.  Average times:  6.24,  6.49 ms.  Total throughput: 314.49 iter/sec.
[Worker #1 Jul 15 15:33] Timing 5760K FFT, 24 cores, 3 workers.  Average times: 13.08, 12.98,  6.54 ms.  Total throughput: 306.48 iter/sec.
[Worker #1 Jul 15 15:34] Timing 5760K FFT, 24 cores, 4 workers.  Average times: 13.34, 13.35, 13.41, 13.55 ms.  Total throughput: 298.19 iter/sec.
Hence the throughputs are as follows
4 workers 313.79+314.49 = 628.28 iter/sec.
6 workers 306.46+306.48 = 612.94 iter/sec.
8 workers 298.06+298.19 = 596.25 iter/sec.

Using the "sensors" command I can determine the temperature of the CPU cores. Based on that, I'm convinced that using one process with 24 cores, and trying to bind them to CPU1 (2nd CPU)

Code:
$ numactl --physcpubind=1 mprime
does not work, as the first CPU gets hot, and the 2nd one stays cold.

8) Two processes. Each process tested using 2, 3 and 4 workers. (Like #7 above, but not attempting to tie processes to cores.)

Code:
[Worker #1 Jul 15 21:42] Timing 5760K FFT, 24 cores, 2 workers.  Average times:  6.10,  6.18 ms.  Total throughput: 325.85 iter/sec.
[Worker #1 Jul 15 21:43] Timing 5760K FFT, 24 cores, 3 workers.  Average times: 12.67, 12.64,  6.22 ms.  Total throughput: 318.77 iter/sec.
[Worker #1 Jul 15 21:45] Timing 5760K FFT, 24 cores, 4 workers.  Average times: 12.82, 12.82, 12.82, 12.80 ms.  Total throughput: 312.16 iter/sec.
and
Code:
[Worker #1 Jul 15 21:42] Timing 5760K FFT, 24 cores, 2 workers.  Average times:  6.10,  6.17 ms.  Total throughput: 325.81 iter/sec.
[Worker #1 Jul 15 21:44] Timing 5760K FFT, 24 cores, 3 workers.  Average times: 12.65, 12.67,  6.24 ms.  Total throughput: 318.35 iter/sec.
[Worker #1 Jul 15 21:45] Timing 5760K FFT, 24 cores, 4 workers.  Average times: 12.75, 12.72, 12.72, 12.79 ms.  Total throughput: 313.82 iter/sec.
Hence the throughputs are as follows
4 workers 325.85+325.81 = 651.66 iter/sec.
6 workers 318.77+318.35 = 637.12 iter/sec.
8 workers 312.16+313.82 = 625.98 iter/sec.
The throughput with 2 processes is lower than with one process, but these are higher than in tests 7 above.

Conclusions.
  1. Uses two processes, each with 24 cores, results in significantly poorer performance than using one process with 48 cores.
  2. Attempting to tie a process to a core with numactl makes the performance drop even further than just letting the kernel determine where processes run.
  3. Temperature measurements (not shown) indicate numactl is not keeping processes to the core I want.
  4. Using 1 process, 52 cores and 4 workers can't be beaten using 48 cores in any way tried to date.
Attached Files
File Type: txt results.bench.txt (14.4 KB, 36 views)
drkirkby is offline   Reply With Quote
Old 2021-07-15, 21:29   #33
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

26·7 Posts
Default

Quote:
Originally Posted by kriesel View Post
Such a situation could be detected on Windows using the CPU pane of Task Manager (Win 7) or Resource Monitor (Win 10) with the caveat that each hyperthread is represented, so it becomes very unwieldy on Xeon Phi. (7250 is 68 cores x 4-way HT.)
It can probably be done on linux, but I'm not sure how. To convince myself what CPU processes were running on, I used the "sensors" command to measure the core temperature.


9) Temperature sensing - only 12 cores in use.

Here's a set of temperature reading when running mprime to test for primes - not benchmarking. I used only 12 cores - 4 workers, each with 12 cores. CPU0 is hotter than CPU1,, despite the hot air from CPU0 blows over CPU1. However, its far from clear that 12 cores are hot on CPU0, and 14 cool on CPU0, so it seems the work is moving from core to core.

Code:
Package id 0:  +48.0°C  (high = +89.0°C, crit = +99.0°C)
Core 0:        +42.0°C  (high = +89.0°C, crit = +99.0°C)
Core 1:        +43.0°C  (high = +89.0°C, crit = +99.0°C)
Core 2:        +43.0°C  (high = +89.0°C, crit = +99.0°C)
Core 3:        +42.0°C  (high = +89.0°C, crit = +99.0°C)
Core 4:        +42.0°C  (high = +89.0°C, crit = +99.0°C)
Core 5:        +42.0°C  (high = +89.0°C, crit = +99.0°C)
Core 6:        +47.0°C  (high = +89.0°C, crit = +99.0°C)
Core 8:        +43.0°C  (high = +89.0°C, crit = +99.0°C)
Core 9:        +42.0°C  (high = +89.0°C, crit = +99.0°C)
Core 10:       +43.0°C  (high = +89.0°C, crit = +99.0°C)
Core 11:       +43.0°C  (high = +89.0°C, crit = +99.0°C)
Core 12:       +43.0°C  (high = +89.0°C, crit = +99.0°C)
Core 13:       +41.0°C  (high = +89.0°C, crit = +99.0°C)
Core 16:       +40.0°C  (high = +89.0°C, crit = +99.0°C)
Core 17:       +41.0°C  (high = +89.0°C, crit = +99.0°C)
Core 18:       +41.0°C  (high = +89.0°C, crit = +99.0°C)
Core 19:       +41.0°C  (high = +89.0°C, crit = +99.0°C)
Core 20:       +39.0°C  (high = +89.0°C, crit = +99.0°C)
Core 21:       +41.0°C  (high = +89.0°C, crit = +99.0°C)
Core 22:       +39.0°C  (high = +89.0°C, crit = +99.0°C)
Core 24:       +41.0°C  (high = +89.0°C, crit = +99.0°C)
Core 25:       +39.0°C  (high = +89.0°C, crit = +99.0°C)
Core 26:       +39.0°C  (high = +89.0°C, crit = +99.0°C)
Core 27:       +41.0°C  (high = +89.0°C, crit = +99.0°C)
Core 28:       +41.0°C  (high = +89.0°C, crit = +99.0°C)
Core 29:       +46.0°C  (high = +89.0°C, crit = +99.0°C)

dell_smm-virtual-0
Adapter: Virtual device
fan1:           0 RPM
fan2:         802 RPM
fan3:         808 RPM

nvme-pci-10200
Adapter: PCI adapter
Composite:    +33.9°C  (low  = -20.1°C, high = +77.8°C)
                       (crit = +81.8°C)
Sensor 1:     +33.9°C  (low  = -273.1°C, high = +65261.8°C)

coretemp-isa-0001
Adapter: ISA adapter
Package id 1:  +42.0°C  (high = +89.0°C, crit = +99.0°C)
Core 0:        +39.0°C  (high = +89.0°C, crit = +99.0°C)
Core 1:        +39.0°C  (high = +89.0°C, crit = +99.0°C)
Core 2:        +39.0°C  (high = +89.0°C, crit = +99.0°C)
Core 3:        +37.0°C  (high = +89.0°C, crit = +99.0°C)
Core 4:        +37.0°C  (high = +89.0°C, crit = +99.0°C)
Core 5:        +38.0°C  (high = +89.0°C, crit = +99.0°C)
Core 6:        +38.0°C  (high = +89.0°C, crit = +99.0°C)
Core 8:        +37.0°C  (high = +89.0°C, crit = +99.0°C)
Core 9:        +38.0°C  (high = +89.0°C, crit = +99.0°C)
Core 10:       +38.0°C  (high = +89.0°C, crit = +99.0°C)
Core 11:       +38.0°C  (high = +89.0°C, crit = +99.0°C)
Core 12:       +38.0°C  (high = +89.0°C, crit = +99.0°C)
Core 13:       +39.0°C  (high = +89.0°C, crit = +99.0°C)
Core 16:       +39.0°C  (high = +89.0°C, crit = +99.0°C)
Core 17:       +38.0°C  (high = +89.0°C, crit = +99.0°C)
Core 18:       +38.0°C  (high = +89.0°C, crit = +99.0°C)
Core 19:       +39.0°C  (high = +89.0°C, crit = +99.0°C)
Core 20:       +38.0°C  (high = +89.0°C, crit = +99.0°C)
Core 21:       +36.0°C  (high = +89.0°C, crit = +99.0°C)
Core 22:       +37.0°C  (high = +89.0°C, crit = +99.0°C)
Core 24:       +37.0°C  (high = +89.0°C, crit = +99.0°C)
Core 25:       +37.0°C  (high = +89.0°C, crit = +99.0°C)
Core 26:       +38.0°C  (high = +89.0°C, crit = +99.0°C)
Core 27:       +38.0°C  (high = +89.0°C, crit = +99.0°C)
Core 28:       +38.0°C  (high = +89.0°C, crit = +99.0°C)
Core 29:       +39.0°C  (high = +89.0°C, crit = +99.0°C)
10) Temperature sensing - all 52 cores in use.

Code:
drkirkby@canary:~/Desktop$ sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +73.0°C  (high = +89.0°C, crit = +99.0°C)
Core 0:        +69.0°C  (high = +89.0°C, crit = +99.0°C)
Core 1:        +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 2:        +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 3:        +64.0°C  (high = +89.0°C, crit = +99.0°C)
Core 4:        +64.0°C  (high = +89.0°C, crit = +99.0°C)
Core 5:        +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 6:        +69.0°C  (high = +89.0°C, crit = +99.0°C)
Core 8:        +63.0°C  (high = +89.0°C, crit = +99.0°C)
Core 9:        +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 10:       +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 11:       +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 12:       +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 13:       +63.0°C  (high = +89.0°C, crit = +99.0°C)
Core 16:       +64.0°C  (high = +89.0°C, crit = +99.0°C)
Core 17:       +66.0°C  (high = +89.0°C, crit = +99.0°C)
Core 18:       +66.0°C  (high = +89.0°C, crit = +99.0°C)
Core 19:       +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 20:       +63.0°C  (high = +89.0°C, crit = +99.0°C)
Core 21:       +64.0°C  (high = +89.0°C, crit = +99.0°C)
Core 22:       +64.0°C  (high = +89.0°C, crit = +99.0°C)
Core 24:       +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 25:       +63.0°C  (high = +89.0°C, crit = +99.0°C)
Core 26:       +64.0°C  (high = +89.0°C, crit = +99.0°C)
Core 27:       +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 28:       +65.0°C  (high = +89.0°C, crit = +99.0°C)
Core 29:       +74.0°C  (high = +89.0°C, crit = +99.0°C)

dell_smm-virtual-0
Adapter: Virtual device
fan1:           0 RPM
fan2:        1409 RPM
fan3:         806 RPM

nvme-pci-10200
Adapter: PCI adapter
Composite:    +33.9°C  (low  = -20.1°C, high = +77.8°C)
                       (crit = +81.8°C)
Sensor 1:     +33.9°C  (low  = -273.1°C, high = +65261.8°C)

coretemp-isa-0001
Adapter: ISA adapter
Package id 1:  +82.0°C  (high = +89.0°C, crit = +99.0°C)
Core 0:        +77.0°C  (high = +89.0°C, crit = +99.0°C)
Core 1:        +79.0°C  (high = +89.0°C, crit = +99.0°C)
Core 2:        +80.0°C  (high = +89.0°C, crit = +99.0°C)
Core 3:        +78.0°C  (high = +89.0°C, crit = +99.0°C)
Core 4:        +80.0°C  (high = +89.0°C, crit = +99.0°C)
Core 5:        +82.0°C  (high = +89.0°C, crit = +99.0°C)
Core 6:        +80.0°C  (high = +89.0°C, crit = +99.0°C)
Core 8:        +78.0°C  (high = +89.0°C, crit = +99.0°C)
Core 9:        +80.0°C  (high = +89.0°C, crit = +99.0°C)
Core 10:       +81.0°C  (high = +89.0°C, crit = +99.0°C)
Core 11:       +81.0°C  (high = +89.0°C, crit = +99.0°C)
Core 12:       +83.0°C  (high = +89.0°C, crit = +99.0°C)
Core 13:       +81.0°C  (high = +89.0°C, crit = +99.0°C)
Core 16:       +80.0°C  (high = +89.0°C, crit = +99.0°C)
Core 17:       +80.0°C  (high = +89.0°C, crit = +99.0°C)
Core 18:       +81.0°C  (high = +89.0°C, crit = +99.0°C)
Core 19:       +81.0°C  (high = +89.0°C, crit = +99.0°C)
Core 20:       +81.0°C  (high = +89.0°C, crit = +99.0°C)
Core 21:       +78.0°C  (high = +89.0°C, crit = +99.0°C)
Core 22:       +79.0°C  (high = +89.0°C, crit = +99.0°C)
Core 24:       +77.0°C  (high = +89.0°C, crit = +99.0°C)
Core 25:       +78.0°C  (high = +89.0°C, crit = +99.0°C)
Core 26:       +80.0°C  (high = +89.0°C, crit = +99.0°C)
Core 27:       +82.0°C  (high = +89.0°C, crit = +99.0°C)
Core 28:       +81.0°C  (high = +89.0°C, crit = +99.0°C)
Core 29:       +82.0°C  (high = +89.0°C, crit = +99.0°C)

nouveau-pci-7300
Adapter: PCI adapter
fan1:        2473 RPM
temp1:        +30.0°C  (high = +95.0°C, hyst =  +3.0°C)
                       (crit = +105.0°C, hyst =  +5.0°C)
                       (emerg = +135.0°C, hyst =  +5.0°C)

nvme-pci-0100
Adapter: PCI adapter
Composite:    +31.9°C  (low  =  -0.1°C, high = +85.8°C)
                       (crit = +86.8°C)
Sensor 1:     +31.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +29.9°C  (low  = -273.1°C, high = +65261.8°C)
Conclusions
  1. Running 12 cores does not result in 12 hot cores, and 40 cool ones.
  2. Running 12 cores makes CPU0 warmer than CPU1, suggesting the work is being done on CPU0.
  3. Running 52 cores makes CPU1 warmer than CPU0, which is probably due to the fact the hot air from CPU0 passes over CPU1. (It seems a bit of an odd design decision to me.)
drkirkby is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Assigning too much memory slows down P-1 stage 2? ZFR Software 11 2020-12-13 10:19
Allow mprime to use more memory ZFR Software 1 2020-12-10 09:50
Mini ITX with LGA 2011 (4 memory channels) bgbeuning Hardware 7 2016-06-18 10:32
mprime checking available memory tha Software 7 2015-12-07 15:56
Cheesy memory slows down prime95? nomadicus Hardware 9 2003-03-01 00:15

All times are UTC. The time now is 14:23.


Fri Dec 3 14:23:49 UTC 2021 up 133 days, 8:52, 0 users, load averages: 1.37, 1.24, 1.23

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.