![]() |
|
|
#34 | |
|
"David Kirkby"
Jan 2021
Althorne, Essex, UK
24×33 Posts |
Quote:
Code:
drkirkby@canary:~$ lstopo-no-graphics
Machine (377GB total)
Package L#0
NUMANode L#0 (P#0 188GB)
L3 L#0 (36MB)
L2 L#0 (1024KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
L2 L#1 (1024KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
L2 L#2 (1024KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
L2 L#3 (1024KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
L2 L#4 (1024KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#4)
L2 L#5 (1024KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#5)
L2 L#6 (1024KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#6)
L2 L#7 (1024KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#7)
L2 L#8 (1024KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#8)
L2 L#9 (1024KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#9)
L2 L#10 (1024KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU L#10 (P#10)
L2 L#11 (1024KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU L#11 (P#11)
L2 L#12 (1024KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 + PU L#12 (P#12)
L2 L#13 (1024KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 + PU L#13 (P#13)
L2 L#14 (1024KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU L#14 (P#14)
L2 L#15 (1024KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU L#15 (P#15)
L2 L#16 (1024KB) + L1d L#16 (32KB) + L1i L#16 (32KB) + Core L#16 + PU L#16 (P#16)
L2 L#17 (1024KB) + L1d L#17 (32KB) + L1i L#17 (32KB) + Core L#17 + PU L#17 (P#17)
L2 L#18 (1024KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18 + PU L#18 (P#18)
L2 L#19 (1024KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19 + PU L#19 (P#19)
L2 L#20 (1024KB) + L1d L#20 (32KB) + L1i L#20 (32KB) + Core L#20 + PU L#20 (P#20)
L2 L#21 (1024KB) + L1d L#21 (32KB) + L1i L#21 (32KB) + Core L#21 + PU L#21 (P#21)
L2 L#22 (1024KB) + L1d L#22 (32KB) + L1i L#22 (32KB) + Core L#22 + PU L#22 (P#22)
L2 L#23 (1024KB) + L1d L#23 (32KB) + L1i L#23 (32KB) + Core L#23 + PU L#23 (P#23)
L2 L#24 (1024KB) + L1d L#24 (32KB) + L1i L#24 (32KB) + Core L#24 + PU L#24 (P#24)
L2 L#25 (1024KB) + L1d L#25 (32KB) + L1i L#25 (32KB) + Core L#25 + PU L#25 (P#25)
HostBridge
PCI 00:11.5 (SATA)
PCI 00:16.2 (IDE)
PCI 00:17.0 (RAID)
Block(Disk) "sdb"
Block(Disk) "sda"
PCIBridge
PCI 02:00.0 (Ethernet)
Net "enp2s0"
PCIBridge
PCI 03:00.0 (Ethernet)
Net "enp3s0f0"
PCI 03:00.1 (Ethernet)
Net "enp3s0f1"
PCI 00:1f.6 (Ethernet)
Net "enp0s31f6"
HostBridge
PCI 44:05.5 (RAID)
Block(Disk) "nvme0n1"
HostBridge
PCIBridge
PCI 73:00.0 (VGA)
Package L#1
NUMANode L#1 (P#1 189GB)
L3 L#1 (36MB)
L2 L#26 (1024KB) + L1d L#26 (32KB) + L1i L#26 (32KB) + Core L#26 + PU L#26 (P#26)
L2 L#27 (1024KB) + L1d L#27 (32KB) + L1i L#27 (32KB) + Core L#27 + PU L#27 (P#27)
L2 L#28 (1024KB) + L1d L#28 (32KB) + L1i L#28 (32KB) + Core L#28 + PU L#28 (P#28)
L2 L#29 (1024KB) + L1d L#29 (32KB) + L1i L#29 (32KB) + Core L#29 + PU L#29 (P#29)
L2 L#30 (1024KB) + L1d L#30 (32KB) + L1i L#30 (32KB) + Core L#30 + PU L#30 (P#30)
L2 L#31 (1024KB) + L1d L#31 (32KB) + L1i L#31 (32KB) + Core L#31 + PU L#31 (P#31)
L2 L#32 (1024KB) + L1d L#32 (32KB) + L1i L#32 (32KB) + Core L#32 + PU L#32 (P#32)
L2 L#33 (1024KB) + L1d L#33 (32KB) + L1i L#33 (32KB) + Core L#33 + PU L#33 (P#33)
L2 L#34 (1024KB) + L1d L#34 (32KB) + L1i L#34 (32KB) + Core L#34 + PU L#34 (P#34)
L2 L#35 (1024KB) + L1d L#35 (32KB) + L1i L#35 (32KB) + Core L#35 + PU L#35 (P#35)
L2 L#36 (1024KB) + L1d L#36 (32KB) + L1i L#36 (32KB) + Core L#36 + PU L#36 (P#36)
L2 L#37 (1024KB) + L1d L#37 (32KB) + L1i L#37 (32KB) + Core L#37 + PU L#37 (P#37)
L2 L#38 (1024KB) + L1d L#38 (32KB) + L1i L#38 (32KB) + Core L#38 + PU L#38 (P#38)
L2 L#39 (1024KB) + L1d L#39 (32KB) + L1i L#39 (32KB) + Core L#39 + PU L#39 (P#39)
L2 L#40 (1024KB) + L1d L#40 (32KB) + L1i L#40 (32KB) + Core L#40 + PU L#40 (P#40)
L2 L#41 (1024KB) + L1d L#41 (32KB) + L1i L#41 (32KB) + Core L#41 + PU L#41 (P#41)
L2 L#42 (1024KB) + L1d L#42 (32KB) + L1i L#42 (32KB) + Core L#42 + PU L#42 (P#42)
L2 L#43 (1024KB) + L1d L#43 (32KB) + L1i L#43 (32KB) + Core L#43 + PU L#43 (P#43)
L2 L#44 (1024KB) + L1d L#44 (32KB) + L1i L#44 (32KB) + Core L#44 + PU L#44 (P#44)
L2 L#45 (1024KB) + L1d L#45 (32KB) + L1i L#45 (32KB) + Core L#45 + PU L#45 (P#45)
L2 L#46 (1024KB) + L1d L#46 (32KB) + L1i L#46 (32KB) + Core L#46 + PU L#46 (P#46)
L2 L#47 (1024KB) + L1d L#47 (32KB) + L1i L#47 (32KB) + Core L#47 + PU L#47 (P#47)
L2 L#48 (1024KB) + L1d L#48 (32KB) + L1i L#48 (32KB) + Core L#48 + PU L#48 (P#48)
L2 L#49 (1024KB) + L1d L#49 (32KB) + L1i L#49 (32KB) + Core L#49 + PU L#49 (P#49)
L2 L#50 (1024KB) + L1d L#50 (32KB) + L1i L#50 (32KB) + Core L#50 + PU L#50 (P#50)
L2 L#51 (1024KB) + L1d L#51 (32KB) + L1i L#51 (32KB) + Core L#51 + PU L#51 (P#51)
HostBridge
PCI d1:05.5 (RAID)
Block(Disk) "nvme1n1"
drkirkby@canary:~$
Last fiddled with by drkirkby on 2021-07-24 at 11:02 |
|
|
|
|
|
|
#35 |
|
Jun 2003
2·2,543 Posts |
Is the HT turned off (or is there no HT)? It is only reporting 1 logical processor per core and a total of 52 logical cores.
Taking that at face value, the affinities should be: local.txt #1, Worker #1 Affinity=0,1,2,3,4,5,6,7,8,9,10,11,12 local.txt #1, Worker #2 Affinity=13,14,15,16,17,18,19,20,21,22,23,24,25 local.txt #2, Worker #1 Affinity=26,27,28,29,30,31,32,33,34,35,36,37,38 local.txt #2, Worker #2 Affinity=39,40,41,42,43,44,45,46,47,48,49,50,51 Last fiddled with by axn on 2021-07-24 at 13:08 Reason: Affinity syntax is confusing. |
|
|
|
|
|
#36 | |
|
"David Kirkby"
Jan 2021
Althorne, Essex, UK
24×33 Posts |
Quote:
Code:
numactl -H https://www.mersenneforum.org/showpo...9&postcount=35 That shows cpus numbered 0 to 103. So do I need to use numactl or not now? Of course, whilst 4 workers is optimal with one process running, it may well not be if two processes are running. I'll give that a try. |
|
|
|
|
|
|
#37 | |
|
Jun 2003
2×2,543 Posts |
Quote:
We still need numactl to evaluate the impact of stage 2 with local memory vs non-local memory. If it turns out that using only half the total amount, but local, memory makes stage 2 much faster, then that is the way to go. Hopefully with the numactl and Affinity settings, we'll be able to run two instances of P95 with local memory and fully utilizing the cores. If that does work, I have no doubt that you'll get the best performance. It may or may not be significantly better than running a single instance, but that's what we want to find out. |
|
|
|
|
|
|
#38 |
|
"David Kirkby"
Jan 2021
Althorne, Essex, UK
24×33 Posts |
(kriesel: Caution, next post indicates there was an undisclosed error affecting this post.)
I tried what you said, but performance was not that great. Then I tried running one process with 2 workers, with Affinity like you said, and benchmarking another process. The benchmarking was tried with 24-26 cores and 2-4 workers. Code:
[Worker #1 Jul 24 16:57] Timing 5760K FFT, 26 cores, 4 workers. Average times: 8.29, 7.12, 6.24, 5.37 ms. Total throughput: 607.58 iter/sec. The 607.58 iter/sec is almost double the throughput one obtains running 4 workers on each of two processes, where the processes are not constrained in any way. Here are the results from running two benchmarks, where nothing is constrained. Code:
[Worker #1 Jul 24 18:14] Benchmarking multiple workers to measure the impact of memory bandwidth [Worker #1 Jul 24 18:15] Timing 5760K FFT, 26 cores, 4 workers. Average times: 13.27, 11.03, 13.31, 11.08 ms. Total throughput: 331.31 iter/sec. Code:
[Worker #1 Jul 24 18:15] Timing 5760K FFT, 26 cores, 4 workers. Average times: 12.80, 11.30, 12.78, 11.42 ms. Total throughput: 332.36 iter/sec. One does better running one process Code:
[Worker #1 Jul 24 18:22] Benchmarking multiple workers to measure the impact of memory bandwidth [Worker #1 Jul 24 18:22] Timing 5760K FFT, 52 cores, 4 workers. Average times: 3.85, 3.84, 3.86, 3.86 ms. Total throughput: 1038.20 iter/sec. Last fiddled with by kriesel on 2021-07-25 at 00:05 Reason: error indicated in next post |
|
|
|
|
|
#39 |
|
"David Kirkby"
Jan 2021
Althorne, Essex, UK
24·33 Posts |
Ignore my last post (at 2021-07-24 17:47.) I found an error.
|
|
|
|
|
|
#40 |
|
Jun 2003
2·2,543 Posts |
|
|
|
|
|
|
#41 |
|
"David Kirkby"
Jan 2021
Althorne, Essex, UK
24×33 Posts |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Determine squares | fenderbender | Math | 14 | 2007-07-28 23:24 |
| determine | hyderman | Homework Help | 7 | 2007-06-17 06:01 |
| Methods to determine integer multiples | dsouza123 | Math | 6 | 2006-11-18 16:10 |
| Help: trying to determine latency on movaps instructions on AthlonXP | LoKI.GuZ | Hardware | 1 | 2004-01-26 20:05 |
| How to determine the P-1 boundaries? | Boulder | Software | 2 | 2003-08-20 11:55 |