mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2020-05-13, 12:27   #144
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

7×1,237 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
We mentioned earlier that the motherboard's CPU temperature display was offset wrong. Now that we are running Windows we have a temperature monitoring app. Under full load the system runs at 60°C and it idles at 28°C.
We had Prime95 (1 thread) and msieve (12 threads) running when we wrote the above quote. As an experiment we ran just msieve on 12 threads and the temperature dropped to 40°C!

Attached Thumbnails
Click image for larger version

Name:	cpu.png
Views:	108
Size:	87.7 KB
ID:	22295   Click image for larger version

Name:	temp.png
Views:	105
Size:	24.7 KB
ID:	22296  
Xyzzy is offline   Reply With Quote
Old 2020-05-17, 19:51   #145
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

23×71 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
We had Prime95 (1 thread) and msieve (12 threads) running when we wrote the above quote. As an experiment we ran just msieve on 12 threads and the temperature dropped to 40°C!

I guess no current application will scale nicely on Threadripper or Ryzen 1xxx, because of its CCXs. When I try to scale mlucas, Prime95, etc. to more than 1 CCX on my 1950X, performance/efficiency will take a great hit and the power consumption decreases.
kruoli is online now   Reply With Quote
Old 2020-05-17, 23:06   #146
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

7·1,237 Posts
Default

We had no idea what a CCX was, so we found this: https://www.tomshardware.com/reviews...ined,6338.html

Is there a way to lock a process and its threads to a particular CCX?

Xyzzy is offline   Reply With Quote
Old 2020-05-17, 23:58   #147
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
We had no idea what a CCX was, so we found this: https://www.tomshardware.com/reviews...ined,6338.html

Is there a way to lock a process and its threads to a particular CCX?

Thanks for the lookup!
kladner is offline   Reply With Quote
Old 2020-05-18, 01:05   #148
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

2×2,927 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
We had no idea what a CCX was, so we found this: https://www.tomshardware.com/reviews...ined,6338.html

Is there a way to lock a process and its threads to a particular CCX?

You can use taskset to lock a process to specific thread numbers; you would merely need to learn which threads go with which CCX to do a couple of timing tests.

I invoke msieve with: taskset -c 0-11 ./msieve -t 12 -nc2

That puts the 12 msieve threads on 12 different cores (on my machine, 12-23 are hyperthreads of 0-11). Once you know which threads are part of a single CCX, you can use taskset to hit just those threads. You can use a comma also, if the threads aren't contiguous: taskset -c 0-2,6-8 will run 6-threaded if you wish for some reason (e.g. to see if using all cores + hyperthreads on a single CCX gains you anything).
VBCurtis is online now   Reply With Quote
Old 2020-05-18, 08:31   #149
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

23×71 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
Is there a way to lock a process and its threads to a particular CCX?
On my end, I'd first disable node interleaving when doing primarily PRP and LL.

When I'm executing a benchmark on Prime95, the hwloc-library will write detailed information into results.bench.txt:
Code:
AMD Ryzen Threadripper 1950X 16-Core Processor 
CPU speed: 3432.97 MHz, 16 hyperthreaded cores
CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 16x32 KB, L2 cache size: 16x512 KB, L3 cache size: 4x8 MB
L1 cache line size: 64 bytes, L2 cache line size: 64 bytes
Machine topology as determined by hwloc library:
 Machine#0 (total=111520316KB, Backend=Windows, hwlocVersion=2.0.4, ProcessName=prime95.exe)
  Package (total=111520316KB, CPUVendor=AuthenticAMD, CPUFamilyNumber=23, CPUModelNumber=1, CPUModel="AMD Ryzen Threadripper 1950X 16-Core Processor ", CPUStepping=1)
    Group0 (total=64141560KB)
      L3 (size=8192KB, linesize=64, ways=16, Inclusive=0)
        L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
          L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
            Core (cpuset: 0x00000003)
              PU#0 (cpuset: 0x00000001)
              PU#1 (cpuset: 0x00000002)
        L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
          L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
            Core (cpuset: 0x0000000c)
              PU#2 (cpuset: 0x00000004)
              PU#3 (cpuset: 0x00000008)
        L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
          L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
            Core (cpuset: 0x00000030)
              PU#4 (cpuset: 0x00000010)
              PU#5 (cpuset: 0x00000020)
        L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
          L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
            Core (cpuset: 0x000000c0)
              PU#6 (cpuset: 0x00000040)
              PU#7 (cpuset: 0x00000080)
      L3 (size=8192KB, linesize=64, ways=16, Inclusive=0)
// here starts a new CCX!
<snip>
You have to look after the L3 sections. Each L3 cache corresponds to one CCX on Threadripper 1st Gen. So in my case, physical cores 1-4 are one CCX, 5-8 are one CCX and so on.
If you then set Prime95 to use workers with a maximum of 4 cores, that should be fine. Interestingly, I get best throughput with one DC exponent per physical core. That's totally different on my Intel machines.

Your 1920X should have 4 CCXs with 3 cores each:
Quote:
The Ryzen Threadripper 1920X uses two dies, each one with 3+3 configuration (one disabled core on each CCX), for a total of 12 cores
In general, if speed is your biggest concern, there are situations where using all cores at once for a single exponent can increase speed slightly, but will decrease throughput significantly.
kruoli is online now   Reply With Quote
Old 2020-05-18, 12:18   #150
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

7×1,237 Posts
Default

Here is the hwlock info for our CPU:
Code:
AMD Ryzen Threadripper 1920X 12-Core Processor 
CPU speed: 3493.43 MHz, 12 hyperthreaded cores
CPU features: 3DNow! Prefetch, SSE, SSE2, SSE4, AVX, AVX2, FMA
L1 cache size: 12x32 KB, L2 cache size: 12x512 KB, L3 cache size: 4x8 MB
L1 cache line size: 64 bytes, L2 cache line size: 64 bytes
Machine topology as determined by hwloc library:
 Machine#0 (total=7457252KB, Backend=Windows, hwlocVersion=2.0.4, ProcessName=prime95.exe)
  Package (total=7457252KB, CPUVendor=AuthenticAMD, CPUFamilyNumber=23, CPUModelNumber=1, CPUModel="AMD Ryzen Threadripper 1920X 12-Core Processor ", CPUStepping=1)
    L3 (size=8192KB, linesize=64, ways=16, Inclusive=0)
      L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x00000003)
            PU#0 (cpuset: 0x00000001)
            PU#1 (cpuset: 0x00000002)
      L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x0000000c)
            PU#2 (cpuset: 0x00000004)
            PU#3 (cpuset: 0x00000008)
      L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x00000030)
            PU#4 (cpuset: 0x00000010)
            PU#5 (cpuset: 0x00000020)
    L3 (size=8192KB, linesize=64, ways=16, Inclusive=0)
      L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x000000c0)
            PU#6 (cpuset: 0x00000040)
            PU#7 (cpuset: 0x00000080)
      L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x00000300)
            PU#8 (cpuset: 0x00000100)
            PU#9 (cpuset: 0x00000200)
      L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x00000c00)
            PU#10 (cpuset: 0x00000400)
            PU#11 (cpuset: 0x00000800)
    L3 (size=8192KB, linesize=64, ways=16, Inclusive=0)
      L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x00003000)
            PU#12 (cpuset: 0x00001000)
            PU#13 (cpuset: 0x00002000)
      L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x0000c000)
            PU#14 (cpuset: 0x00004000)
            PU#15 (cpuset: 0x00008000)
      L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x00030000)
            PU#16 (cpuset: 0x00010000)
            PU#17 (cpuset: 0x00020000)
    L3 (size=8192KB, linesize=64, ways=16, Inclusive=0)
      L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x000c0000)
            PU#18 (cpuset: 0x00040000)
            PU#19 (cpuset: 0x00080000)
      L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x00300000)
            PU#20 (cpuset: 0x00100000)
            PU#21 (cpuset: 0x00200000)
      L2 (size=512KB, linesize=64, ways=8, Inclusive=1)
        L1d (size=32KB, linesize=64, ways=8, Inclusive=0)
          Core (cpuset: 0x00c00000)
            PU#22 (cpuset: 0x00400000)
            PU#23 (cpuset: 0x00800000)
We think that our CCX layout is:
Code:
╔═════╦══════════╦══════════╗
║ CCX ║   CORE   ║    HT    ║
╠═════╬══════════╬══════════╣
║  1  ║  0 2 4   ║  1 3 5   ║
╠═════╬══════════╬══════════╣
║  2  ║  6 8 10  ║  7 9 11  ║
╠═════╬══════════╬══════════╣
║  3  ║ 12 14 16 ║ 13 15 17 ║
╠═════╬══════════╬══════════╣
║  4  ║ 18 20 22 ║ 19 21 23 ║
╚═════╩══════════╩══════════╝
If we want to use the two CCXs that belong to the same CCD how do we know which set that is? Is there an advantage with staying on the same CCD?

Xyzzy is offline   Reply With Quote
Old 2020-05-18, 12:25   #151
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

7·1,237 Posts
Default

Possibly useful?

https://www.passmark.com/forum/perfo...d-threadripper
https://blog.michael.kuron-germany.d...-and-htcondor/

More hardware info: https://docs.microsoft.com/en-us/sys...loads/coreinfo
Code:
AMD Ryzen Threadripper 1920X 12-Core Processor 
AMD64 Family 23 Model 1 Stepping 1, AuthenticAMD
Microcode signature: 00000000
HTT           *    Multicore
HYPERVISOR    -    Hypervisor is present
VMX           -    Supports Intel hardware-assisted virtualization
SVM           *    Supports AMD hardware-assisted virtualization
X64           *    Supports 64-bit mode

SMX           -    Supports Intel trusted execution
SKINIT        *    Supports AMD SKINIT

NX            *    Supports no-execute page protection
SMEP          *    Supports Supervisor Mode Execution Prevention
SMAP          *    Supports Supervisor Mode Access Prevention
PAGE1GB       *    Supports 1 GB large pages
PAE           *    Supports > 32-bit physical addresses
PAT           *    Supports Page Attribute Table
PSE           *    Supports 4 MB pages
PSE36         *    Supports > 32-bit address 4 MB pages
PGE           *    Supports global bit in page tables
SS            -    Supports bus snooping for cache operations
VME           *    Supports Virtual-8086 mode
RDWRFSGSBASE  *    Supports direct GS/FS base access

FPU           *    Implements i387 floating point instructions
MMX           *    Supports MMX instruction set
MMXEXT        *    Implements AMD MMX extensions
3DNOW         -    Supports 3DNow! instructions
3DNOWEXT      -    Supports 3DNow! extension instructions
SSE           *    Supports Streaming SIMD Extensions
SSE2          *    Supports Streaming SIMD Extensions 2
SSE3          *    Supports Streaming SIMD Extensions 3
SSSE3         *    Supports Supplemental SIMD Extensions 3
SSE4a         *    Supports Streaming SIMDR Extensions 4a
SSE4.1        *    Supports Streaming SIMD Extensions 4.1
SSE4.2        *    Supports Streaming SIMD Extensions 4.2

AES           *    Supports AES extensions
AVX           *    Supports AVX instruction extensions
FMA           *    Supports FMA extensions using YMM state
MSR           *    Implements RDMSR/WRMSR instructions
MTRR          *    Supports Memory Type Range Registers
XSAVE         *    Supports XSAVE/XRSTOR instructions
OSXSAVE       *    Supports XSETBV/XGETBV instructions
RDRAND        *    Supports RDRAND instruction
RDSEED        *    Supports RDSEED instruction

CMOV          *    Supports CMOVcc instruction
CLFSH         *    Supports CLFLUSH instruction
CX8           *    Supports compare and exchange 8-byte instructions
CX16          *    Supports CMPXCHG16B instruction
BMI1          *    Supports bit manipulation extensions 1
BMI2          *    Supports bit manipulation extensions 2
ADX           *    Supports ADCX/ADOX instructions
DCA           -    Supports prefetch from memory-mapped device
F16C          *    Supports half-precision instruction
FXSR          *    Supports FXSAVE/FXSTOR instructions
FFXSR         *    Supports optimized FXSAVE/FSRSTOR instruction
MONITOR       *    Supports MONITOR and MWAIT instructions
MOVBE         *    Supports MOVBE instruction
ERMSB         -    Supports Enhanced REP MOVSB/STOSB
PCLMULDQ      *    Supports PCLMULDQ instruction
POPCNT        *    Supports POPCNT instruction
LZCNT         *    Supports LZCNT instruction
SEP           *    Supports fast system call instructions
LAHF-SAHF     *    Supports LAHF/SAHF instructions in 64-bit mode
HLE           -    Supports Hardware Lock Elision instructions
RTM           -    Supports Restricted Transactional Memory instructions

DE            *    Supports I/O breakpoints including CR4.DE
DTES64        -    Can write history of 64-bit branch addresses
DS            -    Implements memory-resident debug buffer
DS-CPL        -    Supports Debug Store feature with CPL
PCID          -    Supports PCIDs and settable CR4.PCIDE
INVPCID       -    Supports INVPCID instruction
PDCM          -    Supports Performance Capabilities MSR
RDTSCP        *    Supports RDTSCP instruction
TSC           *    Supports RDTSC instruction
TSC-DEADLINE  -    Local APIC supports one-shot deadline timer
TSC-INVARIANT *    TSC runs at constant rate
xTPR          -    Supports disabling task priority messages

EIST          -    Supports Enhanced Intel Speedstep
ACPI          -    Implements MSR for power management
TM            -    Implements thermal monitor circuitry
TM2           -    Implements Thermal Monitor 2 control
APIC          *    Implements software-accessible local APIC
x2APIC        -    Supports x2APIC

CNXT-ID       -    L1 data cache mode adaptive or BIOS

MCE           *    Supports Machine Check, INT18 and CR4.MCE
MCA           *    Implements Machine Check Architecture
PBE           -    Supports use of FERR#/PBE# pin

PSN           -    Implements 96-bit processor serial number

PREFETCHW     *    Supports PREFETCHW instruction

Maximum implemented CPUID leaves: 0000000D (Basic), 8000001F (Extended).
Maximum implemented address width: 48 bits (virtual), 48 bits (physical).

Processor signature: 00800F11

Logical to Physical Processor Map:
**----------------------  Physical Processor 0 (Hyperthreaded)
--**--------------------  Physical Processor 1 (Hyperthreaded)
----**------------------  Physical Processor 2 (Hyperthreaded)
------**----------------  Physical Processor 3 (Hyperthreaded)
--------**--------------  Physical Processor 4 (Hyperthreaded)
----------**------------  Physical Processor 5 (Hyperthreaded)
------------**----------  Physical Processor 6 (Hyperthreaded)
--------------**--------  Physical Processor 7 (Hyperthreaded)
----------------**------  Physical Processor 8 (Hyperthreaded)
------------------**----  Physical Processor 9 (Hyperthreaded)
--------------------**--  Physical Processor 10 (Hyperthreaded)
----------------------**  Physical Processor 11 (Hyperthreaded)

Logical Processor to Socket Map:
************************  Socket 0

Logical Processor to NUMA Node Map:
************************  NUMA Node 0
-  NUMA Node 1
Calculating Cross-NUMA Node Access Cost...
                                          
Approximate Cross-NUMA Node Access Cost (relative to fastest):
     00  01
00: 1.2 1.0
01: 0.0 0.0

Logical Processor to Cache Map:
**----------------------  Data Cache          0, Level 1,   32 KB, Assoc   8, LineSize  64
**----------------------  Instruction Cache   0, Level 1,   64 KB, Assoc   4, LineSize  64
**----------------------  Unified Cache       0, Level 2,  512 KB, Assoc   8, LineSize  64
******------------------  Unified Cache       1, Level 3,    8 MB, Assoc  16, LineSize  64
--**--------------------  Data Cache          1, Level 1,   32 KB, Assoc   8, LineSize  64
--**--------------------  Instruction Cache   1, Level 1,   64 KB, Assoc   4, LineSize  64
--**--------------------  Unified Cache       2, Level 2,  512 KB, Assoc   8, LineSize  64
----**------------------  Data Cache          2, Level 1,   32 KB, Assoc   8, LineSize  64
----**------------------  Instruction Cache   2, Level 1,   64 KB, Assoc   4, LineSize  64
----**------------------  Unified Cache       3, Level 2,  512 KB, Assoc   8, LineSize  64
------**----------------  Data Cache          3, Level 1,   32 KB, Assoc   8, LineSize  64
------**----------------  Instruction Cache   3, Level 1,   64 KB, Assoc   4, LineSize  64
------**----------------  Unified Cache       4, Level 2,  512 KB, Assoc   8, LineSize  64
------******------------  Unified Cache       5, Level 3,    8 MB, Assoc  16, LineSize  64
--------**--------------  Data Cache          4, Level 1,   32 KB, Assoc   8, LineSize  64
--------**--------------  Instruction Cache   4, Level 1,   64 KB, Assoc   4, LineSize  64
--------**--------------  Unified Cache       6, Level 2,  512 KB, Assoc   8, LineSize  64
----------**------------  Data Cache          5, Level 1,   32 KB, Assoc   8, LineSize  64
----------**------------  Instruction Cache   5, Level 1,   64 KB, Assoc   4, LineSize  64
----------**------------  Unified Cache       7, Level 2,  512 KB, Assoc   8, LineSize  64
------------**----------  Data Cache          6, Level 1,   32 KB, Assoc   8, LineSize  64
------------**----------  Instruction Cache   6, Level 1,   64 KB, Assoc   4, LineSize  64
------------**----------  Unified Cache       8, Level 2,  512 KB, Assoc   8, LineSize  64
------------******------  Unified Cache       9, Level 3,    8 MB, Assoc  16, LineSize  64
--------------**--------  Data Cache          7, Level 1,   32 KB, Assoc   8, LineSize  64
--------------**--------  Instruction Cache   7, Level 1,   64 KB, Assoc   4, LineSize  64
--------------**--------  Unified Cache      10, Level 2,  512 KB, Assoc   8, LineSize  64
----------------**------  Data Cache          8, Level 1,   32 KB, Assoc   8, LineSize  64
----------------**------  Instruction Cache   8, Level 1,   64 KB, Assoc   4, LineSize  64
----------------**------  Unified Cache      11, Level 2,  512 KB, Assoc   8, LineSize  64
------------------**----  Data Cache          9, Level 1,   32 KB, Assoc   8, LineSize  64
------------------**----  Instruction Cache   9, Level 1,   64 KB, Assoc   4, LineSize  64
------------------**----  Unified Cache      12, Level 2,  512 KB, Assoc   8, LineSize  64
------------------******  Unified Cache      13, Level 3,    8 MB, Assoc  16, LineSize  64
--------------------**--  Data Cache         10, Level 1,   32 KB, Assoc   8, LineSize  64
--------------------**--  Instruction Cache  10, Level 1,   64 KB, Assoc   4, LineSize  64
--------------------**--  Unified Cache      14, Level 2,  512 KB, Assoc   8, LineSize  64
----------------------**  Data Cache         11, Level 1,   32 KB, Assoc   8, LineSize  64
----------------------**  Instruction Cache  11, Level 1,   64 KB, Assoc   4, LineSize  64
----------------------**  Unified Cache      15, Level 2,  512 KB, Assoc   8, LineSize  64

Logical Processor to Group Map:
************************  Group 0
Xyzzy is offline   Reply With Quote
Old 2020-05-18, 12:51   #152
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

7×1,237 Posts
Default

We set the memory interleaving option in the BIOS from "AUTO" to "CHANNEL".

Before:
Code:
Logical Processor to NUMA Node Map:
************************  NUMA Node 0
-  NUMA Node 1
Calculating Cross-NUMA Node Access Cost...
                                          
Approximate Cross-NUMA Node Access Cost (relative to fastest):
     00  01
00: 1.2 1.0
01: 0.0 0.0
After:
Code:
Logical Processor to NUMA Node Map:
************------------  NUMA Node 0
------------************  NUMA Node 1
Calculating Cross-NUMA Node Access Cost...
                                          
Approximate Cross-NUMA Node Access Cost (relative to fastest):
     00  01
00: 1.0 1.2
01: 1.4 1.3
It appears that cores 0-11 are one CCD and 12-23 are the other CCD.

Xyzzy is offline   Reply With Quote
Old 2020-05-18, 13:05   #153
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

7×1,237 Posts
Default

So to put a 6 thread job on only the "real" cores of the second CCD, we are using this command: taskset -c 12,14,16,18,20,22 nice -19 ../msieve -ncr -t 6 -v target_density=120 -i 107331526897849_17m1.ini

After this job is done we will run the benchmark on 6 cores again.

Note that our BIOS offers the following memory interleaving options:
Quote:
Memory interleaving

Controls fabric level memory interleaving (AUTO, none, channel, die, socket). Note that channel, die, and socket has requirements on memory populations and it will be ignored if the memory doesn't support the selected option.
Xyzzy is offline   Reply With Quote
Old 2020-05-18, 13:14   #154
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

23×71 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
If we want to use the two CCXs that belong to the same CCD how do we know which set that is? Is there an advantage with staying on the same CCD?
Yes, that's definitely faster!

Since you changed the node interleaving settings after this, hwloc was not able to detect the CCDs. But I think you got it correctly after that.

In contrast to Prime95, y-cruncher runs much better with node interleaving activated.
kruoli is online now   Reply With Quote
Reply



All times are UTC. The time now is 16:38.


Fri Jul 7 16:38:48 UTC 2023 up 323 days, 14:07, 1 user, load averages: 3.17, 2.76, 2.27

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔