mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Prime95 version 28.6 / 28.7 (28.7 now available!) (https://www.mersenneforum.org/showthread.php?t=20156)

Prime95 2015-05-21 15:10

[QUOTE=Mark Rose;402743]I'd like to run mprime on cores 1-3 and leave core 0 free.[/QUOTE]

This should be possible. In the "dialog" box where you specify priority and type of work to get, you can specify the specific CPU each worker should run on.

TObject 2015-05-21 15:26

Can we make that field a comma delimited list, so that affinity of helper threads can be optionally specified as well?

Mark Rose 2015-05-21 15:48

Thanks, guys! I got it working as wanted with your help!

Dubslow 2015-05-21 17:42

[QUOTE=TObject;402756]Can we make that field a comma delimited list, so that affinity of helper threads can be optionally specified as well?[/QUOTE]

That's what AffinityScramble2 is for; it maps helper threads to the primary threads. So if you have a thread on core 2, and AffinityScramble2=xyxyxy26xyxyxy, the helper thread will be on core 6.

TObject 2015-05-21 18:25

It appears that AffinityScramble2 is ignored on non-hyperthreaded CPUs.

TObject 2015-05-21 18:43

[QUOTE=Dubslow;402764]That's what AffinityScramble2 is for; it maps helper threads to the primary threads. So if you have a thread on core 2, and AffinityScramble2=xyxyxy26xyxyxy, the helper thread will be on core 6.[/QUOTE]

Here is an example for a 4 core non-hyperthreading:
Goal:
Worker one: CPU4 plus helper on CPU3
Worker two: CPU2 plus helper on CPU1

AffinityScramble2=3210

What Prime95 actually does with these settings under Windows 8.1:
Worker one: CPU4 plus helper on “any” CPU
Worker two: CPU2 plus helper on CPU3

Edit: in other words it behaves as if there is no AffinityScramble2 at all.

Madpoo 2015-05-22 01:08

[QUOTE=TObject;402771]Here is an example for a 4 core non-hyperthreading:
Goal:
Worker one: CPU4 plus helper on CPU3
Worker two: CPU2 plus helper on CPU1

AffinityScramble2=3210

What Prime95 actually does with these settings under Windows 8.1:
Worker one: CPU4 plus helper on “any” CPU
Worker two: CPU2 plus helper on CPU3

Edit: in other words it behaves as if there is no AffinityScramble2 at all.[/QUOTE]

As an example in wonky affinity fun, here's what I have on a dual 10-core (with HT enabled) system:

AffinityScramble2=02468ACEGIKMOQSUWYac13579BDFHJLNPRTVXZbd
WorkerThreads=2
ThreadsPerTest=10

[Worker #1]
Affinity=0

[Worker #2]
Affinity=10

In Windows each physical and HT core are pairs (like 0,1). I just throw all those HT cores at the end of my affinity list and ignore them.

I could just as easily set worker #1 there to Affinity=1 and give it it's own ThreadsPerTest=9 line.

At least on the Proliant servers, I haven't seen any indication that CPU #0 has to handle all of the interrupts. I know that was the case on old generic multi-CPU systems. But back in the day, Compaq had their own custom HAL for Windows that allowed interrupt handling on any CPU. I just tried to find out more info on that but not finding any info right away. That was definitely my recollection though about the advantage of the custom HAL over the normal Windows NT mps hal. :smile:

I have this notion that in modern Windows systems that's a standard thing.

The only exception on server systems that comes to mind are things like the Proliant PCI slots might have an affinity to one CPU socket or another, so you can balance things by putting disk controllers in certain slots to maximize some DMA features, but that's a little different than just INT handling.

I'm probably mangling all kinds of things since I'm trying to remember this all off the top of my head and doing a terrible job. So I could be all wrong.

Madpoo 2015-05-22 01:12

[QUOTE=Madpoo;402787]...But back in the day, Compaq had their own custom HAL for Windows that allowed interrupt handling on any CPU. I just tried to find out more info on that but not finding any info right away. That was definitely my recollection though about the advantage of the custom HAL over the normal Windows NT mps hal. :smile:[/QUOTE]

Ah, here we go:
"One of the feature enhancements of the Compaq MP HAL is that it
provides support for multiprocessing servers. This includes supporting
distributed interrupts across multiple processors for Compaq ProLiant
and Compaq Systempro/XL servers. The MP HAL included with the
Windows NT base product services interrupts on the first processor
only. Distributing interrupts among multiple processors provides more
efficient use of system resources."

So advanced for 1996.

Madpoo 2015-05-22 01:23

[QUOTE=Madpoo;402787]...I have this notion that in modern Windows systems that's a standard thing.[/QUOTE]

Here's kind of what I was thinking:
[QUOTE]Prior to Windows Server 2008, the interrupt and associated deferred procedure call (DPC) for a storage or network I/O can execute on any CPU, including ones from a different node than the one on which the I/O initiated, potentially causing data read or written in the I/O operation to be in a different node's memory than the one where the data is accessed.
To avoid this, the Windows Server 2008 I/O system directs DPC execution to a CPU in the node that initiated the I/O, and systems that have devices that support PCI bus MSI-X (an extension to the Message Signaled Interrupt standard) can further localize I/O completion by using device drivers that take advantage of Windows Server 2008 APIs to direct an I/O's interrupt to the processor that initiated the I/O.[/QUOTE]

From [URL="https://technet.microsoft.com/en-us/magazine/2008.03.kernel.aspx"]https://technet.microsoft.com/en-us/magazine/2008.03.kernel.aspx[/URL]

And as far as I know, Linux has always distributed interrupts across CPUs in some way or another.

Mark Rose 2015-05-22 02:10

[QUOTE=Madpoo;402789]And as far as I know, Linux has always distributed interrupts across CPUs in some way or another.[/QUOTE]

It depends on the particular service or component. Log into a busy box that's been up a while and cat /proc/interrupts. On my box in question, most interrupts are balanced, but not all:

[code]
CPU0 CPU1 CPU2 CPU3
0: 366764 1903729 30053431 2365898043 IO-APIC-edge timer
<snip>
14: 369 5060 283465 15348084 IO-APIC-edge pata_atiixp
<snip>
16: 180 2681 132926 8147669 IO-APIC-fasteoi sata_sil24, pata_via, firewire_ohci, snd_hda_intel, snd_hda_intel
<snip>
46: 1238333810 0 3 328 PCI-MSI-edge eth0
47: 5624 35179 4190215 530568557 PCI-MSI-edge nvidia
48: 355 2075 553210 44392539 PCI-MSI-edge nvidia
[/code]

You can see in particular that eth0 hammers the first CPU. And by default, the CPU that gets the interrupt for the network device also processes the packet. (You can change by enabling [url=https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/network-rps.html]Receive Packet Steering[/url], which also reduces latency and jitter if you have significant network traffic.)

Madpoo 2015-05-22 05:59

[QUOTE=Mark Rose;402792]It depends on the particular service or component. Log into a busy box that's been up a while and cat /proc/interrupts. On my box in question, most interrupts are balanced, but not all...

...You can see in particular that eth0 hammers the first CPU. And by default, the CPU that gets the interrupt for the network device also processes the packet. (You can change by enabling [url=https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/network-rps.html]Receive Packet Steering[/url], which also reduces latency and jitter if you have significant network traffic.)[/QUOTE]

That must be the effect of the chipset connecting to the various devices over one chip or the other.

Yeah, I guess that depends a lot more then on how the system was designed and if it had the ability to spread the love at all.

On the Proliant servers I know and love, if you populate the 2nd CPU then you can use more PCI slots that tie to that other socket. When setting up a system like that for max performance, you would put cards in certain sockets accordingly so that the interrupts are handled nicely. The built-in things like the internal array controller and net adapters probably have to use socket 0.

That affects the NUMA nodes anyway... on-chip you have multi-cores and I think any one of the cores can handle interrupts on that node.

Ideally you would want most of that stuff offloaded... ethernet can use TOE, and I suppose at some point we'll move past SAS/SATA and start working with NVMe for best performance. "MSI-X allows the NVMe device to direct its interrupts to a particular core."

Ah, the times they are a changin'. Good things to look forward to in future tech.


All times are UTC. The time now is 05:16.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.