mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   mprime (Linux) doesn't do "affinity" correctly... (https://www.mersenneforum.org/showthread.php?t=19047)

chalsall 2014-01-02 22:49

[QUOTE=sdbardwick;363620]Although I note that the first example looks like the default assignment, and your second 2 examples involve impossible core configs such that mprime might be smart enough to ignore them and revert to the default. I haven't looked at the source code for error handling, so just speculation on my part.[/QUOTE]

When I tell my software to do something, I expect it to do what I say.

I'm sentient; it isn't.

Aramis Wyler 2014-01-02 23:30

The program will always bow before the will of it's programmer before it listens to the will of it's end user.

NBtarheel_33 2014-01-03 10:11

*Yawn* Yet another Linux hyperthreading affinity thread. I identified this sort of problem last year whilst testing on the NVIDIA big iron. See [URL=http://mersenneforum.org/showthread.php?t=18499]here[/URL], and [URL=http://www.mersenneforum.org/showthread.php?t=18134]here[/URL], among others.

TL, DR: This is a known issue/bug/feature. I am sure it is on George's list, just maybe not near the top. It is also, IIRC, not a readily replicated condition.

NBtarheel_33 2014-01-03 10:20

[QUOTE=chalsall;363616]
This is, instead, mprime making a much bigger mistake. Read: Not understanding how to deal with multi-socket-CPU environments.[/QUOTE]

Definitely. That was the question that I had last year: Perhaps mprime knows how to handle multi-[I]core[/I] systems, but does it know how to handle multi-[I]socket[/I] systems? And from my experience, I think the answer (at this moment in time) ranges from "absolutely not" to "quite poorly".

Do keep in mind that multi-[I]core [/I]support in Prime95/mprime is a relatively recent innovation. There are very few users clamoring for multi-[I]socket[/I] support.

One thought that I had last year was to try running as many copies of mprime as there are sockets, and see if that works somehow.

Mark Rose 2014-01-03 15:45

[QUOTE=chalsall;363593]
Final note: all of these tests were under CentOS 6.4. There may be different behavior under other versions of Linux.[/QUOTE]

CentOS 6.4 has a four year old kernel, 2.6.32, which is pretty ancient in Linux terms. Linux NUMA support isn't perfect, but a lot of work has gone into it over the last four years. I would try 3.12 and see if it helps.

chalsall 2014-01-03 15:55

[QUOTE=Mark Rose;363710]I would try 3.12 and see if it helps.[/QUOTE]

Interesting point.

Unfortunately the R720's are "production", and intentionally use older, stable software which can be maintained by others if I'm "hit by a bus". Further, they are the only multi-socket machines I have access to.

Lastly, I would argue that this is a bug in mprime. It seems that it ignores the AffinityScramble2 settings, and while sets the affinity correctly for the first four cores of each CPU (when each Worker section has an Affinity statement), it incorrectly sets it for the latter four cores.

I'm going to drill down on this further over the weekend and examine the mprime source code, and hopefully be able to provide George with some code delta suggestions.


All times are UTC. The time now is 07:18.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.