mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Prime95 version 27.7 / 27.9 (https://www.mersenneforum.org/showthread.php?t=16779)

Jwb52z 2012-05-20 17:51

[QUOTE=NBtarheel_33;299878]Just a heads-up: The GIMPS home page is still advertising version 26 as the current release build.[/QUOTE]Yes, it is supposed to be that way until it is finalized that version 27.x is stable and doesn't need any further changes or enhancements and its creators say so.

Dubslow 2012-05-22 09:58

Possible bug?
 
[code]bill@Gravemind:~/MPrime∰∂ mprime -d
[Main thread May 22 04:52] Mersenne number primality test program version 27.7
[Main thread May 22 04:52:35] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 8 MB
[Main thread May 22 04:52:35][U] Logical CPUs 1,5 form one physical CPU.[/U]
[Main thread May 22 04:52:35] [U]Logical CPUs 2,6 form one physical CPU.[/U]
[Main thread May 22 04:52:35] [U]Logical CPUs 3,7 form one physical CPU.[/U]
[Main thread May 22 04:52:35] [U]Logical CPUs 4,8 form one physical CPU.[/U]
[Main thread May 22 04:52:35] Starting workers.
[Comm thread May 22 04:52:35] Exchanging program options with server
[Worker #1 May 22 04:52:35] Worker starting
[Worker #1 May 22 04:52:35] [U]Setting affinity to run worker on logical CPU #1[/U]
[Worker #2 May 22 04:52:35] Waiting 5 seconds to stagger worker starts.
[Worker #3 May 22 04:52:35] Waiting 10 seconds to stagger worker starts.
[Worker #4 May 22 04:52:35] Waiting 15 seconds to stagger worker starts.
[Comm thread May 22 04:52:35] Done communicating with server.
[Worker #1 May 22 04:52:36] [U]Setting affinity to run helper thread 1 on logical CPU #5[/U]
[Worker #1 May 22 04:52:36] Resuming primality test of M54197029 using AVX FFT length 2880K, Pass1=384, Pass2=7680, 2 threads
[Worker #1 May 22 04:52:36] Iteration: 44942274 / 54197029 [82.9238%].
[Worker #2 May 22 04:52:40] Worker starting
[Worker #2 May 22 04:52:40] [U]Setting affinity to run worker on logical CPU #5[/U]
[Worker #2 May 22 04:52:41] [U]Setting affinity to run helper thread 1 on logical CPU #2[/U]
[Worker #2 May 22 04:52:41] Resuming primality test of M25318487 using AVX FFT length 1344K, Pass1=448, Pass2=3K, 2 threads
[Worker #2 May 22 04:52:41] Iteration: 175637 / 25318487 [0.6937%].
[Worker #3 May 22 04:52:45] Worker starting
[Worker #3 May 22 04:52:45] [U]Setting affinity to run worker on logical CPU #2[/U]
[Worker #3 May 22 04:52:45] [U]Setting affinity to run helper thread 1 on logical CPU #6[/U]
[Worker #3 May 22 04:52:46] Resuming primality test of M25572683 using AVX FFT length 1344K, Pass1=448, Pass2=3K, 2 threads
[Worker #3 May 22 04:52:46] Iteration: 14725704 / 25572683 [57.5837%].
[Worker #4 May 22 04:52:50] Worker starting
[Worker #4 May 22 04:52:50] [U]Setting affinity to run worker on logical CPU #6[/U]
[Worker #4 May 22 04:52:50] [U]Setting affinity to run helper thread 1 on logical CPU #3[/U]
[Worker #4 May 22 04:52:51] Resuming primality test of M25353589 using AVX FFT length 1344K, Pass1=448, Pass2=3K, 2 threads
[Worker #4 May 22 04:52:51] Iteration: 11108729 / 25353589 [43.8152%].[/code]
I have no idea why in the world it does this. It doesn't matter whether or not I use AffinityScramble2 override or not -- same result.
[code]bill@Gravemind:~/MPrime∰∂ cat local.txt
<snip>
WorkerThreads=4
NumCPUs=4
ThreadsPerTest=2
<snip>

[Worker #1]
Affinity=0

[Worker #2]
Affinity=1

[Worker #3]
Affinity=2

[Worker #4]
Affinity=3[/code]
It gets neither the affinity nor the hyperthreading correct for workers 2-4.

I remember this once happened when I first installed MPrime (v27) on my laptop, and it was very frustrating; however, I also recall figuring out some stupid user error after which it started working again, so I never said anything. But, this is exactly the same sort of symptoms, and I can't figure out for the life of me what I'm doing wrong this time.

Edit: Another example:
[code]bill@Gravemind:~/MPrime∰∂ cat local.txt
<snip>
WorkerThreads=4
NumCPUs=4
ThreadsPerTest=2
<snip>

[Worker #1]
[U]Affinity=1[/U]

[Worker #2]
[U]Affinity=2[/U]

[Worker #3]
[U]Affinity=3[/U]

[Worker #4]
[U]Affinity=4[/U]
bill@Gravemind:~/MPrime∰∂ mprime -d
[Main thread May 22 04:58] Mersenne number primality test program version 27.7
[Main thread May 22 04:58:44] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 8 MB
[Main thread May 22 04:58:44] Unable to detect some of the hyperthreaded logical CPUs.
[Main thread May 22 04:58:44] Enough information obtained to make a reasonable guess.
[Main thread May 22 04:58:44] [U]Logical CPUs 1,5 form one physical CPU.[/U]
[Main thread May 22 04:58:44] [U]Logical CPUs 2,6 form one physical CPU.[/U]
[Main thread May 22 04:58:44] [U]Logical CPUs 3,7 form one physical CPU.[/U]
[Main thread May 22 04:58:44] [U]Logical CPUs 4,8 form one physical CPU.[/U]
[Main thread May 22 04:58:44] Starting workers.
[Worker #1 May 22 04:58:44] Worker starting
[Worker #3 May 22 04:58:44] Waiting 10 seconds to stagger worker starts.
[Worker #1 May 22 04:58:44] [U]Setting affinity to run worker on logical CPU #5[/U]
[Worker #2 May 22 04:58:44] Waiting 5 seconds to stagger worker starts.
[Worker #4 May 22 04:58:44] Waiting 15 seconds to stagger worker starts.
[Worker #1 May 22 04:58:45] [U]Setting affinity to run helper thread 1 on logical CPU #2[/U]
[Worker #1 May 22 04:58:45] Resuming primality test of M54197029 using AVX FFT length 2880K, Pass1=384, Pass2=7680, 2 threads
[Worker #1 May 22 04:58:45] Iteration: 44947544 / 54197029 [82.9335%].
[Worker #2 May 22 04:58:49] Worker starting
[Worker #2 May 22 04:58:49] [U]Setting affinity to run worker on logical CPU #2[/U]
[Worker #2 May 22 04:58:50] [U]Setting affinity to run helper thread 1 on logical CPU #6[/U]
[Worker #2 May 22 04:58:50] Resuming primality test of M25318487 using AVX FFT length 1344K, Pass1=448, Pass2=3K, 2 threads
[Worker #2 May 22 04:58:50] Iteration: 183276 / 25318487 [0.7238%].
[Worker #3 May 22 04:58:54] Worker starting
[Worker #3 May 22 04:58:54] [U]Setting affinity to run worker on logical CPU #6[/U]
[Worker #3 May 22 04:58:55] [U]Setting affinity to run helper thread 1 on logical CPU #3[/U]
[Worker #3 May 22 04:58:55] Resuming primality test of M25572683 using AVX FFT length 1344K, Pass1=448, Pass2=3K, 2 threads
[Worker #3 May 22 04:58:55] Iteration: 14733012 / 25572683 [57.6123%].
[Worker #4 May 22 04:58:59] Worker starting
[Worker #4 May 22 04:58:59] [U]Setting affinity to run worker on logical CPU #3[/U]
[Worker #4 May 22 04:59:00] [U]Setting affinity to run helper thread 1 on logical CPU #7[/U]
[Worker #4 May 22 04:59:00] Resuming primality test of M25353589 using AVX FFT length 1344K, Pass1=448, Pass2=3K, 2 threads
[Worker #4 May 22 04:59:00] Iteration: 11123987 / 25353589 [43.8753%].[/code]
Notice that it screws up with the same pattern, though this time Worker 1 isn't even set properly.

If I set Affinity(1)=0, Affinity(2)=5, Affinity(3)=2, Affinity(4)=7, then what I wind up getting is 04 (correct), 62 (should be 51), 15 (should be 26), 8* (should be 73) respectively, where * is "[Worker #4 May 22 05:03:04] Setting affinity to run helper thread 1 on any logical CPU." I'll leave it like this overnight, since at least there are no overlaps as in the first two examples.

Prime95 2012-05-22 16:41

[QUOTE=Dubslow;300029]
I have no idea why in the world it does this.
[/QUOTE]

Here is what is happening. First, consider a CPU with 8 physical cores and no hyperthreading. These cores are numbered from 1 to 8. If you use two threads for your four workers, worker 1 gets assigned CPUs 1&2, worker 2 gets assigned CPUs 3&4, etc.

Now assume you have 4 cores with hyperthreading and logical CPUs 1 & 2 form physical CPU 1. Again, using two threads for your four workers, worker 1 gets assigned logical CPUs 1&2, worker 2 gets assigned logical CPUs 3&4, etc.

Now assume you have 4 cores with hyperthreading and logical CPUs 1 & 5 form physical CPU 1. An affinity scramble mask of "05162738" is generated. Again, using two threads for your four workers, worker 1 gets assigned scrambled CPUs 1&2 which maps to logical CPUs 0&5, worker 2 gets assigned scrambled CPUs 3&4 which maps to logical CPUs 1&6, etc.

Now look at your case. An affinity scramble mask of "05162738" was generated. You specifically told prime95 to have worker 1 used scrambled CPU 1 (add 1 to the Affinity= setting) and 2 (hyperthreading always uses the next scrambled CPU number). Worker 2 was told to use scrambled CPUs 2 & 3, etc. This explains the assignments you are seeing.

In local.txt, remove all the Affinity= settings. Things should get better.

Dubslow 2012-05-22 18:32

[QUOTE=Prime95;300052]
Now look at your case. An affinity scramble mask of "04152637" was generated. You specifically told prime95 to have worker 1 used scrambled CPU 1 (add 1 to the Affinity= setting) and 2 (hyperthreading always uses the next scrambled CPU number). Worker 2 was told to use scrambled CPUs 2 & 3, etc. This explains the assignments you are seeing.[/quote]I don't understand this. If I have Affinity=0 under Worker #1, shouldn't that mean it sets one thread to CPU0, and the helper thread to the next number = CPU4? Then if Worker 2 has Affinity=1, then it gets assigned CPU1, looks at the mask and sees the matching 5 so the helper goes on CPU5, etc.?

The thing is, I've had it exactly like this before and it worked just fine.
[QUOTE=Prime95;300052]
In local.txt, remove all the Affinity= settings. Things should get better.[/QUOTE]
I did this, and got things like "Setting affinity to run worker on logical CPUs 3,7" and "Setting affinity to run helper thread 1 on logical CPUs 3,7" with each worker getting a different physical core; however CPU usage is not quite at 100% anymore, presumably because threads are occasionally still switching between each of the pair they're assigned. That's why I used Affinity= for each thread before. As I said above, it worked fine like that before.

Prime95 2012-05-22 20:16

[QUOTE=Dubslow;300057]I did this, and got things like "Setting affinity to run worker on logical CPUs 3,7" and "Setting affinity to run helper thread 1 on logical CPUs 3,7" with each worker getting a different physical core; however CPU usage is not quite at 100% anymore, presumably because threads are occasionally still switching between each of the pair they're assigned.[/quote]

Switching threads between logical CPUs 3 and 7 should not impact performance. The two share the same L1 and L2 cache.

Any use of multithreading may well cause CPU usage to drop below 100% as occasionally one thread must wait on the other to finish up. BTW, if you are doing P-1, prime95 will do both big multiplies and big adds. The multiplies are multithreaded, the adds are not (further degrading the CPU utilization figure).

[quote]That's why I used Affinity= for each thread before. As I said above, it worked fine like that before.[/QUOTE]

You can test setting each worker affinity with Affinity=0 (use scrambled CPUs 1&2), Affinity=2 (use scrambled CPUs 3&4), Affinity=4 (use scrambled CPUs 5&6), etc.

You'll probably get best throughput by not using multithreading at all.

LaurV 2012-05-23 04:16

I would agree with George. For me the best is 4 workers in 4 threads, no helpers (in a 8-HT-cores machine). Hyper-threading is generally generating too much heat and it takes too much energy for the plus of performance it brings, especially when we are talking about programs so cache-optimized as p95. Running 8 workers (or 4 workers in 8 cores, one main plus one helper thread for each worker) generally brings about 20% more performance, for a 50-80% more energy (and heat!). Additionally, running 4 single-threaded workers lets some free firing-power for other daily working stuff (no, I don't talk about writing/sending mails and browsing the forum).

Dubslow 2012-05-23 05:09

I figured out (thanks to fivemack) for the other stuff to do "south of here", they're generally not optimized like P95, so HT for them does help. I realized though that ATM I'm not running any of those, so I did turn off HT for now. (My statement about it working before (some months ago) still stands though.)

sdbardwick 2012-05-23 05:40

Windows is not 100% consistent in enumerating cores. For example, I have a dual Opteron 6128 box (16 physical cores). Prime64 runs 16 individual worker threads, each assigned to a unique core. Windows/Prime64 cores correlate like this under the current Win7Pro install: 1-8 match, Windows 9-12 => Prime 13-16, Windows 13-16 => Prime 9-12.
On different OS (Win2k3 server, Win7Pro without SP1, and W2K8R2 [IIRC, might have been plain W2K8]) but EXACT same hardware and BIOS settings, they correlate exactly. Another Win7 install swapped 5-8 and 1-4.

With hyperthreading, the permutations get even weirder. Luckily, the enumerations appear to remain consistent once established; once you figure out the particular setup it doesn't change until a new OS is installed.

Ubuntu 10.4(? LTS) enumeration matched Mprime numbering the one time I installed it.

TheJudger 2012-05-27 16:00

CPU affinity gone wild
 
Some affinity "fun" on some big iron...
prime95 v27.7 x86-64, Linux

Wrong usage of Affinity + AffinityScramble2 or bug? There seems to be an issue with the AffinityScramble2 and small letters, capital letters work fine!

local.txt
[CODE]
WorkerThreads=1
ThreadsPerTest=10
Affinity=0
AffinityScramble2=0123456789
[...]
Pid=30642
[/CODE]

[CODE]pid 30642's current affinity mask: ffffffffffffffffffff # main thread
pid 30806's current affinity mask: ffffffffffffffffffff # communication thread
pid 30807's current affinity mask: 1
pid 30869's current affinity mask: 2
pid 30870's current affinity mask: 4
pid 30871's current affinity mask: 8
pid 30872's current affinity mask: 10
pid 30873's current affinity mask: 20
pid 30874's current affinity mask: 40
pid 30875's current affinity mask: 80
pid 30876's current affinity mask: 100
pid 30877's current affinity mask: 200
[/CODE]


local.txt
[CODE]
WorkerThreads=1
ThreadsPerTest=10
Affinity=0
AffinityScramble2=UVWXYZabcd
[...]
Pid=31147
[/CODE]

[CODE]pid 31147's current affinity mask: ffffffffffffffffffff # main thread
pid 31238's current affinity mask: ffffffffffffffffffff # communication thread
pid 31239's current affinity mask: 40000000
pid 31382's current affinity mask: 80000000
pid 31383's current affinity mask: 100000000 # not limited to 32 cores anymore!
pid 31384's current affinity mask: 200000000 # not limited to 32 cores anymore!
pid 31385's current affinity mask: 400000000 # not limited to 32 cores anymore!
pid 31386's current affinity mask: 800000000 # not limited to 32 cores anymore!
[COLOR="Red"]pid 31387's current affinity mask: 40000000
pid 31388's current affinity mask: 40000000
pid 31389's current affinity mask: 40000000
pid 31390's current affinity mask: 40000000[/COLOR]
[/CODE]


local.txt
[CODE]
WorkerThreads=1
ThreadsPerTest=10
Affinity=0
AffinityScramble2=efghijklmn
[...]
Pid=31317
[/CODE]

[CODE]
pid 31317's current affinity mask: ffffffffffffffffffff # main thread
pid 31408's current affinity mask: ffffffffffffffffffff # communication thread
[COLOR="Red"]pid 31409's current affinity mask: ffffffffffffffffffff
pid 31552's current affinity mask: ffffffffffffffffffff
pid 31553's current affinity mask: ffffffffffffffffffff
pid 31554's current affinity mask: ffffffffffffffffffff
pid 31555's current affinity mask: ffffffffffffffffffff
pid 31556's current affinity mask: ffffffffffffffffffff
pid 31557's current affinity mask: ffffffffffffffffffff
pid 31558's current affinity mask: ffffffffffffffffffff
pid 31559's current affinity mask: ffffffffffffffffffff
pid 31560's current affinity mask: ffffffffffffffffffff[/COLOR]
[/CODE]

TheJudger 2012-05-27 17:34

in commonb.c line 576 to 589:
[CODE]
for (i = 0; i < MAX_NUM_WORKER_THREADS; i++) {
if (scramble[i] >= '0' && scramble[i] <= '9')
AFFINITY_SCRAMBLE[i] = scramble[i] - '0';
else if (scramble[i] >= 'A' && scramble[i] <= 'Z')
AFFINITY_SCRAMBLE[i] = scramble[i] - 'A' + 10;
else if (scramble[i] >= 'a' && scramble[i] <= 'z')
AFFINITY_SCRAMBLE[i] = scramble[i] - '[B][COLOR="Red"]A[/COLOR][/B]' + 36;
else if (scramble[i] == '(')
AFFINITY_SCRAMBLE[i] = 62;
else if (scramble[i] == ')')
AFFINITY_SCRAMBLE[i] = 63;
else
AFFINITY_SCRAMBLE[i] = i; /* Illegal entry = no mapping */
}
[/CODE]

I guess the red [B][COLOR="Red"]A[/COLOR][/B] should be lowercase, right?

Oliver

Prime95 2012-05-27 18:32

[QUOTE=TheJudger;300397]

I guess the red [B][COLOR="Red"]A[/COLOR][/B] should be lowercase, right?

[/QUOTE]


Good catch. The bug effectively limits AffinityScramble's usefulness to 36 cores.


All times are UTC. The time now is 06:29.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.