mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2012-05-20, 17:51   #23
Jwb52z
 
Jwb52z's Avatar
 
Sep 2002

11000111112 Posts
Default

Quote:
Originally Posted by NBtarheel_33 View Post
Just a heads-up: The GIMPS home page is still advertising version 26 as the current release build.
Yes, it is supposed to be that way until it is finalized that version 27.x is stable and doesn't need any further changes or enhancements and its creators say so.
Jwb52z is offline   Reply With Quote
Old 2012-05-22, 09:58   #24
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default Possible bug?

Code:
bill@Gravemind:~/MPrime∰∂ mprime -d
[Main thread May 22 04:52] Mersenne number primality test program version 27.7
[Main thread May 22 04:52:35] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 8 MB
[Main thread May 22 04:52:35] Logical CPUs 1,5 form one physical CPU.
[Main thread May 22 04:52:35] Logical CPUs 2,6 form one physical CPU.
[Main thread May 22 04:52:35] Logical CPUs 3,7 form one physical CPU.
[Main thread May 22 04:52:35] Logical CPUs 4,8 form one physical CPU.
[Main thread May 22 04:52:35] Starting workers.
[Comm thread May 22 04:52:35] Exchanging program options with server
[Worker #1 May 22 04:52:35] Worker starting
[Worker #1 May 22 04:52:35] Setting affinity to run worker on logical CPU #1
[Worker #2 May 22 04:52:35] Waiting 5 seconds to stagger worker starts.
[Worker #3 May 22 04:52:35] Waiting 10 seconds to stagger worker starts.
[Worker #4 May 22 04:52:35] Waiting 15 seconds to stagger worker starts.
[Comm thread May 22 04:52:35] Done communicating with server.
[Worker #1 May 22 04:52:36] Setting affinity to run helper thread 1 on logical CPU #5
[Worker #1 May 22 04:52:36] Resuming primality test of M54197029 using AVX FFT length 2880K, Pass1=384, Pass2=7680, 2 threads
[Worker #1 May 22 04:52:36] Iteration: 44942274 / 54197029 [82.9238%].
[Worker #2 May 22 04:52:40] Worker starting
[Worker #2 May 22 04:52:40] Setting affinity to run worker on logical CPU #5
[Worker #2 May 22 04:52:41] Setting affinity to run helper thread 1 on logical CPU #2
[Worker #2 May 22 04:52:41] Resuming primality test of M25318487 using AVX FFT length 1344K, Pass1=448, Pass2=3K, 2 threads
[Worker #2 May 22 04:52:41] Iteration: 175637 / 25318487 [0.6937%].
[Worker #3 May 22 04:52:45] Worker starting
[Worker #3 May 22 04:52:45] Setting affinity to run worker on logical CPU #2
[Worker #3 May 22 04:52:45] Setting affinity to run helper thread 1 on logical CPU #6
[Worker #3 May 22 04:52:46] Resuming primality test of M25572683 using AVX FFT length 1344K, Pass1=448, Pass2=3K, 2 threads
[Worker #3 May 22 04:52:46] Iteration: 14725704 / 25572683 [57.5837%].
[Worker #4 May 22 04:52:50] Worker starting
[Worker #4 May 22 04:52:50] Setting affinity to run worker on logical CPU #6
[Worker #4 May 22 04:52:50] Setting affinity to run helper thread 1 on logical CPU #3
[Worker #4 May 22 04:52:51] Resuming primality test of M25353589 using AVX FFT length 1344K, Pass1=448, Pass2=3K, 2 threads
[Worker #4 May 22 04:52:51] Iteration: 11108729 / 25353589 [43.8152%].
I have no idea why in the world it does this. It doesn't matter whether or not I use AffinityScramble2 override or not -- same result.
Code:
bill@Gravemind:~/MPrime∰∂ cat local.txt
<snip>
WorkerThreads=4
NumCPUs=4
ThreadsPerTest=2
<snip>

[Worker #1]
Affinity=0

[Worker #2]
Affinity=1

[Worker #3]
Affinity=2

[Worker #4]
Affinity=3
It gets neither the affinity nor the hyperthreading correct for workers 2-4.

I remember this once happened when I first installed MPrime (v27) on my laptop, and it was very frustrating; however, I also recall figuring out some stupid user error after which it started working again, so I never said anything. But, this is exactly the same sort of symptoms, and I can't figure out for the life of me what I'm doing wrong this time.

Edit: Another example:
Code:
bill@Gravemind:~/MPrime∰∂ cat local.txt 
<snip>
WorkerThreads=4
NumCPUs=4
ThreadsPerTest=2
<snip>

[Worker #1]
Affinity=1

[Worker #2]
Affinity=2

[Worker #3]
Affinity=3

[Worker #4]
Affinity=4
bill@Gravemind:~/MPrime∰∂ mprime -d
[Main thread May 22 04:58] Mersenne number primality test program version 27.7
[Main thread May 22 04:58:44] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 8 MB
[Main thread May 22 04:58:44] Unable to detect some of the hyperthreaded logical CPUs.
[Main thread May 22 04:58:44] Enough information obtained to make a reasonable guess.
[Main thread May 22 04:58:44] Logical CPUs 1,5 form one physical CPU.
[Main thread May 22 04:58:44] Logical CPUs 2,6 form one physical CPU.
[Main thread May 22 04:58:44] Logical CPUs 3,7 form one physical CPU.
[Main thread May 22 04:58:44] Logical CPUs 4,8 form one physical CPU.
[Main thread May 22 04:58:44] Starting workers.
[Worker #1 May 22 04:58:44] Worker starting
[Worker #3 May 22 04:58:44] Waiting 10 seconds to stagger worker starts.
[Worker #1 May 22 04:58:44] Setting affinity to run worker on logical CPU #5
[Worker #2 May 22 04:58:44] Waiting 5 seconds to stagger worker starts.
[Worker #4 May 22 04:58:44] Waiting 15 seconds to stagger worker starts.
[Worker #1 May 22 04:58:45] Setting affinity to run helper thread 1 on logical CPU #2
[Worker #1 May 22 04:58:45] Resuming primality test of M54197029 using AVX FFT length 2880K, Pass1=384, Pass2=7680, 2 threads
[Worker #1 May 22 04:58:45] Iteration: 44947544 / 54197029 [82.9335%].
[Worker #2 May 22 04:58:49] Worker starting
[Worker #2 May 22 04:58:49] Setting affinity to run worker on logical CPU #2
[Worker #2 May 22 04:58:50] Setting affinity to run helper thread 1 on logical CPU #6
[Worker #2 May 22 04:58:50] Resuming primality test of M25318487 using AVX FFT length 1344K, Pass1=448, Pass2=3K, 2 threads
[Worker #2 May 22 04:58:50] Iteration: 183276 / 25318487 [0.7238%].
[Worker #3 May 22 04:58:54] Worker starting
[Worker #3 May 22 04:58:54] Setting affinity to run worker on logical CPU #6
[Worker #3 May 22 04:58:55] Setting affinity to run helper thread 1 on logical CPU #3
[Worker #3 May 22 04:58:55] Resuming primality test of M25572683 using AVX FFT length 1344K, Pass1=448, Pass2=3K, 2 threads
[Worker #3 May 22 04:58:55] Iteration: 14733012 / 25572683 [57.6123%].
[Worker #4 May 22 04:58:59] Worker starting
[Worker #4 May 22 04:58:59] Setting affinity to run worker on logical CPU #3
[Worker #4 May 22 04:59:00] Setting affinity to run helper thread 1 on logical CPU #7
[Worker #4 May 22 04:59:00] Resuming primality test of M25353589 using AVX FFT length 1344K, Pass1=448, Pass2=3K, 2 threads
[Worker #4 May 22 04:59:00] Iteration: 11123987 / 25353589 [43.8753%].
Notice that it screws up with the same pattern, though this time Worker 1 isn't even set properly.

If I set Affinity(1)=0, Affinity(2)=5, Affinity(3)=2, Affinity(4)=7, then what I wind up getting is 04 (correct), 62 (should be 51), 15 (should be 26), 8* (should be 73) respectively, where * is "[Worker #4 May 22 05:03:04] Setting affinity to run helper thread 1 on any logical CPU." I'll leave it like this overnight, since at least there are no overlaps as in the first two examples.

Last fiddled with by Dubslow on 2012-05-22 at 10:07
Dubslow is offline   Reply With Quote
Old 2012-05-22, 16:41   #25
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

7,537 Posts
Default

Quote:
Originally Posted by Dubslow View Post
I have no idea why in the world it does this.
Here is what is happening. First, consider a CPU with 8 physical cores and no hyperthreading. These cores are numbered from 1 to 8. If you use two threads for your four workers, worker 1 gets assigned CPUs 1&2, worker 2 gets assigned CPUs 3&4, etc.

Now assume you have 4 cores with hyperthreading and logical CPUs 1 & 2 form physical CPU 1. Again, using two threads for your four workers, worker 1 gets assigned logical CPUs 1&2, worker 2 gets assigned logical CPUs 3&4, etc.

Now assume you have 4 cores with hyperthreading and logical CPUs 1 & 5 form physical CPU 1. An affinity scramble mask of "05162738" is generated. Again, using two threads for your four workers, worker 1 gets assigned scrambled CPUs 1&2 which maps to logical CPUs 0&5, worker 2 gets assigned scrambled CPUs 3&4 which maps to logical CPUs 1&6, etc.

Now look at your case. An affinity scramble mask of "05162738" was generated. You specifically told prime95 to have worker 1 used scrambled CPU 1 (add 1 to the Affinity= setting) and 2 (hyperthreading always uses the next scrambled CPU number). Worker 2 was told to use scrambled CPUs 2 & 3, etc. This explains the assignments you are seeing.

In local.txt, remove all the Affinity= settings. Things should get better.
Prime95 is offline   Reply With Quote
Old 2012-05-22, 18:32   #26
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Now look at your case. An affinity scramble mask of "04152637" was generated. You specifically told prime95 to have worker 1 used scrambled CPU 1 (add 1 to the Affinity= setting) and 2 (hyperthreading always uses the next scrambled CPU number). Worker 2 was told to use scrambled CPUs 2 & 3, etc. This explains the assignments you are seeing.
I don't understand this. If I have Affinity=0 under Worker #1, shouldn't that mean it sets one thread to CPU0, and the helper thread to the next number = CPU4? Then if Worker 2 has Affinity=1, then it gets assigned CPU1, looks at the mask and sees the matching 5 so the helper goes on CPU5, etc.?

The thing is, I've had it exactly like this before and it worked just fine.
Quote:
Originally Posted by Prime95 View Post
In local.txt, remove all the Affinity= settings. Things should get better.
I did this, and got things like "Setting affinity to run worker on logical CPUs 3,7" and "Setting affinity to run helper thread 1 on logical CPUs 3,7" with each worker getting a different physical core; however CPU usage is not quite at 100% anymore, presumably because threads are occasionally still switching between each of the pair they're assigned. That's why I used Affinity= for each thread before. As I said above, it worked fine like that before.
Dubslow is offline   Reply With Quote
Old 2012-05-22, 20:16   #27
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

7,537 Posts
Default

Quote:
Originally Posted by Dubslow View Post
I did this, and got things like "Setting affinity to run worker on logical CPUs 3,7" and "Setting affinity to run helper thread 1 on logical CPUs 3,7" with each worker getting a different physical core; however CPU usage is not quite at 100% anymore, presumably because threads are occasionally still switching between each of the pair they're assigned.
Switching threads between logical CPUs 3 and 7 should not impact performance. The two share the same L1 and L2 cache.

Any use of multithreading may well cause CPU usage to drop below 100% as occasionally one thread must wait on the other to finish up. BTW, if you are doing P-1, prime95 will do both big multiplies and big adds. The multiplies are multithreaded, the adds are not (further degrading the CPU utilization figure).

Quote:
That's why I used Affinity= for each thread before. As I said above, it worked fine like that before.
You can test setting each worker affinity with Affinity=0 (use scrambled CPUs 1&2), Affinity=2 (use scrambled CPUs 3&4), Affinity=4 (use scrambled CPUs 5&6), etc.

You'll probably get best throughput by not using multithreading at all.

Last fiddled with by Prime95 on 2012-05-22 at 20:16
Prime95 is offline   Reply With Quote
Old 2012-05-23, 04:16   #28
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

72×197 Posts
Default

I would agree with George. For me the best is 4 workers in 4 threads, no helpers (in a 8-HT-cores machine). Hyper-threading is generally generating too much heat and it takes too much energy for the plus of performance it brings, especially when we are talking about programs so cache-optimized as p95. Running 8 workers (or 4 workers in 8 cores, one main plus one helper thread for each worker) generally brings about 20% more performance, for a 50-80% more energy (and heat!). Additionally, running 4 single-threaded workers lets some free firing-power for other daily working stuff (no, I don't talk about writing/sending mails and browsing the forum).
LaurV is offline   Reply With Quote
Old 2012-05-23, 05:09   #29
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

1C3516 Posts
Default

I figured out (thanks to fivemack) for the other stuff to do "south of here", they're generally not optimized like P95, so HT for them does help. I realized though that ATM I'm not running any of those, so I did turn off HT for now. (My statement about it working before (some months ago) still stands though.)
Dubslow is offline   Reply With Quote
Old 2012-05-23, 05:40   #30
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

5·137 Posts
Default

Windows is not 100% consistent in enumerating cores. For example, I have a dual Opteron 6128 box (16 physical cores). Prime64 runs 16 individual worker threads, each assigned to a unique core. Windows/Prime64 cores correlate like this under the current Win7Pro install: 1-8 match, Windows 9-12 => Prime 13-16, Windows 13-16 => Prime 9-12.
On different OS (Win2k3 server, Win7Pro without SP1, and W2K8R2 [IIRC, might have been plain W2K8]) but EXACT same hardware and BIOS settings, they correlate exactly. Another Win7 install swapped 5-8 and 1-4.

With hyperthreading, the permutations get even weirder. Luckily, the enumerations appear to remain consistent once established; once you figure out the particular setup it doesn't change until a new OS is installed.

Ubuntu 10.4(? LTS) enumeration matched Mprime numbering the one time I installed it.

Last fiddled with by sdbardwick on 2012-05-23 at 05:43 Reason: Ubuntu info.
sdbardwick is offline   Reply With Quote
Old 2012-05-27, 16:00   #31
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11×101 Posts
Default CPU affinity gone wild

Some affinity "fun" on some big iron...
prime95 v27.7 x86-64, Linux

Wrong usage of Affinity + AffinityScramble2 or bug? There seems to be an issue with the AffinityScramble2 and small letters, capital letters work fine!

local.txt
Code:
WorkerThreads=1
ThreadsPerTest=10
Affinity=0
AffinityScramble2=0123456789
[...]
Pid=30642
Code:
pid 30642's current affinity mask: ffffffffffffffffffff # main thread
pid 30806's current affinity mask: ffffffffffffffffffff # communication thread
pid 30807's current affinity mask: 1
pid 30869's current affinity mask: 2
pid 30870's current affinity mask: 4
pid 30871's current affinity mask: 8
pid 30872's current affinity mask: 10
pid 30873's current affinity mask: 20
pid 30874's current affinity mask: 40
pid 30875's current affinity mask: 80
pid 30876's current affinity mask: 100
pid 30877's current affinity mask: 200

local.txt
Code:
WorkerThreads=1
ThreadsPerTest=10
Affinity=0
AffinityScramble2=UVWXYZabcd
[...]
Pid=31147
Code:
pid 31147's current affinity mask: ffffffffffffffffffff # main thread
pid 31238's current affinity mask: ffffffffffffffffffff # communication thread
pid 31239's current affinity mask: 40000000
pid 31382's current affinity mask: 80000000
pid 31383's current affinity mask: 100000000 # not limited to 32 cores anymore!
pid 31384's current affinity mask: 200000000 # not limited to 32 cores anymore!
pid 31385's current affinity mask: 400000000 # not limited to 32 cores anymore!
pid 31386's current affinity mask: 800000000 # not limited to 32 cores anymore!
pid 31387's current affinity mask: 40000000
pid 31388's current affinity mask: 40000000
pid 31389's current affinity mask: 40000000
pid 31390's current affinity mask: 40000000

local.txt
Code:
WorkerThreads=1
ThreadsPerTest=10
Affinity=0
AffinityScramble2=efghijklmn
[...]
Pid=31317
Code:
pid 31317's current affinity mask: ffffffffffffffffffff # main thread
pid 31408's current affinity mask: ffffffffffffffffffff # communication thread
pid 31409's current affinity mask: ffffffffffffffffffff
pid 31552's current affinity mask: ffffffffffffffffffff
pid 31553's current affinity mask: ffffffffffffffffffff
pid 31554's current affinity mask: ffffffffffffffffffff
pid 31555's current affinity mask: ffffffffffffffffffff
pid 31556's current affinity mask: ffffffffffffffffffff
pid 31557's current affinity mask: ffffffffffffffffffff
pid 31558's current affinity mask: ffffffffffffffffffff
pid 31559's current affinity mask: ffffffffffffffffffff
pid 31560's current affinity mask: ffffffffffffffffffff
TheJudger is offline   Reply With Quote
Old 2012-05-27, 17:34   #32
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11·101 Posts
Default

in commonb.c line 576 to 589:
Code:
                for (i = 0; i < MAX_NUM_WORKER_THREADS; i++) {
                        if (scramble[i] >= '0' && scramble[i] <= '9')
                                AFFINITY_SCRAMBLE[i] = scramble[i] - '0';
                        else if (scramble[i] >= 'A' && scramble[i] <= 'Z')
                                AFFINITY_SCRAMBLE[i] = scramble[i] - 'A' + 10;
                        else if (scramble[i] >= 'a' && scramble[i] <= 'z')
                                AFFINITY_SCRAMBLE[i] = scramble[i] - 'A' + 36;
                        else if (scramble[i] == '(')
                                AFFINITY_SCRAMBLE[i] = 62;
                        else if (scramble[i] == ')')
                                AFFINITY_SCRAMBLE[i] = 63;
                        else
                                AFFINITY_SCRAMBLE[i] = i;  /* Illegal entry = no mapping */
                }
I guess the red A should be lowercase, right?

Oliver
TheJudger is offline   Reply With Quote
Old 2012-05-27, 18:32   #33
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

753710 Posts
Default

Quote:
Originally Posted by TheJudger View Post

I guess the red A should be lowercase, right?

Good catch. The bug effectively limits AffinityScramble's usefulness to 36 cores.
Prime95 is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime95 version 27.3 Prime95 Software 148 2012-03-18 19:24
Prime95 version 26.3 Prime95 Software 76 2010-12-11 00:11
Prime95 version 25.5 Prime95 PrimeNet 369 2008-02-26 05:21
Prime95 version 25.4 Prime95 PrimeNet 143 2007-09-24 21:01
When the next prime95 version ? pacionet Software 74 2006-12-07 20:30

All times are UTC. The time now is 06:37.


Mon Aug 2 06:37:00 UTC 2021 up 10 days, 1:06, 0 users, load averages: 1.81, 1.37, 1.26

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.