mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2017-02-24, 07:56   #1
Explorer09
 
May 2014

3·11 Posts
Default Prime95 potential race on WORKER_THREADS_ACTIVE and WORKER_THREADS_STOPPING

Upon trying to hack the mprime source code. I discovered a potential data race
condition on the thread-shared global flags WORKER_THREADS_ACTIVE and
WORKER_THREADS_STOPPING.

In particular, its about the Launcher() code:

commonb.c, Launcher() function
Code:
/* If worker threads are active then stop them all.  This can */
/* happen when we choose Torture Test, Benchmark, or Advanced/Time from */
/* the menus while the worker threads are running */
if (WORKER_THREADS_ACTIVE) {
        stop_workers_for_escape ();
        while (WORKER_THREADS_STOPPING) Sleep (50);
}

/* Set flags so that GUI knows worker threads are active */
WORKER_THREADS_ACTIVE = ld->num_threads;
WORKER_THREADS_STOPPING = FALSE;

... launch other workers, convert Launcher thread itself into a worker,
read stop conditions, stop, maybe restart ...

if (LAUNCH_TYPE == LD_CONTINUE || LAUNCH_TYPE == LD_TORTURE)
        OutputStr (MAIN_THREAD_NUM, "Execution halted.\n");
if (LAUNCH_TYPE == LD_CONTINUE)
        OutputStr (MAIN_THREAD_NUM, "Choose Test/Continue to restart.\n");

/* Clear flags so that GUI knows worker threads are not active */
WORKER_THREADS_ACTIVE = 0;
WORKER_THREADS_STOPPING = FALSE;
And here the Active and Stopping flags are modified at the same time, but
without mutex. And so there is a potential race.

Example 1 (TOCTTOU in stop_workers_for_escape()):

Code:
Launcher / worker #1 (thread_num=0) | Menu/GUI thread
------------------------------------+----------------------------------------
// In Launcher() ..                 | 
OutputStr (MAIN_THREAD_NUM,         |
"Execution halted.\n");             |
                                    | // Upon *second* call of
                                    | // stop_workers_for_escape ()
                                    | if (WORKER_THREADS_ACTIVE)
                                    | // result: 1
                                    | OutputStr (MAIN_THREAD_NUM,
                                    | "Stopping all worker threads.\n");
WORKER_THREADS_ACTIVE = 0;          |
WORKER_THREADS_STOPPING = FALSE;    |
                                    | WORKER_THREADS_STOPPING = TRUE;
                                    | ... return to exiting routine that
                                    | waits for WORKER_THREADS_STOPPING ...
                                    |
                                    | while (WORKER_THREADS_STOPPING) {
                                    |     Sleep (50);
                                    | } // Loops forever!!!
------------------------------------+----------------------------------------
Example 2 (when launching Torture Test, Benchmark, or Advanced/Time just as
the worker threads are launched) (rare but possible):

Code:
Launcher for worker (thread_num=0)  | Launcher for benchmark (thread_num=0)
------------------------------------+----------------------------------------
// Launched by "Test/Continue"      |
// from menu or GUI thread first    |
                                    | // Launched by "Options/Benchmark"
                                    | // from menu or GUI thread later
// In Launcher() ...                | // In Launcher() ...
WORKER_THREADS_ACTIVE =             |
    ld->num_threads;                |
                                    | if (WORKER_THREADS_ACTIVE) { // result: 1
                                    |     stop_workers_for_escape ();
WORKER_THREADS_STOPPING = FALSE;    |
                                    |     while (WORKER_THREADS_STOPPING) {
                                    |     // result: 0; break; didn't Sleep()
                                    |     }
                                    | }
                                    | WORKER_THREADS_ACTIVE = ld->num_threads;
                                    | WORKER_THREADS_STOPPING = FALSE;
// WORKER_THREADS_ACTIVE hijacked!  |
// And threads here didn't stop.    |
------------------------------------+----------------------------------------
And so something weird could happen.
Explorer09 is offline   Reply With Quote
Old 2017-02-24, 07:58   #2
Explorer09
 
May 2014

3×11 Posts
Default

Excuse for the large patch. Because I discovered subsequent problems
when I try to fix the data race, and so I try addressed them all.

The patch fixes these things:

1. Make all writes on WORKER_THREADS_ACTIVE and WORKER_THREADS_STOPPING variables be guarded by a mutex to prevent write contention and ensure the conditional (WORKER_THREADS_ACTIVE && !WORKER_THREADS_STOPPING) be evaluated synchronously.

Introduces: LAUNCHER_MUTEX and workers_active_and_running() function.

stop_workers_for_escape() now tests-and-sets WORKER_THREADS_STOPPING to prevent more contention.

Launcher() also now tests-and-sets WORKER_THREADS_ACTIVE to prevent contention.

2. For the "while (a_global_var) Sleep (50);" construct, make the variable access volatile to prevent compiler surprise. (I.e. Cache the variable reading and turn the loop into infinite loop.)

Note that *modification* of the variables may be non-volatile.

3. Remove the dead THREAD_ACTIVE symbol in prime95/service.c and replace that with working code. (Theoretically working; didn't test.)

4. In Linux / OS X build of mprime, when choosing "Test/Stop" in the menu to stop workers one by one, the "Launcher()" thread didn't issue a terminate on all worker threads upon stopping the last worker, but kept waiting instead. This is a flaw in the menu interface code, and so does not affect Windows GUI or OS X GUI builds.

5. The "Test/Continue" actions in all UIs will now wait for WORKER_THREADS_STOPPING flag, and present "which workers to stop" question only after the flag has been cleared. The reason for the wait is to delay the active_workers_count() call until the state where the call can get more reliable result.
Attached Files
File Type: txt mprime-launcher-race.patch.txt (27.1 KB, 131 views)
Explorer09 is offline   Reply With Quote
Old 2017-03-09, 08:17   #3
Explorer09
 
May 2014

418 Posts
Default

Patch Revised! v2 Changes:

2b. Simplify the syntax of volatile casting.

2c. Make accesses of WORKER_THREADS_STOPPING in two more functions volatile.
Attached Files
File Type: txt mprime-launcher-race-v2.patch.txt (28.0 KB, 198 views)
Explorer09 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
The human race, magnifier of the improbable jasong jasong 0 2015-08-24 06:11
A potential cause of Windows low-memory messages cheesehead Software 14 2013-05-16 00:45
Price is Right Race Game c10ck3r Puzzles 19 2013-03-27 14:00
Boat race davieddy Puzzles 13 2011-11-03 15:49
Horse Race Gary Edstrom Puzzles 10 2003-08-31 14:25

All times are UTC. The time now is 12:56.


Sun Oct 24 12:56:45 UTC 2021 up 93 days, 7:25, 1 user, load averages: 0.85, 0.98, 1.20

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.