mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Linux

Reply
 
Thread Tools
Old 2019-10-05, 20:22   #1
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

7·11·151 Posts
Default Need cronscript to check if user job is running, and if not, to restart it

My old Haswell-quad system is still capable of rendering good crunching work, but it's always been flaky in terms of system stability. Been hitting near-daily data 'mystery crashes' running Mlucas @5120K on all 4 cores recently - my little Broadwell NUC is also running the same build 4-threaded with no stability issues, as is a host of Galaxy S7 beater-phones (running the ARM SIMD) build, so I believe it's a hardware issue. System is running at stock, good ventilation (case side panel allowing access to the mobo is removed, allowing extra ventilation and making it easy to monitor dust accumulation). I do get not-infrequent throttling messages on the console, but nothing untoward, based on runtimes, which rise perhaps 2-3% in the first ~10 minutes following program start, then stabilize there.

What I need is a cron script which every so often - say every 15 minutes - checks whether Mlucas is running, and if not, executes this startup command (directory is relative to my regular-user account):

cd ~/RUN && nohup nice ~/*19/obj/mlucas_v19 -cpu 0:3 -fftlen 5120 &

This page has a bash-script solution: https://www.digitalocean.com/communi...erver-programs

But:
https://stackoverflow.com/questions/...ervice-if-dead -- The answers here note that /etc/init.d-based solutions are problematic because the OS deletes/rereshes that file periodically.

And, can I do this sort of thing as regular user, or does it have to be as root? (The latter is not a problem, I just want to know what my options are.)

Last fiddled with by ewmayer on 2019-10-05 at 20:24
ewmayer is offline   Reply With Quote
Old 2019-10-05, 22:04   #2
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

24·229 Posts
Default

Code:
*/10 * * * * pidof mlucas_v19 >/dev/null || cd ~/RUN && nohup nice ~/*19/obj/mlucas_v19 -cpu 0:3 -fftlen 5120 &
This ought to do it, but I have not tested it. Check that you have pidof on your Linux system.

Last fiddled with by paulunderwood on 2019-10-05 at 22:50
paulunderwood is online now   Reply With Quote
Old 2019-10-06, 09:22   #3
Nick
 
Nick's Avatar
 
Dec 2012
The Netherlands

68916 Posts
Default

The cleanest solution is to use the wait() system call as then you know immediately when the child exits.
Nick is offline   Reply With Quote
Old 2019-10-06, 15:58   #4
chris2be8
 
chris2be8's Avatar
 
Sep 2009

23·3·5·17 Posts
Default

Quote:
Originally Posted by Nick View Post
The cleanest solution is to use the wait() system call as then you know immediately when the child exits.
That won't work if the system suddenly reboots.

paulunderwood's cron entry should work. It should work from his crontab (ie not root). And an @reboot entry to check after the system reboots could be useful (but think what happens if that run just as the every 10 minutes entry is about to run).

Chris
chris2be8 is offline   Reply With Quote
Old 2019-10-06, 19:46   #5
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

7×11×151 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
Code:
*/10 * * * * pidof mlucas_v19 >/dev/null || cd ~/RUN && nohup nice ~/*19/obj/mlucas_v19 -cpu 0:3 -fftlen 5120 &
This ought to do it, but I have not tested it. Check that you have pidof on your Linux system.
Thanks, the system in question does have pidof - how do I use the above? A followup comment says to put it in a crontab file, for non-root regular user where should said file go?
ewmayer is offline   Reply With Quote
Old 2019-10-06, 20:07   #6
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

24·229 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Thanks, the system in question does have pidof - how do I use the above? A followup comment says to put it in a crontab file, for non-root regular user where should said file go?
Run (as user) crontab -e to edit the file and when done crontab -l to list the file

Append the line by pasting.
paulunderwood is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Endlessly Running Jacobi error check on v29.3 emiller Software 10 2017-11-14 10:26
Is it possible to restart? aketilander Software 3 2011-09-03 11:12
Restart ;) Svenie25 Twin Prime Search 4 2010-08-05 23:04
First check and double check llrnet servers. opyrt Prime Sierpinski Project 3 2009-01-02 01:50
How should I restart? Unregistered Software 3 2003-11-19 23:19

All times are UTC. The time now is 19:47.

Fri May 7 19:47:53 UTC 2021 up 29 days, 14:28, 0 users, load averages: 2.60, 2.47, 2.41

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.