mersenneforum.org Need cronscript to check if user job is running, and if not, to restart it
 Register FAQ Search Today's Posts Mark Forums Read

 2019-10-05, 20:22 #1 ewmayer ∂2ω=0     Sep 2002 República de California 7·11·151 Posts Need cronscript to check if user job is running, and if not, to restart it My old Haswell-quad system is still capable of rendering good crunching work, but it's always been flaky in terms of system stability. Been hitting near-daily data 'mystery crashes' running Mlucas @5120K on all 4 cores recently - my little Broadwell NUC is also running the same build 4-threaded with no stability issues, as is a host of Galaxy S7 beater-phones (running the ARM SIMD) build, so I believe it's a hardware issue. System is running at stock, good ventilation (case side panel allowing access to the mobo is removed, allowing extra ventilation and making it easy to monitor dust accumulation). I do get not-infrequent throttling messages on the console, but nothing untoward, based on runtimes, which rise perhaps 2-3% in the first ~10 minutes following program start, then stabilize there. What I need is a cron script which every so often - say every 15 minutes - checks whether Mlucas is running, and if not, executes this startup command (directory is relative to my regular-user account): cd ~/RUN && nohup nice ~/*19/obj/mlucas_v19 -cpu 0:3 -fftlen 5120 & This page has a bash-script solution: https://www.digitalocean.com/communi...erver-programs But: https://stackoverflow.com/questions/...ervice-if-dead -- The answers here note that /etc/init.d-based solutions are problematic because the OS deletes/rereshes that file periodically. And, can I do this sort of thing as regular user, or does it have to be as root? (The latter is not a problem, I just want to know what my options are.) Last fiddled with by ewmayer on 2019-10-05 at 20:24
 2019-10-05, 22:04 #2 paulunderwood     Sep 2002 Database er0rr 24·229 Posts Code: */10 * * * * pidof mlucas_v19 >/dev/null || cd ~/RUN && nohup nice ~/*19/obj/mlucas_v19 -cpu 0:3 -fftlen 5120 & This ought to do it, but I have not tested it. Check that you have pidof on your Linux system. Last fiddled with by paulunderwood on 2019-10-05 at 22:50
 2019-10-06, 09:22 #3 Nick     Dec 2012 The Netherlands 68916 Posts The cleanest solution is to use the wait() system call as then you know immediately when the child exits.
2019-10-06, 15:58   #4
chris2be8

Sep 2009

23·3·5·17 Posts

Quote:
 Originally Posted by Nick The cleanest solution is to use the wait() system call as then you know immediately when the child exits.
That won't work if the system suddenly reboots.

paulunderwood's cron entry should work. It should work from his crontab (ie not root). And an @reboot entry to check after the system reboots could be useful (but think what happens if that run just as the every 10 minutes entry is about to run).

Chris

2019-10-06, 19:46   #5
ewmayer
2ω=0

Sep 2002
República de California

7×11×151 Posts

Quote:
 Originally Posted by paulunderwood Code: */10 * * * * pidof mlucas_v19 >/dev/null || cd ~/RUN && nohup nice ~/*19/obj/mlucas_v19 -cpu 0:3 -fftlen 5120 & This ought to do it, but I have not tested it. Check that you have pidof on your Linux system.
Thanks, the system in question does have pidof - how do I use the above? A followup comment says to put it in a crontab file, for non-root regular user where should said file go?

2019-10-06, 20:07   #6
paulunderwood

Sep 2002
Database er0rr

24·229 Posts

Quote:
 Originally Posted by ewmayer Thanks, the system in question does have pidof - how do I use the above? A followup comment says to put it in a crontab file, for non-root regular user where should said file go?
Run (as user) crontab -e to edit the file and when done crontab -l to list the file

Append the line by pasting.

 Similar Threads Thread Thread Starter Forum Replies Last Post emiller Software 10 2017-11-14 10:26 aketilander Software 3 2011-09-03 11:12 Svenie25 Twin Prime Search 4 2010-08-05 23:04 opyrt Prime Sierpinski Project 3 2009-01-02 01:50 Unregistered Software 3 2003-11-19 23:19

All times are UTC. The time now is 19:47.

Fri May 7 19:47:53 UTC 2021 up 29 days, 14:28, 0 users, load averages: 2.60, 2.47, 2.41