mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Linux (https://www.mersenneforum.org/forumdisplay.php?f=39)
-   -   Linux crash (very strange) (https://www.mersenneforum.org/showthread.php?t=25786)

Prime95 2020-07-29 02:52

Linux crash (very strange)
 
Dream machine has decided to spontaneously reboot every 19 minutes!?
Yes, 19 minutes like clockwork.

It is not overheating (nor are any of the 4 CPUs that are doing a network boot)
dmesg shows nothing strange right up until crash

I am at a loss as to what to do next. Ideas?

paulunderwood 2020-07-29 02:58

Freshly installed Debian has to be coaxed into not [URL="https://wiki.debian.org/Suspend#Disable_suspend_and_hibernation"]suspending[/URL] etc after ~20 minutes with a mask command. I wonder if your problem is a similar thing.

Prime95 2020-07-29 03:19

Ubuntu 16.04 -- hasn't changed since the day it was built ~5 years ago.

paulunderwood 2020-07-29 03:31

[QUOTE=Prime95;551844]Ubuntu 16.04 -- hasn't changed since the day it was built ~5 years ago.[/QUOTE]

[URL="https://serverfault.com/questions/884597/how-to-find-reason-for-system-reboot-or-shutdown-in-ubuntu-16-04-system"]This[/URL] says run the command [C]egrep -ir "(shut|reboot)" /var/log/*[/C]

Prime95 2020-07-29 03:58

Since 9:00 tonight

[CODE]/var/log/syslog:Jul 28 21:27:17 z170itx systemd[1]: Starting Update UTMP about System Boot/Shutdown...
/var/log/syslog:Jul 28 21:27:17 z170itx systemd[1]: Started Update UTMP about System Boot/Shutdown.
/var/log/syslog:Jul 28 21:27:17 z170itx cron[907]: (CRON) INFO (Running @reboot jobs)
/var/log/syslog:Jul 28 21:27:17 z170itx systemd[1]: Starting LXD - container startup/shutdown...
/var/log/syslog:Jul 28 21:27:18 z170itx systemd[1]: Started LXD - container startup/shutdown.
/var/log/syslog:Jul 28 21:27:23 z170itx systemd[1]: Started Unattended Upgrades Shutdown.
/var/log/syslog:Jul 28 21:47:13 z170itx systemd[1]: Starting Update UTMP about System Boot/Shutdown...
/var/log/syslog:Jul 28 21:47:13 z170itx systemd[1]: Started Update UTMP about System Boot/Shutdown.
/var/log/syslog:Jul 28 21:47:13 z170itx systemd[1]: Starting LXD - container startup/shutdown...
/var/log/syslog:Jul 28 21:47:13 z170itx cron[941]: (CRON) INFO (Running @reboot jobs)
/var/log/syslog:Jul 28 21:47:13 z170itx systemd[1]: Started LXD - container startup/shutdown.
/var/log/syslog:Jul 28 21:47:17 z170itx systemd[1]: Started Unattended Upgrades Shutdown.
/var/log/syslog:Jul 28 22:07:12 z170itx systemd[1]: Starting Update UTMP about System Boot/Shutdown...
/var/log/syslog:Jul 28 22:07:12 z170itx systemd[1]: Started Update UTMP about System Boot/Shutdown.
/var/log/syslog:Jul 28 22:07:12 z170itx systemd[1]: Starting LXD - container startup/shutdown...
/var/log/syslog:Jul 28 22:07:12 z170itx cron[927]: (CRON) INFO (Running @reboot jobs)
/var/log/syslog:Jul 28 22:07:13 z170itx systemd[1]: Started LXD - container startup/shutdown.
/var/log/syslog:Jul 28 22:07:18 z170itx systemd[1]: Started Unattended Upgrades Shutdown.
/var/log/syslog:Jul 28 22:24:37 z170itx systemd[1]: Starting Update UTMP about System Boot/Shutdown...
/var/log/syslog:Jul 28 22:24:37 z170itx systemd[1]: Started Update UTMP about System Boot/Shutdown.
/var/log/syslog:Jul 28 22:24:37 z170itx systemd[1]: Starting LXD - container startup/shutdown...
/var/log/syslog:Jul 28 22:24:37 z170itx cron[970]: (CRON) INFO (Running @reboot jobs)
/var/log/syslog:Jul 28 22:24:38 z170itx systemd[1]: Started LXD - container startup/shutdown.
/var/log/syslog:Jul 28 22:24:43 z170itx systemd[1]: Started Unattended Upgrades Shutdown.
/var/log/syslog:Jul 28 22:40:11 z170itx systemd[1]: Starting Update UTMP about System Boot/Shutdown...
/var/log/syslog:Jul 28 22:40:11 z170itx systemd[1]: Started Update UTMP about System Boot/Shutdown.
/var/log/syslog:Jul 28 22:40:11 z170itx systemd[1]: Starting LXD - container startup/shutdown...
/var/log/syslog:Jul 28 22:40:11 z170itx cron[926]: (CRON) INFO (Running @reboot jobs)
/var/log/syslog:Jul 28 22:40:12 z170itx systemd[1]: Started LXD - container startup/shutdown.
/var/log/syslog:Jul 28 22:40:17 z170itx systemd[1]: Started Unattended Upgrades Shutdown.
/var/log/syslog:Jul 28 23:04:53 z170itx systemd[1]: Starting Update UTMP about System Boot/Shutdown...
/var/log/syslog:Jul 28 23:04:53 z170itx systemd[1]: Started Update UTMP about System Boot/Shutdown.
/var/log/syslog:Jul 28 23:04:53 z170itx cron[949]: (CRON) INFO (Running @reboot jobs)
/var/log/syslog:Jul 28 23:04:53 z170itx systemd[1]: Starting LXD - container startup/shutdown...
/var/log/syslog:Jul 28 23:04:53 z170itx systemd[1]: Started LXD - container startup/shutdown.
/var/log/syslog:Jul 28 23:04:58 z170itx systemd[1]: Started Unattended Upgrades Shutdown.
/var/log/syslog:Jul 28 23:42:23 z170itx systemd[1]: Starting Update UTMP about System Boot/Shutdown...
/var/log/syslog:Jul 28 23:42:23 z170itx systemd[1]: Started Update UTMP about System Boot/Shutdown.
/var/log/syslog:Jul 28 23:42:23 z170itx cron[852]: (CRON) INFO (Running @reboot jobs)
/var/log/syslog:Jul 28 23:42:23 z170itx systemd[1]: Starting LXD - container startup/shutdown...
/var/log/syslog:Jul 28 23:42:24 z170itx systemd[1]: Started LXD - container startup/shutdown.
/var/log/syslog:Jul 28 23:42:29 z170itx systemd[1]: Started Unattended Upgrades Shutdown.
[/CODE]

paulunderwood 2020-07-29 04:04

This looks like a problem with unattended upgrades. [URL="https://ostechnix.com/how-to-disable-unattended-upgrades-on-ubuntu/"]Remove it [/URL] with [C]sudo apt remove unattended-upgrades[/C] or disable it via [c]sudo dpkg-reconfigure unattended-upgrades[/c]

Prime95 2020-07-29 04:20

[QUOTE=Prime95;551840]Yes, 19 minutes like clockwork.[/QUOTE]

Ooh, I just got 32 minutes before reboot.

Trying Paul's suggestion.
Then, I'm going to unplug mobos one at a time -- my guess is it's a "NIC gone nuts" problem.

EdH 2020-07-29 13:39

This is interesting! I started having similar troubles with a couple of i7s running Ubuntu 16.04. One of them has totally died and I haven't gone further with troubleshooting it yet - the other one has lengthened its intervals now that it has a working fan, but still reboots more than once a day. But, my machines are old and the one had a bad case fan, so I attributed the reboots to heat. I wonder. . .

OTOH, I have an i7 laptop that I suspend before bed (as I do many others), that wakes itself up during the night. I often find it already at work when I awaken the others.

rogue 2020-07-29 14:01

Is it possible that you have a bad CPU? This happened to my wife's computer a couple of months ago. The computer worked without issues for two years then all of a sudden I could not boot into Windows. It would just restart before I got to the desktop. Thinking that Windows was corrupted, I was able to log into a command prompt and backup all personal documents. I then reinstalled Windows on another disk. I could get to the desktop on that disk, but within a minute or more it would restart with no blue screen. Took it in to nearby shop and they eventually determined that the issue was with the CPU. It was under warranty from Intel, thus I could get a new CPU at no charge. Intel could not provide that specific CPU, so they provided a discounted upgraded CPU (7700 to 9700 or something like that).

Prime95 2020-07-30 01:20

What I've discovered thusfar: This is a 5-CPU system. One CPU has a disk, the other four netboot off the first CPU. They all connect to the same network switch, which is crammed inside the CPU case.

No problems if main CPU and netboot CPUs #1 and #4 are running.
Adding netboot CPU#2 fails.
Adding netboot CPU#3 fails.
Now I way have used the same internet cable for #2 and #3 because the cable ordinarily used for #3 isn't working so I choose a different one at random (random being one of 3 other cables as there used to be 6 netboot CPUs).

Now testing #3 with a different cable.

New working theory: Recent work on the machine jostled or loosened one (or more) of the Internet cables causing noise on one of the lines. The OS is trying to process the noise and falling behind, until 19 minutes later a buffer overflows, and the OS dies. /var/log/syslog has no entries right before each reboot.

Prime95 2020-07-30 03:53

netboot cpu #3 on a different cable: 2.5 hours and going strong


All times are UTC. The time now is 16:41.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.