![]() |
|
|
#1 |
|
Mar 2006
Germany
55308 Posts |
All servers LLRnet and PRPnet are offline since 15 minutes now!
Max could not reach the server and Gary is not available, so we only can wait for now! |
|
|
|
|
|
#2 |
|
A Sunny Moo
Aug 2007
USA (GMT-5)
3·2,083 Posts |
I just got a hold of Gary via text message--my guess of a thunderstorm that I mentioned in a PM turned out to be right. As of 8 minutes ago he said he's checking to confirm that the internet is out. (Not that there's much he can do if it is out besides call the cable company and wait for them to come by...)
|
|
|
|
|
|
#3 |
|
May 2007
Kansas; USA
242338 Posts |
Hum. It had nothing to do with the thunderstorm. All of my other machines are internet connected and running. Jeepford just spontaneously shut down. In booting it back up, it appears to have a few disk errors and would not boot up. After a few attempts at the shell or whatever it is that is the Linux equivalent of a DOS C-prompt, I keep getting just a little further each time. The boot originally only got to 2-3% and the last attempt got to 34% before stopping. The fsck utility appears to have corrected some errors.
Thanks for the very quick notification Max. I'm all over it now. If it's quickly fixable with fsck or other obvious utility, I'll have it fixed within the next 15-30 mins. Edit: My perception now is that server machines are hard on hard drives. The last one developed a few errors here-and-there and is still working fine as a "normal" machine. All of the constant disk writes from the PRPnet server during the rally may have stressed it to the point that it got some errors that need to be written around. Last fiddled with by gd_barnes on 2010-08-21 at 01:42 |
|
|
|
|
|
#4 |
|
May 2007
Kansas; USA
33·5·7·11 Posts |
As a point of reference as to what I'm getting when I run the fsck utility: It comes back with:
"Inodes that were part of a corrupted orphan linked list found. Fix(y)?" I then tell it yes and it seems to fix some and then hesitates for an extended period, apparently looking for more of them. I think I wasn't patient enough before and just rebooted it after the first group of errors. I'll just keep letting it find them now and then hopefully when I try the next reboot, it will work completely correctly. Edit: Does Karsten ever sleep? lol Last fiddled with by gd_barnes on 2010-08-21 at 01:46 |
|
|
|
|
|
#5 |
|
May 2007
Kansas; USA
1039510 Posts |
Everything is working now.
My take on the issue: A few small hard disk errors had crept in from the tremendous amount of reading/writing of the PRPnet server messages from the rally in an attempt to isolate the cause of the "to many connections" error. One of the errors somehow crept into some root or system file caushing the machine to shut itself down. It took aformentioned utility a couple of attempts to write the system files around the bad hard drive sectors. Thankfully Linux is robust in that regard. Sorry about the problem. Fortunately the fix wasn't too bad. It was strange to see the server machine sitting there shut down while my modem/router continued to blink away, all of my other machines were still on, and with the battery backup still in good shape and apparently never used. There was no power flicker here. |
|
|
|
|
|
#6 |
|
A Sunny Moo
Aug 2007
USA (GMT-5)
3·2,083 Posts |
Hmm...interesting. I hadn't ever really considered the impact on the hard drive of all the beatup it takes on a daily basis running not only PRPnet and LLRnet servers, but also the stats DB. For each pair that's handed out, it has to write to the disk once when it's sent to the client, once when it's returned, and probably a few more times when it's imported into the DB. Plus, there's various overhead not specifically tied to # of pairs processed, such as LLRnet's pruning and PRPnet's every-10-minute checks to see if it needs to send out any new emails.
While in this case everything worked out OK since the errors were minor, it is definitely a striking reminder of the need to have a backup system. At this time, none of the server stuff is being backed up on a regular basis--at least not to a location outside of the server's primary hard drive. (I believe Dave does something to backup the DB, but it just puts it elsewhere on the same disk.) I think it's high time I started looking into options for backing up the server to some kind of external location... Possibly an external (USB?) hard drive for regular backups, with a DVD or something similar for periodic (monthly?) backups of that.
|
|
|
|
|
|
#7 |
|
Jan 2006
deep in a while-loop
65810 Posts |
Max, please refer to email 31-05-2010 and respond on that email.
You need to tell me which destination I can copy-off to. Thx. |
|
|
|
|
|
#8 |
|
May 2007
Kansas; USA
1039510 Posts |
You didn't mention the most obvious thing that might have caused the problem: The fact that we were doing the huge amount of logging for the PRPnet server resulting in 100's of MB of file writes.
|
|
|
|
|
|
#9 | |
|
A Sunny Moo
Aug 2007
USA (GMT-5)
141518 Posts |
Quote:
Despite the huge log files it creates, I do prefer to keep all the PRPnet servers on maximum debug level (which while verbose, is not quite as much so as the special version we used during the rally). Those elusive high-load bugs that only show up once in a while are almost impossible to catch otherwise--to apply an analogy you used a few days ago in a PM, it's like a car problem that suddenly stops happening when you take the car into the shop. ![]() And besides all this, we really are quite overdue to get a real external backup system set up for jeepford: all it would take is one dead hard drive (which is not an entirely uncommon scenario even in a non-server setting) for us to lose the entire stats DB (and of course the active contents of the servers as well, though that's not as hard to recover from). Last fiddled with by mdettweiler on 2010-08-21 at 04:45 |
|
|
|
|
|
|
#10 |
|
Mar 2006
Germany
23·3·112 Posts |
It was before I'm going to sleep when a heard the beep-sound!
Lucky to have find a prime it was 'only' that issue, so I stayed online awhile, waiting the server comes back. I continued another local effort so there was not much time left. But now, after 5 hours of sleep, I've rested enough!
|
|
|
|
|
|
#11 | |
|
I quite division it
"Chris"
Feb 2005
England
207710 Posts |
Quote:
![]() I know you guys are the experts but please tell me there are multiple backups of all the NPLB and CRUS results and sieve files. ![]() I've had about 4 HD failures here in 10 years. All my important data here is on 4 HDs on 3 PCs, then it is automatically backed up online. 678GB so far. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| 20th Test of primality and factorization of Lepore with Pythagorean triples | Alberico Lepore | Alberico Lepore | 43 | 2018-01-17 15:55 |
| Move the 20th (moving to endgame soon) | Dubslow | Game 1 - ♚♛♝♞♜♟ - Shaolin Pirates | 10 | 2013-03-03 08:59 |
| Rally Feb. 20th-22nd | gd_barnes | No Prime Left Behind | 13 | 2009-02-20 14:06 |
| Prime95's backups broken? | abstractius | Software | 4 | 2007-12-18 02:31 |
| New Server Hardware and price quotes, Funding the server | Angular | PrimeNet | 32 | 2002-12-09 01:12 |