mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > No Prime Left Behind

Reply
 
Thread Tools
Old 2008-12-11, 02:27   #474
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

11000011010012 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
Is the server completely down?
No, it's just going down every 15 minutes (when it prunes the joblist and knpairs files), but I'm manually restarting it every time as soon as I see it. (I've got a VNC window open in the background so I can monitor it.) I'm currently working on a workaround to make it automatically restart the server every time it goes down (to serve as sort of a band-aid fix).
mdettweiler is offline   Reply With Quote
Old 2008-12-11, 02:30   #475
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
No, it's just going down every 15 minutes (when it prunes the joblist and knpairs files), but I'm manually restarting it every time as soon as I see it. (I've got a VNC window open in the background so I can monitor it.) I'm currently working on a workaround to make it automatically restart the server every time it goes down (to serve as sort of a band-aid fix).
Okay, I think I've got a temporary workaround (using "while" loops in bash syntax to continually re-start the server every time it exits) that should keep things going without me restarting it manually every 15 minutes. I'll continue to monitor it, of course, to make sure that the workaround functions properly.
mdettweiler is offline   Reply With Quote
Old 2008-12-11, 02:45   #476
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

32·13·89 Posts
Default

I'm typing from the machine right now. Is there anything I can do in the next 10 mins. to monitor it?

This has been, by far, my most stable machine. It's never been turned off and never run above 68 C (67 C right now) since I started running it in early May. Taking the cover off to swap out the hard drive was the first time I even had the cover off of it.

Last fiddled with by gd_barnes on 2008-12-11 at 02:48
gd_barnes is online now   Reply With Quote
Old 2008-12-11, 02:49   #477
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
I'm typing from the machine right now. Is there anything I can do in the next 10 mins. to monitor it?

This has been, by far, my most stable machine. It's never been turned off and never run above 68 C since I started running it in early May. Taking the cover off to swap out the hard drive was the first time I even had the cover off of it.
Okay...the workaround seems to be holding. Should be OK as a band-aid fix until we can figure out the root of the problem.

As for what's causing this: I honestly don't know. I doubt it could have anything to do with swapping the hard drive; I'm thinking something more along the lines of a messed-up binary. But then again, I don't know--it could be anything.

In the meantime, nobody should need to move any machines off G4000; the workaround should keep it running OK for now.
mdettweiler is offline   Reply With Quote
Old 2008-12-11, 02:52   #478
IronBits
I ♥ BOINC!
 
IronBits's Avatar
 
Oct 2002
Glendale, AZ. (USA)

21318 Posts
Default

Check the perms chmod 664 *.txt should fix it.
Happens when a lot of folks are trying to get in, and it doesn't come up cleanly trying to lock onto the socket.
When folks are hitting mine real heavy, and I take it down to give it some more pairs, it might take me 15 tries to get it to come back up. Sometimes I just let it sit for a minute or two then try again to.
IronBits is offline   Reply With Quote
Old 2008-12-11, 02:55   #479
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

17×251 Posts
Default

When I run out of work on the 18th (possibly 19th), assuming the drive is still running and I haven't lined up new files yet, is IB400 probably the most stable and best for me to run on?
Mini-Geek is offline   Reply With Quote
Old 2008-12-11, 02:58   #480
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

624910 Posts
Default

Quote:
Originally Posted by IronBits View Post
Check the perms chmod 664 *.txt should fix it.
Happens when a lot of folks are trying to get in, and it doesn't come up cleanly trying to lock onto the socket.
When folks are hitting mine real heavy, and I take it down to give it some more pairs, it might take me 15 tries to get it to come back up. Sometimes I just let it sit for a minute or two then try again to.
Okay, thanks--I shut down the server, ran "chmod 664 *.txt", and restarted it. Hopefully that will do the trick.

As for it needing a few more minutes before it can be restarted: yeah, I've had that happen a few times, especially today with needing to restart the server so much. I've been doing the same thing you suggested--letting it sit for a few minutes and then trying again.
mdettweiler is offline   Reply With Quote
Old 2008-12-11, 02:58   #481
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

141518 Posts
Default

Quote:
Originally Posted by Mini-Geek View Post
When I run out of work on the 18th (possibly 19th), assuming the drive is still running and I haven't lined up new files yet, is IB400 probably the most stable and best for me to run on?
Probably. If we time everything right, it will be the last 1st Drive server running at the end.
mdettweiler is offline   Reply With Quote
Old 2008-12-11, 03:20   #482
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Okay, thanks--I shut down the server, ran "chmod 664 *.txt", and restarted it. Hopefully that will do the trick.

As for it needing a few more minutes before it can be restarted: yeah, I've had that happen a few times, especially today with needing to restart the server so much. I've been doing the same thing you suggested--letting it sit for a few minutes and then trying again.
Hmm...it crashed again, just as before. I've put it back on the while-loop workaround that I used before.

Maybe a "chmod 777 *" would work better? I'll try that momentarily...

Edit: Okay, I've tried "chmod 777 *". We'll see if it still crashes after that. (Just to play it safe, though, I've enabled the while-loop thing once again, to ensure that the server isn't down for long periods of time if I'm not right at the computer when it crashes.)

Last fiddled with by mdettweiler on 2008-12-11 at 03:22
mdettweiler is offline   Reply With Quote
Old 2008-12-11, 05:02   #483
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

32·13·89 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Hmm...it crashed again, just as before. I've put it back on the while-loop workaround that I used before.

Maybe a "chmod 777 *" would work better? I'll try that momentarily...

Edit: Okay, I've tried "chmod 777 *". We'll see if it still crashes after that. (Just to play it safe, though, I've enabled the while-loop thing once again, to ensure that the server isn't down for long periods of time if I'm not right at the computer when it crashes.)

Thanks for your attention to detail on this Max. Sounds like a mess.

Any luck with stopping the crashing with this last attempt?
gd_barnes is online now   Reply With Quote
Old 2008-12-11, 05:09   #484
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

101000101011012 Posts
Default

Quote:
Originally Posted by Mini-Geek View Post
When I run out of work on the 18th (possibly 19th), assuming the drive is still running and I haven't lined up new files yet, is IB400 probably the most stable and best for me to run on?

I'll add to what Max said here:

Yes, IB400 is the most stable at this point. But the question will be: Where should you connect at that point? Likely it will be IB400 but it could be one of the others if they are not dried before IB400 is.

Around the 16th or 17th, keep checking the threads. We'll post what needs to be finished off by that point. I'll attempt to balance my machines such that IB400 is the last remaining server with pairs for the 1st drive but I can't guarantee it.

On another note: You know what the most cool thing about this is?: Actually having a general idea of when we will complete something! How many projects are out there that can estimate when they will complete an entire effort that is being run by 10+ people?! This is great! It allows for excellent forward planning on future efforts.


Gary
gd_barnes is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
PRPnet servers for NPLB mdettweiler No Prime Left Behind 228 2018-12-26 04:50
Servers for NPLB gd_barnes No Prime Left Behind 0 2009-08-10 19:21
LLRnet servers for CRUS gd_barnes Conjectures 'R Us 39 2008-07-15 10:26
NPLB LLRnet server discussion em99010pepe No Prime Left Behind 229 2008-04-30 19:13
NPLB LLRnet server #1 - dried em99010pepe No Prime Left Behind 19 2008-03-26 06:19

All times are UTC. The time now is 21:03.


Fri Aug 6 21:03:26 UTC 2021 up 14 days, 15:32, 1 user, load averages: 2.37, 2.52, 2.55

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.