![]() |
[quote=gd_barnes;152814]Is the server completely down?[/quote]
No, it's just going down every 15 minutes (when it prunes the joblist and knpairs files), but I'm manually restarting it every time as soon as I see it. (I've got a VNC window open in the background so I can monitor it.) I'm currently working on a workaround to make it automatically restart the server every time it goes down (to serve as sort of a band-aid fix). |
[quote=mdettweiler;152815]No, it's just going down every 15 minutes (when it prunes the joblist and knpairs files), but I'm manually restarting it every time as soon as I see it. (I've got a VNC window open in the background so I can monitor it.) I'm currently working on a workaround to make it automatically restart the server every time it goes down (to serve as sort of a band-aid fix).[/quote]
Okay, I think I've got a temporary workaround (using "while" loops in bash syntax to continually re-start the server every time it exits) that should keep things going without me restarting it manually every 15 minutes. I'll continue to monitor it, of course, to make sure that the workaround functions properly. |
I'm typing from the machine right now. Is there anything I can do in the next 10 mins. to monitor it?
This has been, by far, my most stable machine. It's never been turned off and never run above 68 C (67 C right now) since I started running it in early May. Taking the cover off to swap out the hard drive was the first time I even had the cover off of it. |
[quote=gd_barnes;152817]I'm typing from the machine right now. Is there anything I can do in the next 10 mins. to monitor it?
This has been, by far, my most stable machine. It's never been turned off and never run above 68 C since I started running it in early May. Taking the cover off to swap out the hard drive was the first time I even had the cover off of it.[/quote] Okay...the workaround seems to be holding. Should be OK as a band-aid fix until we can figure out the root of the problem. As for what's causing this: I honestly don't know. I doubt it could have anything to do with swapping the hard drive; I'm thinking something more along the lines of a messed-up binary. But then again, I don't know--it could be anything. In the meantime, nobody should need to move any machines off G4000; the workaround should keep it running OK for now. |
Check the perms chmod 664 *.txt should fix it.
Happens when a lot of folks are trying to get in, and it doesn't come up cleanly trying to lock onto the socket. When folks are hitting mine real heavy, and I take it down to give it some more pairs, it might take me 15 tries to get it to come back up. Sometimes I just let it sit for a minute or two then try again to. |
When I run out of work on the 18th (possibly 19th), assuming the drive is still running and I haven't lined up new files yet, is IB400 probably the most stable and best for me to run on?
|
[quote=IronBits;152820]Check the perms chmod 664 *.txt should fix it.
Happens when a lot of folks are trying to get in, and it doesn't come up cleanly trying to lock onto the socket. When folks are hitting mine real heavy, and I take it down to give it some more pairs, it might take me 15 tries to get it to come back up. Sometimes I just let it sit for a minute or two then try again to.[/quote] Okay, thanks--I shut down the server, ran "chmod 664 *.txt", and restarted it. Hopefully that will do the trick. :smile: As for it needing a few more minutes before it can be restarted: yeah, I've had that happen a few times, especially today with needing to restart the server so much. I've been doing the same thing you suggested--letting it sit for a few minutes and then trying again. :smile: |
[quote=Mini-Geek;152822]When I run out of work on the 18th (possibly 19th), assuming the drive is still running and I haven't lined up new files yet, is IB400 probably the most stable and best for me to run on?[/quote]
Probably. If we time everything right, it will be the last 1st Drive server running at the end. |
[quote=mdettweiler;152823]Okay, thanks--I shut down the server, ran "chmod 664 *.txt", and restarted it. Hopefully that will do the trick. :smile:
As for it needing a few more minutes before it can be restarted: yeah, I've had that happen a few times, especially today with needing to restart the server so much. I've been doing the same thing you suggested--letting it sit for a few minutes and then trying again. :smile:[/quote] Hmm...it crashed again, just as before. I've put it back on the while-loop workaround that I used before. Maybe a "chmod 777 *" would work better? I'll try that momentarily... Edit: Okay, I've tried "chmod 777 *". We'll see if it still crashes after that. :smile: (Just to play it safe, though, I've enabled the while-loop thing once again, to ensure that the server isn't down for long periods of time if I'm not right at the computer when it crashes.) |
[quote=mdettweiler;152827]Hmm...it crashed again, just as before. I've put it back on the while-loop workaround that I used before.
Maybe a "chmod 777 *" would work better? I'll try that momentarily... Edit: Okay, I've tried "chmod 777 *". We'll see if it still crashes after that. :smile: (Just to play it safe, though, I've enabled the while-loop thing once again, to ensure that the server isn't down for long periods of time if I'm not right at the computer when it crashes.)[/quote] Thanks for your attention to detail on this Max. Sounds like a mess. Any luck with stopping the crashing with this last attempt? |
[quote=Mini-Geek;152822]When I run out of work on the 18th (possibly 19th), assuming the drive is still running and I haven't lined up new files yet, is IB400 probably the most stable and best for me to run on?[/quote]
I'll add to what Max said here: Yes, IB400 is the most stable at this point. But the question will be: Where should you connect at that point? Likely it will be IB400 but it could be one of the others if they are not dried before IB400 is. Around the 16th or 17th, keep checking the threads. We'll post what needs to be finished off by that point. I'll attempt to balance my machines such that IB400 is the last remaining server with pairs for the 1st drive but I can't guarantee it. On another note: You know what the most cool thing about this is?: Actually having a general idea of when we will complete something! How many projects are out there that can estimate when they will complete an entire effort that is being run by 10+ people?! This is great! :smile: It allows for excellent forward planning on future efforts. Gary |
| All times are UTC. The time now is 23:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.