![]() |
![]() |
#1035 | |
May 2008
Wilmington, DE
22·23·31 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#1036 |
May 2007
Kansas; USA
284716 Posts |
![]()
Well, it appears to keep sending you different pairs but once again, it stopped after it's hourly pruning process at 2 AM; just like it did at midnight. (It didn't do the 1 AM one.) I've once again restarted it.
I checked it more closely this time and it shows that it is sending you different pairs than before. You must be getting something. Isn't there some command that you've run on your end before? Can you try restarting the affected machines? That's the only thing I can suggest at this point. Something is hung somewhere. This may be something that Max is going to have to fix. He has some wierd loop script that he runs that keeps restarting the server if it stops right after an outage like this. Max, what can be done to fix this permanently instead of continuing to patch it with this looping script that you run? Gary |
![]() |
![]() |
![]() |
#1037 |
May 2007
Kansas; USA
3·7·491 Posts |
![]()
I'm going to attempt to run port 4000 on my upstairs Window's laptop and see if it gets pairs. I'll edit this post with whether it works or not.
Edit: I just now saw more pairs being proposed to you for the 1st time in a while. Perhaps you did something to unhang your clients? Edit 2: I just now ran port 4000 on my upstairs laptop. I received a pair just fine. Instead of checking it every 15-20 mins. this time, I'll leave the server status window open so that I can see if it goes down immediately. Last fiddled with by gd_barnes on 2009-06-08 at 07:29 |
![]() |
![]() |
![]() |
#1038 |
May 2008
Wilmington, DE
22·23·31 Posts |
![]()
I'm getting pairs now. I never touched anything on this end. It just started processing again about 5 minutes ago (3:20AM Eastern).
|
![]() |
![]() |
![]() |
#1039 |
May 2007
Kansas; USA
241078 Posts |
![]()
Wierd. I've had mine do that too, even with David's server after he's had it down for a short while for maintenance. They'll all of a sudden just start up after 2 hours even though he only had the server offline for 5-15 mins. or so.
|
![]() |
![]() |
![]() |
#1040 |
May 2007
Kansas; USA
1031110 Posts |
![]()
AH HA!! I caught it this time. I don't get this pruning process thing. That's only supposed to be once/hour but it did it on the half hour. Right after it did it, the server went down again. This time, I immediately restarted it.
BTW, there is quite a lightning storm here. Don't be surprised if there's another outage. I won't be up but another hour so you may want to move your machines to another port for the night. Even though I restarted it again, it's possible that they're hung again after the server just stopped itself again for no reason 3 mins. ago. Edit: I see it proposing pairs to you again and actually scolling down the page so hopefully things are OK again until it goes down again. Edit 2: More strangness: I see it do it's pruning on port 8000 and it has no problems. I just had an idea: If this things crashes at 3 AM CDT, I'm going to change the pruning period to 24 hours. Perhaps that will stop the problem until Max can look at it. Last fiddled with by gd_barnes on 2009-06-08 at 07:39 |
![]() |
![]() |
![]() |
#1041 |
May 2007
Kansas; USA
3·7·491 Posts |
![]()
OH!! I get it now! The prune period is every 15 mins. OK, in 3 mins., I'll be able to watch it again. If it goes down, the prune period is going to 24 hours. The joblist and knpairs files don't need to be cleaned THAT often!
![]() Edit: It just did it again; pruned and went down. Prune period is getting changed now. Edit 2: Max, when you look into this issue later Monday, I'd suggest setting the prune period to 1 hour minimum. I'm now setting it to 24 hours. Ian, hopefully it will stop crashing now. Edit 3: Server has been restarted with a prune period of 86400 secs. (24 hours). Hopefully that will be the last of the crashes for tonight. I'll watch it to make sure it doesn't prune shortly after 3 AM. I just now ran a client and it received a pair OK. We'll see if that holds in about a half hour. Last fiddled with by gd_barnes on 2009-06-08 at 07:53 |
![]() |
![]() |
![]() |
#1042 |
May 2008
Wilmington, DE
22·23·31 Posts |
![]()
We may have 2 problems here. The first being the script not always running and hanging the server or running and then hanging, and an IP address change. We had this scenario before when you had a power blip and the IP address changed. It seems that your ISP must trip a new address change after you re-connect after a power blip. It then takes a few hours for all the DNS crap to catch up.
I usually flush the DNS anytime that port hangs. I did that when it first went down and never touched it again. Very strange. |
![]() |
![]() |
![]() |
#1043 |
May 2008
Wilmington, DE
22×23×31 Posts |
![]()
If the prune works on 8000 then the script for 4000 and 8000 must be different. Thats where I would look first.
If we only prune once a day it will probably mess up the email notification process. |
![]() |
![]() |
![]() |
#1044 |
May 2007
Kansas; USA
284716 Posts |
![]()
Beats me on the notification process. Hopefully Max will get to it before there is a problem there. David's servers prune every hour so I'm not sure why mine need to prune every 15 mins.
It's been about 25 mins. and no prune and no dropped server and a nice scrolling of pairs behind proposed to you. That as well as one pair proposed to me, which the server also correctly showed as cancelled and handed out to you when I did the llrnet -c command on my client. My work is done here. lol Seriously, I'll check it once again after 3:30 AM CDT. Edit: One more thing. Although I'm sure it's changed because I have had to recycle my router 2-3 times in the last month in addition to 2 short power outages, AFAIK we've haven't had a problem with any changing IP address in a long time. Last fiddled with by gd_barnes on 2009-06-08 at 08:18 |
![]() |
![]() |
![]() |
#1045 |
A Sunny Moo
Aug 2007
USA (GMT-5)
3×2,083 Posts |
![]()
Holy cow. Why do all these problems happen when I'm asleep?
![]() First of all, regarding why prunePeriod was set to 15 minutes: I have the status page update every 15 minutes, and I figured it would be good to have it prune at least that often to ensure that knpairs.txt is always kept updated with the latest results. That way, the lowest outstanding n figure on the web page is always current. Thus, if someone's processing results for G4000 or G8000, and, say, they submit one or two last k/n pairs to the server to finish off a range, they don't have to wait an entire hour to proceed. That's turned out to be not as much of a big deal any more since now Gary is requesting results from Karsten a while after that range is done, rather than me doing the results as soon as I see the range complete. As for the crashing LLRnet servers: I'm not sure why they're doing this. I brought it up in the forum a while back when I first ran into this. David, who'd encountered the same thing on his servers once or twice, said that it was probably a corrupted binary. Possibly the rather abrupt shutdown messed up the binary. (Just grasping at straws here--as I said earlier, I don't have much of a clue to what's going on here. Also, assuming that my theory of a corrupted binary is correct, it could have even been corrupted during an earlier outage and just strung along with the loop thingy until now.) The solution? I'm going to try swapping out all the servers' LLRnet binaries with a fresh one pulled from my computer. If the problem is indeed due to a corrupted binary, that should fix it. I'll also set prunePeriod back to 1 hour so that if it still is unstable, it won't crash as often. (And in case it still does, I'll put that loop thingy in place to band-aid it up.) Max ![]() Edit: Okay, servers swapped out, restarted w/loop thingy, and prunePriod set to 1 hour. ![]() Last fiddled with by mdettweiler on 2009-06-08 at 13:33 |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
PRPnet servers for NPLB | mdettweiler | No Prime Left Behind | 228 | 2018-12-26 04:50 |
Servers for NPLB | gd_barnes | No Prime Left Behind | 0 | 2009-08-10 19:21 |
LLRnet servers for CRUS | gd_barnes | Conjectures 'R Us | 39 | 2008-07-15 10:26 |
NPLB LLRnet server discussion | em99010pepe | No Prime Left Behind | 229 | 2008-04-30 19:13 |
NPLB LLRnet server #1 - dried | em99010pepe | No Prime Left Behind | 19 | 2008-03-26 06:19 |