mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > No Prime Left Behind

Reply
 
Thread Tools
Old 2009-06-08, 07:09   #1035
MyDogBuster
 
MyDogBuster's Avatar
 
May 2008
Wilmington, DE

22×709 Posts
Default

Quote:
That's odd. It just stopped itself after doing (what it appears to be) it's hourly pruning process. I just now started it again. It's now proposing pairs to you again.
I'm not getting a thing.
MyDogBuster is offline   Reply With Quote
Old 2009-06-08, 07:23   #1036
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

279316 Posts
Default

Well, it appears to keep sending you different pairs but once again, it stopped after it's hourly pruning process at 2 AM; just like it did at midnight. (It didn't do the 1 AM one.) I've once again restarted it.

I checked it more closely this time and it shows that it is sending you different pairs than before. You must be getting something. Isn't there some command that you've run on your end before? Can you try restarting the affected machines? That's the only thing I can suggest at this point. Something is hung somewhere.

This may be something that Max is going to have to fix. He has some wierd loop script that he runs that keeps restarting the server if it stops right after an outage like this.

Max, what can be done to fix this permanently instead of continuing to patch it with this looping script that you run?


Gary
gd_barnes is offline   Reply With Quote
Old 2009-06-08, 07:24   #1037
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

3·11·307 Posts
Default

I'm going to attempt to run port 4000 on my upstairs Window's laptop and see if it gets pairs. I'll edit this post with whether it works or not.

Edit: I just now saw more pairs being proposed to you for the 1st time in a while. Perhaps you did something to unhang your clients?

Edit 2: I just now ran port 4000 on my upstairs laptop. I received a pair just fine. Instead of checking it every 15-20 mins. this time, I'll leave the server status window open so that I can see if it goes down immediately.

Last fiddled with by gd_barnes on 2009-06-08 at 07:29
gd_barnes is offline   Reply With Quote
Old 2009-06-08, 07:28   #1038
MyDogBuster
 
MyDogBuster's Avatar
 
May 2008
Wilmington, DE

22×709 Posts
Default

I'm getting pairs now. I never touched anything on this end. It just started processing again about 5 minutes ago (3:20AM Eastern).
MyDogBuster is offline   Reply With Quote
Old 2009-06-08, 07:30   #1039
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

1013110 Posts
Default

Wierd. I've had mine do that too, even with David's server after he's had it down for a short while for maintenance. They'll all of a sudden just start up after 2 hours even though he only had the server offline for 5-15 mins. or so.
gd_barnes is offline   Reply With Quote
Old 2009-06-08, 07:35   #1040
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

3·11·307 Posts
Default

AH HA!! I caught it this time. I don't get this pruning process thing. That's only supposed to be once/hour but it did it on the half hour. Right after it did it, the server went down again. This time, I immediately restarted it.

BTW, there is quite a lightning storm here. Don't be surprised if there's another outage.

I won't be up but another hour so you may want to move your machines to another port for the night. Even though I restarted it again, it's possible that they're hung again after the server just stopped itself again for no reason 3 mins. ago.

Edit: I see it proposing pairs to you again and actually scolling down the page so hopefully things are OK again until it goes down again.

Edit 2: More strangness: I see it do it's pruning on port 8000 and it has no problems. I just had an idea: If this things crashes at 3 AM CDT, I'm going to change the pruning period to 24 hours. Perhaps that will stop the problem until Max can look at it.

Last fiddled with by gd_barnes on 2009-06-08 at 07:39
gd_barnes is offline   Reply With Quote
Old 2009-06-08, 07:43   #1041
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

3×11×307 Posts
Default

OH!! I get it now! The prune period is every 15 mins. OK, in 3 mins., I'll be able to watch it again. If it goes down, the prune period is going to 24 hours. The joblist and knpairs files don't need to be cleaned THAT often!

Edit: It just did it again; pruned and went down. Prune period is getting changed now.

Edit 2: Max, when you look into this issue later Monday, I'd suggest setting the prune period to 1 hour minimum. I'm now setting it to 24 hours. Ian, hopefully it will stop crashing now.

Edit 3: Server has been restarted with a prune period of 86400 secs. (24 hours). Hopefully that will be the last of the crashes for tonight. I'll watch it to make sure it doesn't prune shortly after 3 AM. I just now ran a client and it received a pair OK. We'll see if that holds in about a half hour.

Last fiddled with by gd_barnes on 2009-06-08 at 07:53
gd_barnes is offline   Reply With Quote
Old 2009-06-08, 07:44   #1042
MyDogBuster
 
MyDogBuster's Avatar
 
May 2008
Wilmington, DE

22×709 Posts
Default

We may have 2 problems here. The first being the script not always running and hanging the server or running and then hanging, and an IP address change. We had this scenario before when you had a power blip and the IP address changed. It seems that your ISP must trip a new address change after you re-connect after a power blip. It then takes a few hours for all the DNS crap to catch up.

I usually flush the DNS anytime that port hangs. I did that when it first went down and never touched it again. Very strange.
MyDogBuster is offline   Reply With Quote
Old 2009-06-08, 07:51   #1043
MyDogBuster
 
MyDogBuster's Avatar
 
May 2008
Wilmington, DE

22·709 Posts
Default

If the prune works on 8000 then the script for 4000 and 8000 must be different. Thats where I would look first.

If we only prune once a day it will probably mess up the email notification process.
MyDogBuster is offline   Reply With Quote
Old 2009-06-08, 08:16   #1044
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

3·11·307 Posts
Default

Beats me on the notification process. Hopefully Max will get to it before there is a problem there. David's servers prune every hour so I'm not sure why mine need to prune every 15 mins.

It's been about 25 mins. and no prune and no dropped server and a nice scrolling of pairs behind proposed to you. That as well as one pair proposed to me, which the server also correctly showed as cancelled and handed out to you when I did the llrnet -c command on my client.

My work is done here. lol Seriously, I'll check it once again after 3:30 AM CDT.

Edit: One more thing. Although I'm sure it's changed because I have had to recycle my router 2-3 times in the last month in addition to 2 short power outages, AFAIK we've haven't had a problem with any changing IP address in a long time.

Last fiddled with by gd_barnes on 2009-06-08 at 08:18
gd_barnes is offline   Reply With Quote
Old 2009-06-08, 13:23   #1045
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

11000011010012 Posts
Default

Holy cow. Why do all these problems happen when I'm asleep?

First of all, regarding why prunePeriod was set to 15 minutes: I have the status page update every 15 minutes, and I figured it would be good to have it prune at least that often to ensure that knpairs.txt is always kept updated with the latest results. That way, the lowest outstanding n figure on the web page is always current. Thus, if someone's processing results for G4000 or G8000, and, say, they submit one or two last k/n pairs to the server to finish off a range, they don't have to wait an entire hour to proceed. That's turned out to be not as much of a big deal any more since now Gary is requesting results from Karsten a while after that range is done, rather than me doing the results as soon as I see the range complete.

As for the crashing LLRnet servers: I'm not sure why they're doing this. I brought it up in the forum a while back when I first ran into this. David, who'd encountered the same thing on his servers once or twice, said that it was probably a corrupted binary. Possibly the rather abrupt shutdown messed up the binary. (Just grasping at straws here--as I said earlier, I don't have much of a clue to what's going on here. Also, assuming that my theory of a corrupted binary is correct, it could have even been corrupted during an earlier outage and just strung along with the loop thingy until now.)

The solution? I'm going to try swapping out all the servers' LLRnet binaries with a fresh one pulled from my computer. If the problem is indeed due to a corrupted binary, that should fix it. I'll also set prunePeriod back to 1 hour so that if it still is unstable, it won't crash as often. (And in case it still does, I'll put that loop thingy in place to band-aid it up.)

Max

Edit: Okay, servers swapped out, restarted w/loop thingy, and prunePriod set to 1 hour.

Last fiddled with by mdettweiler on 2009-06-08 at 13:33
mdettweiler is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
PRPnet servers for NPLB mdettweiler No Prime Left Behind 228 2018-12-26 04:50
Servers for NPLB gd_barnes No Prime Left Behind 0 2009-08-10 19:21
LLRnet servers for CRUS gd_barnes Conjectures 'R Us 39 2008-07-15 10:26
NPLB LLRnet server discussion em99010pepe No Prime Left Behind 229 2008-04-30 19:13
NPLB LLRnet server #1 - dried em99010pepe No Prime Left Behind 19 2008-03-26 06:19

All times are UTC. The time now is 11:13.

Sat Jun 6 11:13:32 UTC 2020 up 73 days, 8:46, 0 users, load averages: 1.07, 1.50, 1.66

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.