2009-06-08, 13:23   #1045
A Sunny Moo
mdettweiler
Aug 2007

Holy cow. Why do all these problems happen when I'm asleep?

First of all, regarding why prunePeriod was set to 15 minutes: I have the status page update every 15 minutes, and I figured it would be good to have it prune at least that often to ensure that knpairs.txt is always kept updated with the latest results. That way, the lowest outstanding n figure on the web page is always current. Thus, if someone's processing results for G4000 or G8000, and, say, they submit one or two last k/n pairs to the server to finish off a range, they don't have to wait an entire hour to proceed. That's turned out to be not as much of a big deal any more since now Gary is requesting results from Karsten a while after that range is done, rather than me doing the results as soon as I see the range complete.

As for the crashing LLRnet servers: I'm not sure why they're doing this. I brought it up in the forum a while back when I first ran into this. David, who'd encountered the same thing on his servers once or twice, said that it was probably a corrupted binary. Possibly the rather abrupt shutdown messed up the binary. (Just grasping at straws here--as I said earlier, I don't have much of a clue to what's going on here. Also, assuming that my theory of a corrupted binary is correct, it could have even been corrupted during an earlier outage and just strung along with the loop thingy until now.)

The solution? I'm going to try swapping out all the servers' LLRnet binaries with a fresh one pulled from my computer. If the problem is indeed due to a corrupted binary, that should fix it. I'll also set prunePeriod back to 1 hour so that if it still is unstable, it won't crash as often. (And in case it still does, I'll put that loop thingy in place to band-aid it up.)


Edit: Okay, servers swapped out, restarted w/loop thingy, and prunePriod set to 1 hour.

Last fiddled with by mdettweiler on 2009-06-08 at 13:33
