View Single Post
Old 2009-08-19, 23:21   #1200
A Sunny Moo
mdettweiler's Avatar
Aug 2007

11000011010012 Posts

Originally Posted by gd_barnes View Post
Here's why: Likely the originals and the duplicates were handed out at about the same time.
Hmm...I wouldn't be so sure about that. What's more likely is a situation like this:

1) server hands out original to client A
2) client A doesn't return in time for the jobMaxTime
3) server hands out duplicate to client B
4) client A returns a little late, but before client B returns (results are credited to client B)
5) client B returns and is rejected since the server no longer has information on the tests

As you can see, 1) and 3) would be quite a ways apart. Of course, this is only one of a number of situations that could lead to rejected results, but it is the most common. Anyway, long story short, it's not a given that the duplicates and originals were haded out at the same time.

Even in the case of the power outage, they would have been handed out two times a ways apart: one right before the power outage, and another soon after the servers came back online.

One more question: Will getting David's code on to my servers mean that we can avoid the "loop thing" code to restart the servers? If so, that will prevent quite a bit of this "after outage" multiple crashes that we keep encountering.
I don't know. David, have you ever had problems with servers crashing, especially after things like power outages? (Not that you usually encounter those since you have a UPS, but...) If so, how do you deal with them?

mdettweiler is offline   Reply With Quote