![]() |
Well, crap. I just connected to port G8000 10 mins. ago since David's servers weren't working on my end. I only got about 10-15 pairs but several were older, i.e. about n=957K. But why would it have expired them and handed them to me? Max, any thoughts?
[LEFT] [/LEFT] Karsten, since I'm able to connect to David's machines now, I'll stop my port G8000 connections and return those pairs to the server. You might try re-sending them back to the server after I edit this port that I have returned them. What a friggin mess! Gary |
[quote=kar_bon;186113]GB7000 and GB8000 (only tested those) are online again.
BUT: for port 7000 i got 2 rejected pairs (perhaps more the next 15 minutes) but this shouldn't happen, because those pairs were assigned max. 1 hour ago! the same for port 8000: and here it's quite heavier! many pairs at n=969k! so much time lost! could someone explain this?![/quote] I think that's because the servers don't factor in the fact that they were down when determining whether a pair should expire. They only look at the raw times and subtract them from the current time. Thus, if the k/n pair was assigned more than 24 hours ago, then it will be expired, regardless of server downtime. Generally, if a server is down for a long period of time, we either a) temporarily change the jobMaxTime to something longer to avoid such cancellations; or b) tell people to avoid grabbing new work from the server until everyone's had the chance to return their results (we'll usually do this is, say, just one person has a large # of k/n pairs that need to be returned). @Gary: Ah, that explains it. When you pulled down a couple of k/n pairs to test the servers, the server expired exactly that many k/n pairs of Karsten's that were more than 24 hours old, and gave them to you. |
n=957k? on port GB8000?
i've processed those n-range on 2009-08-07! so those pairs had to be handed out several times before you got them now! another thing: in the rejected file: [code] user=kar_bon [2009-08-17 04:06:20] 327*2^969546-1 is not prime. Res64: 0D67CB813245264B Time : 66189.0 sec. [/code] if the jobMaxTime is 1 day (86400 secs) so why i got this pair rejected after 66000 secs? where're the Gremlins in here? |
[quote=mdettweiler;186116]I think that's because the servers don't factor in the fact that they were down when determining whether a pair should expire. They only look at the raw times and subtract them from the current time. Thus, if the k/n pair was assigned more than 24 hours ago, then it will be expired, regardless of server downtime.
Generally, if a server is down for a long period of time, we either a) temporarily change the jobMaxTime to something longer to avoid such cancellations; or b) tell people to avoid grabbing new work from the server until everyone's had the chance to return their results (we'll usually do this is, say, just one person has a large # of k/n pairs that need to be returned). @Gary: Ah, that explains it. When you pulled down a couple of k/n pairs to test the servers, the server expired exactly that many k/n pairs of Karsten's that were more than 24 hours old, and gave them to you.[/quote] HUH?????????????????????????????? 1. My servers have a JobMaxTime of 3 days. 2. My servers were offline for a grand total of ONE hour. Another explanation is in order please. Thanks. Gary |
[quote=kar_bon;186118]n=957k? on port GB8000?
i've processed those n-range on 2009-08-07! so those pairs had to be handed out several times before you got them now! another thing: in the rejected file: [code] user=kar_bon [2009-08-17 04:06:20] 327*2^969546-1 is not prime. Res64: 0D67CB813245264B Time : 66189.0 sec. [/code] if the jobMaxTime is 1 day (86400 secs) so why i got this pair rejected after 66000 secs? where're the Gremlins in here?[/quote] Uh...according to the status page on [URL]http://nplb-gb1.no-ip.org/llrnet/[/URL], G8000's lowest outstanding n is around 957K. That sounds about right given what you're describing. |
[quote=gd_barnes;186120]HUH??????????????????????????????
1. My servers have a JobMaxTime of 3 days. 2. My servers were offline for a grand total of ONE hour. Another explanation is in order please. Thanks. Gary[/quote] Hmm...I see. First of all, your servers have been at 1 day for a while (we set them to that a while back for reasons I don't remember off the top of my head, and we never bothered to set them back). As for the servers being off for one hour, if Karsten had cached pairs from over 24 hours ago, that 1 hour might have been just enough to throw a wrench into his plans to return them before the deadline. |
Karsten,
I've now returned about 10 unprocessed pairs to port G8000. I ended up returning residues on a total of 4 of them before stopping. But I'm still baffled...all the ones I processed were n=~970K. I haven't a clue as to what is going on. I KNOW I had some pairs around n=~957K in my queue. Perhaps it handed out n=~970K, then n=~957K, then more n=~970K. Heck, I don't know. It doesn't matter. There's nothing we can do about it now. If you can try re-returning your rejected results to the server, go for it. I swear, I'm getting just "this" close to running this entire project with manual files. Gary |
[quote=mdettweiler;186123]Hmm...I see. First of all, your servers have been at 1 day for a while (we set them to that a while back for reasons I don't remember off the top of my head, and we never bothered to set them back). As for the servers being off for one hour, if Karsten had cached pairs from over 24 hours ago, that 1 hour might have been just enough to throw a wrench into his plans to return them before the deadline.[/quote]
You told me they were back at 3 days again quite a while ago after everyone had this big argument over that. The agreement was that David's would be 1 day and mine 3 days except for IB9000 that was put at 2 days. Oh well, never mind. In the future, I'll check the JobMaxTime myself and set it at whatever I deem appropriate and simply let everyone know what that is. The democratic process on that has not worked at all. |
the pairs from GB7000 assigned for me about 1 hour ago, just before the outrage!
pairs at n=117k (as the two rejected) will processed in 30-40 seconds and my WUCacheSize = 3! so this shouldn't happen! if Gary get some pairs from GB7000 there'e must be something wrong with the joblist.txt not saved regular. and the pair (345 957466) from port GB8000: i don't know why this happens. this pair is not in results-file! i've processed those n-range on 2009-08-06 (see those resultfile) so why is this pair not handed out earlier? so please look in the joblist, who has this pair assigned! it's still in the status report as first unprocessed one! |
[quote=gd_barnes;186124]Karsten,
I've now returned about 10 unprocessed pairs to port G8000. I ended returning residues on a total of 4 of them before stopping. But I'm still baffled...all the ones I processed were n=~970K. I haven't a clue as to what is going on. I KNOW I had some pairs around n=~957K in my queue. Perhaps it handed out n=~970K, then n=~957K, then more n=~970K. Heck, I don't know. It doesn't matter. There's nothing we can do about it now. If you can try re-returning your rejected results to the server, go for it. I swear, I'm getting just "this" close to running this entire project with manual files. Gary[/quote] I think it had something to do with the fact that when MooMoo unreserved the various reservations he had at the tail end of the mini-drive, we loaded some stuff in a funny order. I don't exactly remember how we did it all at the time, but at any rate, it seems that's the explanation for it. Nothing to worry about, everything should straighten itself out in the end. :smile: |
[quote=kar_bon;186132]the pairs from GB7000 assigned for me about 1 hour ago, just before the outrage!
pairs at n=117k (as the two rejected) will processed in 30-40 seconds and my WUCacheSize = 3! so this shouldn't happen! if Gary get some pairs from GB7000 there'e must be something wrong with the joblist.txt not saved regular. and the pair (345 957466) from port GB8000: i don't know why this happens. this pair is not in results-file! i've processed those n-range on 2009-08-06 (see those resultfile) so why is this pair not handed out earlier?[/quote] Regarding the G7000 pairs, my guess is that due to the sudden interruption of power from the outage, the server didn't have the chance to update joblist.txt with the latest happenings, and thus it lost the last minute or two of data. That would quite believably cause a few rejected pairs. No big deal, they're small enough that hardly any work was wasted. As for G8000, see my last message for an explanation of that. |
| All times are UTC. The time now is 22:20. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.