mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > No Prime Left Behind

Reply
 
Thread Tools
Old 2009-08-17, 21:40   #1156
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

236218 Posts
Default

Well, crap. I just connected to port G8000 10 mins. ago since David's servers weren't working on my end. I only got about 10-15 pairs but several were older, i.e. about n=957K. But why would it have expired them and handed them to me? Max, any thoughts?

Karsten, since I'm able to connect to David's machines now, I'll stop my port G8000 connections and return those pairs to the server. You might try re-sending them back to the server after I edit this port that I have returned them.

What a friggin mess!


Gary
gd_barnes is offline   Reply With Quote
Old 2009-08-17, 21:46   #1157
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

141518 Posts
Default

Quote:
Originally Posted by kar_bon View Post
GB7000 and GB8000 (only tested those) are online again.

BUT:

for port 7000 i got 2 rejected pairs (perhaps more the next 15 minutes) but this shouldn't happen, because those pairs were assigned max. 1 hour ago!

the same for port 8000: and here it's quite heavier! many pairs at n=969k! so much time lost!

could someone explain this?!
I think that's because the servers don't factor in the fact that they were down when determining whether a pair should expire. They only look at the raw times and subtract them from the current time. Thus, if the k/n pair was assigned more than 24 hours ago, then it will be expired, regardless of server downtime.

Generally, if a server is down for a long period of time, we either a) temporarily change the jobMaxTime to something longer to avoid such cancellations; or b) tell people to avoid grabbing new work from the server until everyone's had the chance to return their results (we'll usually do this is, say, just one person has a large # of k/n pairs that need to be returned).

@Gary: Ah, that explains it. When you pulled down a couple of k/n pairs to test the servers, the server expired exactly that many k/n pairs of Karsten's that were more than 24 hours old, and gave them to you.

Last fiddled with by mdettweiler on 2009-08-17 at 21:46
mdettweiler is offline   Reply With Quote
Old 2009-08-17, 21:47   #1158
kar_bon
 
kar_bon's Avatar
 
Mar 2006
Germany

24·52·7 Posts
Default

n=957k? on port GB8000?

i've processed those n-range on 2009-08-07! so those pairs had to be handed out several times before you got them now!

another thing:
in the rejected file:
Code:
user=kar_bon
[2009-08-17 04:06:20]
327*2^969546-1 is not prime.  Res64: 0D67CB813245264B  Time : 66189.0 sec.
if the jobMaxTime is 1 day (86400 secs) so why i got this pair rejected after 66000 secs?

where're the Gremlins in here?

Last fiddled with by kar_bon on 2009-08-17 at 21:49
kar_bon is offline   Reply With Quote
Old 2009-08-17, 21:52   #1159
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

7×1,447 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
I think that's because the servers don't factor in the fact that they were down when determining whether a pair should expire. They only look at the raw times and subtract them from the current time. Thus, if the k/n pair was assigned more than 24 hours ago, then it will be expired, regardless of server downtime.

Generally, if a server is down for a long period of time, we either a) temporarily change the jobMaxTime to something longer to avoid such cancellations; or b) tell people to avoid grabbing new work from the server until everyone's had the chance to return their results (we'll usually do this is, say, just one person has a large # of k/n pairs that need to be returned).

@Gary: Ah, that explains it. When you pulled down a couple of k/n pairs to test the servers, the server expired exactly that many k/n pairs of Karsten's that were more than 24 hours old, and gave them to you.


HUH??????????????????????????????

1. My servers have a JobMaxTime of 3 days.

2. My servers were offline for a grand total of ONE hour.


Another explanation is in order please. Thanks.


Gary
gd_barnes is offline   Reply With Quote
Old 2009-08-17, 21:52   #1160
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Quote:
Originally Posted by kar_bon View Post
n=957k? on port GB8000?

i've processed those n-range on 2009-08-07! so those pairs had to be handed out several times before you got them now!

another thing:
in the rejected file:
Code:
user=kar_bon
[2009-08-17 04:06:20]
327*2^969546-1 is not prime.  Res64: 0D67CB813245264B  Time : 66189.0 sec.
if the jobMaxTime is 1 day (86400 secs) so why i got this pair rejected after 66000 secs?

where're the Gremlins in here?
Uh...according to the status page on http://nplb-gb1.no-ip.org/llrnet/, G8000's lowest outstanding n is around 957K. That sounds about right given what you're describing.
mdettweiler is offline   Reply With Quote
Old 2009-08-17, 21:54   #1161
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
HUH??????????????????????????????

1. My servers have a JobMaxTime of 3 days.

2. My servers were offline for a grand total of ONE hour.


Another explanation is in order please. Thanks.


Gary
Hmm...I see. First of all, your servers have been at 1 day for a while (we set them to that a while back for reasons I don't remember off the top of my head, and we never bothered to set them back). As for the servers being off for one hour, if Karsten had cached pairs from over 24 hours ago, that 1 hour might have been just enough to throw a wrench into his plans to return them before the deadline.
mdettweiler is offline   Reply With Quote
Old 2009-08-17, 21:57   #1162
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

7×1,447 Posts
Default

Karsten,

I've now returned about 10 unprocessed pairs to port G8000. I ended up returning residues on a total of 4 of them before stopping.

But I'm still baffled...all the ones I processed were n=~970K. I haven't a clue as to what is going on. I KNOW I had some pairs around n=~957K in my queue. Perhaps it handed out n=~970K, then n=~957K, then more n=~970K. Heck, I don't know. It doesn't matter. There's nothing we can do about it now.

If you can try re-returning your rejected results to the server, go for it.

I swear, I'm getting just "this" close to running this entire project with manual files.


Gary

Last fiddled with by gd_barnes on 2009-08-17 at 22:00
gd_barnes is offline   Reply With Quote
Old 2009-08-17, 22:01   #1163
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

236218 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Hmm...I see. First of all, your servers have been at 1 day for a while (we set them to that a while back for reasons I don't remember off the top of my head, and we never bothered to set them back). As for the servers being off for one hour, if Karsten had cached pairs from over 24 hours ago, that 1 hour might have been just enough to throw a wrench into his plans to return them before the deadline.
You told me they were back at 3 days again quite a while ago after everyone had this big argument over that. The agreement was that David's would be 1 day and mine 3 days except for IB9000 that was put at 2 days. Oh well, never mind. In the future, I'll check the JobMaxTime myself and set it at whatever I deem appropriate and simply let everyone know what that is. The democratic process on that has not worked at all.

Last fiddled with by gd_barnes on 2009-08-17 at 22:05
gd_barnes is offline   Reply With Quote
Old 2009-08-17, 22:04   #1164
kar_bon
 
kar_bon's Avatar
 
Mar 2006
Germany

24·52·7 Posts
Default

the pairs from GB7000 assigned for me about 1 hour ago, just before the outrage!

pairs at n=117k (as the two rejected) will processed in 30-40 seconds and my WUCacheSize = 3!

so this shouldn't happen!

if Gary get some pairs from GB7000 there'e must be something wrong with the joblist.txt not saved regular.

and the pair (345 957466) from port GB8000:
i don't know why this happens. this pair is not in results-file!
i've processed those n-range on 2009-08-06 (see those resultfile) so why is this pair not handed out earlier?

so please look in the joblist, who has this pair assigned!
it's still in the status report as first unprocessed one!

Last fiddled with by kar_bon on 2009-08-17 at 22:07
kar_bon is offline   Reply With Quote
Old 2009-08-17, 22:04   #1165
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

186916 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
Karsten,

I've now returned about 10 unprocessed pairs to port G8000. I ended returning residues on a total of 4 of them before stopping.

But I'm still baffled...all the ones I processed were n=~970K. I haven't a clue as to what is going on. I KNOW I had some pairs around n=~957K in my queue. Perhaps it handed out n=~970K, then n=~957K, then more n=~970K. Heck, I don't know. It doesn't matter. There's nothing we can do about it now.

If you can try re-returning your rejected results to the server, go for it.

I swear, I'm getting just "this" close to running this entire project with manual files.


Gary
I think it had something to do with the fact that when MooMoo unreserved the various reservations he had at the tail end of the mini-drive, we loaded some stuff in a funny order. I don't exactly remember how we did it all at the time, but at any rate, it seems that's the explanation for it.

Nothing to worry about, everything should straighten itself out in the end.
mdettweiler is offline   Reply With Quote
Old 2009-08-17, 22:08   #1166
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

186916 Posts
Default

Quote:
Originally Posted by kar_bon View Post
the pairs from GB7000 assigned for me about 1 hour ago, just before the outrage!

pairs at n=117k (as the two rejected) will processed in 30-40 seconds and my WUCacheSize = 3!

so this shouldn't happen!

if Gary get some pairs from GB7000 there'e must be something wrong with the joblist.txt not saved regular.

and the pair (345 957466) from port GB8000:
i don't know why this happens. this pair is not in results-file!
i've processed those n-range on 2009-08-06 (see those resultfile) so why is this pair not handed out earlier?
Regarding the G7000 pairs, my guess is that due to the sudden interruption of power from the outage, the server didn't have the chance to update joblist.txt with the latest happenings, and thus it lost the last minute or two of data. That would quite believably cause a few rejected pairs. No big deal, they're small enough that hardly any work was wasted.

As for G8000, see my last message for an explanation of that.
mdettweiler is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
PRPnet servers for NPLB mdettweiler No Prime Left Behind 228 2018-12-26 04:50
Servers for NPLB gd_barnes No Prime Left Behind 0 2009-08-10 19:21
LLRnet servers for CRUS gd_barnes Conjectures 'R Us 39 2008-07-15 10:26
NPLB LLRnet server discussion em99010pepe No Prime Left Behind 229 2008-04-30 19:13
NPLB LLRnet server #1 - dried em99010pepe No Prime Left Behind 19 2008-03-26 06:19

All times are UTC. The time now is 14:03.

Mon Jun 1 14:03:35 UTC 2020 up 68 days, 11:36, 2 users, load averages: 2.00, 1.86, 1.78

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.