mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Conjectures 'R Us

Reply
 
Thread Tools
Old 2009-08-03, 23:04   #34
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24×397 Posts
Default

Lennart, can you tell me what the client does with the workunits when this happens? Does it delete them or save them and try again?
rogue is offline   Reply With Quote
Old 2009-08-03, 23:33   #35
Lennart
 
Lennart's Avatar
 
"Lennart"
Jun 2007

25·5·7 Posts
Default

Code:
[2009-08-03 16:44:08 GMT] Total Time:  2:12:11  Total Tests: 15  Total PRPs Found: 0
[2009-08-03 16:44:53 GMT] crus: Returning work to server nplb-gb1.no-ip.org at port 3000
[2009-08-03 16:47:10 GMT] nplb-gb1.no-ip.org:3000 connect to socket failed
[2009-08-03 16:47:10 GMT] nplb-gb1.no-ip.org:3000 connect to socket failed
[2009-08-03 16:47:10 GMT] nplb-gb1.no-ip.org:3000 connect to socket failed
[2009-08-03 16:47:11 GMT] nplb-gb1.no-ip.org:3000 connect to socket failed
[2009-08-03 16:47:11 GMT] nplb-gb1.no-ip.org:3000 connect to socket failed
[2009-08-03 16:47:11 GMT] 27121: Getting work from server prpnet.primegrid.com at port 12006
[2009-08-03 17:49:36 GMT] 27121: 27*2^1543462+1 is not prime.  Residue 2D44561896DD41CE
[2009-08-03 17:49:36 GMT] Total Time:  3:17:39  Total Tests: 16  Total PRPs Found: 0
[2009-08-03 17:49:36 GMT] 27121: Returning work to server prpnet.primegrid.com at port 12006
[2009-08-03 17:49:38 GMT] 27121: INFO: Test for candidate 27*2^1543462+1 accepted
[2009-08-03 17:49:38 GMT] 27121: INFO: All 1 test results were accepted
[2009-08-03 17:49:38 GMT] crus: Returning work to server nplb-gb1.no-ip.org at port 3000
[2009-08-03 17:49:43 GMT] crus: ERROR: Workunit 124221*6^148285+1 not found on server
[2009-08-03 17:49:43 GMT] crus: The client will delete this workunit
[2009-08-03 17:49:44 GMT] crus: INFO: Test for candidate 74612*6^148287+1 accepted
[2009-08-03 17:49:45 GMT] crus: INFO: Test for candidate 172257*6^148286+1 accepted
[2009-08-03 17:49:45 GMT] crus: INFO: 2 of 3 test results were accepted
[2009-08-03 17:49:46 GMT] crus: Getting work from server nplb-gb1.no-ip.org at port 3000
[2009-08-03 17:49:47 GMT] crus: INFO: No available candidates are left on this server.
[2009-08-03 17:49:48 GMT] crus: Getting work from server nplb-gb1.no-ip.org at port 3000
[2009-08-03 17:49:49 GMT] crus: INFO: No available candidates are left on this server.
[2009-08-03 17:49:50 GMT] crus: Getting work from server nplb-gb1.no-ip.org at port 3000
[2009-08-03 17:49:51 GMT] crus: INFO: No available candidates are left on this server.
[2009-08-03 17:49:52 GMT] crus: Getting work from server nplb-gb1.no-ip.org at port 3000
[2009-08-03 17:49:53 GMT] crus: INFO: No available candidates are left on this server.
[2009-08-03 17:49:54 GMT] crus: Getting work from server nplb-gb1.no-ip.org at port 3000
[2009-08-03 17:49:55 GMT] crus: INFO: No available candidates are left on this server.
[2009-08-03 17:49:56 GMT] crus: Getting work from server nplb-gb1.no-ip.org at port 3000
[2009-08-03 17:49:57 GMT] crus: INFO: No available candidates are left on this server.
[2009-08-03 17:49:57 GMT] 27121: Getting work from server prpnet.primegrid.com at port 12006
[2009-08-03 18:43:33 GMT] 27121: 27*2^1543856+1 is not prime.  Residue DE6D07D9F6EA4450
[2009-08-03 18:43:33 GMT] Total Time:  4:11:36  Total Tests: 17  Total PRPs Found: 0
[2009-08-03 18:43:34 GMT] 27121: Returning work to server prpnet.primegrid.com at port 12006
[2009-08-03 18:43:37 GMT] 27121: INFO: Test for candidate 27*2^1543856+1 accepted
Here is the log as you see my clock was not correct.

I have started setting all on debug=1

Lennart
Lennart is offline   Reply With Quote
Old 2009-08-03, 23:42   #36
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24×397 Posts
Default

Quote:
Originally Posted by Lennart View Post
Here is the log as you see my clock was not correct.

I have started setting all on debug=1

Lennart
As soon as you get the log, that would be great. There isn't enough information in these lines to give me a clear picture of what happened.
rogue is offline   Reply With Quote
Old 2009-08-04, 00:08   #37
Lennart
 
Lennart's Avatar
 
"Lennart"
Jun 2007

25·5·7 Posts
Talking

Quote:
Originally Posted by mdettweiler View Post
I'm afraid I haven't encountered this issue myself on the client end, so I can't help you there; Lennart, could you possibly put all of your G3000 clients on level 1 debug logging, if they aren't already?

They are now

Lennart
Lennart is offline   Reply With Quote
Old 2009-08-04, 04:27   #38
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

28A316 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
All right, 140K-150K has been reloaded exactly as described above. I see Lennart's hungry machines have already swooped in and grabbed a bunch of work.

Checking the various web pages:
http://nplb-gb1.no-ip.org:3000/ (a.k.a. server_stats.html) checks out except for the "Min N" label on both the Min N and Max N columns, which is a known bug for Sierpinski/Riesel mode in PRPnet and will be fixed in a future release, but is only a cosmetic error for now.

http://nplb-gb1.no-ip.org:3000/server_status.html still looks kind of weird:

It seems that this page is displaying inaccurately, due to a bug in PRPnet. However, as before, this is a cosmetic error (even if a bit more serious than the extra "Min N" label), and regardless of how the words seem to come out on the page, the basic information does line up with the following (as of when I pulled down this page):
-k's remaining: 30
-n's remaining: 4065
-min n remaining: 140005
The # of digits doesn't seem to be presented at all, regardless of the wording.

http://nplb-gb1.no-ip.org:3000/user_stats.html checks out.

I'll report the bug on the server_status page to Mark. However, since all the information on that page can be gathered from the server_stats/home page anyway, it's somewhat less of a big deal since it's not affecting anything except the display of the data. So, we should be OK even with that bug present.

Max

Gotta have all interfaces (as weel as barfing) fixed including grammar/spacing/clarity/etc. before we load n>150K even if it requires a new PRPnet release. We'll keep rerunning n=140K-150K until we get a clean test. The "n's remaining" of 4065 is misleading. It should say something like "pairs remaining" (assuming that is what it is referring to.)

Thanks for getting that going. It's good to see the # of k's remaining is correct now.


Thanks,
Gary
gd_barnes is online now   Reply With Quote
Old 2009-08-04, 04:30   #39
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

242438 Posts
Default

Could these problems with "barfing" be as a result of my servers not being able to handle a very big load? That's quite a bit of crunching power on there by Lennart. (I have the equivalent of 2 cores on there, i.e. a 50-50 split with another effort on a full quad.)

We should definitively know about load being a possible problem when port G5000 at NPLB gets rolling with the very teeny tests that will only take a few secs. each. That should be a big load even with just a few quads on it!
gd_barnes is online now   Reply With Quote
Old 2009-08-04, 04:42   #40
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

1040310 Posts
Default

Something new for this time around:

First, I believe this drive is being processed by n-vaule. Based on that, I see that at http://nplb-gb1.no-ip.org:3000/ k=124125 and 124221 have a min n of 140006 and 140005 respectively even though most of this testing effort is at n>143K. Could it be because someone has received some pairs that haven't been returned to the server in a long time. I'm trying to determine if the "min n" is updating properly on all k's.

Second, the max n is showing as n=~148.7K for nearly all k's. (~147.6K for a few k's, perhaps because they are lower weight?) It should be showing as n=~150K for all k's unless the full n=140K-150K range was not loaded. This looked correct last time. How come it doesn't look correct this time?


Final question: How frequently is the "min n" and "max n" by k page updated?


Thanks,
Gary

Last fiddled with by gd_barnes on 2009-08-04 at 04:48
gd_barnes is online now   Reply With Quote
Old 2009-08-04, 04:50   #41
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

11000011010012 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
Something new for this time around:

First, I believe this drive is being processed by n-vaule. Based on that, I see that at http://nplb-gb1.no-ip.org:3000/ k=124125 and 124221 have a min n of 140006 and 140005 respectively even though most of this testing effort is at n>143K. Could it be because someone has received some pairs that haven't been returned to the server in a long time. I'm trying to determine if the "min n" is updating properly on all k's.

Second, the max n is showing as n=~148.7K for all k's. It should be showing as n=~150K unless the full n=140K-150K range was not loaded. This looked correct last time. How come it doesn't look correct this time?


Final question: How frequently is the "min n" and "max n" by k page updated?


Thanks,
Gary
Min n and Max n are updated every 5 minutes; intermediate changes are shown in the columns to the right of those. (The intermediate changes are absorbed into the larger Min n and Max n columns at the 5 minute updates when completed tests are removed from the prpserver.candidates file.)

Regarding the barfing possibly being related to server stress: no, I've talked with Mark and he's definitely confirmed that there's a server bug that needs to be fixed, as well as possibly a client bug pending further investigation of debug.log files. He's sent me a fix for the server side of things (which should definitely fix the barfing), though he asked me not to apply the fixed version to the server yet so Lennart's clients can get a chance to catch a log of their end of the barfing in their debug.log files for him to examine and see if there's a bug in the client as well.
mdettweiler is offline   Reply With Quote
Old 2009-08-04, 04:58   #42
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

101·103 Posts
Default

OK, great. I'm glad to hear we've nailed down the "server barfing" problems.

Thanks for the "absorbing of changes" to min/max n explanation. It's a little clearer to me now.

It seems we still have a "min n" and "max n" problem though; mainly "max n". The "min n" issue could be as a result of a few pairs not having been returned yet, although that seems a little suspect since Lennart and I are the main ones on there and our machines have remained connected (I think). The "max n" should be n=~150K for all k's. Can you look into that? Is the full n=140K-150K file loaded into the server?


Gary

Last fiddled with by gd_barnes on 2009-08-04 at 05:00
gd_barnes is online now   Reply With Quote
Old 2009-08-04, 13:06   #43
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
It seems we still have a "min n" and "max n" problem though; mainly "max n". The "min n" issue could be as a result of a few pairs not having been returned yet, although that seems a little suspect since Lennart and I are the main ones on there and our machines have remained connected (I think). The "max n" should be n=~150K for all k's. Can you look into that? Is the full n=140K-150K file loaded into the server?
Look at the # of candidates left for each k--you'll notice that it's a very small amount. The reason why the "max n" is not quite up to ~150K is because most of the work has been completed; all you're seeing now are just whatever the min and max of any stragglers happen to be.

BTW, Lennart, did you catch the barfing in one of your client debug.log's? I see a small bit of barfing on the server from around August 3, 21:15 GMT; here are the clients involved: _31, _206, _162, _127, _71, _31, and last but not least, humpford (one of Gary's finest ). (Of course since Gary doesn't have his clients set to debug logging, that last one is rather irrelevant; no big deal, there should be plenty of data from Lennart's logs.)

Interestingly enough, last night doesn't seem to be a big one for barfing; I had to go all the way back to the time of the abovementioned barf in order to find any instance of it.

Last fiddled with by mdettweiler on 2009-08-04 at 13:07
mdettweiler is offline   Reply With Quote
Old 2009-08-04, 13:24   #44
Lennart
 
Lennart's Avatar
 
"Lennart"
Jun 2007

21408 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Look at the # of candidates left for each k--you'll notice that it's a very small amount. The reason why the "max n" is not quite up to ~150K is because most of the work has been completed; all you're seeing now are just whatever the min and max of any stragglers happen to be.

BTW, Lennart, did you catch the barfing in one of your client debug.log's? I see a small bit of barfing on the server from around August 3, 21:15 GMT; here are the clients involved: _31, _206, _162, _127, _71, _31, and last but not least, humpford (one of Gary's finest ). (Of course since Gary doesn't have his clients set to debug logging, that last one is rather irrelevant; no big deal, there should be plenty of data from Lennart's logs.)

Interestingly enough, last night doesn't seem to be a big one for barfing; I had to go all the way back to the time of the abovementioned barf in order to find any instance of it.
Debug was not on that early. 22:00GMT -23:45GMT was the time i enabled debug.

Lennart
Lennart is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
PRPNet server for personal use johnadam74 Software 2 2016-01-01 15:58
New SR5 PRPnet server online ltd Sierpinski/Riesel Base 5 15 2013-03-19 18:03
First PSP PRPnet 4.0.6 server online ltd Prime Sierpinski Project 9 2011-03-15 04:58
PRPnet 3.1.3 stress-test server mdettweiler No Prime Left Behind 40 2010-01-30 18:05
First pass PRPNet server out of work? opyrt Prime Sierpinski Project 6 2009-09-24 18:14

All times are UTC. The time now is 09:47.


Tue Jul 27 09:47:46 UTC 2021 up 4 days, 4:16, 0 users, load averages: 1.88, 2.00, 1.92

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.