mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > No Prime Left Behind

Reply
 
Thread Tools
Old 2009-12-05, 22:09   #67
Brucifer
 
Brucifer's Avatar
 
Dec 2005

313 Posts
Default

Was probably past the max point at where it could quickly handle connects so clients were having to wait............... ????? I say that as if it's working good again then it's because I pulled several cores off it so the load lightened a bit on it.

Last fiddled with by Brucifer on 2009-12-05 at 22:10
Brucifer is offline   Reply With Quote
Old 2009-12-05, 22:51   #68
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

3×11×173 Posts
Default

Quote:
Originally Posted by Brucifer View Post
Was probably past the max point at where it could quickly handle connects so clients were having to wait............... ????? I say that as if it's working good again then it's because I pulled several cores off it so the load lightened a bit on it.
I suggest that you up maxworkunits on your clients. That will also reduce the load on the server. Max should also up the maxworkunits on the server so that clients can do a lot more work between communications to the server.
rogue is offline   Reply With Quote
Old 2009-12-06, 00:08   #69
Brucifer
 
Brucifer's Avatar
 
Dec 2005

313 Posts
Default

I've had mine at 20, but the server was at 10. The machines I'd left on G7000 were hanging up to, aparrently sockets aren't being released is what it acts like cause the clients don't get released so they hang on waiting for the server to do something. I have taken all my clients off G7000 and am running an llrnet server here at home on a manual reservation.
Brucifer is offline   Reply With Quote
Old 2009-12-06, 01:50   #70
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by Brucifer View Post
I've had mine at 20, but the server was at 10. The machines I'd left on G7000 were hanging up to, aparrently sockets aren't being released is what it acts like cause the clients don't get released so they hang on waiting for the server to do something. I have taken all my clients off G7000 and am running an llrnet server here at home on a manual reservation.
Do you have the latest version of the client (2.4.6)?

Meanwhile, I've upped maxworkunits on the server to 50. If you're feeling bold, give it a try at 20.

BTW Bruce, could you possibly send me a debug log snippet from a client that demonstrates the problematic behavior if you see it again? If I can get an exact timestamp for when the problem's occurring, I can compare what happened on the server's debug log at that same time and possibly get a better idea of what's going on.

Last fiddled with by mdettweiler on 2009-12-06 at 01:53
mdettweiler is offline   Reply With Quote
Old 2009-12-06, 04:34   #71
Brucifer
 
Brucifer's Avatar
 
Dec 2005

31310 Posts
Default

yup latest client.

edit: I'm just going to run my llrnet stuff and not mess anymore with the prpnet.

Last fiddled with by Brucifer on 2009-12-06 at 04:59
Brucifer is offline   Reply With Quote
Old 2009-12-08, 00:48   #72
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

7·1,447 Posts
Default

I'm sure it's due to the load on the server. I just looked at it now. It's like it's in an endless loop from for getting and receiving nothing from Lennart. I'll stop and restart the server.

We're all tired of the PRPnet problems and I'm sure PrimeGrid is too. It is my conclusion that PRPnet cannot handle a large load and will not be able to do so in the foreseeable future until memory is no longer utilized for most of it.

Bruce, your machines on PRPnet port 4000 for the 5th drive had no problem that I observed. I'm a pretty big sceptic in this situation but am comfortable enough to put my own machines on PRPnet port 4000. That's because tests take much longer than the 12th drive. Also the long tests on CRUS PRPnet port 1300 have made that quite a stable server over there. So I'm comfortable in saying that you can run port 4000 here and not have problems.

Max, based on the problems, here is how I would like to proceed:

1. (Must be done by late Tuesday): Set up a PRPnet server for the 6th drive and load n=743K-750K. The LLRNet server for that drive is going to be drying over the next few days.

2. Set up an LLRNet server for the 12th drive on port 4000 and load it up with k=2600-2800/n=50K-250K (please make sure extra headers are removed). We'll go ahead and finish k=2400-2600 on the current PRPnet server and then remove that server. PRPnet does not work for small tests with any kind of large load and it's the large load we need to knock off this large range of work.

3. For the 11th drive, set up an LLRNet server on my machine at your convience and load n=520K-530K in it. It will be several weeks before David's server dries so no rush there. I want this on LLRnet because I want the possibility for rallies on it in the future. PRPnet is not ready for rallies. Also, the kind of smaller tests make it more vulnerable than the 5th/6th drives.

Here is the effect of all of this for the future:

5th drive: PRPnet server
6th drive: PRPnet server (currently LLRnet)
7th drive: LLRnet server
11th drive: LLRnet server
12th drive: LLRnet server (currently PRPnet)
doublecheck drive: PRPnet server*

*We may have to go back to an LLRnet server at some point if some heavy hitters come in on it.

In other words, only the n>700K tests will be on PRPnet servers and we're leaving one of those on PRPnet for people's comfort level. People can take their choice.

My apologies to Bruce and Lennart and thank you for helping us out with these small n-ranges. Lennart, if you stay on the 12th drive PRPnet server with your current load, I'll check it several times a day. If it has problems, I'll stop and restart it like I'm about to do right now.

I'm sorry everyone, PRPnet is just not fully ready for the big time yet; at least not at the lower n-ranges, which I'm classifying as anything n<700K right now. I as well as PrimeGrid have been more than patient with it. It's time to finally deal with the reality of the situation.

Edit: The 10th drive was left out of this post. See discussion below.


Gary

Last fiddled with by gd_barnes on 2009-12-08 at 04:58 Reason: edit
gd_barnes is online now   Reply With Quote
Old 2009-12-08, 03:25   #73
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Actually, I think you might be underestimating what PRPnet can handle a bit. From what I've observed, I think PRPnet should be able to handle anything over 200K fine, even with lots of cores. I think once we finish off the lowest tier of the 12th Drive, that one will actually be OK to continue on with PRPnet.

This is not due to problems with PRPnet per se but rather the inherent limitations of a single-threaded server application. That said, those inherent limitations are rather high. Keep in mind that we didn't start encountering problems until we had Lennart and Bruce's combined might on tiny n=50K candidates. Also, Bruce has recently been testing the G2000 (n=~300K) server with loads of clients, and it's been holding up quite nicely.

Here's how I suggest we proceed:

-Finish off the currently loaded range in G7000, but don't do any more n=50K-250K work through PRPnet. Bruce has offered to finish off this portion of the drive with his internal LLRnet server, and I estimate he can have it done within a month.

-Continue in PRPnet G7000 with the n>250K portion of the 12th Drive.

-Continue transferring all the other servers over to PRPnet except G4000/7th Drive, which will permanently remain on LLRnet.

Essentially, the only work we have that is more than PRPnet can handle, even under stressful rally conditions, is the tiny 50K-250K stuff. Everything else should be A-OK. And I'm not just saying that out of wishful thinking; this time we have real data from Bruce's tests on G2000 to back it up.

Max
mdettweiler is offline   Reply With Quote
Old 2009-12-08, 04:32   #74
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

279116 Posts
Default

Sorry Max. Not this time around. We've had enough. It's not ready yet. I'd like to stick with what I suggested. Our heavy hitters are frustrated and rightfully so. Thanks.

You can say you know what the problem is but there is no way to know without large-scale beta testing. Yes, it worked for Bruce on smaller tests for a while but there have been large changes since then. And as we've seen, changes to one thing usually affect other things on PRPnet.

IMHO, running a PRPNet rally is tantamount to a disaster, especially if Lauren from Free-DC showed up with his 200 cores.

If the 3 PRPnet servers go well for 6 months, then we'll move some more over. I want to make it a more gradual process.


Gary

Last fiddled with by gd_barnes on 2009-12-08 at 04:59
gd_barnes is online now   Reply With Quote
Old 2009-12-08, 04:35   #75
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

141518 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
Sorry Max. Not this time around. We've had enough. It's not ready yet. I'd like to stick with what I suggested. Thanks.

If that goes well for 6 months, then we'll move some more over. I want to make it a more gradual process.


Gary
Okay, if we have to...

BTW, what about the 10th Drive? That one's at about 664K right now, and it should be plenty big enough for PRPnet to handle easily. We'd still be keeping all the smaller stuff (11th, 12th Drives) on LLRnet, but at least all the stuff that we know can work with PRPnet will be on there.
mdettweiler is offline   Reply With Quote
Old 2009-12-08, 04:51   #76
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

7·1,447 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Okay, if we have to...

BTW, what about the 10th Drive? That one's at about 664K right now, and it should be plenty big enough for PRPnet to handle easily. We'd still be keeping all the smaller stuff (11th, 12th Drives) on LLRnet, but at least all the stuff that we know can work with PRPnet will be on there.
Oh, I completely forgot about that one. lol OK, I'll relent there. With little interest on it of late, we can put it on PRPnet. It will become more popular as the 11th drive passes n=600K but that will be quite a while.

Edit: I just now updated applicable posts to include the 10th drive for PRPnet in the future.

Last fiddled with by gd_barnes on 2009-12-08 at 04:53
gd_barnes is online now   Reply With Quote
Old 2009-12-08, 13:03   #77
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

7·1,447 Posts
Default

Max,

I just now had to shut down and restart my server machine. The screen had gone partially blank with no icons, many of the apps on the top tool bar had little red x's through them, and the servers were doing nothing. I'm not sure as to what could have caused this.

I restarted the servers like they've been done previously except that I did it directly on the machine instead of remotely.

I can confirm that ports 3000 and 4000 have started handing out work again. I can't tell yet on the others. The down time appeared to be a little over an hour.


Gary

Last fiddled with by gd_barnes on 2009-12-08 at 13:05
gd_barnes is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
PRPnet Servers for CRUS MyDogBuster Conjectures 'R Us 76 2018-03-09 19:05
LLRnet servers for NPLB kar_bon No Prime Left Behind 1343 2014-08-20 09:38
Public PRPNet Servers rogue Open Projects 26 2013-01-16 01:33
PRPNet servers down? opyrt Prime Sierpinski Project 13 2009-11-04 21:33
Servers for NPLB gd_barnes No Prime Left Behind 0 2009-08-10 19:21

All times are UTC. The time now is 10:06.

Sun May 31 10:06:40 UTC 2020 up 67 days, 7:39, 1 user, load averages: 2.00, 2.03, 1.91

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.