mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > No Prime Left Behind

Reply
 
Thread Tools
Old 2010-05-23, 19:23   #144
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

11000011010012 Posts
Default

Quote:
Originally Posted by rogue View Post
Let me know what stats you want to see and I can look into the effort to display them.
Okay, here's what I'm looking for:

-Total # of untested pairs remaining
-First untested pair remaining
-Last untested pair remaining
(the last two would both be ordered by whatever the server's sortoption= dictates--i.e., first/last pairs in the order they'll be handed out)
-A downloadable file containing a complete list of the pairs remaining (untested) in the server

The last one is a "might be nice to have", since right now we don't even have that set up for our LLRnet servers.
mdettweiler is offline   Reply With Quote
Old 2010-05-23, 23:03   #145
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

2·2,857 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Okay, here's what I'm looking for:

-Total # of untested pairs remaining
-First untested pair remaining
-Last untested pair remaining
(the last two would both be ordered by whatever the server's sortoption= dictates--i.e., first/last pairs in the order they'll be handed out)
-A downloadable file containing a complete list of the pairs remaining (untested) in the server

The last one is a "might be nice to have", since right now we don't even have that set up for our LLRnet servers.
The first won't be too difficult to do. I have considered adding a "Trailing Edge" to address the second. "Max in Group" is closest to the third, but is group specific. This is by decimal length, though, not sortoption. I'm uncertain of the difficulty of the last one, although I don't understand the need for it.
rogue is offline   Reply With Quote
Old 2010-05-23, 23:38   #146
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

141518 Posts
Default

Quote:
Originally Posted by rogue View Post
The first won't be too difficult to do. I have considered adding a "Trailing Edge" to address the second. "Max in Group" is closest to the third, but is group specific. This is by decimal length, though, not sortoption. I'm uncertain of the difficulty of the last one, although I don't understand the need for it.
Okay, sounds good. As for the last one, it would mainly be for the sake of curiosity. That's why I marked it as "might be nice to have". At any rate, don't worry about it; it should be easy enough for me to code up in a script anyway. (Simple enough: just select the requisite columns from the DB's Candidate table and dump the results to a web-accessible text file.)
mdettweiler is offline   Reply With Quote
Old 2010-07-22, 15:50   #147
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

624910 Posts
Default

Yesterday I upgraded all the NPLB/CRUS PRPnet servers to version 3.3.5 (though they identify themselves as 3.3.4...it's a long story). With this release, all PRPnet applications now report times in local time (in the servers' case, CDT/GMT-5 during the summer and CST/GMT-6 during the winter). Therefore, G9000 is now (finally!) in sync with the rest of the servers in the stats, not 5 hours "ahead" of them all!

This should be quite handy in future rallies, since we'll only need one offset time for all servers in the DB.

Last fiddled with by mdettweiler on 2010-07-22 at 15:50
mdettweiler is offline   Reply With Quote
Old 2010-09-03, 07:42   #148
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

2·5·1,013 Posts
Default

Max,

Lennart just came on port 9000 with quite a few cores, can you load n=910K-912K the first thing in the morning? Thanks.


Gary
gd_barnes is offline   Reply With Quote
Old 2010-09-03, 17:08   #149
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
Max,

Lennart just came on port 9000 with quite a few cores, can you load n=910K-912K the first thing in the morning? Thanks.


Gary
Looks like he's done now. I've seen this over the last few days: he comes on with a load of cores, then drops back off a few hours later. A while back he did something similar off and on and he later explanined that he was doing small-n work on a server that dried periodically, with port 9000 as his fallback server. I expect something similar is the case now.

Do you still want me to load 910K-912K? Port 9000 seems to have settled out at n=~891.7K (for now).

Edit: I see now that he did a similar dump on the TPS port 12000 server shortly after it came up, thereby drying it out in very short order (there wasn't much left in it anyway). My revised theory is that he went on there, then fell back to port 9000 when that ran out. I'll go ahead and load 910K-912K after all since he may be back for more.

Edit 2: 910K-912K is now loaded.

Last fiddled with by mdettweiler on 2010-09-03 at 17:19
mdettweiler is offline   Reply With Quote
Old 2010-09-18, 22:32   #150
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

2×2,857 Posts
Default

I found the memory leak in the server. It was code that I had commented out some time ago because it was causing a crash. I don't know if the crash was due to a bug on my part or a bug in the MySQL ODBC driver. Anyways, if you uncomment this line:

SQLFreeHandle(SQL_HANDLE_DBC, sqlConnectionHandle);

in the Disconnect() function of DBInterface.cpp, the memory leak will go away. I have been able to verify this on Mac at home. I have yet to test it on Windows, which is where I suspect I ran into the leak originally. I will put out a patched release of the server in a couple of days.
rogue is offline   Reply With Quote
Old 2010-09-19, 02:40   #151
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Quote:
Originally Posted by rogue View Post
I found the memory leak in the server. It was code that I had commented out some time ago because it was causing a crash. I don't know if the crash was due to a bug on my part or a bug in the MySQL ODBC driver. Anyways, if you uncomment this line:

SQLFreeHandle(SQL_HANDLE_DBC, sqlConnectionHandle);

in the Disconnect() function of DBInterface.cpp, the memory leak will go away. I have been able to verify this on Mac at home. I have yet to test it on Windows, which is where I suspect I ran into the leak originally. I will put out a patched release of the server in a couple of days.
Great, thanks! Only problem is, it doesn't build on the NPLB server machine. I've attached the console output from "make prpserver".
Attached Files
File Type: txt prpserver-3.3.7-build-log.txt (14.0 KB, 236 views)
mdettweiler is offline   Reply With Quote
Old 2010-09-19, 12:56   #152
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

165216 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Great, thanks! Only problem is, it doesn't build on the NPLB server machine. I've attached the console output from "make prpserver".
Is something borked with your system? The error I see is:

Code:
HelperThread.cpp:61: error: ‘rc’ was not declared in this scope
which makes no sense. You must have built it on another system, because with the makefile only one file would have been recompiled.

Can you verify the version you are building (in defs.h)? What is on line 61 in HelperThread.cpp?
rogue is offline   Reply With Quote
Old 2010-09-19, 18:55   #153
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by rogue View Post
Is something borked with your system? The error I see is:

Code:
HelperThread.cpp:61: error: ‘rc’ was not declared in this scope
which makes no sense. You must have built it on another system, because with the makefile only one file would have been recompiled.

Can you verify the version you are building (in defs.h)? What is on line 61 in HelperThread.cpp?
Yeah, you're right--it does look like somehow my file got messed up. The local copy of the PRPnet 3.3.6 source had been modified with the extra debug logging we put in earlier to track down this problem, so I got a fresh copy of the source and used that. (Well, not entirely fresh--it came off my primary computer's local copy. So I'm not sure where the goof got in there, but rest assured it is NOT in your posted zip file.)

Downloading a fresh copy of the source from your website, applying the fix to that, and bulding prpserver worked. All noprimeleftbehind.net servers are now running the patched version.
mdettweiler is offline   Reply With Quote
Old 2010-12-24, 21:04   #154
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Lightbulb No more "too many connections" crashes (I hope!)

As those of you who follow the PRPnet 4.0.x announcement thread in the Software forum will have noticed, v4.0.5 was released today with fixes for a rather major memory leak. As Mark, Lars (a.k.a. ltd) and I have discovered over the course of our investigations over the last week or so, this is actually the cause of the mysterious and extremely vexing "too many connections" bug that has plagued every NPLB rally since we added PRPnet to them (all except the last rally, in which we only managed to avoid crashes by restarting the PRPnet server every 12 hours).

It seems that whenever a client finishes talking to the server, it neglects to properly release memory from its communication with the database--leading to a memory leak of a few MB for each connection that goes by. What made this leak hard to spot, however, is that most of the leak goes straight into virtual memory, rather than active memory. The VM allocation would keep building up until it reached 256 GB (!), which was apparently the tipping point, at which the server would just stop responding to any communications. Incoming connections would then build up until whatever the admin-specified limit was (in the case of the NPLB servers, 1000), and then the server would respond to all queries with "too many connections". The server would continue to be unreachable until someone manually restarted it. During high-load periods like rallies, such crashes could occur quite frequently, sometimes on the order of once every day or two. The simplest workaround was to just restart the server every 12 hours as a preemptive measure (though we didn't know exactly why this worked at the time, just that it somehow prevented the crashes).

As it turned out, Lars was able to find the root of the problem and come up with a patch which, while not completely addressing the memory leak, cuts it down from a few MB per connection to a few KB. Now, we should be able to survive an entire rally (and then some) without any crashes, no server restarts needed.

The patch can be applied equally well to v3.3.6 or 4.0.4 (it's included in 4.0.5), so now all the noprimeleftbehind.net servers (NPLB, CRUS, and private) have been patched. Port 9000 is still running 3.3.6, and I've been meaning to upgrade it to 4.0.x sometime soon; the DB conversion takes a bit of time, however, so whether I will be able to get it upgraded before the upcoming rally will depending on how my schedule works out. Regardless, though, the memory leak is under sufficient control that it shouldn't give us any trouble in the rally.

@Gary: this memory leak, I discovered, is actually the cause of the mysterious sluggishness on jeepford. As much as I would like to blame the absolutely horrible Ubuntu 9.04 as per our original theory, it seems that the many GB of leaked memory from the multiple running prpservers caused the system to be tied up while it shuffled data from active memory into virtual memory (to make room for normal GUI activities) every time someone used the computer after it had sat for a while. With the memory leaks now under control, most of the sluggishness should be gone from here on out. (Not to mention that it will spare the hard drive quite a bit of beating...)

Last fiddled with by mdettweiler on 2010-12-24 at 21:05
mdettweiler is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
PRPnet Servers for CRUS MyDogBuster Conjectures 'R Us 76 2018-03-09 19:05
LLRnet servers for NPLB kar_bon No Prime Left Behind 1343 2014-08-20 09:38
Public PRPNet Servers rogue Open Projects 26 2013-01-16 01:33
PRPNet servers down? opyrt Prime Sierpinski Project 13 2009-11-04 21:33
Servers for NPLB gd_barnes No Prime Left Behind 0 2009-08-10 19:21

All times are UTC. The time now is 07:47.

Thu Jun 4 07:47:41 UTC 2020 up 71 days, 5:20, 0 users, load averages: 1.60, 1.51, 1.55

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.