mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > No Prime Left Behind

Reply
 
Thread Tools
Old 2010-08-19, 04:11   #67
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

11000011010012 Posts
Default

Quote:
Originally Posted by Mini-Geek View Post
Starting 18 minutes ago (first core 10:09 PM, fourth core 10:22; they all tried at slightly different times and failed) I wasn't able to connect to the PRPnet server. I got "nothing was received on socket after 10 seconds" and then "Could not verify connection to noprimeleftbehind.net. Will try again later." in the logs (repeated three times before it fell back to my local server). Is the server down?
Yup, it's the "too many connections" bug. Right now it's in the process of building up to the 1000 connections limit at which it will actually start saying "too many connections"; it's at around 450 right now. As soon as that's done I can restart the server, as at that point the bug will have progressed far enough for Mark to get a complete picture of what happened.

Stay tuned...
mdettweiler is offline   Reply With Quote
Old 2010-08-19, 04:49   #68
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

141518 Posts
Default

Okay, we've reached 1000 connections and the server is rejecting clients. I have now restarted the server and it is once more functioning normally. I'll send the log file to Mark for analysis.

Now we can get on with the rest of the rally!

(BTW: Gary, you can resume restarting the server on a twice-daily basis as you were doing before. Now that we don't want it to crash any more, that seemed to be an effective precaution so we may as well keep using it until a fix is found.)

Last fiddled with by mdettweiler on 2010-08-19 at 04:50
mdettweiler is offline   Reply With Quote
Old 2010-08-19, 05:23   #69
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

33·5·7·11 Posts
Default

In looking at the logs, it looks like we lost 2 hours this time around. Therefore, we'll extend the PRPnet part of the rally by a total of 7 hours. To make if official, the PRPnet part will run until 9 PM CDT today (Aug. 19th), [2 AM GMT on the 20th]. There is no change in the LLRnet part.

I'll send Dave a PM to make sure he's aware of this.

Last fiddled with by gd_barnes on 2010-08-19 at 05:26
gd_barnes is offline   Reply With Quote
Old 2010-08-19, 06:16   #70
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

33·5·7·11 Posts
Default

OK, guys, I'd like to offer up one more "side line" competition here. Here it is:

LLRnet port 3000 vs. PRPnet port 9000



Pick your favorite software and unload all of your cores on it for 24 hours today starting at midnight CDT (5 AM GMT) tonight (a little over an hour ago from this posting). The winner is the port that processes the most pairs today. What makes the race interesting is that the 2 have been quite close the last several days of the rally whenever there has not been a PRPnet outage.

Note that this is only an informal race. It will not affect the rally stats. The rally still ends at 2 PM CDT for LLRnet and 9 PM CDT for PRPnet (7 PM and 2 AM GMT). We'll use the "regular" daily stats to determine a winner; not the rally stats.

To hopefully prevent the too many connections issue of PRPnet, I will restart it twice during the day. The objective of the PRPnet folks should be to have a backup PRPnet server that has extremely short tests so that when I do restart them and your clients pick up tests from another port, they won't be gone for long. But...I will stop and restart it within 5 seconds so hopefully this will be minimized.

The guys running the winning software get bragging rights for the next rally.

Edit: Likely due to the PRPnet outage that occurred between about 10 PM and midnight, LLRnet has jumped out to a modest 120+ pair lead at the end of the first hour. But I have confidence that PRPnet will come back quickly. If the PRPnet guys object too much that it started shortly after an outage, we can make it run 1 AM to 1 AM or 2 AM to 2 AM.

Vaughan, you better bring 'em to LLRnet if Lennart starts to bring 'em to PRPnet. I'm maxed out minus ~2 cores. :-)

Last fiddled with by gd_barnes on 2010-08-19 at 06:34
gd_barnes is offline   Reply With Quote
Old 2010-08-19, 10:17   #71
AMDave
 
AMDave's Avatar
 
Jan 2006
deep in a while-loop

2·7·47 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
In looking at the logs, it looks like we lost 2 hours this time around. Therefore, we'll extend the PRPnet part of the rally by a total of 7 hours. To make if official, the PRPnet part will run until 9 PM CDT today (Aug. 19th), [2 AM GMT on the 20th]. There is no change in the LLRnet part.

I'll send Dave a PM to make sure he's aware of this.
Implemented.
AMDave is offline   Reply With Quote
Old 2010-08-19, 20:18   #72
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

33×5×7×11 Posts
Default

With about 5-1/2 hours left in the PRPnet portion of the rally, we have recovered nicely on the # of primes found . After just 3 primes in the first 80,000 pairs, we've garnered 7 primes in the last 50,000 pairs. The 10 primes are now only ~2 below the expected ~12.
gd_barnes is offline   Reply With Quote
Old 2010-08-20, 06:02   #73
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

33×5×7×11 Posts
Default

The PRPnet part of the rally ended about 4 hours ago. As expected PrimeSearchTeam won the team competition for the 2nd straight rally 55,622 to 42,958 over Raiders of the Lost Primes. The total pairs processed was 133,079 or 19,011 per day. The expected # of primes was ~12.1. The actual # of primes was 10. Very nice work everyone. That's a lot of pairs done for such a high n-range.

In the final fun rally over the 24 hours that includes the entire day of Aug. 19th CDT U.S., the LLRnet guys had the PRPnet guys for lunch, 10,022 to 5,462. For the next rally, we'll make it official before the rally starts to have a final 24-hour rally between the software types that ends when the rally ends so that people can plan for it. The idea is to keep everyone around until nearly the end, even if the team competition has already been mostly decided.

Thank you very much to all who participated. Feel free to stick around a while if you see other things that you like here. I hope to add at least one more PRPnet server for one of the other drives within the next 1-2 weeks. We'll discuss it openly when we're ready to start the process.


Gary

Last fiddled with by gd_barnes on 2010-08-20 at 06:10
gd_barnes is offline   Reply With Quote
Old 2010-08-22, 22:39   #74
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

11·577 Posts
Default

I got back from vacation today and will be looking at the log tomorrow. I'll post updates in the PRPNet released thread in the CRUS forum.

On a sad note my main workhorse computer has been flaky. I suspect the power supply (which is only a year old) got zapped during a power outage, but I won't know until tomorrow. I first noticed the problem the morning I went on vacation, but couldn't do anything about it.
rogue is offline   Reply With Quote
Old 2010-08-23, 00:36   #75
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

11·577 Posts
Default

Quote:
Originally Posted by rogue View Post
I got back from vacation today and will be looking at the log tomorrow.
I had some time to look into the issue tonight. The server is running out of memory, thus it is unable to create new threads. I need to investigate to determine where the memory leak is occurring. It must be in *nix specific code because I clean up any Windows memory leaks that I find before I release it. Most code is shared, but there are a couple of places that have specific Windows/*nix branches. With my main computer gone for a while this could take some time to discover and fix. Detecting memory leaks on *nix is a bit harder than on Windows. It is also possible that the memory leak is in a *nix library and not PRPNet. I suggest that because PrimeGrid hasn't run into this problem (that I am aware of).
rogue is offline   Reply With Quote
Old 2010-08-23, 01:14   #76
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by rogue View Post
I had some time to look into the issue tonight. The server is running out of memory, thus it is unable to create new threads. I need to investigate to determine where the memory leak is occurring. It must be in *nix specific code because I clean up any Windows memory leaks that I find before I release it. Most code is shared, but there are a couple of places that have specific Windows/*nix branches. With my main computer gone for a while this could take some time to discover and fix. Detecting memory leaks on *nix is a bit harder than on Windows. It is also possible that the memory leak is in a *nix library and not PRPNet. I suggest that because PrimeGrid hasn't run into this problem (that I am aware of).
Interesting. But I thought PrimeGrid was running on Linux as well--or has that changed?
mdettweiler is offline   Reply With Quote
Old 2010-08-23, 01:23   #77
Lennart
 
Lennart's Avatar
 
"Lennart"
Jun 2007

112010 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Interesting. But I thought PrimeGrid was running on Linux as well--or has that changed?
Most server is on a Windows server. Then we have 2 server on a second Winserver and 5 server on a Ubuntu server + 2 here at my home also on Ubuntu.

I have seen the problem one time, It was on the Ubuntu server. The thing was that I had to little memory ( only 2G ) I have added 2 more and now it works ok with 4G.

I thought the problem was the memory necause when I upgraded to 4G I have never seen it happen anymore.

Lennart
Lennart is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
LLRnet/PRPnet rally April 4th-11th mdettweiler No Prime Left Behind 55 2011-04-25 09:35
LLRnet/PRPnet rally January 3rd-10th mdettweiler No Prime Left Behind 48 2011-01-12 10:14
LLRnet/PRPnet rally Oct. 27th-Nov. 3rd mdettweiler No Prime Left Behind 33 2010-12-24 19:16
LLRnet/PRPnet rally June 4th-6th gd_barnes No Prime Left Behind 61 2010-07-30 17:28
LLRnet server rally 400<k<1001 August 8-10 mdettweiler No Prime Left Behind 66 2008-08-11 03:00

All times are UTC. The time now is 11:02.


Sat Jul 17 11:02:59 UTC 2021 up 50 days, 8:50, 1 user, load averages: 1.03, 1.16, 1.22

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.