mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > No Prime Left Behind

Reply
 
Thread Tools
Old 2008-09-10, 06:39   #12
em99010pepe
 
em99010pepe's Avatar
 
Sep 2004

2·5·283 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
About how frequently is a k/n pair being returned?
A pair every 30 secs and with dumps from my side every 6 hours or so.

Why don't you host a server just for you?

Last fiddled with by em99010pepe on 2008-09-10 at 06:41
em99010pepe is offline   Reply With Quote
Old 2008-09-10, 06:42   #13
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

242338 Posts
Default

Quote:
Originally Posted by em99010pepe View Post
A pair every 30 secs and with dumps from my side every 6 hours or so.

I think a test would take ~475-525 secs. or so. So perhaps 500 / 30 = 16-17 cores are on it right now.

That's not really very many. Hum.

Dumps every 6 hours? I wonder if something went haywire with the last dump? If you drop the server and bring it back up, will it cause everyone on it to be dropped? If so, don't do that until more people are around.
gd_barnes is online now   Reply With Quote
Old 2008-09-10, 06:43   #14
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

33·5·7·11 Posts
Default

Quote:
Originally Posted by em99010pepe View Post
Why don't you host a server just for you?

I've tried it. I can't figure it out. It may be easy for you guys but I spent hours trying to get one to work several months ago. Don't have time right now.

Also, that's not the point. We need the main servers on this drive to work correctly for everyone.
gd_barnes is online now   Reply With Quote
Old 2008-09-10, 06:46   #15
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

242338 Posts
Default

OK, it just started working. lol

What did you do because I didn't do anything differently on my end? It just now sent that result that had been stuck in my 'tosend' file and got a new k/n pair to test...after I hit 'llrnet.exe' twice.
gd_barnes is online now   Reply With Quote
Old 2008-09-10, 06:47   #16
em99010pepe
 
em99010pepe's Avatar
 
Sep 2004

2×5×283 Posts
Default

Got your pair after rebooting the server and forcing a prune. Now I am going back to bed. Bye.

Edit: Check your task manager to see if you have more than a instance of llrnet running per core.

Last fiddled with by em99010pepe on 2008-09-10 at 06:48
em99010pepe is offline   Reply With Quote
Old 2008-09-10, 06:48   #17
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

33×5×7×11 Posts
Default

Quote:
Originally Posted by em99010pepe View Post
Got your pair after rebooting the server and forcing a prune. Now I am going back to bed. Bye.
I hope no one else lost their connection as a result of the reboot. Everyone, check their connections to C443 when you're on next...

Thanks for doing...

Last fiddled with by gd_barnes on 2008-09-10 at 06:48
gd_barnes is online now   Reply With Quote
Old 2008-09-10, 06:51   #18
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

33·5·7·11 Posts
Default

Quote:
Originally Posted by em99010pepe View Post
Got your pair after rebooting the server and forcing a prune. Now I am going back to bed. Bye.

Edit: Check your task manager to see if you have more than a instance of llrnet running per core.
I had already done that. I was having a problem on all machines when trying to connect the 1st core of a quad with nothing else running.

My Windows machine is connected now. I'll try the quads now.

Edit: This has now been moved to a different thread.

Last fiddled with by gd_barnes on 2008-09-10 at 06:56 Reason: Edit
gd_barnes is online now   Reply With Quote
Old 2008-09-10, 07:32   #19
IronBits
I ♥ BOINC!
 
IronBits's Avatar
 
Oct 2002
Glendale, AZ. (USA)

3·7·53 Posts
Default

From my observations when I first started running an llrnet server under Windows, I noticed it had a limited supply of sockets per port.
The problem stems from when the a socket hangs and then the OS begins enumerating sockets until the port stops working.
Rebooting always releases the whole mess, but begins to happen again...

As I recall, it's when a remote client tries to connect but fails (sometimes over and over) and before it can release that socket, another connection comes in, then another and so on, by the time the whole sequence is finished, a socket will get locked up, eventually running out of sockets on that port number.

Rebooting or restarting the llrnet server in no way 'disconnects' clients. The clients just keep asking for work until the server comes back online.
There appears to be an amount of time the client will cease to function if it can't talk to the Server, but the time it takes to reboot a computer or restart a llrnet server, should not have any impact on the clients hanging..

I moved all my servers over to Linux (CentOS 64bit) to get away from that problem. Linux appears to handle that situation much better.
I know the llrnet server software can not handle too many cores because of the high connect/disconnect rate will cause it to fail eventually.
I do not believe it was written to handle hundreds of cores at once. The only option is to have many llrnet servers, hopefully running under Linux, probably being the better solution.

I am currently running 6 llrnet servers over here, each using a different port of course, on just one Computer and it seems to be doing great so far.
If you need/want me to run another llrnet server on port 443 to help balance the load, for those that need to use that port, I can do that to.
The llrnet Server software has very little impact on the Server resources itself.

Last fiddled with by IronBits on 2008-09-10 at 07:51
IronBits is offline   Reply With Quote
Old 2008-09-10, 08:47   #20
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

242338 Posts
Default

Quote:
Originally Posted by IronBits View Post
From my observations when I first started running an llrnet server under Windows, I noticed it had a limited supply of sockets per port.
The problem stems from when the a socket hangs and then the OS begins enumerating sockets until the port stops working.
Rebooting always releases the whole mess, but begins to happen again...

As I recall, it's when a remote client tries to connect but fails (sometimes over and over) and before it can release that socket, another connection comes in, then another and so on, by the time the whole sequence is finished, a socket will get locked up, eventually running out of sockets on that port number.

Rebooting or restarting the llrnet server in no way 'disconnects' clients. The clients just keep asking for work until the server comes back online.
There appears to be an amount of time the client will cease to function if it can't talk to the Server, but the time it takes to reboot a computer or restart a llrnet server, should not have any impact on the clients hanging..

I moved all my servers over to Linux (CentOS 64bit) to get away from that problem. Linux appears to handle that situation much better.
I know the llrnet server software can not handle too many cores because of the high connect/disconnect rate will cause it to fail eventually.
I do not believe it was written to handle hundreds of cores at once. The only option is to have many llrnet servers, hopefully running under Linux, probably being the better solution.

I am currently running 6 llrnet servers over here, each using a different port of course, on just one Computer and it seems to be doing great so far.
If you need/want me to run another llrnet server on port 443 to help balance the load, for those that need to use that port, I can do that to.
The llrnet Server software has very little impact on the Server resources itself.

Thanks David. I've observed that yours seem to be able to handle up to ~50-60 cores at once at n=~500000 but that should increase as the n-values (the 2nd number in each line of the file that we send you) get larger because the testing times increase quite a bit.

We'll see how port 443 holds up over time and if there are too many issues, we may have you set up another one for us.

I've kept my 2 quads on your port 7000 for now. The others are blowing through port 443 pretty fast as it is so my 2 quads will help balance things out.

At our current rate, I think we'll blow through n=5K-8K per day combined on the 2 servers! Looking good!


Gary
gd_barnes is online now   Reply With Quote
Old 2008-09-10, 09:26   #21
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

1039510 Posts
Default

C443 clearly has issues with too many clients.

I just now noticed that the last single k/n pair that I got from it ~2 hours ago is sitting in my 'tosend' file and once again, the results file says 'Could not send result'. The k/n pair is 175*2^601991-1.

I tried starting it again twice...no luck.

David, hold off on this until you get the official word on this but: I think we're going to need to have you set up a 2nd server for this drive. We should probably run C443 dry and then have everyone move over their machines.

Carlos, is there anything you can think of to permanently fix this problem? Maybe run it on Linux like suggested by David? If not, we should probably have David running the 2 main servers for the drive.

You could still keep your C443 server intact for you and/or your buds as long as you kept it at perhaps 16 cores or less at all times. Everyone else such as Anon, me, etc. should probably use David's 2 servers, assuming that we go that route. In effect, it'd be just like our 1st drive is now.


Gary
gd_barnes is online now   Reply With Quote
Old 2008-09-10, 09:43   #22
em99010pepe
 
em99010pepe's Avatar
 
Sep 2004

B0E16 Posts
Default

Quote:
Originally Posted by gd_barnes View Post

Carlos, is there anything you can think of to permanently fix this problem? Maybe run it on Linux like suggested by David? If not, we should probably have David running the 2 main servers for the drive.
I have here a machine with Ubuntu but I cannot turn it on due to energy savings. When I get the time I'll check for a few tweaks like opening more connections, etc.
I'll keep the server for Jokern...

Carlos
em99010pepe is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
ECM RAM issues yoyo GMP-ECM 7 2018-04-28 05:51
New GPU; new issues... chalsall GPU Computing 18 2013-06-12 19:28
Top-5000 Server Issues Kosmaj Riesel Prime Search 7 2010-07-26 16:46
C443 issue em99010pepe No Prime Left Behind 86 2008-12-22 22:54
New Server Hardware and price quotes, Funding the server Angular PrimeNet 32 2002-12-09 01:12

All times are UTC. The time now is 10:21.


Sat Jul 17 10:21:17 UTC 2021 up 50 days, 8:08, 1 user, load averages: 1.14, 1.26, 1.32

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.