mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > No Prime Left Behind

Reply
 
Thread Tools
Old 2010-01-22, 20:39   #1
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

141518 Posts
Default PRPnet 3.1.3 stress-test server

Hi all,

As I've previously mentioned on a few occasions, Gary and I had been planning for a while to run a stress test on PRPnet 3.1.3 with lots of cores and very small candidates to ensure that the latest PRPnet can handle high loads. I do expect it to cope well based on testing performed at PrimeGrid, though nonetheless the testing done here will be valuable as it will show us whether Gary's setup in particular can handle the load.

To this effect, I have set up a new PRPnet 3.1.3 server and loaded it with work from k=2000-2200, n=50K-250K--i.e., a doublecheck of the 12th Drive. The server info is as follows:

server = nplb-gb1.no-ip.org
port = 7465

Or, in terms of a server= line for use in prpclient.ini:
server=G7465:100:1:nplb-gb1.no-ip.org:7465

Note that in the above line I've set the batch size to 1. Normally this is NOT what you'd want to do for small tests like this, as for numbers this small the overhead actually adds up to a nonnegligable amount of wasted CPU power. It also puts a much higher load on the server than, say, a batch size of 20 would. But in this case, high load is what we're aiming for.

Gary, as soon as I can get a Linux client package ready for PRPnet 3.1.3, I can send you a preconfigured package to drop on all of your quads. 12 quads * 4 cores = 48 cores, plus whatever I and anyone else can throw on there, so we should be over the magic number of 50.

If anyone else wants to put a few cores on the server, go right ahead--the more load, the better. Visit our PRPnet thread for client download links and setup instructions. If the server can hold up for at least 6 hours or so with 50+ cores hammering away on it, then we can be quite confident in the server's capabilities for future rallies and the like.

Max
mdettweiler is offline   Reply With Quote
Old 2010-01-22, 20:44   #2
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

17×251 Posts
Default

Maybe I was just too quick, but I just set two Intel cores on it and they couldn't connect to the server. Are you still trying to get it set up?
Edit: I tried from another computer, and got this:
Code:
[2010-01-22 20:49:33 GMT] PRPNet Client application v3.1.3 started
[2010-01-22 20:49:33 GMT] User name Mini-Geek at email address is tim.sorbera@gmail.com
[2010-01-22 20:49:36 GMT] G7465: Getting work from server nplb-gb1.no-ip.org at port 7465
[2010-01-22 20:49:45 GMT] G7465: PRPNet server is version 3.1.3
[2010-01-22 20:49:56 GMT] G7465: 2001*2^50150-1 is not prime.  Residue 3A543F7AC1D6C29D
[2010-01-22 20:49:56 GMT] Total Time:  0:00:23  Total Tests: 1  Total PRPs Found: 0
[2010-01-22 20:49:56 GMT] G7465: Returning work to server nplb-gb1.no-ip.org at port 7465
[2010-01-22 20:50:06 GMT] Nothing was received on socket 368, therefore the socket was closed
[2010-01-22 20:50:07 GMT] Nothing was received on socket 364, therefore the socket was closed
[2010-01-22 20:50:07 GMT] Total Time:  0:00:34  Total Tests: 1  Total PRPs Found: 0
[2010-01-22 20:50:07 GMT] G7465: Returning work to server nplb-gb1.no-ip.org at port 7465
[2010-01-22 20:50:07 GMT] G7465: INFO: Test for 2001*2^50150-1 was ignored.  Candidate and/or test was not found
[2010-01-22 20:50:07 GMT] G7465: INFO: 0 of 1 test results were accepted
[2010-01-22 20:50:07 GMT] G7465: Getting work from server nplb-gb1.no-ip.org at port 7465
[2010-01-22 20:50:18 GMT] Nothing was received on socket 1656, therefore the socket was closed
[2010-01-22 20:50:28 GMT] Nothing was received on socket 368, therefore the socket was closed
[2010-01-22 20:50:28 GMT] Could not verify connection to nplb-gb1.no-ip.org.  Will try again later.
[2010-01-22 20:50:29 GMT] nplb-gb1.no-ip.org:7000 connect to socket failed
[2010-01-22 20:50:33 GMT] NPLB5thDrive: Getting work from server nplb-gb1.no-ip.org at port 3000
(synopsis: got work without a problem, tried returning it without getting through twice, then got through properly, and the test was ignored because "Candidate and/or test was not found", then was unable to connect to get work, and moved on to another server.)
The first computer still has yet to make a connection, so I've stopped it for now.

Last fiddled with by Mini-Geek on 2010-01-22 at 20:55
Mini-Geek is offline   Reply With Quote
Old 2010-01-22, 20:56   #3
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by Mini-Geek View Post
Maybe I was just too quick, but I just set two Intel cores on it and they couldn't connect to the server. Are you still trying to get it set up?
Hmm...that's strange. I can access the web page at http://nplb-gb1.no-ip.org:7465/ just fine, so the server is definitely accessible from the outside.

I just looked at the server and I see your clients' requests for work, but it appears that they aren't being given candidates to test. Definitely strange. Upon looking back through debug.log, I'm seeing some messages that make me wonder if there's a problem with how the server's connecting to the DB. I'll look into it now.
mdettweiler is offline   Reply With Quote
Old 2010-01-22, 20:56   #4
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Quote:
Originally Posted by Mini-Geek View Post
Edit: I tried from another computer, and got this:
Code:
[2010-01-22 20:49:33 GMT] PRPNet Client application v3.1.3 started
[2010-01-22 20:49:33 GMT] User name Mini-Geek at email address is tim.sorbera@gmail.com
[2010-01-22 20:49:36 GMT] G7465: Getting work from server nplb-gb1.no-ip.org at port 7465
[2010-01-22 20:49:45 GMT] G7465: PRPNet server is version 3.1.3
[2010-01-22 20:49:56 GMT] G7465: 2001*2^50150-1 is not prime.  Residue 3A543F7AC1D6C29D
[2010-01-22 20:49:56 GMT] Total Time:  0:00:23  Total Tests: 1  Total PRPs Found: 0
[2010-01-22 20:49:56 GMT] G7465: Returning work to server nplb-gb1.no-ip.org at port 7465
[2010-01-22 20:50:06 GMT] Nothing was received on socket 368, therefore the socket was closed
[2010-01-22 20:50:07 GMT] Nothing was received on socket 364, therefore the socket was closed
[2010-01-22 20:50:07 GMT] Total Time:  0:00:34  Total Tests: 1  Total PRPs Found: 0
[2010-01-22 20:50:07 GMT] G7465: Returning work to server nplb-gb1.no-ip.org at port 7465
[2010-01-22 20:50:07 GMT] G7465: INFO: Test for 2001*2^50150-1 was ignored.  Candidate and/or test was not found
[2010-01-22 20:50:07 GMT] G7465: INFO: 0 of 1 test results were accepted
[2010-01-22 20:50:07 GMT] G7465: Getting work from server nplb-gb1.no-ip.org at port 7465
[2010-01-22 20:50:18 GMT] Nothing was received on socket 1656, therefore the socket was closed
[2010-01-22 20:50:28 GMT] Nothing was received on socket 368, therefore the socket was closed
[2010-01-22 20:50:28 GMT] Could not verify connection to nplb-gb1.no-ip.org.  Will try again later.
[2010-01-22 20:50:29 GMT] nplb-gb1.no-ip.org:7000 connect to socket failed
[2010-01-22 20:50:33 GMT] NPLB5thDrive: Getting work from server nplb-gb1.no-ip.org at port 3000
(synopsis: got work without a problem, tried returning it without getting through twice, then got through properly, and the test was ignored because "Candidate and/or test was not found", then was unable to connect to get work, and moved on to another server.)
The first computer still has yet to make a connection, so I've stopped it for now.
Just saw your edit. This is definitely very strange. I'll look into it and see what's up.
mdettweiler is offline   Reply With Quote
Old 2010-01-22, 21:03   #5
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Okay, I just took a look at the server, and while I didn't find much of particular interest, I did notice that the VNC connection would sometimes work at a normal speed, and other times it would be really slow.

I wonder if maybe Gary is hogging up the internet connection with something...though I can't fathom what would be quite this hoggish. Neither can that really explain the issue of the "Candidate and/or test was not found" message you got.

Anyway, I've restarted the server. It may be that this problem is due to some stupid mistake I made; give it a try now and see how it works.
mdettweiler is offline   Reply With Quote
Old 2010-01-22, 21:12   #6
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Oh! I think I know what's going on. I just tried putting a client (Windows, 3.1.3) of my own on the server, and observed it behaving like this:

-It would ask for work from the server.
-About 9 seconds later, the server would respond with a test.
-Within 3 or 4 seconds, the client would finish the test. (Hey, they're pretty small.)
-The client would send the test back to the server.
-The server would accept it almost momentarily.

Note that 9 seconds is a really long time for the server to respond. I think this has to do with the fact that I loaded the server with an absolutely enormous number of candidates. Methinks it's taking a while for the MySQL server to look in the database and come up with a test. It took about that long when I tried to view the Candidate table manually from the console.

Sometimes, though, normal variation would push the delay over the critical 10 second mark--which means the client's timeout kicks in and it gives up. Hence the problems Mini-Geek was seeing; I think he was getting timeouts a tad more often than me, probably due to small differences in the latency of his internet connection vs. mine.

I'm going to try re-loading the server with a smaller batch of work--say, k=2000-2050 instead of 2000-2200. That should make for a less bloated database and hopefully fix this problem. Note: This means the server will be offline for up to 10 minutes or so.

Last fiddled with by mdettweiler on 2010-01-22 at 21:13
mdettweiler is offline   Reply With Quote
Old 2010-01-22, 21:15   #7
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

2×3,019 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Oh! I think I know what's going on. I just tried putting a client (Windows, 3.1.3) of my own on the server, and observed it behaving like this:

-It would ask for work from the server.
-About 9 seconds later, the server would respond with a test.
-Within 3 or 4 seconds, the client would finish the test. (Hey, they're pretty small.)
-The client would send the test back to the server.
-The server would accept it almost momentarily.

Note that 9 seconds is a really long time for the server to respond. I think this has to do with the fact that I loaded the server with an absolutely enormous number of candidates. Methinks it's taking a while for the MySQL server to look in the database and come up with a test. It took about that long when I tried to view the Candidate table manually from the console.

Sometimes, though, normal variation would push the delay over the critical 10 second mark--which means the client's timeout kicks in and it gives up. Hence the problems Mini-Geek was seeing; I think he was getting timeouts a tad more often than me, probably due to small differences in the latency of his internet connection vs. mine.

I'm going to try re-loading the server with a smaller batch of work--say, k=2000-2050 instead of 2000-2200. That should make for a less bloated database and hopefully fix this problem. Note: This means the server will be offline for up to 10 minutes or so.
How many candidates were loaded into the server when you had the problems? I wonder if the database needs an index or two.
rogue is online now   Reply With Quote
Old 2010-01-22, 21:18   #8
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

141518 Posts
Default

Quote:
Originally Posted by rogue View Post
How many candidates were loaded into the server when you had the problems? I wonder if the database needs an index or two.
There were about 726,000 candidates loaded at the time. I'm currently loading up about a quarter of that after having dumped out the server's DB.
mdettweiler is offline   Reply With Quote
Old 2010-01-22, 21:23   #9
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Okay, I've now got the server loaded with just k=2000-2050 now. Let's see how that one works.
mdettweiler is offline   Reply With Quote
Old 2010-01-22, 21:32   #10
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

It seems to be working all right now--the server's taking about 2 seconds to respond to requests for work, which is pretty much perfect. I'll refrain from making hasty generalizations lest I be forced to later put my foot in my mouth...we'll see how it goes.

Last fiddled with by mdettweiler on 2010-01-22 at 21:33 Reason: corrected # of seconds
mdettweiler is offline   Reply With Quote
Old 2010-01-22, 21:34   #11
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

17·251 Posts
Default

I thought I should post my experiences with 3.1.3:
I've been using PRPnet 3.1.3 a bit, and while it all worked just fine in sending and receiving work on my local box, when I tried to run it from another machine on my network, the client would only understand a portion of what the server was sending. The rest were, like what is apparently happening here, being marked as reserved on the server, but not being received and run on the client. Getting "Candidate and/or test was not found" was pretty rare, but would happen with the entire batch whenever it did (4 such batches over ~4000 tests). I checked the server logs for one such candidate, and here's it's story: (as the server knows it)
Code:
prpserver.log:
[2010-01-21 18:05:42 GMT] 809997332*3^10319-1 sent to Email: tim.sorbera@gmail.com  User: Mini-Geek  Client: dad2
[2010-01-21 18:11:27 GMT] Test of 809997332*3^10319-1 for user tim.sorbera@gmail.com and client dad2 has expired.
(a few more assignments/expirations every 5 minutes, then:)
[2010-01-21 18:33:05 GMT] Test of 809997332*3^10319-1 for user tim.sorbera@gmail.com and client dad2 has expired.
[2010-01-21 18:33:08 GMT] 809997332*3^10319-1 sent to Email: tim.sorbera@gmail.com  User: Mini-Geek  Client: dad1
[2010-01-21 18:40:09 GMT] tim.sorbera@gmail.com (dad2): Test 1264098464 for candidate 809997332*3^10319-1 was not found

completed_tests.log:
[2010-01-21 18:33:22 GMT] 809997332*3^10319-1 received by Email: tim.sorbera@gmail.com  User: Mini-Geek  Client: dad1 Program: pfgw.exe  Residue: A1599B2980DBDA7B
I had about 22000 candidates loaded into the server.

Now, in response to the recent change:
I can now get and return work on both computers, but now on both I sometimes (pretty often, maybe every 5-10 times I communicate) get the "No available candidates" message. I know there are fewer candidates, but it's not THAT much less!

Last fiddled with by Mini-Geek on 2010-01-22 at 21:35
Mini-Geek is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
PRPNet server for personal use johnadam74 Software 2 2016-01-01 15:58
New SR5 PRPnet server online ltd Sierpinski/Riesel Base 5 15 2013-03-19 18:03
First PSP PRPnet 4.0.6 server online ltd Prime Sierpinski Project 9 2011-03-15 04:58
First pass PRPNet server out of work? opyrt Prime Sierpinski Project 6 2009-09-24 18:14
PRPnet beta test server mdettweiler No Prime Left Behind 108 2009-07-15 00:03

All times are UTC. The time now is 01:57.

Thu Dec 3 01:57:10 UTC 2020 up 83 days, 23:08, 1 user, load averages: 2.33, 2.47, 2.22

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.