![]() |
![]() |
#1 |
A Sunny Moo
Aug 2007
USA (GMT-5)
3×2,083 Posts |
![]()
Hi all,
As I've previously mentioned on a few occasions, Gary and I had been planning for a while to run a stress test on PRPnet 3.1.3 with lots of cores and very small candidates to ensure that the latest PRPnet can handle high loads. I do expect it to cope well based on testing performed at PrimeGrid, though nonetheless the testing done here will be valuable as it will show us whether Gary's setup in particular can handle the load. To this effect, I have set up a new PRPnet 3.1.3 server and loaded it with work from k=2000-2200, n=50K-250K--i.e., a doublecheck of the 12th Drive. The server info is as follows: server = nplb-gb1.no-ip.org port = 7465 Or, in terms of a server= line for use in prpclient.ini: server=G7465:100:1:nplb-gb1.no-ip.org:7465 Note that in the above line I've set the batch size to 1. Normally this is NOT what you'd want to do for small tests like this, as for numbers this small the overhead actually adds up to a nonnegligable amount of wasted CPU power. It also puts a much higher load on the server than, say, a batch size of 20 would. But in this case, high load is what we're aiming for. ![]() Gary, as soon as I can get a Linux client package ready for PRPnet 3.1.3, I can send you a preconfigured package to drop on all of your quads. 12 quads * 4 cores = 48 cores, plus whatever I and anyone else can throw on there, so we should be over the magic number of 50. If anyone else wants to put a few cores on the server, go right ahead--the more load, the better. ![]() Max ![]() |
![]() |
![]() |
![]() |
#2 |
Account Deleted
"Tim Sorbera"
Aug 2006
San Antonio, TX USA
10AB16 Posts |
![]()
Maybe I was just too quick, but I just set two Intel cores on it and they couldn't connect to the server. Are you still trying to get it set up?
Edit: I tried from another computer, and got this: Code:
[2010-01-22 20:49:33 GMT] PRPNet Client application v3.1.3 started [2010-01-22 20:49:33 GMT] User name Mini-Geek at email address is tim.sorbera@gmail.com [2010-01-22 20:49:36 GMT] G7465: Getting work from server nplb-gb1.no-ip.org at port 7465 [2010-01-22 20:49:45 GMT] G7465: PRPNet server is version 3.1.3 [2010-01-22 20:49:56 GMT] G7465: 2001*2^50150-1 is not prime. Residue 3A543F7AC1D6C29D [2010-01-22 20:49:56 GMT] Total Time: 0:00:23 Total Tests: 1 Total PRPs Found: 0 [2010-01-22 20:49:56 GMT] G7465: Returning work to server nplb-gb1.no-ip.org at port 7465 [2010-01-22 20:50:06 GMT] Nothing was received on socket 368, therefore the socket was closed [2010-01-22 20:50:07 GMT] Nothing was received on socket 364, therefore the socket was closed [2010-01-22 20:50:07 GMT] Total Time: 0:00:34 Total Tests: 1 Total PRPs Found: 0 [2010-01-22 20:50:07 GMT] G7465: Returning work to server nplb-gb1.no-ip.org at port 7465 [2010-01-22 20:50:07 GMT] G7465: INFO: Test for 2001*2^50150-1 was ignored. Candidate and/or test was not found [2010-01-22 20:50:07 GMT] G7465: INFO: 0 of 1 test results were accepted [2010-01-22 20:50:07 GMT] G7465: Getting work from server nplb-gb1.no-ip.org at port 7465 [2010-01-22 20:50:18 GMT] Nothing was received on socket 1656, therefore the socket was closed [2010-01-22 20:50:28 GMT] Nothing was received on socket 368, therefore the socket was closed [2010-01-22 20:50:28 GMT] Could not verify connection to nplb-gb1.no-ip.org. Will try again later. [2010-01-22 20:50:29 GMT] nplb-gb1.no-ip.org:7000 connect to socket failed [2010-01-22 20:50:33 GMT] NPLB5thDrive: Getting work from server nplb-gb1.no-ip.org at port 3000 The first computer still has yet to make a connection, so I've stopped it for now. Last fiddled with by Mini-Geek on 2010-01-22 at 20:55 |
![]() |
![]() |
![]() |
#3 | |
A Sunny Moo
Aug 2007
USA (GMT-5)
3×2,083 Posts |
![]() Quote:
I just looked at the server and I see your clients' requests for work, but it appears that they aren't being given candidates to test. Definitely strange. Upon looking back through debug.log, I'm seeing some messages that make me wonder if there's a problem with how the server's connecting to the DB. I'll look into it now. |
|
![]() |
![]() |
![]() |
#4 | |
A Sunny Moo
Aug 2007
USA (GMT-5)
3×2,083 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#5 |
A Sunny Moo
Aug 2007
USA (GMT-5)
3×2,083 Posts |
![]()
Okay, I just took a look at the server, and while I didn't find much of particular interest, I did notice that the VNC connection would sometimes work at a normal speed, and other times it would be really slow.
I wonder if maybe Gary is hogging up the internet connection with something...though I can't fathom what would be quite this hoggish. Neither can that really explain the issue of the "Candidate and/or test was not found" message you got. Anyway, I've restarted the server. It may be that this problem is due to some stupid mistake I made; give it a try now and see how it works. |
![]() |
![]() |
![]() |
#6 |
A Sunny Moo
Aug 2007
USA (GMT-5)
3·2,083 Posts |
![]()
Oh! I think I know what's going on. I just tried putting a client (Windows, 3.1.3) of my own on the server, and observed it behaving like this:
-It would ask for work from the server. -About 9 seconds later, the server would respond with a test. -Within 3 or 4 seconds, the client would finish the test. (Hey, they're pretty small.) -The client would send the test back to the server. -The server would accept it almost momentarily. Note that 9 seconds is a really long time for the server to respond. I think this has to do with the fact that I loaded the server with an absolutely enormous number of candidates. Methinks it's taking a while for the MySQL server to look in the database and come up with a test. It took about that long when I tried to view the Candidate table manually from the console. Sometimes, though, normal variation would push the delay over the critical 10 second mark--which means the client's timeout kicks in and it gives up. Hence the problems Mini-Geek was seeing; I think he was getting timeouts a tad more often than me, probably due to small differences in the latency of his internet connection vs. mine. I'm going to try re-loading the server with a smaller batch of work--say, k=2000-2050 instead of 2000-2200. That should make for a less bloated database and hopefully fix this problem. Note: This means the server will be offline for up to 10 minutes or so. Last fiddled with by mdettweiler on 2010-01-22 at 21:13 |
![]() |
![]() |
![]() |
#7 | |
"Mark"
Apr 2003
Between here and the
11000100100102 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#8 |
A Sunny Moo
Aug 2007
USA (GMT-5)
3×2,083 Posts |
![]()
There were about 726,000 candidates loaded at the time. I'm currently loading up about a quarter of that after having dumped out the server's DB.
|
![]() |
![]() |
![]() |
#9 |
A Sunny Moo
Aug 2007
USA (GMT-5)
3·2,083 Posts |
![]()
Okay, I've now got the server loaded with just k=2000-2050 now. Let's see how that one works.
![]() |
![]() |
![]() |
![]() |
#10 |
A Sunny Moo
Aug 2007
USA (GMT-5)
186916 Posts |
![]()
It seems to be working all right now--the server's taking about 2 seconds to respond to requests for work, which is pretty much perfect. I'll refrain from making hasty generalizations lest I be forced to later put my foot in my mouth...we'll see how it goes.
![]() Last fiddled with by mdettweiler on 2010-01-22 at 21:33 Reason: corrected # of seconds |
![]() |
![]() |
![]() |
#11 |
Account Deleted
"Tim Sorbera"
Aug 2006
San Antonio, TX USA
102538 Posts |
![]()
I thought I should post my experiences with 3.1.3:
I've been using PRPnet 3.1.3 a bit, and while it all worked just fine in sending and receiving work on my local box, when I tried to run it from another machine on my network, the client would only understand a portion of what the server was sending. The rest were, like what is apparently happening here, being marked as reserved on the server, but not being received and run on the client. Getting "Candidate and/or test was not found" was pretty rare, but would happen with the entire batch whenever it did (4 such batches over ~4000 tests). I checked the server logs for one such candidate, and here's it's story: (as the server knows it) Code:
prpserver.log: [2010-01-21 18:05:42 GMT] 809997332*3^10319-1 sent to Email: tim.sorbera@gmail.com User: Mini-Geek Client: dad2 [2010-01-21 18:11:27 GMT] Test of 809997332*3^10319-1 for user tim.sorbera@gmail.com and client dad2 has expired. (a few more assignments/expirations every 5 minutes, then:) [2010-01-21 18:33:05 GMT] Test of 809997332*3^10319-1 for user tim.sorbera@gmail.com and client dad2 has expired. [2010-01-21 18:33:08 GMT] 809997332*3^10319-1 sent to Email: tim.sorbera@gmail.com User: Mini-Geek Client: dad1 [2010-01-21 18:40:09 GMT] tim.sorbera@gmail.com (dad2): Test 1264098464 for candidate 809997332*3^10319-1 was not found completed_tests.log: [2010-01-21 18:33:22 GMT] 809997332*3^10319-1 received by Email: tim.sorbera@gmail.com User: Mini-Geek Client: dad1 Program: pfgw.exe Residue: A1599B2980DBDA7B Now, in response to the recent change: I can now get and return work on both computers, but now on both I sometimes (pretty often, maybe every 5-10 times I communicate) get the "No available candidates" message. I know there are fewer candidates, but it's not THAT much less! ![]() Last fiddled with by Mini-Geek on 2010-01-22 at 21:35 |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
PRPNet server for personal use | johnadam74 | Software | 2 | 2016-01-01 15:58 |
New SR5 PRPnet server online | ltd | Sierpinski/Riesel Base 5 | 15 | 2013-03-19 18:03 |
First PSP PRPnet 4.0.6 server online | ltd | Prime Sierpinski Project | 9 | 2011-03-15 04:58 |
First pass PRPNet server out of work? | opyrt | Prime Sierpinski Project | 6 | 2009-09-24 18:14 |
PRPnet beta test server | mdettweiler | No Prime Left Behind | 108 | 2009-07-15 00:03 |