mersenneforum.org PRPnet 3.1.3 stress-test server
 Register FAQ Search Today's Posts Mark Forums Read

 2010-01-22, 21:35 #12 mdettweiler A Sunny Moo     Aug 2007 USA (GMT-5) 3·2,083 Posts In other news, the server seems to be responding quite well to multiple simultaneous connections: I tried refreshing the main web page at the same time my client hit the server, and neither was impacted.
2010-01-22, 21:41   #13
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

3·2,083 Posts

Quote:
 Originally Posted by Mini-Geek Now, in response to the recent change: I can now get and return work on both computers, but now on both I sometimes (pretty often, maybe every 5-10 times I communicate) get the "No available candidates" message. I know there are fewer candidates, but it's not THAT much less!
Yeah, same here:
Code:
[2010-01-22 21:34:47 GMT] G7465: Returning work to server nplb-gb1.no-ip.org at port 7465
[2010-01-22 21:34:48 GMT] G7465: INFO: Test for 2001*2^52481-1 was accepted
[2010-01-22 21:34:48 GMT] G7465: INFO: All 1 test results were accepted
[2010-01-22 21:34:48 GMT] G7465: Getting work from server nplb-gb1.no-ip.org at port 7465
[2010-01-22 21:34:51 GMT] G7465: INFO: No available candidates are left on this server.
It seems that while the timeout issue was fixed, these still pop up once in a while. I wonder what's causing them? They seem to be somewhat harmless as the client can generally grab a pair successfully on the next try, but nonetheless they are a problem. I wonder if this is because (say) two clients are trying to connect simultaneously and the DB can't give them both a new pair at the same time?

BTW, I eat my words about 3.1.2 clients being cross-comptatible with 2.4.6 servers. After one of these "no available candidates" thingies, my client pulled a candiadate from G2000, which is on 2.4.6. That worked all right (albeit with one or two little errors that didn't seem to impact anything), but now the client's driving G2000 nuts trying to return the result. It would appear that 2.4.6 doesn't take well to 3.1.2's trying to send it the test time, a new feature added in version 3.

 2010-01-22, 21:42 #14 gd_barnes     May 2007 Kansas; USA 26·3·53 Posts Hum. Max, can you send me a care package for the newest server? I won't be able to dogpile on it until later this evening. I'm seeing a couple of error messages coming across: "Could not open file: Greeting.txt" (Looks like no big deal. Do you need a greeting file there, Max?) And of course our favorite, which should be a concern: "Nothing was received on socket 5, therefore the socket was closed." Better check into that last one. Gary
2010-01-22, 21:49   #15
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

141518 Posts

Quote:
 Originally Posted by gd_barnes Hum. Max, can you send me a care package for the newest server? I won't be able to dogpile on it until later this evening.
Sure. But, actually, you may want to hold off on the dogpile for a wee bit; I'd kind of like to nail down this problem with the "no candidates on server" message first.

Quote:
 I'm seeing a couple of error messages coming across: "Could not open file: Greeting.txt" (Looks like no big deal. Do you need a greeting file there, Max?)
Correct, no big deal. I could put something there but there's no particular need for it.

Quote:
 And of course our favorite, which should be a concern: "Nothing was received on socket 5, therefore the socket was closed." Better check into that last one.
I just took a look at the server (you probably noticed ) and didn't see any of those, though surely they are out there. I'm not sure exactly what, if any, connection that has to the "no candidates on server" message we're seeing here, though it might be the server's end of that. I'll keep checking the server and see if I can spot it.

2010-01-22, 22:04   #16
rogue

"Mark"
Apr 2003
Between here and the

5×1,171 Posts

Quote:
 Originally Posted by Mini-Geek Now, in response to the recent change: I can now get and return work on both computers, but now on both I sometimes (pretty often, maybe every 5-10 times I communicate) get the "No available candidates" message. I know there are fewer candidates, but it's not THAT much less!
What OS are you using? 3.1.4 has a patch that is specific to some instances of *nix (notably Ubuntu) that peg the CPU when using select() on a socket.

Regarding test expiration, what do you have in prpserver.delay? It seems that tests are expiring too quickly.

I have rarely seen the 'No available candidates" message, but not after I switched the database engine. Are you using the InnoDB database engine? I suggest turning on debugging (set debuglevel=3 in prpserver.ini) and sending the log to me when you see this happen again. I suspect it to be a database issue (and not a code issue), unless there is some fundamental misunderstanding I have regarding MySQL.

Another thing you could try is to add this index to the Candidate table.

alter table Candidate add index ix_test (HasPendingTest, CompletedTests, DoubleChecked, DecimalLength);

Maybe this will also address slowdowns in communications when a client gets work.

2010-01-22, 22:27   #17
Mini-Geek
Account Deleted

"Tim Sorbera"
Aug 2006
San Antonio, TX USA

17·251 Posts

Quote:
 Originally Posted by rogue What OS are you using? 3.1.4 has a patch that is specific to some instances of *nix (notably Ubuntu) that peg the CPU when using select() on a socket.
All of my clients and servers are Windows.
I don't know if it's related to using select() or what, but I've noticed that (with 3.1.3 on Windows, at least) the clients basically peg a core during the time that it's communicating with the server (making it keep the core pegged almost constantly, not just when having work for its helper apps).
I think the GB servers are run on Linux.

Last fiddled with by Mini-Geek on 2010-01-22 at 22:57

2010-01-22, 23:16   #18
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

141518 Posts

Quote:
 Originally Posted by rogue What OS are you using? 3.1.4 has a patch that is specific to some instances of *nix (notably Ubuntu) that peg the CPU when using select() on a socket.
As Mini-Geek said, the GB servers are all running on Linux, specifically Ubuntu in fact. (Unless this is just an issue with the client?)

Quote:
 Regarding test expiration, what do you have in prpserver.delay? It seems that tests are expiring too quickly.
prpserver.delay is set to 2 days for all candidate sizes.

Quote:
 I have rarely seen the 'No available candidates" message, but not after I switched the database engine. Are you using the InnoDB database engine? I suggest turning on debugging (set debuglevel=3 in prpserver.ini) and sending the log to me when you see this happen again. I suspect it to be a database issue (and not a code issue), unless there is some fundamental misunderstanding I have regarding MySQL.
I don't think I'm using InnoDB, though admittedly I have absoultely no idea what InnoDB is, so if it's the default then I might be using it.

I currently have the debug level set to log socket communication only (level 2). I don't see a level 3 in the choices; I presume you mean level 1 (socket+database)? I've set the server to do that now and will send you the log if I see any more of those errors. It seems all the other clients except mine have dropped off, so if the problem occurs only when multiple clients are hitting the server simultaneously as I'm suspecting, it probably won't occur just yet. (Hey, if anyone else wants to chuck a core or two back on there for a little while, it would be greatly appreciated...shouldn't take too long to trigger one of those errors on somebody's client somewhere.)
Quote:
 Another thing you could try is to add this index to the Candidate table. alter table Candidate add index ix_test (HasPendingTest, CompletedTests, DoubleChecked, DecimalLength); Maybe this will also address slowdowns in communications when a client gets work.
Okay, I've added the index. I didn't notice any speedup when I looked at the client immediately after applying the index, but it's not like it was too terribly bad with this smaller number of candidates anyway, so there may not be much improvement to be had at this point. Once we've nailed down the issue with the "no available candidates" I'll try loading the full gamut of tests into the server and see if the index helps with that.

2010-01-22, 23:28   #19
rogue

"Mark"
Apr 2003
Between here and the

5·1,171 Posts

Quote:
 Originally Posted by mdettweiler I don't think I'm using InnoDB, though admittedly I have absoultely no idea what InnoDB is, so if it's the default then I might be using it. I currently have the debug level set to log socket communication only (level 2). I don't see a level 3 in the choices; I presume you mean level 1 (socket+database)? I've set the server to do that now and will send you the log if I see any more of those errors. It seems all the other clients except mine have dropped off, so if the problem occurs only when multiple clients are hitting the server simultaneously as I'm suspecting, it probably won't occur just yet. (Hey, if anyone else wants to chuck a core or two back on there for a little while, it would be greatly appreciated...shouldn't take too long to trigger one of those errors on somebody's client somewhere.) Okay, I've added the index. I didn't notice any speedup when I looked at the client immediately after applying the index, but it's not like it was too terribly bad with this smaller number of candidates anyway, so there may not be much improvement to be had at this point. Once we've nailed down the issue with the "no available candidates" I'll try loading the full gamut of tests into the server and see if the index helps with that.
I posted in PRPNet announcements thread how to convert the tables to the InnoDB database engine. It's fairly straightforward. It is not the default. If you created the tables with the current script (instead of upgrading), they should be InnoDB. Use "show table status;" from the MySQL client to tell you which database engine is being used on the tables.

debuglevel=3 is not documented due to an oversight on my part. I'll post 3.1.4 later tonight. It will address the pegging CPU issue.

I need to modify ExpireTests() in prpserver.cpp to give more details behind tests expiring.

 2010-01-23, 02:21 #20 gd_barnes     May 2007 Kansas; USA 26×3×53 Posts OK, just let me know when to dogpile on it. Max, I saw the "socket not closed" message early on not long after the server started but I saw it again 5-10 mins. later after candidates were coming through "normally".
2010-01-23, 05:27   #21
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

11000011010012 Posts

Quote:
 Originally Posted by rogue I posted in PRPNet announcements thread how to convert the tables to the InnoDB database engine. It's fairly straightforward. It is not the default. If you created the tables with the current script (instead of upgrading), they should be InnoDB. Use "show table status;" from the MySQL client to tell you which database engine is being used on the tables. debuglevel=3 is not documented due to an oversight on my part. I'll post 3.1.4 later tonight. It will address the pegging CPU issue. I need to modify ExpireTests() in prpserver.cpp to give more details behind tests expiring.
Ah, okay. In that case, then, yes, I'm using InnoDB, since I created the tables with the create_tables.sql script that came with 3.1.3.

BTW, what exactly does debuglevel=3 do?

Last fiddled with by mdettweiler on 2010-01-23 at 05:28

2010-01-23, 14:49   #22
rogue

"Mark"
Apr 2003
Between here and the

585510 Posts

Quote:
 Originally Posted by mdettweiler Ah, okay. In that case, then, yes, I'm using InnoDB, since I created the tables with the create_tables.sql script that came with 3.1.3. BTW, what exactly does debuglevel=3 do?
It will log the candidates selected by the candidate selector (for handing out tests). Although it won't state why a candidate is not sent to a client, it can at least indicate if candidates with no pending tests or completed tests are getting selected by the cursor.

 Similar Threads Thread Thread Starter Forum Replies Last Post johnadam74 Software 2 2016-01-01 15:58 ltd Sierpinski/Riesel Base 5 15 2013-03-19 18:03 ltd Prime Sierpinski Project 9 2011-03-15 04:58 opyrt Prime Sierpinski Project 6 2009-09-24 18:14 mdettweiler No Prime Left Behind 108 2009-07-15 00:03

All times are UTC. The time now is 20:57.

Thu Aug 13 20:57:51 UTC 2020 up 17:33, 2 users, load averages: 1.76, 1.84, 1.92