mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > No Prime Left Behind

Reply
 
Thread Tools
Old 2010-01-24, 03:39   #56
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

2·32·52·13 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Hmm, right. Mark, could you possibly add that info to prpclient.log in 3.1.5?

Speaking of which, it would be helpful to have that in the server's logs as well. Heck, I'm not even seeing anything about returned tests in prpserver.log--is this an error?


I'll forewarn you, this may not be the the definitive dogpile--I don't think this release has even addressed the "no candidates on server" message yet (Mark, correct me if I'm wrong here), so we'll need to re-test after that's fixed.
In regards to your first question, it could be added, but I don't understand the need.

As for the second question, the tests should be logged to to prpserver.log. I know they are written to completed_tests.log, so the issue is rather minor. I probably did something stupid. I'll investigate.

Regarding your last question, I don't have enough data to start looking into the issue.

For Gary's issue, it appears that the client disconnected while the server was trying to send a test. Gary, did the client disconnect during the connection? The duplicate keys are occurring on the CandidateTest table. I made a number of changes in 3.1.2 which should have addressed that problem, so I don't understand why it is happening. Can you send me more of the log so that I can see the communication between server and clients?
rogue is offline   Reply With Quote
Old 2010-01-24, 04:36   #57
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

52×11×37 Posts
Default

Quote:
Originally Posted by rogue View Post
In regards to your first question, it could be added, but I don't understand the need.
I got confused as to your "1st question", "2nd question" numbering. I don't really know what question you're referring to.

I hope this response is NOT you referring to the need for testing time on each candiate because I really don't want to keep going into the reason. Both Ian and I have brought that up and it has been mentioned at least twice and now 3 times. We need the testing time for a candidate on both the client and server side to determine workflow on our machines.

What we're saying is that it doesn't work to have the test time in a temporary file. It needs to be in a permanent file. LLR, PFGW, Proth, Proth, and LLRnet have the test time in a permanent file. PRPnet needs that test time too.

We are respectfully requesting that you add testing time per candidate to both the server and client side in permanent files with release 3.1.5. It is what we have asked for in the "requirements for PRPnet" thread.

That said, assuming you're not referring to the testing time, perhaps my server ignorance is getting the best of me here. I don't know what Max really means by the fact that he is "not even seeing anything about returned tests in prpserver.log--is this an error?".

Is the previous sentence about returned tests what you are referring to when you said you don't see the need? If so, I'll leave that to you and him.

I'll see what I can get you as far as more of those error messages go. I stopped my client one time: To simply change it from batching 3 tests to 1 test. I then restarted it shortly afterwards. I'm running Linux 8.04 64-bit.

I have a suggestion: After making changes for a release but before releasing it to us, why not put 2 quads on it at a low n-range and let it rip for testing. Lennart and I each only had about a quad on it and these problems are coming out. I'm pretty sure you would have had the same.

We could send you a file. That way, you can see quite a few problems ahead of time before a public release. I know you only have a Windows set up but to me, these problems appear independent of the OS being used. It would save an immense amount of everyone's time if we didn't have to keep testing one change at a time.


Thank you,
Gary

Last fiddled with by gd_barnes on 2010-01-24 at 06:34
gd_barnes is offline   Reply With Quote
Old 2010-01-24, 06:51   #58
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

236778 Posts
Default

Now a different error is appearing. I think that is because it's taking a very long time (10-15 secs.) each time it needs to retrieve a batch. In my case, the batch is 1 pair so we can stress test it as much as possible. The error message that is appearing is the dreaded:

"Nothing was received on socket 3, therefore the socket was closed."

This seems to happen every time it returns a batch.

Because I don't know all of what you need, I'm attaching the entire prpserver.log file. In the next post, I'll attach the entire client log file from one of my clients.

You'll also notice that the "MySQL" error referred to in my recent post is sprinkled out fairly regularly throughout the entire file.


Gary
Attached Files
File Type: bz2 prpserver.log.tar.bz2 (117.5 KB, 65 views)

Last fiddled with by gd_barnes on 2010-01-24 at 07:20
gd_barnes is offline   Reply With Quote
Old 2010-01-24, 07:20   #59
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

1017510 Posts
Default

Well, I've discovered something kind of interesting. Max, in the PRPnet thread, you've configured the .ini file in the 3.1.4 client to include most of NPLB's servers, even though this server is SQL and all prior servers are not.

Now, since all the servers were in the .ini file (with 100% on G7465) whenever there was a connect problem (or some problem of that nature) on this test, what would happen is that it would go and retrieve a pair from the 0% port 3000 or port 5000, which you have listed as 2nd and 3rd. But the problem with that is that those are PRPnet 2.4.6. Anyway, you get the drift. 3.1.4 client, 2.4.6 server. It kept trying to return them and it didn't work.

But...don't stop reading...

That said, I was still getting the dreaded "nothing was received" message on some of the port 7465 connects so the problem still exists.

In looking at the previous MySQL error messages and the nothing was received error messages, they are clearly still related to our current test. It appears that Lennart is still getting them too.

I'm bringing this up because I'm not going to attach the prpclient.log file just yet. I'm stopping my clients, changing the .ini file to only go after port 7465 and restarting them. But before restarting them, I'm going to rename the prpclient.log file so that we can tell what was happening before and after the change. I don't want prior "valid" error messages to be lost.

In a little while, I'll attach a prpclient.log file without any possible testing on ports 3000 and 5000.

I hope I haven't messed up any pairs on ports 3000 and 5000. In other words, if the server couldn't handle the pairs, I hope they'll eventually be handed back out to someone.

Max, can you please change the .ini file in the PRPnet client in the PRPnet thread so that this doesn't happen to others? Just listing one server there is all that is needed. Perhaps a comment right above it to indicate that more than one server with percentages can be added if needed.


Gary
gd_barnes is offline   Reply With Quote
Old 2010-01-24, 08:11   #60
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

52·11·37 Posts
Default

With Lennart now throwing a whole bunch of cores on there, we may be able to isolate some things. I can say that we're still getting the MySQL error messages and "nothing returned" error messages from time to time. On my clients, since they are only configured to port 7465 now, whenever they can't connect within about 3 seconds, I get a "No available candidates are left on this server." and a "Could not connect to any servers and no work is pending. Pausing 1 minute." error message. It then waits a minute and seems to connect OK.

I'm now attaching a prpclient.log file.

Here is a thought: Shouldn't it wait more than 3 seconds before either looking for another server or befor pausing for a minute? I know we don't want to wait too long because there are usually secondary servers to access but IMHO, it seems like it should wait as long as 10 seconds before looking elsewhere or pausing.


Gary
Attached Files
File Type: bz2 prpclient.log.tar.bz2 (3.4 KB, 58 views)
gd_barnes is offline   Reply With Quote
Old 2010-01-24, 13:49   #61
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

133328 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
I hope this response is NOT you referring to the need for testing time on each candiate because I really don't want to keep going into the reason. Both Ian and I have brought that up and it has been mentioned at least twice and now 3 times. We need the testing time for a candidate on both the client and server side to determine workflow on our machines.

What we're saying is that it doesn't work to have the test time in a temporary file. It needs to be in a permanent file. LLR, PFGW, Proth, Proth, and LLRnet have the test time in a permanent file. PRPnet needs that test time too.
The test time is in the database, but is not written to either the client or server log (unless debugging is enabled). I was trying to understand why the client would need to log that time. Note that the client does not log completed tests locally. Is that what you are really asking for?

I looked at the log files. Unfortunately I need the test run with debuglevel=4 (on the server) to diagnose. I see the errors (duplicate keys), but I don't understand why that is happening. When using InnoDB, MySQL supports transactions. I turn off autocommit and lock the database row to prevent other threads from accessing it. This is all done before the thread updates the database. It appears (at first glance) that the database row is not getting locked or that there is no transaction. I have a way around this, BUT I want to see that debug log before I can be certain that the problem is what I think it is.

Note that with debuglevel=4, that file will grow very quickly. It is best to get the "duplicate key" error a few times, then terminate the server and set it back to debuglevel=0.

I also need that same setting on the server to understand why you are getting the "No available candidates on server" message. I don't understand why the server is not finding any candidates for the client and setting debuglevel=4 on the server will help me diagnose that.

Last fiddled with by rogue on 2010-01-24 at 14:30
rogue is offline   Reply With Quote
Old 2010-01-24, 17:45   #62
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Quote:
Originally Posted by rogue View Post
I looked at the log files. Unfortunately I need the test run with debuglevel=4 (on the server) to diagnose. I see the errors (duplicate keys), but I don't understand why that is happening. When using InnoDB, MySQL supports transactions. I turn off autocommit and lock the database row to prevent other threads from accessing it. This is all done before the thread updates the database. It appears (at first glance) that the database row is not getting locked or that there is no transaction. I have a way around this, BUT I want to see that debug log before I can be certain that the problem is what I think it is.

Note that with debuglevel=4, that file will grow very quickly. It is best to get the "duplicate key" error a few times, then terminate the server and set it back to debuglevel=0.

I also need that same setting on the server to understand why you are getting the "No available candidates on server" message. I don't understand why the server is not finding any candidates for the client and setting debuglevel=4 on the server will help me diagnose that.
Mark, I've sent you a debug log from the last day or so on the server with debuglevel=3 as previously requested. Is this good, or should I set it to debuglevel=4 and get some more data?
mdettweiler is offline   Reply With Quote
Old 2010-01-24, 17:48   #63
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

2×32×52×13 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Mark, I've sent you a debug log from the last day or so on the server with debuglevel=3 as previously requested. Is this good, or should I set it to debuglevel=4 and get some more data?
I received your e-mail, so I'll look into it. Hopefully it is enough when combined with the other logs.
rogue is offline   Reply With Quote
Old 2010-01-24, 17:54   #64
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

BTW @Gary regarding the 2.4.6 servers with the 3.1.4 clients: oops, my bad. When I put them in there like that I was under the mistaken impression that they were backwards-compatible. No, it shouldn't mess up anything on the server; what's going to happen is just that those k/n pairs will be left stranded, and will expire in 2 days (maybe 1 day, I forget what they were set to) to be handed out to someone else.

I'd recommend that you, and anyone else in a similar situation, stop your 3.1.4 clients, delete any "work_G3000.save" or "work_G5000.save" files (i.e. any 2.4.6 servers); comment out the lines for those servers in prpclient.ini (put a // in front of them); and restart the clients. That will ensure that they won't keep trying to return work futilely to servers they're not compatible with.
mdettweiler is offline   Reply With Quote
Old 2010-01-24, 22:55   #65
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

52×11×37 Posts
Default

Quote:
Originally Posted by rogue View Post
The test time is in the database, but is not written to either the client or server log (unless debugging is enabled). I was trying to understand why the client would need to log that time. Note that the client does not log completed tests locally. Is that what you are really asking for?
We are not communicating because this should not be such a big issue. We've asked for this many times in order for PRPnet to replace LLRnet. Have you looked at an LLRnet server and the results that it puts out on both the client and server side? Those are almost exactly what we need.

We need the test time in the prpclient.log file and preferrably on the equivalent file on the server side.

I have a question for you. If you are running a client and you don't have access to the server/database, how would you get the test time? You couldn't without jumping through hoops to calculate it based on the difference in the time of day from the last test. How is a person with 40-50 clients supposed to see how long their tests take?

Let me spell this out in detail. Here is a cut-and-paste of what LLRnet gives us:

Client lresults.txt file:
2959*2^522293-1 is not prime. Res64: 31FE419D5EEE8F44 Time : 1800.608 sec.
Result 2959/522293 succesfully sent to the server.

Server results.txt file:
user=gd_barnes
[2010-01-24 00:01:02]
2757*2^526348-1 is not prime. Res64: 4EE56376A6B5239E Time : 2305.0 sec.


Now, you see the test times of 1800 secs. and 2305 secs.? That's exactly what we need.

I'll take it one step further. The above is ALL that we need in the prpclient.log file.

We don't need info. about that a candidate was sent, that one was received. That should be optional info. and it should be shown in another file.

In other words, we need specific info. about the test in an "official" results file. All of the other info. about sending and receiving and other server messages/errors, IMHO, should be in another file.

Would it make sense for you to look at and run an LLRnet server? Since that is what you're wanting to accomplish, that is having PRPnet replace LLRnet, then I think that is what should be done. If you look at one, I think you'll see why the transition has been so extremely difficult for us. The file names in LLRnet are clear and consice but I've never gotten a warm fuzzy about what the PRPnet server file names mean. But what LLRnet is missing is the flexibility, a newer version of LLR, and the detailed server messages that we need. That is why we are excited about PRPnet. We'd really rather have those server messages separate from the results.

That said, at this point, we're OK if you want to leave all of that info. in the prpclient.log file now. But we must have the testing time right there.

I hope this finally puts this issue to rest and clarifies exactly what we need.


Thank you,
Gary

Last fiddled with by gd_barnes on 2010-01-24 at 22:55
gd_barnes is offline   Reply With Quote
Old 2010-01-25, 00:15   #66
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

2×32×52×13 Posts
Default

I have not run an LLRNet server. Although the goal is for PRPNet to replace LLRNet, it doesn't mean that PRPNet has to replicate every feature of LLRNet.

You stated that you wanted test time. I added it, but only as a data point collected by the server. I made that clear in the release notes. Nobody told me that I misinterpreted the requirement. PRPNet has never recorded completed tests on the client side, so adding test time only made sense (to me) on the server. If the client recorded tests, then I would have added it there as well because it would have been an easy thing to infer from the requirement. What I'm saying is that the real requirement here is that the client has to record tests locally and those tests need to include the test time. This is not how you stated the requirement.

Anyways, I will add the test time to the client, but I need to know what pieces of information you want logged. Unlike LLRNet, PRPNet supports multiple helper programs. The log on the client will need to use a consolidated format. I also need to know if the client wants to record the PRP test alone or both the PRP and primality tests. For NPLB, everything is base 2, so LLR is already doing a primality test. Other projects, such as those by PrimeGrid and CRUS, are more than just base 2, so this is an important question to answer.
rogue is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
PSP goes prpnet ltd Prime Sierpinski Project 86 2012-06-06 02:30
Setting up PRPnet Mattyp101 Conjectures 'R Us 2 2011-02-07 13:53
PRPNet 4.0.1 Released Joe O Sierpinski/Riesel Base 5 1 2010-10-22 20:11
PRPNet 3.0.0 Released rogue Conjectures 'R Us 220 2010-10-12 20:48
PRPNet released! rogue Conjectures 'R Us 250 2009-12-27 21:29

All times are UTC. The time now is 11:12.

Sat Aug 8 11:12:54 UTC 2020 up 22 days, 6:59, 1 user, load averages: 1.41, 1.46, 1.40

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.