mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > No Prime Left Behind

Reply
 
Thread Tools
Old 2010-03-29, 17:01   #34
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

426710 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Actually, that might be a tad premature. Tim, I see from your attachment that you don't have cllr.exe in your directory. That's needed for do.pl to work on Windows, and I just confirmed that it is included in the client package; did you accidentally delete it by chance? You might want to try again after putting it back.
Sorry, not that simple. Like I said, I removed the .exe's from the folder before attaching it, and it works sometimes, which it wouldn't if cllr.exe weren't there. cllr.exe is there. See:
Quote:
Originally Posted by Mini-Geek View Post
I'm getting this a lot of the time when I try to run the LLRnet script: (it also will do it after it's ran properly for some time, but does it most of the time)
....
...I'm attaching all the non-exe files from the folder...
I've just started using do.bat, which seems to work so far. Edit: Hm, not so fast. I had finished 2 of 3 numbers, used do -c to report/cancel them, and it didn't seem to have reported the first two. From what it outputted, it seems to have canceled all 3 without returning the results. Here's those results:
Code:
2221*2^548899-1 is not prime.  LLR Res64: D2B93491410C1AE1  Time : 363.222 sec.
2401*2^548899-1 is not prime.  LLR Res64: 132E13C16414CEBF  Time : 365.859 sec.
Can you check if those numbers were reported as complete in the DB? I can't find any indication on the pages that they were. I don't care too much about the credit for two numbers this size, and they're probably already assigned elsewhere, so I guess we'll just have a spot double check...
If it was indeed returned, the output should really be changed to reassure you that they were returned and not canceled (like the Perl version, IIRC).

Whenever you have an idea for me to check something to troubleshoot either of these things, you can tell me (here or in PM) and I'll try it. I'd like to get this worked out.

Last fiddled with by Mini-Geek on 2010-03-29 at 17:40
Mini-Geek is online now   Reply With Quote
Old 2010-03-29, 18:02   #35
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by Mini-Geek View Post
Sorry, not that simple. Like I said, I removed the .exe's from the folder before attaching it, and it works sometimes, which it wouldn't if cllr.exe weren't there. cllr.exe is there. See:
Ah, whoops, missed that.
Quote:
I've just started using do.bat, which seems to work so far. Edit: Hm, not so fast. I had finished 2 of 3 numbers, used do -c to report/cancel them, and it didn't seem to have reported the first two. From what it outputted, it seems to have canceled all 3 without returning the results. Here's those results:
Code:
2221*2^548899-1 is not prime.  LLR Res64: D2B93491410C1AE1  Time : 363.222 sec.
2401*2^548899-1 is not prime.  LLR Res64: 132E13C16414CEBF  Time : 365.859 sec.
Can you check if those numbers were reported as complete in the DB? I can't find any indication on the pages that they were. I don't care too much about the credit for two numbers this size, and they're probably already assigned elsewhere, so I guess we'll just have a spot double check...
If it was indeed returned, the output should really be changed to reassure you that they were returned and not canceled (like the Perl version, IIRC).

Whenever you have an idea for me to check something to troubleshoot either of these things, you can tell me (here or in PM) and I'll try it. I'd like to get this worked out.
It seems both of those results were canceled; I have them listed in port 6000's results.txt as completed by Gary, so they must have been canceled and reassigned.

BTW, even if they weren't successfully canceled or submitted, the server would eventually reassign them in 2 days; so they wouldn't be "missed" per se, i.e. no need for a later spot doublecheck of them.
mdettweiler is offline   Reply With Quote
Old 2010-03-29, 18:04   #36
em99010pepe
 
em99010pepe's Avatar
 
Sep 2004

2×5×283 Posts
Default

The DOS script it will go down if:

a) you get "ERROR: SUM(INPUTS) != SUM(OUTPUTS)".....it is a cllr.exe issue.
b) your internet connection goes down while you upload the results.

Carlos

Last fiddled with by em99010pepe on 2010-03-29 at 18:05
em99010pepe is offline   Reply With Quote
Old 2010-03-29, 18:14   #37
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

102538 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
BTW, even if they weren't successfully canceled or submitted, the server would eventually reassign them in 2 days; so they wouldn't be "missed" per se, i.e. no need for a later spot doublecheck of them.
I meant that my results plus the results from the reassignment (Gary's results) would make a spot doublecheck, which it already has (though I don't know the result of it; do the residues match?).

Last fiddled with by Mini-Geek on 2010-03-29 at 18:14
Mini-Geek is online now   Reply With Quote
Old 2010-03-29, 18:25   #38
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Quote:
Originally Posted by Mini-Geek View Post
I meant that my results plus the results from the reassignment (Gary's results) would make a spot doublecheck, which it already has (though I don't know the result of it; do the residues match?).
Ah, whoops, I see what you mean now. Yes, the residues do match:
Code:
user=gd_barnes
[2010-03-29 12:43:25]
2221*2^548899-1 is not prime.  Res64: D2B93491410C1AE1  Time : 849.0 sec.
user=gd_barnes
[2010-03-29 12:43:25]
2401*2^548899-1 is not prime.  Res64: 132E13C16414CEBF  Time : 849.0 sec.
BTW, you don't have to have direct access to the server to see these; each server's results.txt file is updated at http://www.noprimeleftbehind.net/llrnet/ every 15 minutes. In this case I got it from here.
mdettweiler is offline   Reply With Quote
Old 2010-03-29, 19:56   #39
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

2×3×19×89 Posts
Default

Karsten,

On the DOS script:

Tim is saying that do -c does not work with the Windows DOS client.

Carlos has 2 problems.

Can you check into those please?

I just also recently noticed that the Linux do.pl script will not return completed results to the server if the server goes down during OR BEFORE the time in which they are completed. The key is "or before" there and only applies if the server is STILL down when the batch completes. It "attempts" to send them, assumes they've been sent, deletes tosend.txt, and waits for the server to come back up to get new pairs. It never seems to know that the previous results had not been sent. I see the problem and will work on fixing it today. This problem seems to be the same or similar to the one that Carlos is experiencing on the DOS script.

Fortunately it seems that these 3 problems are situations not related to load that we did quite a bit of alpha testing on but to exception situations that we either did not think to alpha test or did not test enough. Karsten, any thoughts on how the problem with do -c got missed? I did extensive testing on do.pl -c on the Linux side and it definitely works. It shows the # of pairs returned to the server and the # of paris cancelled.

I would suggest documenting the fixes in README or wherever and increasing the version # after this. Is README where we are showing fixes and new versions?

Max, I haven't responded to your Email suggestion for an upcoming rally yet because I didn't feel like we have properly beta tested everything yet. I want this thread to go "dry" with problems for a week before we have a rally.


Gary

Last fiddled with by gd_barnes on 2010-03-29 at 20:31
gd_barnes is offline   Reply With Quote
Old 2010-03-29, 19:58   #40
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

2×3×19×89 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Ah, whoops, I see what you mean now. Yes, the residues do match:
Code:
user=gd_barnes
[2010-03-29 12:43:25]
2221*2^548899-1 is not prime.  Res64: D2B93491410C1AE1  Time : 849.0 sec.
user=gd_barnes
[2010-03-29 12:43:25]
2401*2^548899-1 is not prime.  Res64: 132E13C16414CEBF  Time : 849.0 sec.
BTW, you don't have to have direct access to the server to see these; each server's results.txt file is updated at http://www.noprimeleftbehind.net/llrnet/ every 15 minutes. In this case I got it from here.
I didn't feel you responded to what he implied. I think he was hoping that these would go in the DB as a doublecheck. Unfortunately...not possible: Since it's the same server, these results were rejected by the server. I confirmed as much.

Sorry about the problems Tim.

Last fiddled with by gd_barnes on 2010-03-29 at 20:26
gd_barnes is offline   Reply With Quote
Old 2010-03-29, 20:05   #41
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

2·3·19·89 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Actually, that might be a tad premature. Tim, I see from your attachment that you don't have cllr.exe in your directory. That's needed for do.pl to work on Windows, and I just confirmed that it is included in the client package; did you accidentally delete it by chance? You might want to try again after putting it back.
Premature? How's that? Both Tim and Carlos (in the base 5 thread) have had problems with the Windows do.pl script. You can't be publicly posting something that hasn't been tested in the environment for which it was intended. That was a blunder on our part.

We should just stick with the DOS do script for Windows and do.pl script for Linux only. There is less testing that way. With version 7.1 of the do.pl Linux script, I will tweak the do.pl README to remove the part that says it "should" work with Windows. It "should" work if we had tested it, which we didn't.


Gary

Last fiddled with by gd_barnes on 2010-03-29 at 20:05
gd_barnes is offline   Reply With Quote
Old 2010-03-29, 20:44   #42
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

2·3·19·89 Posts
Default

I'd like to get a problem log working here so that nothing gets missed.

Everyone, please chime in if I am missing or misstating any known problem here.

Problems in the clients that need to be fixed:

1. Windows DOS, Tim says do -c does not return completed results to the server. It cancels all pairs instead.
Resolution: This is a non-issue. It was an error in the testing environment.

2. Windows DOS, Carlos says the script will go down while returning completed results if your internet connection drops while doing so.
Same issue as #6? Is the script going down the same as the pairs not being returned?

3. Windows DOS, Carlos is getting the ERROR: SUM issue in CLLR. Is that something that we should be able to fix? If not, we'll just list it as a known "feature" in the documentation.

4. Windows do.pl, various problems due to lack of testing of exception situations. I've removed the link in the 1st post here and suggest that we not attempt to maintain it.
Resolution: Maintenance of the Windows do.pl client is not being done.

5. Linux do.pl, if the server goes down during or before results are completed, the clients will: complete the tests, create the tosend.txt file, attempt to send the tosend.txt file, assume that it has been correctly sent and delete it, and wait for the server to come back up to get new pairs to test. It needs to avoid deletion of the tosend.txt file if the server is down or the internet connection is lost. That way, it will send them when the server comes back up.
Resolution: Solved by Gary in Version 0.71. The tosend.txt is not deleted until there is confirmation that the pairs are successfully sent.

6. Windows DOS, same issue as Linux do.pl #5.
Resolution: Solved by Karsten in Version 0.72: The tosend.txt is not deleted anymore and a note on screen is displayed.

Karsten:
I will test #2 with a batch of more pairs done at once and try to disconnect the server while client is sending those results!
For #3 as i mentioned: I need more info to handle this.

Thanks Carlos and Tim for testing and posting known issues.

Gary

Last fiddled with by gd_barnes on 2010-03-31 at 06:27 Reason: resolution updates
gd_barnes is offline   Reply With Quote
Old 2010-03-29, 20:56   #43
em99010pepe
 
em99010pepe's Avatar
 
Sep 2004

1011000011102 Posts
Default

3.

It is due to the overclocking but the client, as happens with the LLR GUI version, should keep testing from last save point. Anyway, this breaks the scripts.
em99010pepe is offline   Reply With Quote
Old 2010-03-29, 20:57   #44
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
Premature? How's that? Both Tim and Carlos (in the base 5 thread) have had problems with the Windows do.pl script. You can't be publicly posting something that hasn't been tested in the environment for which it was intended. That was a blunder on our part.

We should just stick with the DOS do script for Windows and do.pl script for Linux only. There is less testing that way. With version 7.1 of the do.pl Linux script, I will tweak the do.pl README to remove the part that says it "should" work with Windows. It "should" work if we had tested it, which we didn't.
I did quite a bit of testing with do.pl on my own Windows setup; I had it run for a number of days straight on a "production" server and it worked great. I'm not sure what could have gone wrong here. You mentioned a couple posts up that you see where the problem is; could you point me to it?

Almost all of do.pl should be OS-independent. I wonder if Tim's problem with it on Windows is related to the dropped-connection issue: possibly his connection cut out somewhere along the way on the times where he got "could not find lresults.txt" errors? It might be another manifestation of the same problem.
mdettweiler is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
LLRNET ValerieVonck Software 12 2010-03-15 18:09
llrnet 64 bit balachmar Prime Sierpinski Project 4 2008-07-19 08:21
LLRNet em99010pepe Riesel Prime Search 20 2007-09-11 21:03
Bush Supports $120 Billion Iraq War Compromise ewmayer Soap Box 23 2007-05-27 12:37
LLRnet over proxy? Bananeweizen Sierpinski/Riesel Base 5 4 2006-10-14 07:51

All times are UTC. The time now is 13:20.

Sat Jul 4 13:20:13 UTC 2020 up 101 days, 10:53, 2 users, load averages: 1.83, 1.85, 1.76

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.