mersenneforum.org LLRnet supports LLR V3.8! (LLRnet2010 V0.73L)
 Register FAQ Search Today's Posts Mark Forums Read

2010-03-29, 17:01   #34
Mini-Geek
Account Deleted

"Tim Sorbera"
Aug 2006
San Antonio, TX USA

17·251 Posts

Quote:
 Originally Posted by mdettweiler Actually, that might be a tad premature. Tim, I see from your attachment that you don't have cllr.exe in your directory. That's needed for do.pl to work on Windows, and I just confirmed that it is included in the client package; did you accidentally delete it by chance? You might want to try again after putting it back.
Sorry, not that simple. Like I said, I removed the .exe's from the folder before attaching it, and it works sometimes, which it wouldn't if cllr.exe weren't there. cllr.exe is there. See:
Quote:
 Originally Posted by Mini-Geek I'm getting this a lot of the time when I try to run the LLRnet script: (it also will do it after it's ran properly for some time, but does it most of the time) .... ...I'm attaching all the non-exe files from the folder...
I've just started using do.bat, which seems to work so far. Edit: Hm, not so fast. I had finished 2 of 3 numbers, used do -c to report/cancel them, and it didn't seem to have reported the first two. From what it outputted, it seems to have canceled all 3 without returning the results. Here's those results:
Code:
2221*2^548899-1 is not prime.  LLR Res64: D2B93491410C1AE1  Time : 363.222 sec.
2401*2^548899-1 is not prime.  LLR Res64: 132E13C16414CEBF  Time : 365.859 sec.
Can you check if those numbers were reported as complete in the DB? I can't find any indication on the pages that they were. I don't care too much about the credit for two numbers this size, and they're probably already assigned elsewhere, so I guess we'll just have a spot double check...
If it was indeed returned, the output should really be changed to reassure you that they were returned and not canceled (like the Perl version, IIRC).

Whenever you have an idea for me to check something to troubleshoot either of these things, you can tell me (here or in PM) and I'll try it. I'd like to get this worked out.

Last fiddled with by Mini-Geek on 2010-03-29 at 17:40

2010-03-29, 18:02   #35
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

3·2,083 Posts

Quote:
 Originally Posted by Mini-Geek Sorry, not that simple. Like I said, I removed the .exe's from the folder before attaching it, and it works sometimes, which it wouldn't if cllr.exe weren't there. cllr.exe is there. See:
Ah, whoops, missed that.
Quote:
 I've just started using do.bat, which seems to work so far. Edit: Hm, not so fast. I had finished 2 of 3 numbers, used do -c to report/cancel them, and it didn't seem to have reported the first two. From what it outputted, it seems to have canceled all 3 without returning the results. Here's those results: Code: 2221*2^548899-1 is not prime. LLR Res64: D2B93491410C1AE1 Time : 363.222 sec. 2401*2^548899-1 is not prime. LLR Res64: 132E13C16414CEBF Time : 365.859 sec. Can you check if those numbers were reported as complete in the DB? I can't find any indication on the pages that they were. I don't care too much about the credit for two numbers this size, and they're probably already assigned elsewhere, so I guess we'll just have a spot double check... If it was indeed returned, the output should really be changed to reassure you that they were returned and not canceled (like the Perl version, IIRC). Whenever you have an idea for me to check something to troubleshoot either of these things, you can tell me (here or in PM) and I'll try it. I'd like to get this worked out.
It seems both of those results were canceled; I have them listed in port 6000's results.txt as completed by Gary, so they must have been canceled and reassigned.

BTW, even if they weren't successfully canceled or submitted, the server would eventually reassign them in 2 days; so they wouldn't be "missed" per se, i.e. no need for a later spot doublecheck of them.

 2010-03-29, 18:04 #36 em99010pepe     Sep 2004 2×5×283 Posts The DOS script it will go down if: a) you get "ERROR: SUM(INPUTS) != SUM(OUTPUTS)".....it is a cllr.exe issue. b) your internet connection goes down while you upload the results. Carlos Last fiddled with by em99010pepe on 2010-03-29 at 18:05
2010-03-29, 18:14   #37
Mini-Geek
Account Deleted

"Tim Sorbera"
Aug 2006
San Antonio, TX USA

102538 Posts

Quote:
 Originally Posted by mdettweiler BTW, even if they weren't successfully canceled or submitted, the server would eventually reassign them in 2 days; so they wouldn't be "missed" per se, i.e. no need for a later spot doublecheck of them.
I meant that my results plus the results from the reassignment (Gary's results) would make a spot doublecheck, which it already has (though I don't know the result of it; do the residues match?).

Last fiddled with by Mini-Geek on 2010-03-29 at 18:14

2010-03-29, 18:25   #38
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

141518 Posts

Quote:
 Originally Posted by Mini-Geek I meant that my results plus the results from the reassignment (Gary's results) would make a spot doublecheck, which it already has (though I don't know the result of it; do the residues match?).
Ah, whoops, I see what you mean now. Yes, the residues do match:
Code:
user=gd_barnes
[2010-03-29 12:43:25]
2221*2^548899-1 is not prime.  Res64: D2B93491410C1AE1  Time : 849.0 sec.
user=gd_barnes
[2010-03-29 12:43:25]
2401*2^548899-1 is not prime.  Res64: 132E13C16414CEBF  Time : 849.0 sec.
BTW, you don't have to have direct access to the server to see these; each server's results.txt file is updated at http://www.noprimeleftbehind.net/llrnet/ every 15 minutes. In this case I got it from here.

 2010-03-29, 19:56 #39 gd_barnes     May 2007 Kansas; USA 13·19·41 Posts Karsten, On the DOS script: Tim is saying that do -c does not work with the Windows DOS client. Carlos has 2 problems. Can you check into those please? I just also recently noticed that the Linux do.pl script will not return completed results to the server if the server goes down during OR BEFORE the time in which they are completed. The key is "or before" there and only applies if the server is STILL down when the batch completes. It "attempts" to send them, assumes they've been sent, deletes tosend.txt, and waits for the server to come back up to get new pairs. It never seems to know that the previous results had not been sent. I see the problem and will work on fixing it today. This problem seems to be the same or similar to the one that Carlos is experiencing on the DOS script. Fortunately it seems that these 3 problems are situations not related to load that we did quite a bit of alpha testing on but to exception situations that we either did not think to alpha test or did not test enough. Karsten, any thoughts on how the problem with do -c got missed? I did extensive testing on do.pl -c on the Linux side and it definitely works. It shows the # of pairs returned to the server and the # of paris cancelled. I would suggest documenting the fixes in README or wherever and increasing the version # after this. Is README where we are showing fixes and new versions? Max, I haven't responded to your Email suggestion for an upcoming rally yet because I didn't feel like we have properly beta tested everything yet. I want this thread to go "dry" with problems for a week before we have a rally. Gary Last fiddled with by gd_barnes on 2010-03-29 at 20:31
2010-03-29, 19:58   #40
gd_barnes

May 2007
Kansas; USA

278F16 Posts

Quote:
 Originally Posted by mdettweiler Ah, whoops, I see what you mean now. Yes, the residues do match: Code: user=gd_barnes [2010-03-29 12:43:25] 2221*2^548899-1 is not prime. Res64: D2B93491410C1AE1 Time : 849.0 sec. user=gd_barnes [2010-03-29 12:43:25] 2401*2^548899-1 is not prime. Res64: 132E13C16414CEBF Time : 849.0 sec. BTW, you don't have to have direct access to the server to see these; each server's results.txt file is updated at http://www.noprimeleftbehind.net/llrnet/ every 15 minutes. In this case I got it from here.
I didn't feel you responded to what he implied. I think he was hoping that these would go in the DB as a doublecheck. Unfortunately...not possible: Since it's the same server, these results were rejected by the server. I confirmed as much.

Last fiddled with by gd_barnes on 2010-03-29 at 20:26

2010-03-29, 20:05   #41
gd_barnes

May 2007
Kansas; USA

13×19×41 Posts

Quote:
 Originally Posted by mdettweiler Actually, that might be a tad premature. Tim, I see from your attachment that you don't have cllr.exe in your directory. That's needed for do.pl to work on Windows, and I just confirmed that it is included in the client package; did you accidentally delete it by chance? You might want to try again after putting it back.
Premature? How's that? Both Tim and Carlos (in the base 5 thread) have had problems with the Windows do.pl script. You can't be publicly posting something that hasn't been tested in the environment for which it was intended. That was a blunder on our part.

We should just stick with the DOS do script for Windows and do.pl script for Linux only. There is less testing that way. With version 7.1 of the do.pl Linux script, I will tweak the do.pl README to remove the part that says it "should" work with Windows. It "should" work if we had tested it, which we didn't.

Gary

Last fiddled with by gd_barnes on 2010-03-29 at 20:05

 2010-03-29, 20:44 #42 gd_barnes     May 2007 Kansas; USA 13×19×41 Posts I'd like to get a problem log working here so that nothing gets missed. Everyone, please chime in if I am missing or misstating any known problem here. Problems in the clients that need to be fixed: 1. Windows DOS, Tim says do -c does not return completed results to the server. It cancels all pairs instead. Resolution: This is a non-issue. It was an error in the testing environment. 2. Windows DOS, Carlos says the script will go down while returning completed results if your internet connection drops while doing so. Same issue as #6? Is the script going down the same as the pairs not being returned? 3. Windows DOS, Carlos is getting the ERROR: SUM issue in CLLR. Is that something that we should be able to fix? If not, we'll just list it as a known "feature" in the documentation. 4. Windows do.pl, various problems due to lack of testing of exception situations. I've removed the link in the 1st post here and suggest that we not attempt to maintain it. Resolution: Maintenance of the Windows do.pl client is not being done. 5. Linux do.pl, if the server goes down during or before results are completed, the clients will: complete the tests, create the tosend.txt file, attempt to send the tosend.txt file, assume that it has been correctly sent and delete it, and wait for the server to come back up to get new pairs to test. It needs to avoid deletion of the tosend.txt file if the server is down or the internet connection is lost. That way, it will send them when the server comes back up. Resolution: Solved by Gary in Version 0.71. The tosend.txt is not deleted until there is confirmation that the pairs are successfully sent. 6. Windows DOS, same issue as Linux do.pl #5. Resolution: Solved by Karsten in Version 0.72: The tosend.txt is not deleted anymore and a note on screen is displayed. Karsten: I will test #2 with a batch of more pairs done at once and try to disconnect the server while client is sending those results! For #3 as i mentioned: I need more info to handle this. Thanks Carlos and Tim for testing and posting known issues. Gary Last fiddled with by gd_barnes on 2010-03-31 at 06:27 Reason: resolution updates
 2010-03-29, 20:56 #43 em99010pepe     Sep 2004 B0E16 Posts 3. It is due to the overclocking but the client, as happens with the LLR GUI version, should keep testing from last save point. Anyway, this breaks the scripts.
2010-03-29, 20:57   #44
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

3×2,083 Posts

Quote:
 Originally Posted by gd_barnes Premature? How's that? Both Tim and Carlos (in the base 5 thread) have had problems with the Windows do.pl script. You can't be publicly posting something that hasn't been tested in the environment for which it was intended. That was a blunder on our part. We should just stick with the DOS do script for Windows and do.pl script for Linux only. There is less testing that way. With version 7.1 of the do.pl Linux script, I will tweak the do.pl README to remove the part that says it "should" work with Windows. It "should" work if we had tested it, which we didn't.
I did quite a bit of testing with do.pl on my own Windows setup; I had it run for a number of days straight on a "production" server and it worked great. I'm not sure what could have gone wrong here. You mentioned a couple posts up that you see where the problem is; could you point me to it?

Almost all of do.pl should be OS-independent. I wonder if Tim's problem with it on Windows is related to the dropped-connection issue: possibly his connection cut out somewhere along the way on the times where he got "could not find lresults.txt" errors? It might be another manifestation of the same problem.

 Similar Threads Thread Thread Starter Forum Replies Last Post ValerieVonck Software 12 2010-03-15 18:09 balachmar Prime Sierpinski Project 4 2008-07-19 08:21 em99010pepe Riesel Prime Search 20 2007-09-11 21:03 ewmayer Soap Box 23 2007-05-27 12:37 Bananeweizen Sierpinski/Riesel Base 5 4 2006-10-14 07:51

All times are UTC. The time now is 03:29.

Sat May 30 03:29:11 UTC 2020 up 66 days, 1:02, 1 user, load averages: 2.19, 2.01, 1.74