mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   No Prime Left Behind (https://www.mersenneforum.org/forumdisplay.php?f=82)
-   -   LLRnet supports LLR V3.8! (LLRnet2010 V0.73L) (https://www.mersenneforum.org/showthread.php?t=13165)

kar_bon 2010-08-18 21:22

OK, than stop it and convert the results.txt by hand.

Will post the needed steps here.

To do:

- edit your "lresults.txt": delete the date/time so you only have 5 result-lines like from LLR
- copy "workfile.txt" to "workfile.bak" -> it's needed
- call in the folder in the DOS-box: "gawk.exe -f do_tosend.awk lresults.txt"

The file "tosend.txt" will be created with 5 lines: <header> <k> <n> -2 <residue>

Call "llrnet.exe": the results will be send to the server and 5 new will reserved.

Now delete only z*****, lresults.txt, workfile.bak. The new pairs are in workfile.txt.

Start the script by calling "do".... all should work fine now.

kar_bon 2010-08-18 23:47

Solved this your problem?

kar_bon 2010-08-19 19:58

Without any response from Joe O, it seems again, it was the daughter of Mister Use again here -> Miss Use (or more an accidental misuse).

It's recommended for starting a new LLRnet2010 client to

- clear an old folder by calling "_new.bat"
- make a new folder with the files in the zip from post #1

So this issue could be avoided by doing so.

Another option: To be sure this won't happen when users forget to clear the folder, the changing in the do-script from post #128 should be included in the next version 0.74 here!

---------------

While some users are missing the GUI in the WIN-version, I've come up with this extension:

[code]
:do_cllr
if not exist workfile.txt goto error_nowork
[color=red]for /F "delims=" %%i in ('tail -n 1 workfile.txt') do set title="%%i testing..."
title %title%[/color]
cllr.exe -d workfile.txt
[/code]

This will give the DOS-box a special title like "xxx yyyyyy testing...", which means the pair xxx*<any base>^yyyyyy -1 is just testing."

But: This only is true, if the WUCacheSize is 1, otherwise the last entry is displayed only.
(Note: This feature have to use the command 'tail' from UnxUtils!)

So if say 4 clients are running on a quad and all are minimized, you can see the current test in the title of the DOS-box!

This is the most easiest way to get this information displayed here.
For WUCacheSize <> 1 there have to be more coding!

Joe O 2010-08-19 20:34

[QUOTE=kar_bon;226085]Solved this your problem?[/QUOTE]

[QUOTE=kar_bon;226230]Without any response from Joe O, it seems again, it was the daughter of Mister Use again here -> Miss Use (or more an accidental misuse).[/QUOTE]

You might allow for time zone differences and for real life. I'm in the process of consolidating the 6 results.txt to one machine so that I can fix this more easily. I spotted your post because I was coming back for your instructions from the other post.

This has already cost me 20+ man hours and 240+ cpu hours between the two different times that it has not worked. I'm not sure that I am going to give it a third try. I also just had another machine spontaneously stop processing which is not included in those figures. Before you loosely start accusing the machine of having problems, I should tell you that it has run LLR flawlessly before, for many more hours than it just ran your script. I will, of course, thoroughly check out the machine before starting another run. It will be PRPnet though, and not LLRnet.

What is the llrnet.exe program that you are distributing? It shows as modified ‎6/‎22/‎2010 ‏‎12:37 PM Is the source available? Where?

If you are supplying scripts or programs for people to use, they should be robust enough to survive power outages and casual users. I realize that the first attempt will not always be perfect, but you should not treat "bug reports" as personal attacks. I would normally offer this advice in a PM or email, but you have chosen the venue with your post.

kar_bon 2010-08-19 20:49

To answer here:

I've run this script on several computers of mine over about 4 months now. I'm stopping and restarting the script every day on 12 cores without any issue now!
I've never encountered any problem like a not-deleted intermediate file or *.bak file remaining.

There's also a ReadMe.txt included with LLRnetV2010 V0.73 which tells you, I've patched the cLLR.exe.
The date/time of all files (also the *.exe) I've changed to correspond with the version to be sure there're not mixed up.

This script was in the first run 11 lines in code!
All those errrors that can caused by a user are not worth to prevent in such DOS-script!
Pressing CTRL-C should be handled and tested before a program will stop, but in a script it's the only way to break the running.
Power outrages or overclocking is not a normal way to use such script and it's not the job of it either!

gd_barnes 2010-08-19 20:50

Joe,

Wouldn't it be easier to just start over? That seems like a lot of time to spend trying to make it work with old workfile.txt, z***, lresults.txt lresults_hist.txt files in your folders.

My suggestion: Delete everything you have, download everything new, and start from scratch. Personal time is much more valuable than CPU time and you're still losing much more of the latter trying to make the program work from in the middle of where it went down.

2nd suggestion: Only cache 1 or 2 pairs at a time. That way, you won't have too much lost work if you have to start over. There's no reason to cache 5 pairs at a time unless your internet connection is spotty.

We have many casual users who use it with no problem at all. I've had many lost internet connections/power outages while running the script with little problem since the 2nd version of it.

Criticizing someone who spent a lot of time trying to help you is not going to help get the problem fixed. Perhaps PRPnet would be less problematic for you to run.


Gary

Joe O 2010-08-19 22:06

[QUOTE=gd_barnes;226261]Joe,

Wouldn't it be easier to just start over? That seems like a lot of time to spend trying to make it work with old workfile.txt, z***, lresults.txt lresults_hist.txt files in your folders.

My suggestion: Delete everything you have, download everything new, and start from scratch. Personal time is much more valuable than CPU time and you're still losing much more of the latter trying to make the program work from in the middle of where it went down.

2nd suggestion: Only cache 1 or 2 pairs at a time. That way, you won't have too much lost work if you have to start over. There's no reason to cache 5 pairs at a time unless your internet connection is spotty.

We have many casual users who use it with no problem at all. I've had many lost internet connections/power outages while running the script with little problem since the 2nd version of it.

Criticizing someone who spent a lot of time trying to help you is not going to help get the problem fixed. Perhaps PRPnet would be less problematic for you to run.


Gary[/QUOTE]

Gary,
I did start from scratch with this test. I took a working directory and copied it, then cleared out z*, lresults* workfile.* etc. Then I copied it to 8 different machines and ran do.bat On some machines it started and ran, on some machines it started and stopped. On those it stopped I started it again and on some of those it worked and on some it looped. I used the 5 units because that is the way it is distrubuted and I am sure that users will just download it and run. I was evaluating this for three different projects, and had planned to offer customized downloads but knew that some people would go to the original source. I also tried fresh starts caching 1 and 2 units. The 1 unit test finished the unit, converted the results and then stopped. The 2 unit test looped.
I didn't criticize Kar_bon, I criticized his response to me.
This is not the first time I have tried to use this program. The first time it looped on 2 out of 3 machines spontaneosly. I credited that to Murphy and tried again with four instances on a quad. Only one instance ran smoothly. This time I was responding to a request for help. I just happened to be offered 8 machines differing only in serial # , consecutive serial numbers at that. My original intent was to use PRPnet, but was prevailed upon to use LLRnet to drain a large LLRnet backlog.

kar_bon 2010-08-19 22:16

So where is the difference on your clients and many others running without any problem?

Which OS you're using? Do you overclocked? What speed you running?

I need more info to determine the errors.

The clients with one pair: After stopping, please send me a whole! folder, *exe, too!

And again: Install -> copy the original zip-files in a new folder an run again, no old files in there.

You can send me alls lresults.txt and I can convert them for you, running in one new folder and it should work then!

Joe O 2010-08-20 01:12

[QUOTE=kar_bon;226291]So where is the difference on your clients and many others running without any problem?

Which OS you're using? Do you overclocked? What speed you running?

I need more info to determine the errors.

The clients with one pair: After stopping, please send me a whole! folder, *exe, too!

And again: Install -> copy the original zip-files in a new folder an run again, no old files in there.

You can send me alls lresults.txt and I can convert them for you, running in one new folder and it should work then![/QUOTE]

If I knew what the difference was I would tell you. If it gets past the first refresh it usually works for a long time. There are some exceptions to this where it runs a day or two and then loops. Maybe something about how it sends in results and gets new tests. The other thing is, many people are running Linux. I've yet to hear of a problem with Linux.
Windows XP home with all maintenance applied. No overclock 1.6GHz Celeron.
I'll email the whole one unit folder tomorrow. I've since been able to start two machines with only 1 unit queued.
I had already combined the 6 results into one file so I just ran it. It worked. how do you normally strip the date from the file?

kar_bon 2010-08-20 05:34

[QUOTE=Joe O;226314]Windows XP home with all maintenance applied. No overclock 1.6GHz Celeron.
I'll email the whole one unit folder tomorrow. I've since been able to start two machines with only 1 unit queued.
I had already combined the 6 results into one file so I just ran it. It worked. how do you normally strip the date from the file?[/QUOTE]

Ok.

The conversion of the lresults.txt (ouput of cLLR) to tosend.txt (the LLRnet-format) is done by calling the awk-script.
Perhaps I should make a small batch-call to do this manually if it helps.

Flatlander 2010-09-22 19:30

I'm getting occasional drops in my internet connection so I've seen the odd 'waiting 60 seconds' message, but I've just received errors that are strange:
[CODE]2053*2^257487-1 is not prime. LLR Res64: 998E5CEEB2E97EF3 Time : 108.885 sec.
[2010-09-22 19:55:00]
unknown host www.noprimeleftbehind.net
unknown host www.noprimeleftbehind.net
[2010-09-22 19:55:24]
unknown host www.noprimeleftbehind.net
No pairs are available at this time.
Either the server has dried out or there is a connecting problem.
The file 'tosend.txt' contains unsent pairs!
Starting this script will submit those pairs to the server.
Sleeping 60 seconds before trying again.
[2010-09-22 19:56:38]
unknown host www.noprimeleftbehind.net
[2010-09-22 19:58:43]
[2010-09-22 19:58:46]
net_Recv : bad header 'OKR '
The server refused your new result :
either someone else computed it already,
either the server is now configured to
work on other numbers.
client_server.lua:279: attempt to call a nil value
The server refused your new result :
either someone else computed it already,
either the server is now configured to
work on other numbers.
lua_thread error : 'client_server.lua:279: attempt to call a nil value'
No pairs are available at this time.
Either the server has dried out or there is a connecting problem.
The file 'tosend.txt' contains unsent pairs!
Starting this script will submit those pairs to the server.
Sleeping 60 seconds before trying again.
[2010-09-22 20:00:43]
recv error res=-1, errno=0
The server refused your new result :
either someone else computed it already,
either the server is now configured to
work on other numbers.
client_server.lua:279: attempt to call a nil value
No pairs are available at this time.
Either the server has dried out or there is a connecting problem.
The file 'tosend.txt' contains unsent pairs!
Starting this script will submit those pairs to the server.
Sleeping 60 seconds before trying again.
[2010-09-22 20:09:01]
recv error res=-1, errno=0
Could not log on to the server
Fetching WU #1/10: 2053 266019
Fetching WU #2/10: 2053 266067
Fetching WU #3/10: 2053 266155
Fetching WU #4/10: 2053 266215
Fetching WU #5/10: 2053 266307
Fetching WU #6/10: 2053 266575
Fetching WU #7/10: 2053 266731
Fetching WU #8/10: 2053 266839
Fetching WU #9/10: 2053 266911
Fetching WU #10/10: 2053 267127
[2010-09-22 20:11:19]
[2010-09-22 20:11:20]
net_Recv : bad header 'OKLL'
The server refused your new result :
either someone else computed it already,
either the server is now configured to
work on other numbers.
client_server.lua:279: attempt to call a nil value
The server refused your new result :
either someone else computed it already,
either the server is now configured to
work on other numbers.
lua_thread error : 'client_server.lua:279: attempt to call a nil value'
No pairs are available at this time.
Either the server has dried out or there is a connecting problem.
The file 'tosend.txt' contains unsent pairs!
Starting this script will submit those pairs to the server.
Sleeping 60 seconds before trying again.
[/CODE]
I only had these pairs reserved for 20 mins so they shouldn't have timed out. And the 'bad header' messages are a concern.
It's now working. (Apparently.)


All times are UTC. The time now is 07:33.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.