mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   No Prime Left Behind (https://www.mersenneforum.org/forumdisplay.php?f=82)
-   -   LLRnet supports LLR V3.8! (LLRnet2010 V0.73L) (https://www.mersenneforum.org/showthread.php?t=13165)

Flatlander 2010-06-10 14:12

Windows XP Home SP3 32bit. C2Q 6700 not overclocked.

This has happened once or twice before but I assumed I was doing something wrong. Is it significant that both errors were caused when the client was processing the last line of workfile.txt? (Though there were only two lines.)

Should I try increasing the WUCacheSize to see if it always occurs on the last line? Then I can ctrl-c while it is on different lines to see what happens.

I will try the 'waits 1' fix later.

kar_bon 2010-06-10 17:00

[QUOTE=Flatlander;218027]This has happened once or twice before but I assumed I was doing something wrong. Is it significant that both errors were caused when the client was processing the last line of workfile.txt? (Though there were only two lines.)
[/quote]

No.

[QUOTE=Flatlander;218027]
Should I try increasing the WUCacheSize to see if it always occurs on the last line? Then I can ctrl-c while it is on different lines to see what happens.
[/QUOTE]

No, that doesn't matter.

If it occurs again, look in the folder, if no workfile.txt, rename workfile.bak and call 'do' again. That's all.

Flatlander 2010-06-10 17:10

Ok, thanks.

Joe O 2010-07-03 01:49

I've had two machines go into a loop. I caught one yesterday and stopped it.
Here is the output from that one
[CODE][2010-01-07 11:39:36] 138172*5^365107-1 is not prime. RES64: 84EC98CD13D333DE. OLD64: 08AC5CA6C9017801 Time : 1507.269 sec.
259072*5^365107-1 is not prime. RES64: C98B5B25CC6A083D. OLD64: 5CA21171653E18B4 Time : 1829.064 sec.
71084*5^365108-1 is not prime. RES64: 95ED3012F3473012. OLD64: C1C79038D9D59033 Time : 1585.047 sec.
162434*5^365108-1 is not prime. RES64: AC127AAFB4C2A4C6. OLD64: 16800DAEE3AEF06E Time : 1888.927 sec.
[2010-01-07 13:33:36] 138172*5^365107-1 is not prime. RES64: 84EC98CD13D333DE. OLD64: 08AC5CA6C9017801 Time : 1522.320 sec.

[2010-01-07 12:05:22] 131848*5^365109-1 is not prime. RES64: 010950066C988E98. OLD64: 481E2801E8E32677 Time : 1523.812 sec.
183916*5^365109-1 is not prime. RES64: 33BDC0C420337729. OLD64: 5F18B910197DBE1D Time : 1896.132 sec.
296024*5^365110-1 is not prime. RES64: E2E01E0EB116ECFD. OLD64: 41F71F534AFD39DD Time : 1870.961 sec.
[2010-01-07 13:37:31]
52922*5^365112-1 is not prime. RES64: FF38EF9E48FCC73A. OLD64: 3C127E71631EC5B9 Time : 1585.423 sec.
92936*5^365112-1 is not prime. RES64: 81EB74E44982B40D. OLD64: E187B9AA334C281D Time : 1581.199 sec.
[2010-01-07 13:28:28]

30994*5^365109-1 is not prime. RES64: FF21EA84994A085D. OLD64: FD65BF8DCBDE1914 Time : 1609.796 sec.
[2010-01-07 13:13:28]
[/CODE]
and the console output
[CODE]
[2010-01-07 11:39:36] 138172*5^365107-1 is not prime. RES64: 84EC98CD13D333DE. OLD64: 08AC5CA6C9017801 Time : 1507.269 sec.
259072*5^365107-1 is not prime. RES64: C98B5B25CC6A083D. OLD64: 5CA21171653E18B4 Time : 1829.064 sec.
71084*5^365108-1 is not prime. RES64: 95ED3012F3473012. OLD64: C1C79038D9D59033 Time : 1585.047 sec.
162434*5^365108-1 is not prime. RES64: AC127AAFB4C2A4C6. OLD64: 16800DAEE3AEF06E Time : 1888.927 sec.
[2010-01-07 13:33:36] 138172*5^365107-1 is not prime. RES64: 84EC98CD13D333DE. OLD64: 08AC5CA6C9017801 Time : 1522.320 sec.
+-------------------------------------+
| LLRnet client V0.9b7 with cLLR V3.8 |
| K.Bonath, 2010-05-15, Version 0.73 |
+-------------------------------------+
Current configuration:
server = "www.sr5.psp-project.de"
port = 12925
username = "Joe_O"
WUCacheSize = 0
Base prime factor(s) taken : 5
Resuming N+1 prime test of 71084*5^365108-1 at bit 37786 [4.45%]
Using FFT length 80K, a = 3

71084*5^365108-1 is not prime. RES64: 95ED3012F3473012. OLD64: C1C79038D9D5903
3 Time : 1585.047 sec.
Base prime factor(s) taken : 5
Starting N+1 prime test of 162434*5^365108-1
Using zero-padded FFT length 96K, a = 3

162434*5^365108-1 is not prime. RES64: AC127AAFB4C2A4C6. OLD64: 16800DAEE3AEF0
6E Time : 1888.927 sec.
A duplicate file name exists, or the file
cannot be found.
Base prime factor(s) taken : 5
Starting N+1 prime test of 138172*5^365107-1
Using FFT length 80K, a = 3
138172*5^365107-1, bit: 200000 / 847770 [23.59%]. Time per bit: 1.763 ms.

[/CODE]

kar_bon 2010-07-03 07:16

You're using the script and so LLR for base 5.

So I hope you've changed on the server-side in 'llr-serverconfig.txt' the line
[code]
displayFormat = "%s*5^%s-1" -- use this for LLR type test (default)
[/code]

Another thing: your console output says WUCacheSize = 0! Why? Should be 1 or greater.

Please send me the whole client folder with all files after that error.

PS:
Apparently your WUCacheSize is 4 and you stopped the script and restartet it again.
After LLR tested all 4 candidates, the script could not find a file or already exists.

Questions:
Do you used this script before for base 5 and it worked fine?
Do you stopped and continued the script before?

PPS:
I've done one pair and another pair after stopping and continuing the script, third pair cancelled: No problems on my side.
pairs: 146264*5^365234-1, 48394*5^365235-1, Cancelled pair: 100186 365235

Joe O 2010-07-04 00:55

No, I haven't changed the display format and the other machines seem to have worked correctly. In fact 3 other directories on that quad worked correctly. I changed WUCacheSize to 0 to drain that machine. As I said, the other 3 directories on that machine worked correctly. I no longer have those directories, but I do have the directory on a second machine that failed on its own. i.e. I wasn't trying to drain it, it just looped. I've temporarily stopped it on a third machine that is mostly unattended, but still have it working on a fourth machine. I'll check that later and see what is happpening. It was working correctly last night. I've stopped and started multiple clients on multiple machines before without this problem. All the same client, and all Windows XP. Mostly Pro but some Home. All with up to date maintenance.
Where do you want the directory sent? PM me or email me at factrange at yahoo dot com with contact information.

kar_bon 2010-07-04 12:48

I've checked the folder you sent.

Problem:
There are two saved-files from LLR in the folder.
On top the script looks if there exist a LLR-savefile (named like z1786704).
If so, jump to complete the tests from workfile.txt with cllr.
cllr will delete the savefile if the test is done.
But: because of another savefile, which never will removed the script thinks there're mor tests to do. So again and again the script will do the same candidates.

I don't know exactly how a second savefile was created.
This could happend: You stopped the script, changed the llr-clientconfig.txt doing other work. After those 5 new candidate were done, the old savefile was never deleted.

So before doing new work or after changing server/port in the llr-clientconfig.txt, execute the '_new.bat' script: this will delete all files created during a test. Now the folder is clean for a new amount of work. Be sure to backup 'lresults_hist.txt' if you need it.

I've inserted 2 new lines in the 'do.bat' to avoid such issue:
[code]
:no_prime
[color=red]if exist workfile.bak del workfile.bak[/color]
[color=red]if exist z* del z*[/color]
ren workfile.txt workfile.bak
goto start
[/code]

These will be added in the next version of LLRnet.

Note:
I thought about a new name for this LLRnet-client/server and would like to name it

[b]LLRnet2010 V0.73L[/b]

to make a difference to the V0.9b7 from J.Penné in 2005.
The 'L' stands for LLR and uses the latest version of it.

After Jean gave his new V3.8.1 free with some more options, I think I will include this new version with small changes for a V0.74L then.

kar_bon 2010-08-17 23:22

I've updated the batches with one for starting all clients at once and each DOS-box gets his own name.

See the first post (the download-zip and the ReadMe.txt is changed, too).

Joe O 2010-08-18 15:49

I've sent you a zip of a looping client. I set up 8 exact copies of a working client on 8 identical machines and only 2 did not loop after completing their initial 5 tests

kar_bon 2010-08-18 18:18

In your folder is the file "workfile.bak" but this shouldn't be there.

Do you stopped and started again the script and if so, how?

There's no "lresults_hist.txt" at all so this is a hint, the script never escaped the cLLR-looping:

At start it looks if a intermediate cLLR-file exist (like z******) or the "workfile.txt".
If so, the script jumps to the cLLR testing.
After the 5 pairs are done it looks for found primes -> none so it jumps to ":no_prime".

Here "workfile.txt" will be renamed into "workfile.bak" and goes to start again.

Now there should no "workfile.txt" and z****** and the "lresults.txt" will converted into the LLRnet-format, send to the server, receive new pairs and cLLR again.

But: The "workfile.txt" still exist and the renaming won't work here!

So to continue correct:
- delete "workfile.bak", "lresults.txt" and "z*****"
- run the script by calling "do" again

-> Works correct now!

I don't know how this happend, but it's sure you've stopped and started the script or used a folder with old files in it.
To delete all old files and make sure to have a clean folder, run "_new.bat": this will delete all old files generated from a previous run of the batch!

Joe O 2010-08-18 21:19

If I delete lresults.txt I lose all the work. That is unacceptable.

The client stopped on it's own immediately after starting up. But to say that I can't ctrl-c is also unacceptable. I do it all the time on the machines that have no problem.

When I deleted workfile and restarted it got the work and stopped. It left tosend.txt containing only 4 of the 5 results with the other result garbage.
[CODE] 1000000000:P:0:5:258 [2010 -2 B8CEFD9C23AC1268
[/CODE]
I've restarted it and am waiting to see what happens.


All times are UTC. The time now is 05:35.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.