mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > No Prime Left Behind

Reply
 
Thread Tools
Old 2010-06-10, 14:12   #122
Flatlander
I quite division it
 
Flatlander's Avatar
 
"Chris"
Feb 2005
England

31×67 Posts
Default

Windows XP Home SP3 32bit. C2Q 6700 not overclocked.

This has happened once or twice before but I assumed I was doing something wrong. Is it significant that both errors were caused when the client was processing the last line of workfile.txt? (Though there were only two lines.)

Should I try increasing the WUCacheSize to see if it always occurs on the last line? Then I can ctrl-c while it is on different lines to see what happens.

I will try the 'waits 1' fix later.

Last fiddled with by Flatlander on 2010-06-10 at 14:13
Flatlander is offline   Reply With Quote
Old 2010-06-10, 17:00   #123
kar_bon
 
kar_bon's Avatar
 
Mar 2006
Germany

B0A16 Posts
Default

Quote:
Originally Posted by Flatlander View Post
This has happened once or twice before but I assumed I was doing something wrong. Is it significant that both errors were caused when the client was processing the last line of workfile.txt? (Though there were only two lines.)
No.

Quote:
Originally Posted by Flatlander View Post
Should I try increasing the WUCacheSize to see if it always occurs on the last line? Then I can ctrl-c while it is on different lines to see what happens.
No, that doesn't matter.

If it occurs again, look in the folder, if no workfile.txt, rename workfile.bak and call 'do' again. That's all.
kar_bon is offline   Reply With Quote
Old 2010-06-10, 17:10   #124
Flatlander
I quite division it
 
Flatlander's Avatar
 
"Chris"
Feb 2005
England

1000000111012 Posts
Default

Ok, thanks.
Flatlander is offline   Reply With Quote
Old 2010-07-03, 01:49   #125
Joe O
 
Joe O's Avatar
 
Aug 2002

10158 Posts
Default

I've had two machines go into a loop. I caught one yesterday and stopped it.
Here is the output from that one
Code:
[2010-01-07 11:39:36] 138172*5^365107-1 is not prime.  RES64: 84EC98CD13D333DE.  OLD64: 08AC5CA6C9017801  Time : 1507.269 sec.
259072*5^365107-1 is not prime.  RES64: C98B5B25CC6A083D.  OLD64: 5CA21171653E18B4  Time : 1829.064 sec.
71084*5^365108-1 is not prime.  RES64: 95ED3012F3473012.  OLD64: C1C79038D9D59033  Time : 1585.047 sec.
162434*5^365108-1 is not prime.  RES64: AC127AAFB4C2A4C6.  OLD64: 16800DAEE3AEF06E  Time : 1888.927 sec.
[2010-01-07 13:33:36] 138172*5^365107-1 is not prime.  RES64: 84EC98CD13D333DE.  OLD64: 08AC5CA6C9017801  Time : 1522.320 sec.

[2010-01-07 12:05:22] 131848*5^365109-1 is not prime.  RES64: 010950066C988E98.  OLD64: 481E2801E8E32677  Time : 1523.812 sec.
183916*5^365109-1 is not prime.  RES64: 33BDC0C420337729.  OLD64: 5F18B910197DBE1D  Time : 1896.132 sec.
296024*5^365110-1 is not prime.  RES64: E2E01E0EB116ECFD.  OLD64: 41F71F534AFD39DD  Time : 1870.961 sec.
[2010-01-07 13:37:31] 
52922*5^365112-1 is not prime.  RES64: FF38EF9E48FCC73A.  OLD64: 3C127E71631EC5B9  Time : 1585.423 sec.
92936*5^365112-1 is not prime.  RES64: 81EB74E44982B40D.  OLD64: E187B9AA334C281D  Time : 1581.199 sec.
[2010-01-07 13:28:28] 
 
30994*5^365109-1 is not prime.  RES64: FF21EA84994A085D.  OLD64: FD65BF8DCBDE1914  Time : 1609.796 sec.
[2010-01-07 13:13:28]
and the console output
Code:
 
[2010-01-07 11:39:36] 138172*5^365107-1 is not prime.  RES64: 84EC98CD13D333DE.  OLD64: 08AC5CA6C9017801  Time : 1507.269 sec.
259072*5^365107-1 is not prime.  RES64: C98B5B25CC6A083D.  OLD64: 5CA21171653E18B4  Time : 1829.064 sec.
71084*5^365108-1 is not prime.  RES64: 95ED3012F3473012.  OLD64: C1C79038D9D59033  Time : 1585.047 sec.
162434*5^365108-1 is not prime.  RES64: AC127AAFB4C2A4C6.  OLD64: 16800DAEE3AEF06E  Time : 1888.927 sec.
[2010-01-07 13:33:36] 138172*5^365107-1 is not prime.  RES64: 84EC98CD13D333DE.  OLD64: 08AC5CA6C9017801  Time : 1522.320 sec.
+-------------------------------------+
| LLRnet client V0.9b7 with cLLR V3.8 |
| K.Bonath, 2010-05-15, Version 0.73  |
+-------------------------------------+
Current configuration:
server = "www.sr5.psp-project.de"
port =   12925
username = "Joe_O"
WUCacheSize = 0
Base prime factor(s) taken : 5
Resuming N+1 prime test of 71084*5^365108-1 at bit 37786 [4.45%]
Using FFT length 80K, a = 3

71084*5^365108-1 is not prime.  RES64: 95ED3012F3473012.  OLD64: C1C79038D9D5903
3  Time : 1585.047 sec.
Base prime factor(s) taken : 5
Starting N+1 prime test of 162434*5^365108-1
Using zero-padded FFT length 96K, a = 3

162434*5^365108-1 is not prime.  RES64: AC127AAFB4C2A4C6.  OLD64: 16800DAEE3AEF0
6E  Time : 1888.927 sec.
A duplicate file name exists, or the file
cannot be found.
Base prime factor(s) taken : 5
Starting N+1 prime test of 138172*5^365107-1
Using FFT length 80K, a = 3
138172*5^365107-1, bit: 200000 / 847770 [23.59%].  Time per bit: 1.763 ms.
Joe O is offline   Reply With Quote
Old 2010-07-03, 07:16   #126
kar_bon
 
kar_bon's Avatar
 
Mar 2006
Germany

2·32·157 Posts
Default

You're using the script and so LLR for base 5.

So I hope you've changed on the server-side in 'llr-serverconfig.txt' the line
Code:
displayFormat = "%s*5^%s-1"      -- use this for LLR type test (default)
Another thing: your console output says WUCacheSize = 0! Why? Should be 1 or greater.

Please send me the whole client folder with all files after that error.

PS:
Apparently your WUCacheSize is 4 and you stopped the script and restartet it again.
After LLR tested all 4 candidates, the script could not find a file or already exists.

Questions:
Do you used this script before for base 5 and it worked fine?
Do you stopped and continued the script before?

PPS:
I've done one pair and another pair after stopping and continuing the script, third pair cancelled: No problems on my side.
pairs: 146264*5^365234-1, 48394*5^365235-1, Cancelled pair: 100186 365235

Last fiddled with by kar_bon on 2010-07-03 at 08:43
kar_bon is offline   Reply With Quote
Old 2010-07-04, 00:55   #127
Joe O
 
Joe O's Avatar
 
Aug 2002

3×52×7 Posts
Default

No, I haven't changed the display format and the other machines seem to have worked correctly. In fact 3 other directories on that quad worked correctly. I changed WUCacheSize to 0 to drain that machine. As I said, the other 3 directories on that machine worked correctly. I no longer have those directories, but I do have the directory on a second machine that failed on its own. i.e. I wasn't trying to drain it, it just looped. I've temporarily stopped it on a third machine that is mostly unattended, but still have it working on a fourth machine. I'll check that later and see what is happpening. It was working correctly last night. I've stopped and started multiple clients on multiple machines before without this problem. All the same client, and all Windows XP. Mostly Pro but some Home. All with up to date maintenance.
Where do you want the directory sent? PM me or email me at factrange at yahoo dot com with contact information.
Joe O is offline   Reply With Quote
Old 2010-07-04, 12:48   #128
kar_bon
 
kar_bon's Avatar
 
Mar 2006
Germany

2×32×157 Posts
Default

I've checked the folder you sent.

Problem:
There are two saved-files from LLR in the folder.
On top the script looks if there exist a LLR-savefile (named like z1786704).
If so, jump to complete the tests from workfile.txt with cllr.
cllr will delete the savefile if the test is done.
But: because of another savefile, which never will removed the script thinks there're mor tests to do. So again and again the script will do the same candidates.

I don't know exactly how a second savefile was created.
This could happend: You stopped the script, changed the llr-clientconfig.txt doing other work. After those 5 new candidate were done, the old savefile was never deleted.

So before doing new work or after changing server/port in the llr-clientconfig.txt, execute the '_new.bat' script: this will delete all files created during a test. Now the folder is clean for a new amount of work. Be sure to backup 'lresults_hist.txt' if you need it.

I've inserted 2 new lines in the 'do.bat' to avoid such issue:
Code:
:no_prime
if exist workfile.bak del workfile.bak
if exist z* del z*
ren workfile.txt workfile.bak
goto start
These will be added in the next version of LLRnet.

Note:
I thought about a new name for this LLRnet-client/server and would like to name it

LLRnet2010 V0.73L

to make a difference to the V0.9b7 from J.Penné in 2005.
The 'L' stands for LLR and uses the latest version of it.

After Jean gave his new V3.8.1 free with some more options, I think I will include this new version with small changes for a V0.74L then.
kar_bon is offline   Reply With Quote
Old 2010-08-17, 23:22   #129
kar_bon
 
kar_bon's Avatar
 
Mar 2006
Germany

1011000010102 Posts
Default

I've updated the batches with one for starting all clients at once and each DOS-box gets his own name.

See the first post (the download-zip and the ReadMe.txt is changed, too).
kar_bon is offline   Reply With Quote
Old 2010-08-18, 15:49   #130
Joe O
 
Joe O's Avatar
 
Aug 2002

3×52×7 Posts
Default

I've sent you a zip of a looping client. I set up 8 exact copies of a working client on 8 identical machines and only 2 did not loop after completing their initial 5 tests
Joe O is offline   Reply With Quote
Old 2010-08-18, 18:18   #131
kar_bon
 
kar_bon's Avatar
 
Mar 2006
Germany

2·32·157 Posts
Default

In your folder is the file "workfile.bak" but this shouldn't be there.

Do you stopped and started again the script and if so, how?

There's no "lresults_hist.txt" at all so this is a hint, the script never escaped the cLLR-looping:

At start it looks if a intermediate cLLR-file exist (like z******) or the "workfile.txt".
If so, the script jumps to the cLLR testing.
After the 5 pairs are done it looks for found primes -> none so it jumps to ":no_prime".

Here "workfile.txt" will be renamed into "workfile.bak" and goes to start again.

Now there should no "workfile.txt" and z****** and the "lresults.txt" will converted into the LLRnet-format, send to the server, receive new pairs and cLLR again.

But: The "workfile.txt" still exist and the renaming won't work here!

So to continue correct:
- delete "workfile.bak", "lresults.txt" and "z*****"
- run the script by calling "do" again

-> Works correct now!

I don't know how this happend, but it's sure you've stopped and started the script or used a folder with old files in it.
To delete all old files and make sure to have a clean folder, run "_new.bat": this will delete all old files generated from a previous run of the batch!
kar_bon is offline   Reply With Quote
Old 2010-08-18, 21:19   #132
Joe O
 
Joe O's Avatar
 
Aug 2002

3×52×7 Posts
Default

If I delete lresults.txt I lose all the work. That is unacceptable.

The client stopped on it's own immediately after starting up. But to say that I can't ctrl-c is also unacceptable. I do it all the time on the machines that have no problem.

When I deleted workfile and restarted it got the work and stopped. It left tosend.txt containing only 4 of the 5 results with the other result garbage.
Code:
 1000000000:P:0:5:258  [2010 -2 B8CEFD9C23AC1268
I've restarted it and am waiting to see what happens.
Joe O is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
LLRNET ValerieVonck Software 12 2010-03-15 18:09
llrnet 64 bit balachmar Prime Sierpinski Project 4 2008-07-19 08:21
LLRNet em99010pepe Riesel Prime Search 20 2007-09-11 21:03
Bush Supports $120 Billion Iraq War Compromise ewmayer Soap Box 23 2007-05-27 12:37
LLRnet over proxy? Bananeweizen Sierpinski/Riesel Base 5 4 2006-10-14 07:51

All times are UTC. The time now is 19:06.

Mon Jul 13 19:06:06 UTC 2020 up 110 days, 16:39, 1 user, load averages: 1.09, 1.51, 1.54

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.