mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Raiders of the Lost Primes (https://www.mersenneforum.org/forumdisplay.php?f=87)
-   -   Testing.... (https://www.mersenneforum.org/showthread.php?t=13099)

gd_barnes 2010-02-24 20:34

[quote=kar_bon;206563]it's not so easy as thought, but the following lines will do the trick.
[code]
result, residue = primeTest(t, format("%s %s", k, n))
if result == 0 then
residue = "0"
end
[/code]

so, if a prime is found, set the residue to '0' and all is ok!

Note: not needed for the script, only for the 'old' version of the LLRnet-client.[/quote]

Which file in the old client needs to be changed? We should go ahead and get this corrected for users who like to "fiddle" with code and stuff.

My opinion on what to do when the server dries or goes down: Keep trying. Karsten, this would be a modification to your script. I know firsthand how maddening it would be if all of my clients stopped after a small 5-minute internet blip. IMHO, that should not be an option. It should be the default to keep trying.

Max,

It took me a while but I finally concluded what you did: It's much better to "kill" the Linux client with the system manager than it is to do Ctl-C. There are several times I noticed when it took 3-4 Ctl-C's to kill it; usually on small tests -or- when the server had dried. (Don't quote me on the exact scenarios but I do know that sometimes it didn't want to "die" on the first Ctl-C.)

Can you please put something in the documentation about it being best to kill the clients when stopping them?


Karsten,

Can you please modify the Windows script to do the following?:

1. Keep trying to connect when the server is dried or down.

2. Add an option to allow the user to change how often it tries to connect when the server is down. Both of you have that as 60 secs. but the Linux client allows the user to change it. Regardless, I think that is a good default for the value of a variable field that the user can change.

3. To sync up the Windows and Linux clients, would it be too much effort to allow the user an option to set the beep on or off and to put the primes in the current folder or the parent folder?


I think it make sense to have the 2 clients be as close to the same as possible. The above would accomplish that. Note: I think Karsten probably stopped adding options earlier because we had said that we don't want to add any new features. Sorry we're being a little wishy-washy here Karsten.

One more thing Karsten: There have been so many posts and changes here. Can you provide a link in the next post to your latest client? Sometime today or tomorrow after you make the above changes, I'd like to run a short test on it using 4 cores on my I7.

Gotta run...busy day for me. I'll probably get a little bit of testing in on cancelling pairs on the Linux client before about 5 PM CST.

Thanks everyone.


Gary

kar_bon 2010-02-24 20:48

[QUOTE=gd_barnes;206569]Which file in the old client needs to be changed? We should go ahead and get this corrected for users who like to "fiddle" with code and stuff.
[/quote]

see post #97 in the first code-block: it's llrnet.lua.

[QUOTE=gd_barnes;206569]
One more thing Karsten: There have been so many posts and changes here. Can you provide a link in the next post to your latest client?[/QUOTE]

i'll use the same link as in post #1 for any new version.

i'll try to implement the other options the next time, not sure if today all of them.

BTW: i thought about another helpful output:
when starting the script, prompt the most important setting from llr-clientconfig.txt at first!

[code]
+-------------------------------------+
| LLRnet client V0.9b7 with cLLR V3.8 |
| K.Bonath, 2010-02-10, Version 0.61 |
+-------------------------------------+

Current configuration:
server = "nplb-gb1.no-ip.org"
port = 9950
username = "kar_bon"
WUCacheSize=1
[/code]

that's what would have saved some time on running and checking for errors at the first tests with the script (you know: forgot to change my username in your settings).

suggestions?

gd_barnes 2010-02-24 23:10

1 Attachment(s)
I'm having a serious concern about the most recent stress test now. With the aforementioned 5 errors fixed, there should be no problems. But something happened that has now happened 3 times in a row:

1. The server "loses" several pairs right at the very beginning. In this case, it is pairs numbered 6 thru 10. The first 5 went through OK. (BTW, my cache was set to 10 for this test.)

2. The server "loses" a large # of pairs at the very end. In this case, 42 of them, which is the fewest that its lost of any of the 3 stress tests I've run. (Likely because I was only running 4 cores vs. 31 cores.) Checking confirmed that it was the final 42 pairs.


They are just sitting in knpairs.txt and joblist.txt as though they were handed out and never processed. Yet checking my clients confirmed that they were.

I don't know if this is stress-related or related to problems in the Linux client/script. Since this occurred when running just one quad, which is effectively like 1000+ clients at n=~400K, which makes it a pretty decent stress test, I may need to run the Windows client to see if it has the same problem. I can simulate a similar load with 4 cores of my I7 with the same knpairs loaded in the server.

I initially thought that it might be related to the fact that all of the first few pairs are prime except that the same issue seems to be happening at the beginning of the file as at the end.

For reference, I'm attaching the final knpairs that didn't process and the joblist. See a few posts back where I posted the entire knpairs file. The prune period was set to 15 mins and the server dried some 9 hours ago so these are not just some straggling pairs that still need to be received by the server.


Gary

gd_barnes 2010-02-24 23:15

[quote=kar_bon;206570]see post #97 in the first code-block: it's llrnet.lua.



i'll use the same link as in post #1 for any new version.

i'll try to implement the other options the next time, not sure if today all of them.

BTW: i thought about another helpful output:
when starting the script, prompt the most important setting from llr-clientconfig.txt at first!

[code]
+-------------------------------------+
| LLRnet client V0.9b7 with cLLR V3.8 |
| K.Bonath, 2010-02-10, Version 0.61 |
+-------------------------------------+

Current configuration:
server = "nplb-gb1.no-ip.org"
port = 9950
username = "kar_bon"
WUCacheSize=1
[/code]that's what would have saved some time on running and checking for errors at the first tests with the script (you know: forgot to change my username in your settings).

suggestions?[/quote]


That's a good idea on displaying that info. But if you do it, we need to make sure Max agrees and that he can change the Linux client.


Gary

mdettweiler 2010-02-25 01:02

[quote=gd_barnes;206579]I'm having a serious concern about the most recent stress test now. With the aforementioned 5 errors fixed, there should be no problems. But something happened that has now happened 3 times in a row:

1. The server "loses" several pairs right at the very beginning. In this case, it is pairs numbered 6 thru 10. The first 5 went through OK. (BTW, my cache was set to 10 for this test.)

2. The server "loses" a large # of pairs at the very end. In this case, 42 of them, which is the fewest that its lost of any of the 3 stress tests I've run. (Likely because I was only running 4 cores vs. 31 cores.) Checking confirmed that it was the final 42 pairs.


They are just sitting in knpairs.txt and joblist.txt as though they were handed out and never processed. Yet checking my clients confirmed that they were.

I don't know if this is stress-related or related to problems in the Linux client/script. Since this occurred when running just one quad, which is effectively like 1000+ clients at n=~400K, which makes it a pretty decent stress test, I may need to run the Windows client to see if it has the same problem. I can simulate a similar load with 4 cores of my I7 with the same knpairs loaded in the server.

I initially thought that it might be related to the fact that all of the first few pairs are prime except that the same issue seems to be happening at the beginning of the file as at the end.

For reference, I'm attaching the final knpairs that didn't process and the joblist. See a few posts back where I posted the entire knpairs file. The prune period was set to 15 mins and the server dried some 9 hours ago so these are not just some straggling pairs that still need to be received by the server.


Gary[/quote]
I have to wonder if this has something to do with what I suggested before, that the server might not "know" it's time to prune unless there's actually activity happening. I doubt it has a separate thread devoted to monitoring such things, so that kind of behavior would indeed by expected. In fact, come to think of it, in the past I was able to "trigger" an overdue prune by sending in a completely bogus result with the intent that it would be rejected--that's enough to "wake it up".

mdettweiler 2010-02-25 01:13

[quote=gd_barnes;206569]Max,

It took me a while but I finally concluded what you did: It's much better to "kill" the Linux client with the system manager than it is to do Ctl-C. There are several times I noticed when it took 3-4 Ctl-C's to kill it; usually on small tests -or- when the server had dried. (Don't quote me on the exact scenarios but I do know that sometimes it didn't want to "die" on the first Ctl-C.)

Can you please put something in the documentation about it being best to kill the clients when stopping them?[/quote]
I wouldn't recommend using kill as a matter of course since that won't give LLR a chance to save its checkpoint file; for small tests like the ones we're testing with it's not a terribly big deal, but it would be not a good thing to recommend to users in general. Possibly it would be better to just say in the readme that sometimes you have to Ctrl-C it a few times to kill it (especially with small tests) and if it's getting connection errors. For connection errors, though, it shouldn't hurt to just kill it the "hard" way.

mdettweiler 2010-02-25 03:42

[quote=gd_barnes;206581]That's a good idea on displaying that info. But if you do it, we need to make sure Max agrees and that he can change the Linux client.[/quote]
Yeah, I suppose I could do that--shouldn't be too hard.

gd_barnes 2010-02-25 06:04

[quote=mdettweiler;206587]I have to wonder if this has something to do with what I suggested before, that the server might not "know" it's time to prune unless there's actually activity happening. I doubt it has a separate thread devoted to monitoring such things, so that kind of behavior would indeed by expected. In fact, come to think of it, in the past I was able to "trigger" an overdue prune by sending in a completely bogus result with the intent that it would be rejected--that's enough to "wake it up".[/quote]

Hum. That doesn't quite hold water here. First, the first few pairs in the knpairs would have been processed long before. Second, the pairs are being shown as processed and sent to the server by the client yet they are not showing up in the results. For the prune to work, wouldn't the results have to be there? You might take a peak at port 9985. The files from yesterday's run including joblist, knpairs, results, and stdout all have a "-1st" extension on them. Shortly I'm going to run it again but I'll make the file much smaller this go around.

Unfortunately I've already stopped the server; saved off the applicable files and reloaded it. I'll have to try a smaller file to retest it in < 1 hour or so instead of waiting 6-7 hours for it to dry.

[quote=mdettweiler;206590]I wouldn't recommend using kill as a matter of course since that won't give LLR a chance to save its checkpoint file; for small tests like the ones we're testing with it's not a terribly big deal, but it would be not a good thing to recommend to users in general. Possibly it would be better to just say in the readme that sometimes you have to Ctrl-C it a few times to kill it (especially with small tests) and if it's getting connection errors. For connection errors, though, it shouldn't hurt to just kill it the "hard" way.[/quote]

OK, agreed. Makes sense. For the most part, I've been able to make it stop by the 2nd Ctl-C and sometimes on the 1st. So yeah, just commenting that you might have to hit Ctl-C something like 2-4 times to get it to stop should be OK.


Gary

mdettweiler 2010-02-25 06:05

I've now tested the do.pl script on Windows for most of today (since around 10 AM EST), and have encountered no problems except the small factor issue. BTW, I did a bit of investigating on that and found a couple things:

-The first of the four results (which came before the small factor in the batch) was accepted fine.
-On the server end, the small-factor result was received and accepted, though with the NewPGen header (!) put in place of the residual.
-The remaining 3 results in the batch, all of which came after the small-factor one, were rejected and subsequently thrown out by the client.

I'm not positive, but I think normal LLRnet is designed to be able to handle small factors correctly (though I haven't actually tested it). At any rate, though, no properly sieved file should ever have factors in it small enough for LLR to turn up; I imagine it wouldn't be a big deal if we didn't bother to fix it, since if there's small factors in the server, then there's a much bigger problem than just a few abandoned tests. Not to mention that if LLRnet doesn't have a precedent for handling these (as I said, I'm not sure if it does), then we wouldn't be able to fix it at all without adding code for it on the server end (which we probably don't want to get into).

Other than that, though, do.pl seems to be working perfectly. Gary, have you gotten the chance to test the latest version yet on Linux and do the stress test you were planning?

gd_barnes 2010-02-25 06:14

[quote=mdettweiler;206611]I've now tested the do.pl script on Windows for most of today (since around 10 AM EST), and have encountered no problems except the small factor issue. BTW, I did a bit of investigating on that and found a couple things:

-The first of the four results (which came before the small factor in the batch) was accepted fine.
-On the server end, the small-factor result was received and accepted, though with the NewPGen header (!) put in place of the residual.
-The remaining 3 results in the batch, all of which came after the small-factor one, were rejected and subsequently thrown out by the client.

I'm not positive, but I think normal LLRnet is designed to be able to handle small factors correctly (though I haven't actually tested it). At any rate, though, no properly sieved file should ever have factors in it small enough for LLR to turn up; I imagine it wouldn't be a big deal if we didn't bother to fix it, since if there's small factors in the server, then there's a much bigger problem than just a few abandoned tests. Not to mention that if LLRnet doesn't have a precedent for handling these (as I said, I'm not sure if it does), then we wouldn't be able to fix it at all without adding code for it on the server end (which we probably don't want to get into).

Other than that, though, do.pl seems to be working perfectly. Gary, have you gotten the chance to test the latest version yet on Linux and do the stress test you were planning?[/quote]


The version that I posted yesterday is the latest version. Correct? lol It is that latest version that I ran my big stress test on yesterday. It was about halfway through the stress test that I changed a prime residue from 16 x's to a single digit of "0" like the Windows client. What I want to test today is the same script for the cancellation of pairs and the problem with the pairs not processed at the beginning and end of the file by the server.

I just got back in after a long day and need to do a couple of things yet. But I plan to test in the wee hours here for 2-4 hours.

BTW, I also observed what you did on a pair that had a factor of 5. It put the file header in the residue. You know what? I think that might explain why the 4-5 pairs right after it were not accepted by the server even though the client processed them. Bingo! And...if what you said about the final pruning is causing them not to be processed at the end, well...that might explain completely what happened yesterday with the pairs that weren't processed by the server. That said, the server never showed the results for the missing pairs at the end so I'm questioning how a final prune would actually be able to work.

Agreed that a small factor should never happen on a reasonably sieved file. As a programmer though, it would be nice to code around it but not at the expense of a lot of extra time/testing. I'll see what the code looks like.


Gary

gd_barnes 2010-02-25 06:24

[quote=kar_bon;206563]it's not so easy as thought, but the following lines will do the trick.
[code]
result, residue = primeTest(t, format("%s %s", k, n))
if result == 0 then
residue = "0"
end
[/code]

so, if a prime is found, set the residue to '0' and all is ok!

Note: not needed for the script, only for the 'old' version of the LLRnet-client.[/quote]


Karsten,

I was looking to make this change to the residue for a prime in llrnet.lua on the Linux side but it appears to already default to a "0". Here is the code:

[code]
-- perform prime test !
if not asynchronous then
Logout() -- logout before performing computation
end
-- UpdateStatus(format("Working on : %s/%s (%s)", k, n, t))
-- print(format("Working on : %s/%s (%s)", k, n, t))
-- result, residue = primeTest(t, format("%s %s", k, n))
result, residue = 0, "0"
-- check user interruption
if stopCheck() then
return -- return with no error
end
end
SemaWait(semaphore)
[/code]


What change is needed to accomplish what you are talking about?

Edit: The code in the Windows client is the same. Please enlighten me.

kar_bon 2010-02-25 06:31

[QUOTE=gd_barnes;206612]BTW, I also observed what you did on a pair that had a factor of 5. It put the file header in the residue. You know what? I think that might explain why the 4-5 pairs right after it were not accepted by the server even though the client processed them. Bingo! And...if what you said about the final pruning is causing them not to be processed at the end, well...that might explain completely what happened yesterday with the pairs that weren't processed by the server. That said, the server never showed the results for the missing pairs at the end so I'm questioning how a final prune would actually be able to work.

Agreed that a small factor should never happen on a reasonably sieved file. As a programmer though, it would be nice to code around it but not at the expense of a lot of extra time/testing. I'll see what the code looks like.
[/QUOTE]

i had a look at the llrserver.lua:
try to do this: there're functions called PrunePairs() and PruneJoblist() (called in funtion ProxyUpdate). make an output on the server with "print("PrunePairs Call 1")" everytime that function is called before (the other same) and gave every call an own number, so you can say, which call invokes the function.
even better: put the date/time into it:
[code]
print(format("PrunePairs Call #1: [%s] ", date("%Y-%m-%d\ %H:%M:%S")))
[/code]

and test a small amout of pairs and a prune time of 15 mins.

this let you see, where and when the server pruned; perhaps there's issue: only pruning when results received.

gd_barnes 2010-02-25 06:34

[quote=kar_bon;206614]i had a look at the llrserver.lua:
try to do this: there're functions called PrunePairs() and PruneJoblist() (called in funtion ProxyUpdate). make an output on the server with "print("PrunePairs Call 1")" everytime that function is called before (the other same) and gave every call an own number, so you can say, which call invokes the function.
even better: put the date/time into it:
[code]
print(format("PrunePairs Call #1: [%s] ", date("%Y-%m-%d\ %H:%M:%S")))
[/code]

and test a small amout of pairs and a prune time of 15 mins.[/quote]


Don't you ever sleep? lol

I'm going to need some help with this. I'm not clear on where in llrserver.lua that it goes. Can you post an updated llrserver.lua file with this change in it?

mdettweiler 2010-02-25 06:39

[quote=gd_barnes;206612]The version that I posted yesterday is the latest version. Correct? lol It is that latest version that I ran my big stress test on yesterday. It was about halfway through the stress test that I changed a prime residue from 16 x's to a single digit of "0" like the Windows client. What I want to test today is the same script for the cancellation of pairs and the problem with the pairs not processed at the beginning and end of the file by the server.[/quote]
Ah, right, duh. :rolleyes: BTW, I tested the cancellation function about half an hour ago on Windows and it seems it worked perfectly.

kar_bon 2010-02-25 06:41

[QUOTE=gd_barnes;206613]What change is needed to accomplish what you are talking about?

Edit: The code in the Windows client is the same. Please enlighten me.[/QUOTE]

see my note in that post: [b]only[/b] in the 'old' llrnet.lua file, when LLRnet also do the primetest. this
[code]
result, residue = primeTest(t, format("%s %s", k, n))
--result, residue = 0, "0"

-- check user interruption
if stopCheck() then
return -- return with no error
end
[/code]

is what i see in the lua. the result/residue set to 0 commented out for testing.
but it seems, when a prime is found, the residue was not set to "0" in primeTest properly, so then my code!

kar_bon 2010-02-25 06:42

[QUOTE=gd_barnes;206615]Don't you ever sleep? lol

I'm going to need some help with this. I'm not clear on where in llrserver.lua that it goes. Can you post an updated llrserver.lua file with this change in it?[/QUOTE]

just awaken and perparing for work. perhaps i can do that then.

try a 'search for...' in an editor :grin:

gd_barnes 2010-02-25 08:04

Mr. former maintenance programmer still at your service here. :smile:

Not that it should ever happen but I've now fixed the problem when a pair has a small factor, which caused some subsequent pairs not to be sent (I believe the remainder of the batch). What was important about this, though, is that I've now verified that that is exactly what was causing several pairs at the beginning of each of my runs to not be processed to the server. Checking my latest run confirmed that all pairs at the beginning of the file were processed.

What I ended up doing is putting the term "factored" in the residue when there was a small factor. I had to add an elsif to that section to do a separate parsing of the line looking for the word "factor".

An important discovery: I've confirmed that the fact that the last few pairs aren't being sent when the server dries is a bug in the Linux client script. It processes them all in the client but because the server has dried, it can't retrieve more pairs. And if it cannot retrieve more pairs, it doesn't want to send that final batch. The fact is, it should be sending those pairs BEFORE attempting to retrieve additional pairs. That is the issue.

What this confirms is what Max encountered where he had to send one final "bogus" pair/result to the server in order to get the final pairs to process was unrelated to the pruning. The fact is, the results never got there because they were sitting in tosend.txt. Once it was able to process one final bogus pair, it sent them all except the bogus pair, which is exactly what was needed.

Karsten, on the Windows client, have you run tests where you've dried out a server? If so, was there any problem with it sending the final pairs to the server? In other words, did all of the pairs in knpairs.txt clear themselves out with an automatic pruning?

Bottom line:
6 errors corrected
1 error to go
:smile:

I suspect this last error will be the most difficult for me to fix because it involves the structure and timing of when things are processed in the script; which may require a complete change in the order of things; which could result in several other things being impacted. If there's one thing I will do; it is be VERY careful and test the heck out of it if I think I have it right.


Gary

kar_bon 2010-02-25 08:57

[QUOTE=gd_barnes;206620]An important discovery: I've confirmed that the fact that the last few pairs aren't being sent when the server dries is a bug in the Linux client script. It processes them all in the client but because the server has dried, it can't retrieve more pairs. And if it cannot retrieve more pairs, it doesn't want to send that final batch. The fact is, it should be sending those pairs BEFORE attempting to retrieve additional pairs. That is the issue.

What this confirms is what Max encountered where he had to send one final "bogus" pair/result to the server in order to get the final pairs to process was unrelated to the pruning. The fact is, the results never got there because they were sitting in tosend.txt. Once it was able to process one final bogus pair, it sent them all except the bogus pair, which is exactly what was needed.

Karsten, on the Windows client, have you run tests where you've dried out a server? If so, was there any problem with it sending the final pairs to the server? In other words, did all of the pairs in knpairs.txt clear themselves out with an automatic pruning?

I suspect this last error will be the most difficult for me to fix because it involves the structure and timing of when things are processed in the script; which may require a complete change in the order of things; which could result in several other things being impacted. If there's one thing I will do; it is be VERY careful and test the heck out of it if I think I have it right.
[/QUOTE]

- my script does the cLLR-work first
- when done, convert lresults.txt into tosend.txt
- call LLRnet
- if no (new) workfile.txt exist, goto ERROR (no connection)

i've not tested this with a real server, so perhaps we could do this later today.

but:
when i process a workfile on my offline PC (same script), the script will end with that conncection error and the tosend.txt remains.
i then only start llrnet.exe (folder on stick into online PC): LLRnet will first getting new pairs and then submitting the tosend.txt!

to test:
what happens when LLRnet dried and calling llrnet.exe with a tosend.txt there

looking at llrnet.lua the function DialogWithServer:
(some unimportant lines deleted)
[code]
function DialogWithServer()
(...)
-- first check if we have some job to work on
(...)
-- if no unfinished job, then ask a new job to the server
if not t or not k or not n then
--print("Requesting new job from the server ...")
t, k, n = GetPair()
changed = t and k and n
end
(...)
print(format("Fetching %d additional Work Unit(s) ...", num))
(...)

if changed then
WriteWorkfile(t, k, n, more)
SendStructureToAllGUIs("WUs", more, 1)
end

SendAllResults()
(...)
end
[/code]

SendAllResults done at last, so perhaps put this line at first here and test again!

so if the script is ok, it's at LLRnet not sending the last results when the server dried.

@Gary: you can test this easily, too.
a folder with normal testing, 2nd folder with some pairs done but no connection so only the tosend.txt exist.
let the first folder dry out the server and then start the script/llrnet.exe (2 tests!) in the 2nd folder!
i'm home again in about 7 hours from now so can't test it (and got no dried server :smile:).

gd_barnes 2010-02-25 10:15

OK guys, after much effort, I've been unable to fix the problem with the server not returning the final few pairs as the server is near drying. Oddly, I even switched the order of things and it would show that it had sent the final tosend.txt pairs to the server but the server would never accept them. In other words, they just never showed up in the server results.txt or rejected.txt. It's like they just went off into cyberspace somewhere.

I suppose we could live with that at NPLB and just make sure a server never dries but there's one problem: I checked it against the current client that we run in "production". It returns all pairs perfectly. So despite correcting a couple of bugs that LLRnet previously had, we have introduced a new bug that it did not previously have, which I find difficult to stomach.

Karsten, by your explanation here, I suspect the Windows client will have the same problem.

So we have some work to do.

In a little while, I will edit this post to attach the latest do.pl Linux script. Besides the factoring issue that I've fixed, I've tweaked the order of when tosend.txt can be processed and not made it dependent on getting a workfile.txt file from LLRnet. Now it will immediately check for tosend.txt when starting the script and send the pairs immediately. This is good if the server goes down, your machine has subsequently processed and formatted the pairs into tosend.txt, and you decide to stop your client. Then after the server comes back up and you start your client back up, it will send them all right away. That closely simulates what LLRnet currently does.

I've also tweaked the timing of when a time stamp is written to lresults_hist stating when pairs were sent to the server. Previously that time stamp was written when lresults was added to lresults_hist. That was incorrect because the results have not actually been sent to the server at that point. It is now written immediately after the time in which they are actually sent.


Gary

kar_bon 2010-02-25 10:36

[QUOTE=gd_barnes;206632]I suppose we could live with that at NPLB and just make sure a server never dries but there's one problem: I checked it against the current client that we run in "production". It returns all pairs perfectly. So despite correcting a couple of bugs that LLRnet previously had, we have introduced a new bug that it did not previously have, which I find difficult to stomach.

Karsten, by your explanation here, I suspect the Windows client will have the same problem.

So we have some work to do.
[/QUOTE]

i've nothing changed in client.lua (only 3 lines with output) or client_server.lua nor llrserver.lua. so the error has to be on the client side.

i see in do.pl the function doLLRnet() is called in 2 different places.
only just before the end of the main loop, the tosend.txt will be deleted (and in cancel-function of course). i call llrnet.exe only once.

try to test as i mentioned above!

gd_barnes 2010-02-25 12:52

1 Attachment(s)
Karsten,

I was somewhat unable to follow what you meant about testing in the last post. I'm extremely bleary eyed now after an all night testing session. But in looking at what I could understand, it appears that I'm doing very close to what you are suggesting. My tests tonight have involved one core running about 200 small pairs over and over.

I do have some good news finally:

I was able to get it down to where the server is only not receiving the final about 1/2 of a batch of pairs at the very end of the file. This amounted to not receiving the final 2 pairs instead of 7 pairs with the cache set to 5. I found that it was erroring out on the last batch when it had less than 5 pairs when it was trying to display "WU 3/5 xx xxxxx", which caused the next-to-last batch to not be received by the server.

What I did was comment out 2 lines in the llrnet.lua file. That caused it to stop the error message and allowed it to process the next-to-last batch. Now it's only the final batch (in this case 2 pairs) that it won't accept.

Attached are the modified llrnet.lua file and do.pl script.

Changes:
1. Commented out 2 lines in llrnet.lua where it was displaying unneeded info. about what WU's it was fetching. This was causing an error when less than a full batch was remaining on the server.

2. Corrected the problem with a pair that has a small factor. The residue will contain the simple word "factored" in it. Currently LLRnet just puts the residue of the previous pair in there, which is a bug...the same bug it does on a prime and that can cause a problem if the first pair in the file is a prime.

3. Moved the timing of when the time stamp referencing pairs being submitted to the server is written to the lresults_hist file. It was previously written several lines of code before they were actually submitted, which in some cases meant that it was being written when the results were not actually received by the server due to other issues.

4. Something you had mentioned that I had earlier concluded: I changed the execution of doLLRnet() at the end of the main loop to simply run LLRnet once. The problem was that doLLRnet() requires that at least one pair be fetched because it requires the existence of the workfile.txt in order to exit the loop. On the next-to-last or last batch, that was causing a problem. It could even happen sooner if many clients are connected. The DoLLRnet() requiring the existing of workfile.txt in order to exit should only be executed when the main intent of it is to fetch pairs. That is now done only one time towards the beginning of the main loop.

Based on what you said, we are now more synced up on #4.

Note that I opted against moving the processing of tosend.txt to the top of the main loop like I had suggested that I would do earlier. I thought of a couple of scenarios where that would not be desirable.

Please take a look at the logic where I handle a small factor. LLRnet as it currently exists incorrectly handles it by leaving the residue from the previous pair in there. But our scripts were worse than that because they would put the file header in there and end up skipping several subsequent pairs. You should probably code for that in the Windows client.

Stats:
8 errors corrected (including new problems #1 & #4 above. #2 was already counted and #3 is a nitpick more than a bug.)
1 still to go
:-)


Gary

kar_bon 2010-02-25 13:14

i had a quick look at your llr.lua:

you can put the printings (print"Fetching...) below the test, if GetPair was ok, like:

before
[code]
t, k, n = GetPair()
print(format(" Fetching WU #%d/%d: %s %s",i,WUCacheSize,k,n))
if t and k and n then
tinsert(more, { k = k, n = n })
else
break
end
[/code]

after
[code]
t, k, n = GetPair()
if t and k and n then
print(format(" Fetching WU #%d/%d: %s %s",i,WUCacheSize,k,n))
tinsert(more, { k = k, n = n })
else
print("No more pairs on the server!")
break
end
[/code]

and perhaps the message, no pairs available on the server.

try it. i have to look for that at home, there're several points to do so!

testing:
- getting new pairs when server is already dried
- getting the last pairs from the server, not enough then

mdettweiler 2010-02-25 15:15

[quote=gd_barnes;206641]Karsten,

I was somewhat unable to follow what you meant about testing in the last post. I'm extremely bleary eyed now after an all night testing session. But in looking at what I could understand, it appears that I'm doing very close to what you are suggesting. My tests tonight have involved one core running about 200 small pairs over and over.

I do have some good news finally:

I was able to get it down to where the server is only not receiving the final about 1/2 of a batch of pairs at the very end of the file. This amounted to not receiving the final 2 pairs instead of 7 pairs with the cache set to 5. I found that it was erroring out on the last batch when it had less than 5 pairs when it was trying to display "WU 3/5 xx xxxxx", which caused the next-to-last batch to not be received by the server.

What I did was comment out 2 lines in the llrnet.lua file. That caused it to stop the error message and allowed it to process the next-to-last batch. Now it's only the final batch (in this case 2 pairs) that it won't accept.

Attached are the modified llrnet.lua file and do.pl script.

Changes:
1. Commented out 2 lines in llrnet.lua where it was displaying unneeded info. about what WU's it was fetching. This was causing an error when less than a full batch was remaining on the server.

2. Corrected the problem with a pair that has a small factor. The residue will contain the simple word "factored" in it. Currently LLRnet just puts the residue of the previous pair in there, which is a bug...the same bug it does on a prime and that can cause a problem if the first pair in the file is a prime.

3. Moved the timing of when the time stamp referencing pairs being submitted to the server is written to the lresults_hist file. It was previously written several lines of code before they were actually submitted, which in some cases meant that it was being written when the results were not actually received by the server due to other issues.

4. Something you had mentioned that I had earlier concluded: I changed the execution of doLLRnet() at the end of the main loop to simply run LLRnet once. The problem was that doLLRnet() requires that at least one pair be fetched because it requires the existence of the workfile.txt in order to exit the loop. On the next-to-last or last batch, that was causing a problem. It could even happen sooner if many clients are connected. The DoLLRnet() requiring the existing of workfile.txt in order to exit should only be executed when the main intent of it is to fetch pairs. That is now done only one time towards the beginning of the main loop.

Based on what you said, we are now more synced up on #4.

Note that I opted against moving the processing of tosend.txt to the top of the main loop like I had suggested that I would do earlier. I thought of a couple of scenarios where that would not be desirable.

Please take a look at the logic where I handle a small factor. LLRnet as it currently exists incorrectly handles it by leaving the residue from the previous pair in there. But our scripts were worse than that because they would put the file header in there and end up skipping several subsequent pairs. You should probably code for that in the Windows client.

Stats:
8 errors corrected (including new problems #1 & #4 above. #2 was already counted and #3 is a nitpick more than a bug.)
1 still to go
:-)


Gary[/quote]
Nice work--I read through the code and it looks good. I've uploaded the latest packages to the following links due to problems with the noprimeleftehind.net server:

[URL]http://nplb-gb1.no-ip.org/sieves/llrnet-script-perl-0.61-win32.zip[/URL]
[URL]http://nplb-gb1.no-ip.org/sieves/llrnet-script-perl-0.61-linux32.zip[/URL]

kar_bon 2010-02-25 19:18

[b]Status of WIN-script V0.63[/b]

i've just uploaded a new version of the WIN-script with the latest lua-files (link see post #1).

llrnet.lua:
- no errors when printing "Fetching ..." new pairs -> included again

do.bat:
- displaying important settings from llr-clientconfig.txt
- trying to connect to server if not responding (every 60 seconds) forever

llr-clientconfig.txt:
- reduced to mostly needed settings, all others will set in lua-files before reading these

ToDo:
- parsing several options in batch-mode is very annoying and several lines per option would be needed with error-finding. so i will do it with setting an option at top of the do.bat with the set-command from DOS. every user can put his own settings (ON or OFF) in there.

- option OutputIterations (10000 by default)
- option primes.txt in same folder or folder above (above by default)
- option beep on prime / batch ended (beep by default)

- to test: dried server /dry when receiving pairs

do_tosend.awk:
- support pairs with small factors found or probable primes

more work to do now as thought!

gd_barnes 2010-02-25 22:58

[quote=kar_bon;206643]i had a quick look at your llr.lua:

you can put the printings (print"Fetching...) below the test, if GetPair was ok, like:

before
[code]
t, k, n = GetPair()
print(format(" Fetching WU #%d/%d: %s %s",i,WUCacheSize,k,n))
if t and k and n then
tinsert(more, { k = k, n = n })
else
break
end
[/code]after
[code]
t, k, n = GetPair()
if t and k and n then
print(format(" Fetching WU #%d/%d: %s %s",i,WUCacheSize,k,n))
tinsert(more, { k = k, n = n })
else
print("No more pairs on the server!")
break
end
[/code]and perhaps the message, no pairs available on the server.

try it. i have to look for that at home, there're several points to do so!

testing:
- getting new pairs when server is already dried
- getting the last pairs from the server, not enough then[/quote]


OK, I'll try that. It passes a code review. :-)

But there is something that doesn't pass a code review. It is one of the most common programming errors that I have encountered in my programming career. You cannot have "NOT" logic with "OR" in between. Hence it will be executed 100% of the time.

This code:

[code]
if not t or not k or not n then
-- print("Requesting new job from the server ...")
t, k, n = GetPair()
if t and k and n then
-- print(format(" Fetching WU #1/%d: %s %s",WUCacheSize,k,n))
changed = 1
else
return
end
end
[/code]This statement:
if not t or not k or not n then

Would be executed 100% of the time!

Look at it like this statement:

IF t not = 0 or t not = 1 or t not = 2

Now, for t to be NOT 0, then it must have a value of 1 or 2 or 3, etc. So in order for "NOT 0" to be false (hence allowing the remainder of the "OR" statement to be checked), the value of t must be 0. Next you are asking if it is "NOT 1". Since the value of t is 0, then of course "NOT 1" is true. Hence one of the 3 "OR" conditions has been satisfied and the statement is ALWAYS true.

So my question to you is: Should this IF statement be the following?:

if not t AND not k AND not n then

In Cobol, we would avoid such confusing negative logic and use:

[code]
if t and k and n then
(next sentence -or- continue)
else
(the above code that you wanted to execute)
[/code]Is there something that I'm not understanding on how lua processes logic in IF statements?


Gary

kar_bon 2010-02-25 23:15

ok, here's a bit more code from the function DialogWithServer:

[code]
local t, k, n, more
local changed

-- first check if we have some job to work on
[color=red]1[/color] t, k, n, more = ReadWorkfile()
more = more or { }

-- if no unfinished job, then ask a new job to the server
if not t or not k or not n then
-- print("Requesting new job from the server ...")
[color=red]2[/color] t, k, n = GetPair()
if t and k and n then
print(format(" Fetching WU #1/%d: %s %s",WUCacheSize,k,n))
changed = 1
else
print("No more pairs available from server!")
return
end
end
[/code]

t: type of processing the prime-test (30000000000000:M:1:2:258)
k: k-value
n: n-value
more: list of k/n-pairs

so in line 1) the function ReadWorkfile will set k and n with the first pair and the list with the remaining pairs (if no left, the list is empty "more or {})

if k or n or t are not filled -> no pair in workfile left -> reserve new pair(s)

in 2) a new pair k/n-values and t is read from the server (GetPair)

if this was successful (t AND k AND n not nil) print the message and set option to write the Workfile (was changed)
if not, print error "No more..." and return this function.

so i think, it's correct!

helpful to read: [url=http://www.lua.org/pil/3.3.html]LUA Logical Operators[/url]:
"The operator or returns its first argument if it is not false; otherwise, it returns its second argument:"

so if ReadWorkfile returns no pair, t, k and n are nil (see function read() in init.lua).
so the expression "if not t" is true and the rest is negligible; if t got a value (perhaps an error) and the others are nil -> true by those values! so it's correct.

gd_barnes 2010-02-26 00:00

As Max would say:

Duh! Bangs head! :-)

The NOT logic with ORs that I was referring to is only bad if you are testing the SAME variable as in if t not = 0 or t not = 1 or t not = 2. That would be a bug every time. Clearly you can have ANDs or ORs combined with NOTs when testing different variables such as in t, k, and n here.

I can tell I've been out of the programming loop for almost 2 years but a lot of it is coming back to me. This has been kind of a fun process. I wouldn't have wanted to write the script from scratch but debugging an existing one is right up my ally and is a lot of what I used to do in my former job. I was always the maintenance programmer; not the new development guy. You and Max are good at the new development stuff. :-)

Thanks for the explanation. I can very clearly tell the logic is correct now. I can even remember doing similar coding when checking that fields were properly populated in a "flat" file before loading them to a DB2 database.

Edit: I just ran your suggested change on llrnet.lua. It worked! So the W/Us will continue to display on the screen without an error coming on the last batch before a server dries. I still have the existing issue with the final 1/2 batch not being RECEIVED by the server when the server dries out. I emphasize RECEIVED because the client clearly shows that it has processed the final few pairs and has attempted to send it to the server. I'm still not sure why it isn't being accepted. It's not in rejected.txt nor in results.txt and just sits there in joblist.txt. I will be curious to see if you get the same problem with the Windows client.


Gary

gd_barnes 2010-02-26 00:16

Quick question Karsten:

Are there any values that are returned from the server to the script?

As an example in the Linux script, when ./LLRnet is executed, are there any variables that can be returned from the server such that do.pl will have some info. about what it did?

Useful info. might be # of work units sent back to the client and # of formatted tosend.txt results processed by the server.

I might be able to check such a field to finally resolve this final irritating problem with the server not accepting the final batch of pairs.

One more thing: Can you attach the source code that you used to build the new llrnet.exe? To understand this a little better, I need to see all of what you commented out.


Gary

kar_bon 2010-02-26 00:18

the last two weeks i've learned much about LUA programming and some things are different than in other languages.

now we're down to only one (better half the last one) problem to solve?

as you said. there must be an issue in the server-client communication, more on the server side. i could do some printings in the server.lua to see what/when will received from the client and post here.

for testing i suggest: put 15 normal (no primes yet) pairs in the knpairs.txt of the server and set WUCachesize to 10 on the client and run the batch.

gd_barnes 2010-02-26 00:23

[quote=kar_bon;206698]the last two weeks i've learned much about LUA programming and some things are different than in other languages.

now we're down to only one (better half the last one) problem to solve?

as you said. there must be an issue in the server-client communication, more on the server side. i could do some printings in the server.lua to see what/when will received from the client and post here.

for testing i suggest: put 15 normal (no primes yet) pairs in the knpairs.txt of the server and set WUCachesize to 10 on the client and run the batch.[/quote]

I have an idea: Can you run my port 9985 with the Windows client until it is dry? There are 123 pairs in there and it should finish in < 10 mins on one core. There are many primes at the beginning as well as a pair that contains a factor of 5.

I want to see if the Windows client is doing the same thing as the Linux client. The server is set up and ready to go. I will stay off of it for now until I hear back from you.

This should be a good test to see if we are "in sync" on our existing code since it contains a variety of different tests in it.

Edit: Hold on. It's not ready.

Edit 2: OK, it's ready. Fire away

kar_bon 2010-02-26 00:25

[QUOTE=gd_barnes;206697]Quick question Karsten:

Are there any values that are returned from the server to the script?

As an example in the Linux script, when ./LLRnet is executed, are there any variables that can be returned from the server such that do.pl will have some info. about what it did?
[/quote]
no, llrnet.exe don't returns any values, but i can put some printings in the server.lua

Useful info. might be # of work units sent back to the client and # of formatted tosend.txt results processed by the server.

I might be able to check such a field to finally resolve this final irritating problem with the server not accepting the final batch of pairs.

[quote]
One more thing: Can you attach the source code that you used to build the new llrnet.exe? To understand this a little better, I need to see all of what you commented out.
[/QUOTE]

no, misunderstood! i have not build a new llrnet.exe! the only thing is editing the included lua-files, nothing else! those lua's are all in the download in the first post!

gd_barnes 2010-02-26 00:29

[quote=kar_bon;206700]no, llrnet.exe don't returns any values, but i can put some printings in the server.lua

Useful info. might be # of work units sent back to the client and # of formatted tosend.txt results processed by the server.

I might be able to check such a field to finally resolve this final irritating problem with the server not accepting the final batch of pairs.



no, misunderstood! i have not build a new llrnet.exe! the only thing is editing the included lua-files, nothing else! those lua's are all in the download in the first post![/quote]


OH!! Yeah, that was a misunderstanding. Well, darn. I was hoping it was something with what was commented out of the LLRNet binary/executable.

Port 9985 is ready for your Windows client(s). One core would be sufficient and should take < 10 mins.

kar_bon: and running!

kar_bon 2010-02-26 00:38

ok, it's done.
i've set WUCachesize to 5 and the last lines look like this:
[code] Fetching WU #1/5: 187 10009
Fetching WU #2/5: 199 10009
Fetching WU #3/5: 27 10010
Fetching WU #4/5: 71 10010
Fetching WU #5/5: 95 10010
[2010-02-26 01:35:41]
Using Irrational Base DWT : Mersenne fftlen = 512, Used fftlen = 640
187*2^10009-1 is not prime. LLR Res64: 137F03639A5E4D5D Time : 82.482 ms. ms.
Using Irrational Base DWT : Mersenne fftlen = 512, Used fftlen = 640
199*2^10009-1 is not prime. LLR Res64: C302ABAF8A776C7E Time : 80.909 ms. ms.
Using Irrational Base DWT : Mersenne fftlen = 512, Used fftlen = 512
27*2^10010-1 is not prime. LLR Res64: E0DD104B60A07E67 Time : 60.571 ms. s.
Using Irrational Base DWT : Mersenne fftlen = 512, Used fftlen = 640
71*2^10010-1 is not prime. LLR Res64: 90EDA18FE281C61E Time : 80.905 ms. s.
Using Irrational Base DWT : Mersenne fftlen = 512, Used fftlen = 640
95*2^10010-1 is not prime. LLR Res64: D5C51D612D8FB72F Time : 80.863 ms. s.
Fetching WU #1/5: 141 10010
Fetching WU #2/5: 147 10010
Fetching WU #3/5: 167 10010
No more pairs available from server!
[2010-02-26 01:35:45]
Using Irrational Base DWT : Mersenne fftlen = 512, Used fftlen = 640
141*2^10010-1 is not prime. LLR Res64: 2472009A9411CAD2 Time : 83.818 ms. ms.
Using Irrational Base DWT : Mersenne fftlen = 512, Used fftlen = 640
147*2^10010-1 is not prime. LLR Res64: 093D24DA6996BB27 Time : 81.091 ms. ms.
Using Irrational Base DWT : Mersenne fftlen = 512, Used fftlen = 640
167*2^10010-1 is not prime. LLR Res64: 9618D1691A2D1144 Time : 81.912 ms. ms.
No more pairs available from server!
WARNING: No new work received from LLRnet-Server!
Trying to connect again ... 1
(...)
[/code]

the last batch contains only 3 pairs and should be sent correctly (i hope).

gd_barnes 2010-02-26 00:53

That was a very good test to sync up the two clients.

It looks like we're close to in sync. Like with me, your last batch was not accepted by the server and there are no results for it.

There are still 2 small differences between the Windows and Linux clients:

First, There is one difference that you'll definitely need to code for: The candidate that had a factor of 5 did not get processed. The pair is: 3 41625.

I just end up handling lt like any other composite except that you have to parse the line out looking for a word of "factor" instead of " is ". I then put "factored" in the residue.

Here is the Linux script code:
[code]
if($JustRead =~ "prime!" or $JustRead =~ "probable") {
print TOSEND "$header $k $n 0 0\n";
}
elsif($JustRead =~ "factor") {
print TOSEND "$header $k $n -2 factored\n";
}
else {
($foo, $res64time) = split(/64: /, $JustRead);
($res64, $time) = split(/ Time : /, $res64time);
print TOSEND "$header $k $n -2 $res64\n";
}
[/code]The second difference is that the Linux script does not say "no more pairs available from server". It says:
"Error: could not connect to server after 5 tries."
"Most likely there is a problem either with your connection or the server."
"Sleeping 60 seconds."

My question is: Which is more accurate? After all, the same situation would happen if the server went down or it dried out. Perhaps a 3rd choice is better. Perhaps something like:

"No pairs are available at this time."
"Either the server has dried out or there is a problem connecting to the server."
"Sleeping 60 seconds before trying again."

That would cover both a server outage or the server drying out. What would you think of changing your script to that? If you agree, I'll change the Linux script to that also.

Also, on the Linux client, Max allows the user to modify the 60 second time frame for attempted reconnection. Does the Windows client allow the user to modify that?


Gary

gd_barnes 2010-02-26 01:02

At the risk of being an extreme nitpick here, I want to change something. This thinking comes from a couple of database "tuning" classes that I had at my former employer. I want to "tune" this to be as fast as possible.

Because 99.9%+ of all candidates are composite, in the code where we parse out whether it is prime, factored, or a composite with a residue, we should check for the composite first. This will make the code incrementally faster since the first choice will be chosen most every time.

I'll just check that the line contains "not prime" for a standard composite since that verbiage is not in a small-factored candidate. I would be hesitant to check for the verbiage "LLR Res" or something like that. If I remember right, there are sometimes when testing other bases where it might say something like "old res" or something similar.

I'll post the code after I've finished and tested with port 9985 again.


Gary

kar_bon 2010-02-26 01:05

[QUOTE=gd_barnes;206703]It looks like we're close to in sync. Like with me, your last batch was not accepted by the server and there are no results for it.

There are still 2 small differences between the Windows and Linux clients:

First, There is one difference that you'll definitely need to code for: The candidate that had a factor of 5 did not get processed. The pair is: 3 41625.
[/quote]
yes, as i mentioned, i have to change my do_tosend.awk script (i saw your code in the linux-client),

[quote]
The second difference is that the Linux script does not say "no more pairs available from server". It says:
"Error: could not connect to server after 5 tries."
"Most likely there is a problem either with your connection or the server."
"Sleeping 60 seconds."

My question is: Which is more accurate? After all, the same situation would happen if the server went down or it dried out. Perhaps a 3rd choice is better. Perhaps something like:

"No pairs are available at this time."
"Either the server has dried out or there is a problem connecting to the server."
"Sleeping 60 seconds before trying again."
[/quote]

the 'No more pairs..." is from my latest llrnet.lua (see some posts above!). i can change this back and let the script print the whole message.

[quote]
Also, on the Linux client, Max allows the user to modify the 60 second time frame for attempted reconnection. Does the Windows client allow the user to modify that?
[/quote]

i will do it like this:
[code]
@echo off
set waitloop=0
set Iterations=10000
cls
echo +-------------------------------------+
echo ^| LLRnet client V0.9b7 with cLLR V3.8 ^|
echo ^| K.Bonath, 2010-02-10, Version 0.63 ^|
echo +-------------------------------------+
echo.
(...)
[/code]

so the user has to change the set command for 'Iterations' and that value will be taken in the script. same handling for all other options!

it's late and i think we can tomorrow test a llrserver.lua with some printings (which/how many pairs received or sent from server).

could you please send me an server-output to see how it looks like?

gd_barnes 2010-02-26 01:07

[quote=kar_bon;206705]yes, as i mentioned, i have to change my do_tosend.awk script (i saw your code in the linux-client),



the 'No more pairs..." is from my latest llrnet.lua (see some posts above!). i can change this back and let the script print the whole message.



i will do it like this:
[code]
@echo off
set waitloop=0
set Iterations=10000
cls
echo +-------------------------------------+
echo ^| LLRnet client V0.9b7 with cLLR V3.8 ^|
echo ^| K.Bonath, 2010-02-10, Version 0.63 ^|
echo +-------------------------------------+
echo.
(...)
[/code]so the user has to change the set command for 'Iterations' and that value will be taken in the script. same handling for all other options!

it's late and i think we can tomorrow test a llrserver.lua with some printings (which/how many pairs received or sent from server).

could you please send me an server-output to see how it looks like?[/quote]


Which files?

Do you need joblist, knpairs, results? All of those? Any others?

kar_bon 2010-02-26 01:09

[QUOTE=gd_barnes;206706]Which files?

Do you need joblist, knpairs, results? All of those? Any others?[/QUOTE]

no, i'm thinking of the output of the server, like that what i'm getting with the client when starting in a DOS-box. is that possible?

there should be outputs with "Removing solved pair" for example!
or when pruning "Removing solved job"!

gd_barnes 2010-02-26 01:11

1 Attachment(s)
[quote=kar_bon;206707]no, i'm thinking of the output of the server, like that what i'm getting with the client when starting in a DOS-box. is that possible?[/quote]

OH! You mean the stdout.txt file? [B][COLOR=red]<-- yep, this one from the startet server.exe![/COLOR][/B]

Or do you mean at the command prompt on the client machine?

[COLOR=Red]Consider it done. :-)[/COLOR]

Note:
i've found in llr-serverconfig.txt the line
[code]
--logOutput = 1
[/code]so remove the commenting and the output will logged in "stdout.txt"!

Edit: I went ahead and attached all of the applicable files from your run. The knpairs-save file is the beginning knpairs. The knpairs-kar file is what was left at the end of your run.

gd_barnes 2010-02-26 09:43

1 Attachment(s)
Behold my masterpiece. :-)

The main fix for tonight's work is that when pairs are cancelled, the Linux client will now return all unprocessed results to the server, remove those pairs from being returned as unprocessed, and subsequently return all unprocessed pairs in one fell swoop.

This took some serious learning to get it right but I'm glad that I jumped through the hoops.

I did several other clean ups/corrections as follows:

1. Client.lua: Corrected spelling of the word "successful" and its various forms and corrected the grammar on the rejected message that is returned to the screen. The latter one had irritated me for a long time. It had the word "either" in there at the beginning of consecutive sentences, which didn't make sense. It's now one sentence with an "or" in the middle.

2. Llrnet.lua: As per discussion with Karsten, uncommented the fetching statement and moved it under an "if" statement. This allowed it to still be displayed yet not error out when there was less than a batch remaining in the server.

3. do.pl:
--(a) Move the checking of composites ahead of primes or factors when converting the results to the tosend file. (slightly more efficient) Also changed the structure of that area so that the "PRINT" to tosend.txt code only had to happen once. (less code & easier to follow)
--(b) Change display when the server is dry or cannot connect. Instead of the "error connecting after 5 tries" message, I send a "server is dried out or there is a problem in connecting" message to allow for either situation that would hit the same code.
--(c) Place the new line character (\n) in several places right before the program exits. This was a small bug that caused the command prompt to be on the right side of the screen whenever an error or unexpected end condition would occur. I only saw it when hitting Ctl-C but noticed that it would happen in several places with other kinds of "real" errors.
--(d) Move the JobCancel subroutine from the end towards the middle of the program and switch places of two other subroutines. Reasoning: Code should not execute routines HIGHER up in the program because it creates spaghetti code that is much harder to follow. With the changes in (e) below, JobCancel now executes several subroutines that would have been higher in the program had I not moved it.
--(e) The big one: When the user executes do.pl -c, unprocessed results are accounted for and sent to the server and the same pairs are removed from workfile.txt to avoid loss of processing. This is accomplished approximately as follows:
----{1] Run the ConvertResults routine to create the tosend.txt file with a counter that counts the # of results.
----[2} If the # of results is > 0, write the results to the history and rewrite the workfile.txt file removing the same # of pairs from the beginnning of the file as there are # of results.
----[3] Use the shortened workfile.txt in a looping process to return all of its pairs to the server.


Attached are the 3 changed files.

I'm very happy with everything in this routine now except that the server still refuses to accept the final batch of pairs from each client when the server dries out. Karsten, I looked at that for a couple of hours with little headway. It seems like it is on the server side but I cannot tell. I can tell you that the results are there and properly formatted in the client but they seem to go into cyperspace when being submitted to the server.

Since we don't allow the public servers to dry except when we decommission them, I'm now comfortable enough to put the new Linux client on all of my machines. I'll probably do that sometime on Friday for the 10%+ speedup.

I'm about out of time for coding and testing this week. Weekends are my busier time. I can do some sporadic testing on similar changes to the Windows client as needed this weekend.


Gary

mdettweiler 2010-02-26 17:30

@Gary: Wow, that's awesome! :w00t: The cancellation thing had been the one big thing in there that I'd wanted to fix later, so that's great that you've got it all set! A code review of all the changes you made seems to check out, and I now have the client running on one of my cores (Windows) as a secondary check.

If further testing goes OK, do you think this version will be the one we can finally to release to the public (alongside a parallel release from Karsten)? We'd previously discussed timing that strategically before a rally; what's the plan on that? (Or do we want to wait until the web pages are back online before even speaking of a rally publicly?)

BTW, I've uploaded the latest versions of the various files to the nplb-gb1 links as last time.

henryzz 2010-02-26 18:04

[quote=mdettweiler;206746](Or do we want to wait until the web pages are back online before even speaking of a rally publicly?)[/quote]
BTW this is sort of public:wink:

mdettweiler 2010-02-26 18:24

[quote=henryzz;206747]BTW this is sort of public:wink:[/quote]
It is?!? I verified that it wasn't accessible to normal users or guests...oh, wait, I think I know now. From my knowledge of vBulletin configuration, I think when Xyzzy specified that mods wouldn't need a password to get in, there was no way for him to specify them to only be NPLB mods. :smile: So you can get in, as can other mods, but not "normal" users. (The password seems to have been "lost" and no longer works after the latest system upgrade.)

I'm not sure why I hadn't thought of that before...I just assumed that it was limited to NPLB mods, but I hadn't considered that that degree of specificity may actually not be possible with vBulletin. That's the whole reason why were were doing our coordinating in here, since PMs were getting to be a pain in the butt with everything going three ways.

henryzz 2010-02-26 18:56

[quote=mdettweiler;206748]It is?!? I verified that it wasn't accessible to normal users or guests...oh, wait, I think I know now. From my knowledge of vBulletin configuration, I think when Xyzzy specified that mods wouldn't need a password to get in, there was no way for him to specify them to only be NPLB mods. :smile: So you can get in, as can other mods, but not "normal" users. (The password seems to have been "lost" and no longer works after the latest system upgrade.)

I'm not sure why I hadn't thought of that before...I just assumed that it was limited to NPLB mods, but I hadn't considered that that degree of specificity may actually not be possible with vBulletin. That's the whole reason why were were doing our coordinating in here, since PMs were getting to be a pain in the butt with everything going three ways.[/quote]
actually i have been using the password [SPOILER]******[/SPOILER]to get in
i cant view without the password
except that it isnt quite ready yet why not use prpnet instead of putting lots of effort into this?

kar_bon 2010-02-26 19:34

[QUOTE=henryzz;206751]except that it isnt quite ready yet why not use prpnet instead of putting lots of effort into this?[/QUOTE]

PRPnet isn't ready either! there're many issues to solve (memory leaks; don't accept 50 results submitted at once!) and we can change LLRnet to NPLB's requirements needed (many years in use as server/client prime-testing-program; even eliminated first issues from old LLRnet; screen outputs for client or server customizable; newer versions of LLR also supported; best timings for k*2^n-1).

this effort only lasts for about 3 weeks! and almost ready to use for more contributors. i'm using this from almost when i started with the script and it's running fine!

mdettweiler 2010-02-26 19:45

[quote=henryzz;206751]actually i have been using the password [spoiler]******[/spoiler]to get in
i cant view without the password
except that it isnt quite ready yet why not use prpnet instead of putting lots of effort into this?[/quote]
Hey, I didn't know that--I thought the password was still 6g:Qieoj. Where'd you get the new one?
[quote=kar_bon;206755]PRPnet isn't ready either! there're many issues to solve (memory leaks; don't accept 50 results submitted at once!) and we can change LLRnet to NPLB's requirements needed (many years in use as server/client prime-testing-program; even eliminated first issues from old LLRnet; screen outputs for client or server customizable; newer versions of LLR also supported; best timings for k*2^n-1).

this effort only lasts for about 3 weeks! and almost ready to use for more contributors. i'm using this from almost when i started with the script and it's running fine![/quote]
Yes, PRPnet 2.4.7 is great right now for smaller efforts with non-tiny n-ranges, but it can't handle the kinds of load we need for some NPLB efforts, and 3.x, which is intended to fix that (among other things) is still a work in progress.

The idea is that this will supplement PRPnet by bringing LLRnet up to date and giving us other options. PRPnet will still be ideally suited to conjecture searches and the like due to its greater flexibility, while we can use this to speed up our NPLB public drives until PRPnet 3.x is ready.

henryzz 2010-02-26 20:15

[quote=mdettweiler;206758]Hey, I didn't know that--I thought the password was still 6g:Qieoj. Where'd you get the new one?[/quote]
AFAIK it was the first password we ever used or at least used at onepoint.
It was the only one i remembered so i tried it and suceeded.
IMHO too much sorting stuff out(like upgrading llrnet here) is done in private by pm at NPLB and sometimes CRUS. Often it is an interesting read. Also sometimes extra opinions help.
Please do a rally at somepoint it was great fun us all piling our machines onto one server or at least one drive if one server can't cope. I miss it. Would Free-Dc consider joining us for a weekend rally?
Another idea to throw up is would NPLB be willing to run a rally on a CRUS server(maybe base 6?) or would CRUS do it's own rally?

kar_bon 2010-02-26 20:28

[QUOTE=henryzz;206759]IMHO too much sorting stuff out(like upgrading llrnet here) is done in private by pm at NPLB and sometimes CRUS. Often it is an interesting read. Also sometimes extra opinions help.[/QUOTE]

this was done because we see what happens, when a program is not tested deeply (like prpnet):
it's in use but with many issues to live with before they are resolved.

after all is done and many tests run fine, then it's time to make it official and other users can suggest their opinions/needs to make it even better!

gd_barnes 2010-02-26 21:14

David,

We are alpha testing right now. Not beta testing. This is the way public software should be tested. That is alpha testing by a few very isolated and knowledgable users. Continuing to issue releases before proper alpha testing is done across all platforms is both time consuming and frustrating.

Within the next week after our final small bug is fixed, we will open it up to public beta testing.

Can you see the difference?

We were aware that some people would be able to access this thread but we've chosen to keep it semi-private so we are not attempting to coordinate 10s of users during alpha testing when there are numerous small bugs still present. It takes too long and is frustrating. We have just one remaining final small bug. After that, we'll open it up.

LLRnet will now be a tremendous piece of work and will give NPLB and other DC projects who search base 2 a tremendous lift.

We've spent nearly a year trying to get PRPnet up to par. It wasn't going to happen. It's just one thing after another and now I read that it has numerous memory leaks. We've been testing the new LLRnet for 2 weeks and I can guarantee you that it will handle an immense load with no problems. I have put 31 cores on n=~10K tests, which is the equivalent of 10,000+ cores running n>400K. It passed with flying colors.

The current plan is to do a rally not long after this is released and after the noprimeleftbehind pages are back up, the latter of which I estimate will be a week or so.


Thank you,
Gary

gd_barnes 2010-02-28 00:00

After much testing, it appears that the final bug related to pairs not being returned to the server when the server dries has been resolved. I think we have a keeper.

I'll be doing a little more testing on the Windows client tonight and doing verification that the Linux and Windows clients are properly synced up. Then it will be time for the big roll out.

Exciting times lie ahead! :smile:

gd_barnes 2010-02-28 09:02

1 Attachment(s)
I've made changes to the do.pl script and README.txt document for the Linux client as follows:

1. Change do.pl to allow for several different scenarios related to very small tests.

2. Change do.pl program comments and README.txt documentation from 1st to 3rd person and remove many "filler" words. I also updated the version to 0.7.

Attached are the updated versions.


Gary

gd_barnes 2010-03-01 10:50

large problem
 
Karsten,

I'm now running tests for Riesel and Proth bases 2 and 3 for a large k for n<=10K without sieving. This should cover all possible scenarios of k/n pairs that could happen.

There has been a problem that has existed, I believe, for the entire time that we've been testing but I had not realized it. The problem is caused by the fact that whenever there is a rejected pair, it writes out the statement: "The server refused your new result: (etc.)" in the lresults.txt file. It writes out a total of 4 lines.

Here is what happens: Whenever the script then goes to convert the lresults.txt to tosend.txt, these "rejected comment lines" in lresults cause a major problem. They cause MORE pairs to be rejected. Even if the code is otherwise good, this would be very bad in public use. If someone had a pair that was rejected because he waited too long to return it, it would cause subsequent good pairs to be rejected.

I didn't notice this problem before because whenever it wrote something to the lresults.txt saying that it had rejecting something, it was because of incorrect coding on special situations for pairs just prior to the rejected comments but now I realize that it is also rejecting pairs after that where the coding is good for them.

But here is the interesting part: The server is not actually rejecting the good or badly coded pairs, per se. It is just not accepting them but saying nothing about it. It's like the same issue where the pairs were being dropped at the very end of a dried server but the server never said they were rejected. They just end up going out into cyperspace somewhere.

To demonstrate this to yourself, test n=1 thru n=1000 for your k=100542585 without sieving but do the following:
1. Make sure you still have some "bad code" in place that doesn't account for all of the strange situations that come up for the teensy tests.
2. In the code, before the tosend.txt file is sent to the server (which automatically deletes it), add it (concatenate) to a tosend_hist.txt file like you would do for the lresults.txt to lresults_hist.txt file.
3. Set the cache to at least 25 because otherwise the communication time takes too long.

You should see in your tosend_hist.txt that it is trying to send residues that contain some of the verbiage of the rejected comments in lresults.txt.

This is very bad. There is already now quite a bit of code for special situations when converting lresults.txt to tosent.txt. This will require several more lines of special code. But it doesn't seem right. It's as if the program has to code around a problem that itself is creating.

I believe this is something that none of us ever thought of. We just assumed that the results would always be just that: results. But now we are coming to realize that the lresults.txt file can contain a ton of different things: Regular residues, OLD residues, primes, PRP lines, factor lines that show the factor but not the "no prime" verbiage, factor lines that do not show the factor and DO show the "no prime" verbiage, lines that show "we can only do a PRP test" (because the k is bigger than 2^n), lines that contain "is base 3-strong PRP", lines that contain "Frobenius PRP!", etc. (That's all that I can think of right now.)

I'm beginninng to wonder if we should just make LLRnet only applicable to tests where n>1000. That would also resolve the problem of k>2^n.

What's somewhat maddening about this is that I now have code in place (not yet posted here) that takes into account all of the above scenarios (except the rejected comments) because I encountered all of them in my testing of your k-value for n<10K and I believe you encountered most of it also. After getting all of the correct code in place, visually, I now have a clean test. But the problem where a pair would be rejected still exists. I just did not happen to encounter it when I accounted for all of the above situations.

The problem that I am having is that after taking into account all of the scenarios above, now having to code around the "rejected" comment lines in the lresults.txt; that's getting a little ridiculous. It's too much complication and forces the server to test a lot of things that will happen in the "real world" < 0.1% of the time.

My question is: Is there a way to make it NOT write the rejected message to the results. If that is possible, is that something that we really want to do? If it is not possible, can you think of an easy way to code around those lines without having to check for 3-4 more different sets of verbiage?


Gary

kar_bon 2010-03-01 12:04

i did the test with cllr-input like this:
[code]
43228319159:M:1:2:258
100542585 1
100542585 2
100542585 3
100542585 4
100542585 5
100542585 6
100542585 7
100542585 8
100542585 9
100542585 10
[/code]

up to n=1000.

i ran cllr on that as input and got this as 'lresults.txt':
[code]
100542585*2^1-1 = 201085169 is not prime. (trial divisions)
100542585*2^2-1 = 402170339 is prime! (trial divisions)
100542585*2^3-1 = 804340679 is not prime. (trial divisions)
100542585*2^4-1 = 1608681359 is not prime. (trial divisions)
100542585*2^5-1 = 3217362719 is not prime. (trial divisions)
100542585 > 2^6, so we can only do a PRP test for 100542585*2^6-1.
100542585*2^6-1 is not prime. RES64: 00000000654D5CE7. OLD64: 000000012FE816B2 Time : 14.823 ms.
100542585 > 2^7, so we can only do a PRP test for 100542585*2^7-1.
100542585*2^7-1 is base 3-Strong Fermat PRP! Time : 15.818 ms.
100542585*2^7-1 is Frobenius PRP! (P = 5, Q = 2, D = 17) Time : 17.631 ms.
100542585 > 2^8, so we can only do a PRP test for 100542585*2^8-1.
100542585*2^8-1 is base 3-Strong Fermat PRP! Time : 4.035 ms.
100542585*2^8-1 is Frobenius PRP! (P = 5, Q = 2, D = 17) Time : 34.765 ms.
100542585 > 2^9, so we can only do a PRP test for 100542585*2^9-1.
100542585*2^9-1 is not prime. RES64: 0000000082AE6185. OLD64: 00000001880B248C Time : 13.268 ms.
100542585 > 2^10, so we can only do a PRP test for 100542585*2^10-1.
100542585*2^10-1 is not prime. RES64: 0000000DAC3640D5. OLD64: 000000110C00DE7D Time : 18.761 ms.
[/code]

then i used this lresults.txt as input in my conversion script 'do_tosend.awk' and got this as 'tosend.txt':
[code]
5000000000000:M:1:2:258 100542585 1 -2 trial_factored
5000000000000:M:1:2:258 100542585 2 0 0
5000000000000:M:1:2:258 100542585 3 -2 trial_factored
5000000000000:M:1:2:258 100542585 4 -2 trial_factored
5000000000000:M:1:2:258 100542585 5 -2 trial_factored
5000000000000:M:1:2:258 100542585 6 -2 00000000654D5CE7
5000000000000:M:1:2:258 100542585 7 0 Frobenius_PRP
5000000000000:M:1:2:258 100542585 8 0 Frobenius_PRP
5000000000000:M:1:2:258 100542585 9 -2 0000000082AE6185
5000000000000:M:1:2:258 100542585 10 -2 0000000DAC3640D5
[/code]

i've also tested it with my local server/client-installation and the server 'lresults.txt' is:
[code]
user=kar_bon
[03/01/10 12:50:59]
100542585*2^1-1 is not prime. Res64: trial_factored Time : 2.0 sec.
user=kar_bon
[03/01/10 12:50:59]
100542585*2^2-1 is prime! Time : 2.0 sec.
user=kar_bon
[03/01/10 12:50:59]
100542585*2^3-1 is not prime. Res64: trial_factored Time : 2.0 sec.
user=kar_bon
[03/01/10 12:50:59]
100542585*2^4-1 is not prime. Res64: trial_factored Time : 2.0 sec.
user=kar_bon
[03/01/10 12:50:59]
100542585*2^5-1 is not prime. Res64: trial_factored Time : 2.0 sec.
user=kar_bon
[03/01/10 12:51:00]
100542585*2^6-1 is not prime. Res64: 00000000654D5CE7 Time : 1.0 sec.
user=kar_bon
[03/01/10 12:51:00]
100542585*2^7-1 is prime! Time : 1.0 sec.
user=kar_bon
[03/01/10 12:51:00]
100542585*2^8-1 is prime! Time : 1.0 sec.
user=kar_bon
[03/01/10 12:51:00]
100542585*2^9-1 is not prime. Res64: 0000000082AE6185 Time : 1.0 sec.
user=kar_bon
[03/01/10 12:51:00]
100542585*2^10-1 is not prime. Res64: 0000000DAC3640D5 Time : 1.0 sec.

(...)
user=kar_bon
[03/01/10 12:52:26]
100542585*2^340-1 is not prime. Res64: small_factor Time : 1.0 sec.
[/code]

although the results 'Res64' for trial factored/small factor (small or not sieved n-values) look a bit strange (llrserver only differs prime (Res64=0) or not prime (Res64 set to given value)), all 1000 n-values are there and none in the rejected.txt!

the 'primes.txt' contains all 44 primes in that range.

so, please check my conversion awk-script i PM'ed yesterday in the V07-zip!

Karsten

PS: as you can see i forgot to change back the date-format on my local server!

mdettweiler 2010-03-01 22:13

@Gary regarding the rejected pairs issue: how about something like this?
[code] if($JustRead =~ "\*") {
if($JustRead =~ "not prime") {
if($JustRead =~ "LLR Res64") {
($foo, $res64time) = split(/Res64: /, $JustRead);
($res64, $time) = split(/ Time /, $res64time);
}
else {
($foo, $res64time) = split(/RES64: /, $JustRead);
($res64, $time) = split(/. OLD64/, $res64time);
}
print TOSEND "$header $k $n -2 $res64\n";
}
elsif($JustRead =~ "prime!" or $JustRead =~ "probable") {
print TOSEND "$header $k $n 0 0\n";
}
elsif($JustRead =~ "is base") {
# Skip PRP line by doing nothing.
# Next line will prove primality of same pair.
}
elsif($JustRead =~ "factor") {
print TOSEND "$header $k $n -2 factored\n";
}
else {
print TOSEND "$header $k $n -2 error\n";
}
}[/code]
Essentially what I did was wrap the whole thing in an if() statement that checks whether the line contains a * character. I don't believe there's anything that would be put in lresults.txt containing that that isn't an actual number (that is, it's matching on the * in k*b^n+-c). That should screen out the "rejected comment lines" as well as any other unexpected such lines.

gd_barnes 2010-03-02 00:49

[quote=mdettweiler;207017]@Gary regarding the rejected pairs issue: how about something like this?
[code] if($JustRead =~ "\*") {
if($JustRead =~ "not prime") {
if($JustRead =~ "LLR Res64") {
($foo, $res64time) = split(/Res64: /, $JustRead);
($res64, $time) = split(/ Time /, $res64time);
}
else {
($foo, $res64time) = split(/RES64: /, $JustRead);
($res64, $time) = split(/. OLD64/, $res64time);
}
print TOSEND "$header $k $n -2 $res64\n";
}
elsif($JustRead =~ "prime!" or $JustRead =~ "probable") {
print TOSEND "$header $k $n 0 0\n";
}
elsif($JustRead =~ "is base") {
# Skip PRP line by doing nothing.
# Next line will prove primality of same pair.
}
elsif($JustRead =~ "factor") {
print TOSEND "$header $k $n -2 factored\n";
}
else {
print TOSEND "$header $k $n -2 error\n";
}
}[/code]Essentially what I did was wrap the whole thing in an if() statement that checks whether the line contains a * character. I don't believe there's anything that would be put in lresults.txt containing that that isn't an actual number (that is, it's matching on the * in k*b^n+-c). That should screen out the "rejected comment lines" as well as any other unexpected such lines.[/quote]

I went off the deep end a little. I was pretty concerned about finding such a deal breaker this late in the game and continuing to add complication to the results condition checking.

Max, my objective is to always execute the most commonly occurring condition first within the large if statement to optimize program speed. Therefore, I came up with a better way. Just remove the "error" check at the very end, which effectively causes it to write nothing. If it doesn't hit one of the prior conditions, it simply does not create a TOSEND record. I'd hate to check for the "*" condition (actually I was initially thinking "^" after I got up today but either way would work), because it is well < 0.1% of all lines would be rejected lines in the results.

In the new code that I will post later tonight, I end up checking for "64: " instead (vs. LLRes64) so that both Riesel and Proth results are taken into account. That will occur 99.9%+ of the time. If that is true, then I check for "OLD64" to determine how to pull the residue out of the line. If it is false, then I check for other various conditions that have come out as a result of doing teeny Riesel and Proth tests on large k's.

In doing this, I realized I had a bug in my $numresults++ code. That is the counter for the number of results if the user decides to cancel pairs. I had it above all of the if statements. But because there are some situations where a record will not be written to the TOSEND file, it must be within the if's, right at the time that a record is written to TOSEND.

Karsten, I've now done extensive changes to that section and we'll definitely want to sync up yours with mine. Later tonight, I'll post mine. Our code doesn't have to look the same. It just has to be logically the same. Shortly after posting mine, I'll code review yours.

My testing of Risel and Sierp bases 2 and 3 for all n=1 to 10K with no sieving on a large k that is much larger than b^n has helped resolve all possible final scenario issues. Thanks Karsten, for putting us on the right path with that.

It is now my hope that LLRnet will be the fastest that it can be and will work for any sized Riesel or Proth tests for any base.

Perhaps in the future, it could test k*b^n-c and k*b^n+c where c is > 1. I think sr(x)sieve would have difficulty sieving those where k>1 and c>1 so we'll do that some other year. :-)


Gary

gd_barnes 2010-03-02 01:46

Reference the date issue that we have been talking about in PMs.

As previously discussed, Karsten had to make some changes to the Windows server so that the date shows up correctly in Windows results files (either on the server or client; can't remember which). We do know that the date on all Linux servers and clients has been showing up correctly for a long time. Since NPLB has always used Linux servers, this has not been an issue.

Karsten made the point that the two formatting methods should probably be synced up; I believe because a person with a Windows client running a Linux server could get incorrect date formats on his client. Or perhaps in case a Windows user ends up getting ahold of the Linux script and using part of its files for setting up Windows servers and clients on his machines. I'm not extremely clear on that.

I'm sorry if I'm missing the reason here because it's so difficult to follow everything in PMs.

I will re-review the PMs and make the change to the Linux server to be like the Windows server. Presumably this will have no effect on the Linux date formats in the results of either a Linux server of client. Of course I will test it to be sure. Hopefully this will fix any existing issue if a Windows client runs a Linux server; assuming that has been the problem. If I have time, I will also run a Windows test against the Linux server.


Gary

mdettweiler 2010-03-02 03:29

[quote=gd_barnes;207036]Reference the date issue that we have been talking about in PMs.

As previously discussed, Karsten had to make some changes to the Windows server so that the date shows up correctly in Windows results files (either on the server or client; can't remember which). We do know that the date on all Linux servers and clients has been showing up correctly for a long time. Since NPLB has always used Linux servers, this has not been an issue.

Karsten made the point that the two formatting methods should probably be synced up; I believe because a person with a Windows client running a Linux server could get incorrect date formats on his client. Or perhaps in case a Windows user ends up getting ahold of the Linux script and using part of its files for setting up Windows servers and clients on his machines. I'm not extremely clear on that.

I'm sorry if I'm missing the reason here because it's so difficult to follow everything in PMs.

I will re-review the PMs and make the change to the Linux server to be like the Windows server. Presumably this will have no effect on the Linux date formats in the results of either a Linux server of client. Of course I will test it to be sure. Hopefully this will fix any existing issue if a Windows client runs a Linux server; assuming that has been the problem. If I have time, I will also run a Windows test against the Linux server.


Gary[/quote]
NO! Don't do that. :smile: If anything, we should change the Windows server to be like the Linux one. The database is already set up to parse and import stuff in the Linux format.

BTW, clients and servers are completely separate as far as date formats go. They don't have to be synced up; the only thing that really needs to be consistent is on the server end, so that when we import it into the DB it works right. On the client end, it's entirely for human reading since there'd be no reason to have that imported into the DB.

gd_barnes 2010-03-02 04:20

[quote=mdettweiler;207043]NO! Don't do that. :smile: If anything, we should change the Windows server to be like the Linux one. The database is already set up to parse and import stuff in the Linux format.

BTW, clients and servers are completely separate as far as date formats go. They don't have to be synced up; the only thing that really needs to be consistent is on the server end, so that when we import it into the DB it works right. On the client end, it's entirely for human reading since there'd be no reason to have that imported into the DB.[/quote]

I'm completely lost. You and Karsten fight it out about the dates. I'm tired of hearing about the issue. I thought I understood Karsten with his latest PM about it. That's why I asked him specifically about it.

You have to trust me on this. Karsten knows what he is talking about. He does a lot of behind-the-scenese conversion of results.

Perhaps you are misunderstanding. We will NOT be changing the Linux date formats as they are written to files anywhere. We will just be changing the CODE that formats them. In other words:

1. We will change Linux CODE to be like CORRECTED Windows CODE.
2. We will NOT change Linux results files output. (In other words, there are at least 2 different ways to display Linux dates in the same format.)
3. Karsten will change the Windows CODE and RESULTS so that that the Windows results will look like the current Linux results.

Karsten, am I right on this?

Max, are you clear yet?

I wish this issue had started in this thread instead of PMs. I'm sure we would not still be discussing it.

(caps for emphasis)


Gary

gd_barnes 2010-03-02 04:23

1 Attachment(s)
My script is working like a charm now for base 2. No problem with strange results output from teeny tests for large k's and no problem when there are rejected results. I even tried canceling pairs when there were rejected results lines within the most recent batch. That was a good test of the code. It worked perfectly!

Now, we have another issue for base 3 (and for other bases) and it's really unrelated to LLRnet: Rounding errors. Rounding errors apparently go into an infinite loop on LLR. Here is an example of my recent test:

[quote]
100542584*3^323-1 is not prime. RES64: B516DDAC3C2DA591. OLD64: C681FF88490C1762 Time : 1.233 ms.
100542584*3^324-1 is not prime. RES64: B9EAF440B2630D00. OLD64: A89CF68775EDE106 Time : 1.250 ms.
100542584*3^325-1 is not prime. RES64: 57B2D8B312BADF70. OLD64: E84124B970CCFA7F Time : 2.213 ms.
Submitted to server at [2010-03-01 22:06:55]
100542584*3^326-1 is not prime. RES64: FDCBF11CBD949DB8. OLD64: 9CDDA336E292EDB7 Time : 5.832 ms.
100542584*3^327-1 is not prime. RES64: 8828751C67D74542. OLD64: 98795F553785CFC3 Time : 1.112 ms.
100542584*3^328-1 is not prime. RES64: AF0F967B018B1A05. OLD64: 0D2EC37104A14E0C Time : 23.712 ms.
100542584*3^329-1 is not prime. RES64: 0BC6192E5B9C413B. OLD64: 612B383CFC4DEEE0 Time : 1.098 ms.
100542584*3^330-1 is not prime. RES64: 858763DE10557C9C. OLD64: 4A20F1AFED6BF763 Time : 1.736 ms.
100542584*3^331-1 is not prime. RES64: F62BB139C363FC20. OLD64: 78D33CCDE4CD36B6 Time : 1.118 ms.
100542584*3^332-1 is not prime. RES64: 16D3F5810CDB888A. OLD64: CA5CD746C65A27AD Time : 1.086 ms.
100542584*3^333-1 is not prime. RES64: D161A7B4A11D97EA. OLD64: BCF6694353041CD4 Time : 1.102 ms.
100542584*3^334-1 is not prime. RES64: 8AC3F5B355AAD00E. OLD64: 7AC0378A50026F70 Time : 1.708 ms.
100542584*3^335-1 is not prime. RES64: 6736C9D7E97D4CCF. OLD64: C50160D8A97DE443 Time : 1.125 ms.
100542584*3^336-1 is not prime. RES64: B65E42722A31566F. OLD64: D131D14945A5FCD3 Time : 1.146 ms.
100542584*3^337-1 is not prime. RES64: CDFAB4C26501BE9F. OLD64: 7E7A59F7D971150C Time : 32.833 ms.
100542584*3^338-1 is not prime. RES64: 9CCAA4CC820D465F. OLD64: F52F47EE85C998E3 Time : 1.927 ms.
100542584*3^339-1 is not prime. RES64: 068CBF65D4E28A02. OLD64: 13A63E317EA79E03 Time : 1.445 ms.
100542584*3^340-1 is not prime. RES64: A2DAE029B63CEEF2. OLD64: FDDAC64E1F66C0DC Time : 2.059 ms.
100542584*3^341-1 is not prime. RES64: 615F2CD76A644608. OLD64: 63FBF7F9353CAE2E Time : 1.331 ms.
100542584*3^342-1 is not prime. RES64: 5996C0C773651BB4. OLD64: 8BFAEB081E8E7BAB Time : 1.789 ms.
100542584*3^343-1 is not prime. RES64: 4813E4310051A928. OLD64: D83BAC9300F4FB75 Time : 1.383 ms.
100542584*3^344-1 is not prime. RES64: 49FF3A27FFC4CA16. OLD64: 56E99CB7E6A6CB51 Time : 1.402 ms.
100542584*3^345-1 is not prime. RES64: ABC93293D1BC0131. OLD64: 6E1F627B2B3D4AC2 Time : 1.322 ms.
Iter: 27/575, ERROR: ROUND OFF (0.5) > 0.40
Continuing from last save file.
Disregard last error. Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Disregard last error. Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Disregard last error. Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Disregard last error. Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
[/quote]

The last 3 lines appear to continue forever. Although I didn't let it continue more than a minute or 2, it doesn't really matter if it's an infinite loop or just a very long one. The lresults.txt file was 4.8 MB by the time that I stopped it, which would be unacceptable to many users.

Interestingly, this was the test where I returned untested pairs to the server. 20 of the 25 pairs in my batch had been tested. Despite 1000s of lines of extra output from the loop, it correctly returned the 20 good results and the 5 untested pairs beginning with n=346.

Thoughts anyone?

My feeling: Forget about other bases with this release. There's more involved than we should care to get into at this point. As far as I know, base 2 does not have rounding errors. Someone correct me if I'm wrong there. If base 2 does have rounding errors, we have to ask: Are they only on small tests or where k is > 2^n? If so, we should probably ignore them. But if anyone can come up with a quick solution for coding around such a loop, then that would be OK. I just don't want to get into any special code for base 3 and other bases.

Other than the rounding error issue, it looks like base 3 was testing well. I did Riesel base 3 for the above k for n=1 thru 345. Unfortunately I believe that rounding errors occur regardless of n-size on other bases. They are quite common in PFGW; although Mark has coded an automatic -a1 test that retests them. I suppose we could see if Jean and co. could do the same for LLR. But pursuing that with this release of LLRnet, IMHO, is a waste of time for an excellent tool that is badly needed.

Edit: Karsten, so you have something to refer to, attached is the latest do.pl script. It should be 100% bugfree for base 2 for Riesel and Proth tests of all sizes. I now anticipate that this will be the final version of it.

I'll be looking at the incorrect statements in README.txt and the date issue next. I'll also do a code review of the Windows script to see if it close to synced up with the Linux do.pl script.


Gary

mdettweiler 2010-03-02 05:16

[quote=gd_barnes;207048]I'm completely lost. You and Karsten fight it out about the dates. I'm tired of hearing about the issue. I thought I understood Karsten with his latest PM about it. That's why I asked him specifically about it.

You have to trust me on this. Karsten knows what he is talking about. He does a lot of behind-the-scenese conversion of results.

Perhaps you are misunderstanding. We will NOT be changing the Linux date formats as they are written to files anywhere. We will just be changing the CODE that formats them. In other words:

1. We will change Linux CODE to be like CORRECTED Windows CODE.
2. We will NOT change Linux results files output. (In other words, there are at least 2 different ways to display Linux dates in the same format.)
3. Karsten will change the Windows CODE and RESULTS so that that the Windows results will look like the current Linux results.

Karsten, am I right on this?

Max, are you clear yet?

I wish this issue had started in this thread instead of PMs. I'm sure we would not still be discussing it.

(caps for emphasis)


Gary[/quote]
Okay, yes, that makes sense now. :smile:

mdettweiler 2010-03-02 05:21

[quote=gd_barnes;207049]My feeling: Forget about other bases with this release. There's more involved than we should care to get into at this point. As far as I know, base 2 does not have rounding errors. Someone correct me if I'm wrong there. If base 2 does have rounding errors, we have to ask: Are they only on small tests or where k is > 2^n? If so, we should probably ignore them. But if anyone can come up with a quick solution for coding around such a loop, then that would be OK. I just don't want to get into any special code for base 3 and other bases.

Other than the rounding error issue, it looks like base 3 was testing well. I did Riesel base 3 for the above k for n=1 thru 345. Unfortunately I believe that rounding errors occur regardless of n-size on other bases. They are quite common in PFGW; although Mark has coded an automatic -a1 test that retests them. I suppose we could see if Jean and co. could do the same for LLR. But pursuing that with this release of LLRnet, IMHO, is a waste of time for an excellent tool that is badly needed.[/quote]
Actually, LLR has already had an automatic next-FFT retest for many versions (since in many ways it's more closely related to Prime95, which has that feature as well, than is PFGW). In fact it's a bit more advanced than PFGW's rendition of that feature since it just re-does the affected portion of the test, rather than starting it over, potentially saving a lot of time on big tests.

What this looks like here is a bug in LLR 3.8.0. My guess is that it's backing up and retrying, but forgetting to raise the FFT in the process--so it keeps running into trouble over and over. Hence an infinite loop. I'm pretty sure that earlier versions of LLR have handled this correctly, so it would definitely be worth reporting to Jean even though we're not going to hold up the LLRnet client release on it.

BTW, base 2 can have rounding errors once in a while but they're a lot less common than other bases; these kind of things are fixed by trial-and-error tuning AFAIK, so the fact that the majority of the base 2 code in gwnum has been around a lot longer than the non-base-2 code makes for a lot less roundoff errors in base 2. Note that since this is a gwnum issue rather than an LLR one, LLR should theoretically error on exactly the same tests in exactly the same ways that PFGW does (ditto for Prime95).

[B]On another subject[/B]: the latest do.pl is working well and I'll upload the latest version to the no-IP pages shortly. The only thing I'd change is to add a test condition in convertResults() for the text "probably prime"; as it is it can, ironically enough, catch a positive result from the little-used Frobenius PRP but not a "regular" PRP test. :smile:

gd_barnes 2010-03-02 05:37

[quote=mdettweiler;207054][B]On another subject[/B]: the latest do.pl is working well and I'll upload the latest version to the no-IP pages shortly. The only thing I'd change is to add a test condition in convertResults() for the text "probably prime"; as it is it can, ironically enough, catch a positive result from the little-used Frobenius PRP but not a "regular" PRP test. :smile:[/quote]

I removed the phrase "probable" from the results conversion and changed it to "Frobenius PRP" because it appears the term "probable" or "probable prime" is no longer used with LLR version 3.8. Also, keep in mind that we only want to consider the result of the FINAL test of a pair. If we get "probable prime" followed by "Frobenius PRP" or simply "is prime!", then we have to ignore the first test.

That was the big challenge: Having it skip results lines when the same pair would be more thouroughly tested with a subsequent line.

Can you demonstrate any cases where the term "probable" is ever output? And if so, is there another line for the same pair where it shows "is PRP" or "is Frobenius PRP" or "is prime!"?

I did a whole lot of teeny base 2 and base 3 prime tests and didn't encounter one. I also did medium and large-sized base 2 tests. I haven't done medium and larger base 3 tests after encountering the roundoff loop.

If you have time, you might run a bunch of small and medium base 2 and base 3 tests yourself and attempt to break the code.

You know where the applicable forums/threads are. Can you report the issue to Jean and co.?


Gary

mdettweiler 2010-03-02 05:49

[quote=gd_barnes;207056]I removed the phrase "probable" from the results conversion and changed it to "Frobenius PRP" because it appears the term "probable" or "probable prime" is no longer used with LLR version 3.8. Also, keep in mind that we only want to consider the result of the FINAL test of a pair. If we get "probable prime" followed by "Frobenius PRP" or simply "is prime!", then we have to ignore the first test.

That was the big challenge: Having it skip results lines when the same pair would be more thouroughly tested with a subsequent line.

Can you demonstrate any cases where the term "probable" is ever output? And if so, is there another line for the same pair where it shows "is PRP" or "is Frobenius PRP" or "is prime!"?

I did a whole lot of teeny base 2 and base 3 prime tests and didn't encounter one. I also did medium and large-sized base 2 tests. I haven't done medium and larger base 3 tests after encountering the roundoff loop.

If you have time, you might run a bunch of small and medium base 2 and base 3 tests yourself and attempt to break the code.[/quote]
From what I'm seeing here and from reading LLR's readme, it sounds like it does the following:

Base 2:
-LLR/Proth test if possible, producing a final result.
-If not (due to k>b^n or whatever), do a Fermat SPRP test (i.e., the "regular" PRP test we're used to), then a Frobenius PRP if that comes back positive. Presumably Frobenius is a stronger PRP test.

Other bases:
-N-1/N+1 test (final result)

The LLR/Proth or N-1/N+1 tests, respectively, can be bypassed and a PRP forced with the ForcePRP=1 option, which should make it do Fermat then Frobenius if positive.

At any rate, you're right, I don't believe 3.8 uses "probable prime" lingo any more. Now it just says "base 3-Strong Fermat PRP!" or "Frobenius PRP!" instead. Essentially "base 3-Strong Fermat PRP!" is the equivalent of 3.7.1c's "is probably prime" (since it just did the Fermat PRP test when appropriate).

The only reason why we'd possibly want "is probably prime" to be supported is in case somebody wanted to use 3.7.1c with the script. Not likely, I'll admit...probably not worth the bother.

[quote]You know where the applicable forums/threads are. Can yuo report the issue to Jean and co.?[/quote]
Okay, I'll poke around with it a bit more and then report my findings.

gd_barnes 2010-03-02 06:08

[quote=mdettweiler;207057]From what I'm seeing here and from reading LLR's readme, it sounds like it does the following:

Base 2:
-LLR/Proth test if possible, producing a final result.
-If not (due to k>b^n or whatever), do a Fermat SPRP test (i.e., the "regular" PRP test we're used to), then a Frobenius PRP if that comes back positive. Presumably Frobenius is a stronger PRP test.

Other bases:
-N-1/N+1 test (final result)

The LLR/Proth or N-1/N+1 tests, respectively, can be bypassed and a PRP forced with the ForcePRP=1 option, which should make it do Fermat then Frobenius if positive.

At any rate, you're right, I don't believe 3.8 uses "probable prime" lingo any more. Now it just says "base 3-Strong Fermat PRP!" or "Frobenius PRP!" instead. Essentially "base 3-Strong Fermat PRP!" is the equivalent of 3.7.1c's "is probably prime" (since it just did the Fermat PRP test when appropriate).

The only reason why we'd possibly want "is probably prime" to be supported is in case somebody wanted to use 3.7.1c with the script. Not likely, I'll admit...probably not worth the bother.


Okay, I'll poke around with it a bit more and then report my findings.[/quote]

I suppose that wouldn't hurt to add the term "probably prime". But is that really the term? I thought it was "probable prime". If you can confirm that for sure, I could put it in without testing it because I don't want to mess with testing 3.7.1.

But...on the other hand, this script is unlikely to work well at all with 3.7.1 because the tests done on small tests, PRPs, and primes of different bases are so much different. So using that argument, it isn't particularly helpful to add the probably prime lingo in there because the script would not work in many other situations.

They really did a lot of modifications for 3.8 on the smaller and PRP tests. I have to say they are excellent. There was just a big learning curve on them when coding the script.

I forgot to address one of the main things you brought up. You said that checking for "Frobenius PRP!" wouldn't hit other kinds of PRPs. But we can't just check for the more simple "PRP!" because that hits "Fermat PRP!" and as you stated, after a Fermat PRP, it does another better test for the "Frobenius PRP". We would end up causing the pair to be submitted twice causing a rejected pair and possibly an incorrect prime being reported if the Frobenius test comes back composite after the Fermat test came back as PRP.

I'm now testing medium and larger base 3 tests. All kinds of interesting output being put out by LLR on the medium sized base 3 primes. A lot of lines that show "may be prime," followed finally by "is prime!". Fortunately only the "is prime!" line shows up in the results. But even if it didn't, "may be prime," would not hit any of the results conditions and the line would be ignored as it should.

Whew, this is a lot more involved on testing than I had originally imagined. We can blame Karsten. Ha-ha. :smile:


Gary

gd_barnes 2010-03-02 06:19

1 Attachment(s)
Max and Karsten,

I thought it might be helpful if I showed the actual lresults_hist.txt files from a couple of my tests recently. They are attached. These particular tests really show how many different scenarios that we are dealing with and that I have coded around.

I think quite a few of the scenarios came up in the PMs that Karsten sent but I'm not sure if all of them did.


Gary

mdettweiler 2010-03-02 08:02

[quote=gd_barnes;207058]I suppose that wouldn't hurt to add the term "probably prime". But is that really the term? I thought it was "probable prime". If you can confirm that for sure, I could put it in without testing it because I don't want to mess with testing 3.7.1.[/quote]
Hmm, you're right:
[code]$ ./cllr371c.exe -d -q"4436*28^6242-1"
LLR tests only k*2^n+/-1 numbers, so, we will do a PRP test of 4436*28^6242-1
Starting probable prime test of 4436*28^6242-1
Using generic reduction FFT length 3072
4436*28^6242-1 is a probable prime. Time : 4.946 sec.
Please credit George Woltman's PRP for this result![/code]
Note that the "Please credit..." line is printed to lresults.txt as well; it would of course be skipped over by convertResults().

[quote]But...on the other hand, this script is unlikely to work well at all with 3.7.1 because the tests done on small tests, PRPs, and primes of different bases are so much different. So using that argument, it isn't particularly helpful to add the probably prime lingo in there because the script would not work in many other situations.[/quote]
Yeah, I'd suggest just putting in the probable prime lingo check as the last thing it checks--why not. :smile: Beyond that, though, if users want to use 3.7.1c or other pre-3.8 versions, they do so at their own risk.

[quote]They really did a lot of modifications for 3.8 on the smaller and PRP tests. I have to say they are excellent. There was just a big learning curve on them when coding the script.

I forgot to address one of the main things you brought up. You said that checking for "Frobenius PRP!" wouldn't hit other kinds of PRPs. But we can't just check for the more simple "PRP!" because that hits "Fermat PRP!" and as you stated, after a Fermat PRP, it does another better test for the "Frobenius PRP". We would end up causing the pair to be submitted twice causing a rejected pair and possibly an incorrect prime being reported if the Frobenius test comes back composite after the Fermat test came back as PRP.

I'm now testing medium and larger base 3 tests. All kinds of interesting output being put out by LLR on the medium sized base 3 primes. A lot of lines that show "may be prime," followed finally by "is prime!". Fortunately only the "is prime!" line shows up in the results. But even if it didn't, "may be prime," would not hit any of the results conditions and the line would be ignored as it should.[/quote]
Yeah, the "may be prime" messages are a result of how LLR 3.8 has to regularly re-do its N-1/N+1 tests at least once, as analyzed in great detail in the LLR 3.8 thread I started at CRUS. I'd forgotten that it put those in the lresults file too, though.

Speaking of which, I wonder how PRPnet handles those extra "may be prime" messages (not to mention all the Frobenius stuff)--I should check that when I get the chance. It obviously has some coding to avoid extraneous lines in results files since in my experience it hasn't had a problem cutting through various other dross (such as roundoff errors from PFGW or "non-base-2 number, doing PRP test" messages from LLR), so it will be interesting to see how it handles these.

mdettweiler 2010-03-02 08:17

And I thought we had -c working...
 
Just now I tried running do.pl -c on a queue of 2 completed and 3 incomplete workunits. Here's what happened:
-The 2 completed WUs were submitted correctly.
-The first two incomplete WUs were canceled correctly.
-The last incomplete WU was abandoned and the server still lists it as in progress.

Here's the console output I got from the script:
[code][2010-03-02 03:08:10]
Cancelling : 195/142804 (600000000000:M:1:2:258)
[2010-03-02 03:08:14]
Cancelling : 195/142825 (600000000000:M:1:2:258)
[2010-03-02 03:08:15]
Cancelling : 195/142826 (600000000000:M:1:2:258)[/code]
So, all three incomplete pairs apparently made it at least as far as the LLRnet client. The strange thing is why the last one didn't actually get canceled on the server. Any idea why this happened?

FYI, this was with the latest do.pl and llrnet.lua files.

gd_barnes 2010-03-02 08:42

[quote=mdettweiler;207070]Just now I tried running do.pl -c on a queue of 2 completed and 3 incomplete workunits. Here's what happened:
-The 2 completed WUs were submitted correctly.
-The first two incomplete WUs were canceled correctly.
-The last incomplete WU was abandoned and the server still lists it as in progress.

Here's the console output I got from the script:
[code][2010-03-02 03:08:10]
Cancelling : 195/142804 (600000000000:M:1:2:258)
[2010-03-02 03:08:14]
Cancelling : 195/142825 (600000000000:M:1:2:258)
[2010-03-02 03:08:15]
Cancelling : 195/142826 (600000000000:M:1:2:258)[/code]So, all three incomplete pairs apparently made it at least as far as the LLRnet client. The strange thing is why the last one didn't actually get canceled on the server. Any idea why this happened?

FYI, this was with the latest do.pl and llrnet.lua files.[/quote]


Hum. Very good catch. I had not closely inspected the joblist.txt when canceling pairs previously. My mistake there. I'm looking into it now.

This looks like the same type of problem that we were getting before where the final results of a dried server would just go out into cyberspace with nothing to show what happened to them. But I'll have to do some file prints to verify why it's happening.

Do you REALLY want to add "probable prime"? Is this script going to work in all circumstances for 3.7.1 and before? Keep in mind that we'll automatically put LLR 3.8 in the client that we send people or that we post in our threads.

Tell you what: If you can run some 3.7.1 tests for all n=1 to 1000 for a large k (perhaps the one that I did) and everything works except for the "probable prime" tests, I'll add it. The run takes around 3-5 mins. If there are other problems, then I think it's just added code for an LLR version that people should not be using for this release. The bottom line is that we should be able to tell people that it works for only 3.8 or that it works for 3.8 and 3.7.1 ALL of the time; not all of the time for one and some of the time for the other.


Gary

kar_bon 2010-03-02 09:47

to the conversion-scripts:

yesterday i've maild Jean if he can tell us which output LLR3.8 handles and perhaps an example for these all.

i got a look at the llr-source and found some other things:

please check this testfile with the conversion:
[code]
43228319159:P:1:3:257
345074 11
[/code]

this is 345074*3^11+1 from the CRUS-thread about PRP which are composite.

cllr's output:
[code]
345074 > 3^11, so, only a Strong PRP test is done for 345074*3^11+1.
345074*3^11+1 is base 3-Strong Fermat PRP! Time : 4.315 ms.
345074*3^11+1 is strong-Fermat PSP, but composite!! (P = 5, Q = 2), Lucas RES64: 00000003F2AC7784 Time : 5.752 ms.
[/code]

so my script won't handle this correct!

the llr-source contains this:
[code]
"%s is Lucas PSP (P = %d, Q = %d), but composite!!. Frobenius RES64: %s",
[/code]

but i've found not an example for this to test.


to the cancel-option:
perhaps i can test this with the WIN-version today.

question@Max: do the server dried or were these WU's only canceled?

try this first, when server dried OR stopped:
- stop server (if not)
- notice the contents of knpairs.txt and joblist.txt
- call "llrserver -s"
- notice the contents of knpairs.txt and joblist.txt again
- call "llrserver -s" 2nd time!
- notice the contents of knpairs.txt and joblist.txt again

post the result here.

what i do:
for the box not 24/7 online and testing with that script, the do.bat(cllr) will test all workuints and quit the script -> option set not connect to internet again ("op_connect=FALSE" in WIN-script, i think this will not work for UNIX!?).
i put the folder on a stick for an online box. i call then "llrnet.exe" (without any option and not the do-script): llrnet will send the tosend.txt and receive new workunits and ends immediatly!

so try this on Unix:
- box not online
- run script doing all work -> tosend exists
- run "llrnet" (not the script!) while online
-> are there new workunits? are the old reults sent properly? ends llrnet properly?

gd_barnes 2010-03-02 09:56

Karsten,

I have confirmed it with a detailed look and test: The do.pl Linux script handles cancellation of the pairs correctly with the version that I posted in post #156 and that Max tested.

What we have here is the same issue that we had before where the server was dropping the final results coming in when it dried out. You fixed that with a correction of one of the .lua files. (I can't remember which one.)

The problem with it not sending a "CANCELLED" status to the joblist.txt on the final pair only when running do.pl -c has to be in one of the .lua files somewhere.

Can you look into that please? Also, can you see if you have the same problem with your Windows script?

OK guys, Karsten's test proves something to me now. Scope creep is over!! We are so far outside of the original scope of this project that it is crazy.

For the first release; base 2 only please. Other bases are much too convoluted right now.

I don't want to wait weeks more while we keep testing weird situations on other bases. This is getting kind of ridiculous.

Let's get this cancellation issue fixed, get the scripts synced up, get all server and client files synced up, run a few final tests, and be done with this.

Otherwise we'll be testing this thing for weeks trying to get every single situation on all bases to work. LLR just has too much varied output on too many bases to attempt to account for everything on the 1st release. It's important for the project and for individuals at this point to have just a good base 2 server that is easy to use.

Please!


Thank you,
Gary


P.S. I'm really beat tonight otherwise I'd keep going. I'm going to bed now.

kar_bon 2010-03-02 10:48

ok, i try to test this with the Win-script today.

to conversion:
the example with
[code]
345074 > 3^11, so, only a Strong PRP test is done for 345074*3^11+1.
345074*3^11+1 is base 3-Strong Fermat PRP! Time : 4.315 ms.
345074*3^11+1 is strong-Fermat PSP, but composite!! (P = 5, Q = 2), Lucas RES64: 00000003F2AC7784 Time : 5.752 ms.
[/code]

i changed my script like this:
[code]
k = substr($1,1,index($1,"*")-1) # read k- and n-value from line
sign = index($1,"-")
if (sign == 0) {sign = index($1,"+")}
n = substr($1,index($1,"^")+1,sign-index($1,"^")-1)
[/code]

in awk '$0' means the whole input line which i used it before change.
but as you can see in the third line '-' will print in 'strong-Fermat', too not only in the sign of the kn-pair. so "sign" was set false in the old version.
now i use '$1' which means take the first part (all chars to the first blank) and find a sign there!

and i changed
if (index($0,"not prime")>0)
into
if ((index($0,"not prime")>0) || (index($0,"but composite")>0))

now i get this as output for tosend:
[code]
5000000000000:M:1:2:258 345074 11 -2 00000003F2AC7784
[/code]
and it's ok now!

please check this in the do.pl, because i think you use the same way!

mdettweiler 2010-03-02 17:24

[quote=kar_bon;207077]question@Max: do the server dried or were these WU's only canceled?[/quote]
No, there's plenty of work still in the server, I was just canceling these so that I could move that core to something else.
[quote]try this first, when server dried OR stopped:
- stop server (if not)
- notice the contents of knpairs.txt and joblist.txt
- call "llrserver -s"
- notice the contents of knpairs.txt and joblist.txt again
- call "llrserver -s" 2nd time!
- notice the contents of knpairs.txt and joblist.txt again

post the result here.[/quote]
Ah, right, I forgot about how you can use the -s option to force the server to prune. Gary, we'll have to keep that in mind for the future when something's kind of "stuck" in knpairs or joblist longer than it's supposed to.
[quote=gd_barnes;207079]Karsten,

I have confirmed it with a detailed look and test: The do.pl Linux script handles cancellation of the pairs correctly with the version that I posted in post #156 and that Max tested.

What we have here is the same issue that we had before where the server was dropping the final results coming in when it dried out. You fixed that with a correction of one of the .lua files. (I can't remember which one.)

The problem with it not sending a "CANCELLED" status to the joblist.txt on the final pair only when running do.pl -c has to be in one of the .lua files somewhere.[/quote]
Hmm, that is indeed strange. The weird thing is, in my experience cancellation always works perfectly with "normal" LLRnet--and as such I kind of doubt it's an issue on the server end.

As far as do.pl goes, note that the final pair to be canceled obviously has been passed to the LLRnet executable (since the "Canceling x/x" message comes from LLRnet). So the problem has to be in there somewhere, i.e. in llrnet.lua.

[quote]Can you look into that please? Also, can you see if you have the same problem with your Windows script?

OK guys, Karsten's test proves something to me now. Scope creep is over!! We are so far outside of the original scope of this project that it is crazy.

For the first release; base 2 only please. Other bases are much too convoluted right now.

I don't want to wait weeks more while we keep testing weird situations on other bases. This is getting kind of ridiculous.

Let's get this cancellation issue fixed, get the scripts synced up, get all server and client files synced up, run a few final tests, and be done with this.

Otherwise we'll be testing this thing for weeks trying to get every single situation on all bases to work. LLR just has too much varied output on too many bases to attempt to account for everything on the 1st release. It's important for the project and for individuals at this point to have just a good base 2 server that is easy to use.

Please!


Thank you,
Gary


P.S. I'm really beat tonight otherwise I'd keep going. I'm going to bed now.[/quote]
Agreed, better to focus just on base 2 n>1000 for now. Usually when people are running other bases they're doing a conjecture search anyway, in which case PRPnet is generally a better choice.

Also, forget about 3.7.1c support--agreed, no reason to bother with that. It's not like 3.7.1c will work on an platform that 3.8.0 doesn't work just as well on.

kar_bon 2010-03-02 18:34

i've the cancel-issue left in the ToDo-list in ther first posts, but left the focus over all of those small primes!

this is not so simple as it seems:
the '-c' option from the script will cancel [b]all[/b] jobs the client reserved, done or not done!
perhaps this could be handled by the given things (converting the done results, submit with llrnet, calling llrserver with option '-s' 2 times).

but i have to test if this is all ok on the server-side:
- the knpairs.txt is ok
- the results.txt is ok
- the joblist.txt is ok
- no rejected pairs

if this won't work:
- a second script for deleting the work done from the llrnet-client-workfile and try again

another workaround:
- submit the done results by llrnet
-> the rest of the reserved pairs will be resubmitted after jobMaxTime lasted

i think this would be the easiest way to go (conversion and submitting pairs is done so far by the script).

suggestions?

gd_barnes 2010-03-02 20:44

[quote=kar_bon;207138]i've the cancel-issue left in the ToDo-list in ther first posts, but left the focus over all of those small primes!

this is not so simple as it seems:
the '-c' option from the script will cancel [B]all[/B] jobs the client reserved, done or not done!
perhaps this could be handled by the given things (converting the done results, submit with llrnet, calling llrserver with option '-s' 2 times).

but i have to test if this is all ok on the server-side:
- the knpairs.txt is ok
- the results.txt is ok
- the joblist.txt is ok
- no rejected pairs

if this won't work:
- a second script for deleting the work done from the llrnet-client-workfile and try again

another workaround:
- submit the done results by llrnet
-> the rest of the reserved pairs will be resubmitted after jobMaxTime lasted

i think this would be the easiest way to go (conversion and submitting pairs is done so far by the script).

suggestions?[/quote]

Karsten, you said the cancel option will return all pairs done or undone. That is incorrect on the Linux side. I have it "mostly" working except for what Max discovered. What it will do is return all unproecessed results to the server, remove those pairs from workfile.txt, and then cancel the remaining pairs remaining in workfile.txt. But it has the bug in that it doesn't write a "CANCELLED" record to the joblist.txt file for the very last cancelled pair. And that is not a bug in the script itself. It's in one of the .lua files somewhere. That's where I need your help.

Can you please write the same code for the Windows script? I think this is an important feature. I'd like to get this problem resolved before we find any more different kinds of scenarios on the results. I'd like to get the scripts at least synced up in that regard. Thanks.

We don't want the script to cancel all pairs, processed and unprocessed. We want it to submit the results for the done pairs, remove those pairs from workfile.txt, and then cancel the remaining pairs.

No, we don't want to just wait for the pairs to be returned to the server. Then it's not a 'cancel' option. It's a 'submit' option.

I'm going to try something shortly. I'm going to try changing the cancel code such that it has to be executed twice and twice only as follows:
1. On the first cancel, it returns good results to the server and removes those pairs from workfile.txt
2. On the second cancel, it returns all pairs to the server.

Let's see if that will work.

If that doesn't work, we could just live with it. You were saying to just let all of the cancelled pairs wait to be returned to the server. Not a good choice. But if we go with the code I have, although it still contains a small bug, it won't hurt if that one remaining final pairs isn't officially cancelled in the server. It will just be handed back out again after JobMaxTime. At least that's only one pair and not a bunch of them.

Another choice would be:
1. Have a "submit" option that returns all unprocessred results to the server and removes those pairs from workfile.txt. (It will not cancel any pairs.)
2. Have the "old-styel" cancel option that cancels one pair at a time.

But I really want to avoid that if possible. I'd actually prefer to leave it as is.

In other news, I can see that I am going to have to code for the "composite PRP" issue. I suppose I don't have a choice since it's possible to get a small composite PRP on base 2 also.

I don't have a lot of time today and so am starting earlier. I need to leave about 5:30 PM (1:30 AM GMT) and won't be available again until Weds. afternoon.


Gary

kar_bon 2010-03-02 20:59

the cancel option as it is now will cancel all pairs in the workfile.txt in the WIN-script!
i've not yet done this there!

the same code for the WIN-script like the UNIX i need some time doing it in DOS-batch.

i try this now, too:
say, workfile got 5 pairs in it
- submit 2 pairs done by cllr with "llrnet"
- cancel the whole (5 pairs) workfile with "llrnet -c"

i'll look also in the lua-files for the last pair not canceled!

gd_barnes 2010-03-02 21:13

[quote=kar_bon;207081]ok, i try to test this with the Win-script today.

to conversion:
the example with
[code]
345074 > 3^11, so, only a Strong PRP test is done for 345074*3^11+1.
345074*3^11+1 is base 3-Strong Fermat PRP! Time : 4.315 ms.
345074*3^11+1 is strong-Fermat PSP, but composite!! (P = 5, Q = 2), Lucas RES64: 00000003F2AC7784 Time : 5.752 ms.
[/code]i changed my script like this:
[code]
k = substr($1,1,index($1,"*")-1) # read k- and n-value from line
sign = index($1,"-")
if (sign == 0) {sign = index($1,"+")}
n = substr($1,index($1,"^")+1,sign-index($1,"^")-1)
[/code]in awk '$0' means the whole input line which i used it before change.
but as you can see in the third line '-' will print in 'strong-Fermat', too not only in the sign of the kn-pair. so "sign" was set false in the old version.
now i use '$1' which means take the first part (all chars to the first blank) and find a sign there!

and i changed
if (index($0,"not prime")>0)
into
if ((index($0,"not prime")>0) || (index($0,"but composite")>0))

now i get this as output for tosend:
[code]
5000000000000:M:1:2:258 345074 11 -2 00000003F2AC7784
[/code]and it's ok now!

please check this in the do.pl, because i think you use the same way![/quote]

Karsten,

I think the code in the Linux do.pl is much easier. I do not need to add any code for this condition. I'll state why in a minute. IMPORTANT: I also realized that you still have a bug in yours. Here's why:

[quote]
100542585*2^1+1 = 201085171 is not prime. (trial divisions)
100542585*2^2+1 = 402170341 is not prime. (trial divisions)
100542585*2^3+1 = 804340681 is not prime. (trial divisions)
100542585*2^4+1 = 1608681361 is not prime. (trial divisions)
100542585*2^5+1 = 3217362721 is not prime. (trial divisions)
100542585 > 2^6, so we can only do a PRP test for 100542585*2^6+1.
100542585*2^6+1 is base 3-Strong Fermat PRP! Time : 0.170 ms.
100542585*2^6-1 is Frobenius PRP! (P = 7, Q = 3, D = 37) Time : 0.445 ms.
[/quote]You see the first 5 lines? They contain the verbiage "not prime" but they do not have a residue. If you don't have a residue in the tosend.txt file, the server will not accept it. I found that out the hard way.

What my code does it check for "64: ". That catches any line with a residue and quickly concludes that it is not prime. I then have subsequent code that checks for "factor" or "trial". If it hits one of those conditions, then it simply writes "factored" in the residue.

Therefore, I would suggest that you strip out the "not prime" code and check for "64: " instead. I believe what I have in do.pl is the most efficient way to go. For ease of reference, here it is:

[code]
if($nc =~ "\-") {
($n, $c) = split(/\-/, $nc);
}
else {
($n, $c) = split(/\+/, $nc);
}
if($JustRead =~ "64: ") {
if($JustRead =~ "OLD64") {
($foo, $res64time) = split(/RES64: /, $JustRead);
($res64, $time) = split(/. OLD64/, $res64time);
}
else {
($foo, $res64time) = split(/64: /, $JustRead);
($res64, $time) = split(/ Time /, $res64time);
}
print TOSEND "$header $k $n -2 $res64\n";
$numResults++;
}
elsif($JustRead =~ "prime!" or $JustRead =~ "Frobenius PRP!") {
print TOSEND "$header $k $n 0 0\n";
$numResults++;
}
elsif($JustRead =~ "factor" or $JustRead =~ "trial") {
print TOSEND "$header $k $n -2 factored\n";
$numResults++;
}
}
[/code]If the results line doesn't hit any of the conditions, it writes nothing. The numresults variable tells how many results have been processed so far, in the case that the user decides to cancel remaining pairs.

This code has now been tested for all n-sizes on base 2 (including n=1 thru n=1000, n=10K, and n=500K), for composite PRPs, for k's much > 2^n, and for small n-sizes on base 3; also where k much > 3^n.

As you can see, it will correctly handle the composite PRP scenario without any further changes. It will ignore the first few lines where it writes out unneeded stuff and then it will catch the residue on the final line. If the PRP was prime or a "stronger" PRP, it would catch either of those because it would contain "Prime!" or "Frobenius PRP".

BTW, Max I was not saying that we should only focus on n>1000. I was saying that we should only focus on base 2. Let's get base 2 correct for all sizes. The code in the do.pl script will handle any size test and even handle composite PRPs such as this.


Gary

kar_bon 2010-03-02 21:28

[quote]Karsten,

I think the code in the Linux do.pl is much easier. I do not need to add any code for this condition. I'll state why in a minute. IMPORTANT: I also realized that you still have a bug in yours. Here's why:

[code]
100542585*2^1+1 = 201085171 is not prime. (trial divisions)
100542585*2^2+1 = 402170341 is not prime. (trial divisions)
100542585*2^3+1 = 804340681 is not prime. (trial divisions)
100542585*2^4+1 = 1608681361 is not prime. (trial divisions)
100542585*2^5+1 = 3217362721 is not prime. (trial divisions)
100542585 > 2^6, so we can only do a PRP test for 100542585*2^6+1.
100542585*2^6+1 is base 3-Strong Fermat PRP! Time : 0.170 ms.
100542585*2^6-1 is Frobenius PRP! (P = 7, Q = 3, D = 37) Time : 0.445 ms.
[/code]

You see the first 5 lines? They contain the verbiage "not prime" but they do not have a residue. If you don't have a residue in the tosend.txt file, the server will not accept it. I found that out the hard way.

What my code does it check for "64: ". That catches any line with a residue and quickly concludes that it is not prime. I then have subsequent code that checks for "factor" or "trial". If it hits one of those conditions, then it simply writes "factored" in the residue.
[/quote]

so read again post #150 carefully!

the server-result-file is also given there!

the code in llrserver.lua for the result-file printings is:
[code]
write(file, format("user=%s\n", job.user))
write(file, format("[%s]\n", job.resultdate))
if job.result ~= "0" then
write(file, format(displayFormat.." is not prime. Res64: %s Time : %d.0 sec.\n",
job.k, job.n, job.result, Seconds() - job.seconds))
else
write(file, format(displayFormat.." is prime! Time : %d.0 sec.\n",
job.k, job.n, Seconds() - job.seconds))
end
[/code]

so if the result is not '0' -> is '-2' for non-primes, the code will write:
"... is not prime. Res64: <expression given from llrnet-client> Time:..."

and that expression does not have to be a 16-char Residue!

here's my code for that in do_tosend.awk:
[code]
if ((index($0,"not prime")>0) || (index($0,"but composite")>0)) # not a prime
{ prim = "-2"
if (index($0,"RES64:") > 0)
res = substr($0,index($0,"RES64:")+7,16)
else if (index($0,"Res64:") > 0)
res = substr($0,index($0,"Res64:")+7,16)
else if (index($0,"trial") > 0)
res = "trial_factored"
ok = 1
}
[/code]

the residue is set to 'trial-factored'!

'my' tosend.txt with above examples is:
[code]
5000000000000:M:1:2:258 100542585 1 -2 trial_factored
5000000000000:M:1:2:258 100542585 2 -2 trial_factored
5000000000000:M:1:2:258 100542585 3 -2 trial_factored
5000000000000:M:1:2:258 100542585 4 -2 trial_factored
5000000000000:M:1:2:258 100542585 5 -2 trial_factored
5000000000000:M:1:2:258 100542585 6 0 Frobenius_PRP
[/code]

BTW:
the sign in
100542585*2^6-1 is Frobenius PRP! (P = 7, Q = 3, D = 37) Time : 0.445 ms.
can't be '-'!

gd_barnes 2010-03-02 23:44

OK, I'll change my script to show "trial factoed" and "Frobenius_PRP". I had looked in #150 but I guess I didn't pick up on the logic that you had in your script for trial factoring.

What do you mean the sign can't be "-" on a Frobenius PRP? It can be Frobenius PRP regardless of the sign. Examples:

[quote]
100542585*2^20-1 is base 3-Strong Fermat PRP! Time : 0.304 ms.
100542585*2^20-1 is Frobenius PRP! (P = 6, Q = 2, D = 28) Time : 0.821 ms.
100542585 > 2^21, so we can only do a PRP test for 100542585*2^21-1.
100542585*2^21-1 is base 3-Strong Fermat PRP! Time : 0.306 ms.
100542585*2^21-1 is Frobenius PRP! (P = 6, Q = 2, D = 28) Time : 0.869 ms.

100542585 > 2^6, so we can only do a PRP test for 100542585*2^6+1.
100542585*2^6+1 is base 3-Strong Fermat PRP! Time : 9.144 ms.
100542585*2^6-1 is Frobenius PRP! (P = 7, Q = 3, D = 37) Time : 1.220 ms.
[/quote]

I guess the cancel pairs issue is just going to have to wait. I'm out of time. I've been looking in llrnet.lua all afternoon and testing various things. No luck.

BTW, here is my cancel pairs code:
[code]
sub jobCancel
{
convertResults();

# Check to see if there are unprocessed results before returning
# pairs to the server. If so, return results and remove the
# applicable pairs that would have been returned as unprocessed.
if($numResults > 0) {
# See comments for similar code below in the main looping process.
$timestamp = getTimestamp();
system("cat lresults.txt >> lresults_hist.txt");
checkForPrimes();
unlink('lresults.txt');
unlink('workfile.res');
unlink('llr.ini');
system($llrnetPath);
open(HIST, ">>lresults_hist.txt");
print HIST "Submitted to server at $timestamp\n";
close(HIST);
unlink('tosend.txt');

# Rewrite workfile.txt without the first $numResults lines in
# preparation for returning all subsequent pairs to the server.
rename("workfile.txt", "workfileb.txt");
open(WKFB, "workfileb.txt");
open(WKF, ">workfile.txt");
# write header
$line = <WKFB>;
chomp($line);
print WKF "$line\n";
for($lineCnt = 0; $lineCnt < $numResults; $lineCnt++) {
# skip lines
$line = <WKFB>;
}
while(<WKFB>) {
# write rest of file
$line = $_;
chomp($line);
print WKF "$line\n";
}
close(WKFB);
close(WKF);
unlink('workfileb.txt');
}
[/code]


Now I sure wish we could figure out why llrnet.lua is not writing the final "CANCELLED" record to joblist.txt. That's the only existing problem with it. The script code is fine and it displays all pairs as cancelled on the screen so it's hitting the llrnet.lua code. But something is dropping somewhere.

The cancellation code is what I would like to get us synced up on.

So much for getting LLRnet going before I leave town. I'm leaving on a business trip Thursday for 11 days. Oh well. Hopefully it will happen sooner or later.


Gary

mdettweiler 2010-03-02 23:53

[quote=gd_barnes;207174]OK, I'll change my script to show "trial factoed" and "Frobenius_PRP". I had looked in #150 but I guess I didn't pick up on the logic that you had in your script for trial factoring.

What do you mean the sign can't be "-" on a Frobenius PRP? It can be Frobenius PRP regardless of the sign. Examples:
[code]100542585*2^20-1 is base 3-Strong Fermat PRP! Time : 0.304 ms.
100542585*2^20-1 is Frobenius PRP! (P = 6, Q = 2, D = 28) Time : 0.821 ms.
100542585 > 2^21, so we can only do a PRP test for 100542585*2^21-1.
100542585*2^21-1 is base 3-Strong Fermat PRP! Time : 0.306 ms.
100542585*2^21-1 is Frobenius PRP! (P = 6, Q = 2, D = 28) Time : 0.869 ms.

100542585 > 2^6, so we can only do a PRP test for 100542585*2^6+1.
100542585*2^6+1 is base 3-Strong Fermat PRP! Time : 9.144 ms.
100542585*2^6-1 is Frobenius PRP! (P = 7, Q = 3, D = 37) Time : 1.220 ms. [/code][/quote]
Looks to me like LLR erroneously changed the sign of the n=6 example to -1 in the Frobenius result! :exclaim: (Unless that was a typographical error of some sort?)

[quote]Now I sure wish we could figure out why llrnet.lua is not writing the final "CANCELLED" record to joblist.txt. That's the only existing problem with it. The script code is fine and it displays all pairs as cancelled on the screen so it's hitting the llrnet.lua code. But something is dropping somewhere.[/quote]
Just to clarify: llrnet.lua is NOT writing to joblist.txt, per se. What it does is talk to the server (i.e. llrserver.lua) which then writes to joblist.txt. Therefore it seems that llrnet.lua (the client) isn't sending the final pair to be canceled to the server.

kar_bon 2010-03-03 00:33

[QUOTE=gd_barnes;207174]What do you mean the sign can't be "-" on a Frobenius PRP? It can be Frobenius PRP regardless of the sign.[/QUOTE]

the examples in the lresults.txt contain plus [b]and[/b] minus signs in the pairs, but this can only occur when testing twins!
so i think you copied different llr-outputs together to test the script.

but the conversion script only handles one header, so
5000000000000:M:1:2:258 for the minus or
5000000000000:P:1:2:257 for the plus side!

llrnet.lua handles the lists (workfile,tosend) containig the kn-pairs and inserting, deleting, writing pairs when work is done or cancelled.
llrserver.lua 'only' handles the GiveResult and AskPair functions to comunicate with the client, pruning knpairs.txt and joblist.txt.

with the time, i'm understanding more and more of the whole communication and specialities of LUA. i think with some more time (not next weeks), i can implement a data for the real testtime of a pair from cllr (now in the server resultfile it's only the time between receiving and submitting the pair from/to the server)!

i've tested an idea but without success. i'll test till i found a solution!

gd_barnes 2010-03-03 08:53

Guys, I have no idea why it did that plus and minus thing. It has to be a bug in LLR. It was the only Frobenius PRP that I found on the Proth side so maybe LLR has a display issue with it. I cut-and-pasted it right out of my results. I checked all of my other results and it didn't do that.

Let's please stop looking for bugs in LLR and concentrate on bugs in LLRnet! We should be testing only LLRnet for base 2 yet we're trying to debug all bases and LLR Itself. It seems like we are worrying about everything BUT this final canceled pairs issue. We must stop the scope creep! What other issue is there? My testing shows that the Linux script works in all scenarios for base 2 except the cancellation. Let's please concentrate on that, It is a bug that we have created and I can prove it:

Karsten, here is what I want you to do to prove this problem to yourself:

With several pairs remaining in workfile.txt, simply do:

llrnet -c

As many times as it takes to cancel all the pairs. Even with your changed code in llrnet.lua, this should still work.

Now: If APPEARS to work for every pair. One at a time, it will show them cancelled all the way up until you run out of pairs. That's exactly what it shows when I run the -c option on do.pl. It shows them all being cancelled.

BUT...if you go to joblist.txt in the server, you will see that the FINAL pair ONLY did not get a "CANCELLED" record written for it.

Below is the code that I'm fairly certain is somehow incorrect in llrnet.lua on the client side. I tried many different things Tues. afternoon to make it work and only proceeded to make the problem worse so it's back the way it was. Here it is:

[code]
if cancelJob then
-- local i
-- for i=1, getn(more), 1 do
-- print(format("getn1 #%s: %s %s\n", i, more[i].k, more[i].n))
-- end

print(format("Cancelling : %s/%s (%s)", k, n, t))
result, residue = -2, "CANCEL"
-- print(format("cancel1: %s/%s", k, n))
local tbl = { t = t, k = k, n = n, result = result, residue = residue, date = date("%c") }
tinsert(tosend,tbl)
WriteTosendfile()
if more[1] then
local p = more[1]
local k, n = p.k, p.n
-- print(format("cancel1: %s/%s", k, n))
local tbl = { t = t, k = k, n = n, result = result, residue = residue, date = date("%c") }
-- tinsert(tosend,tbl)
-- WriteTosendfile()

tremove(more, 1)
if WriteWorkfile(t, k, n, more) then
print(format("Remove : (%s) %s/%s", k, n, t))
print("Could not write to workfile.txt file !")
SemaSignal(semaphore)
return -1
end
else
ClearWorkfile()
end
else
-- perform prime test !
[/code]Can one of you look at that and see why it will not properly communicate with the server such that the server will write a "CANCELLED" record in joblist.txt? Thank you.


Gary

kar_bon 2010-03-03 09:31

[QUOTE=gd_barnes;207211]Can one of you look at that and see why it will not properly communicate with the server such that the server will write a "CANCELLED" record in joblist.txt? Thank you.
[/QUOTE]

i've tested yesterday many things in that same code-segment but not found the issue yet.
i'll try this today again, got another idea!

gd_barnes 2010-03-03 10:33

Cancelled pairs issue fixed!!!
 
1 Attachment(s)
[quote=kar_bon;207214]i've tested yesterday many things in that same code-segment but not found the issue yet.
i'll try this today again, got another idea![/quote]

They say necessity is the mother of invention:

[code]
sub jobCancel
{
convertResults();

# Check to see if there are unprocessed results before returning
# pairs to the server. If so, return results and remove the
# applicable pairs that would have been returned as unprocessed.
if($numResults > 0) {
# See comments for similar code below in the main looping process.
$timestamp = getTimestamp();
system("cat lresults.txt >> lresults_hist.txt");
checkForPrimes();
unlink('lresults.txt');
unlink('workfile.res');
unlink('llr.ini');
system($llrnetPath);
open(HIST, ">>lresults_hist.txt");
print HIST "Submitted to server at $timestamp\n";
close(HIST);
unlink('tosend.txt');
}

# Rewrite workfile.txt without the first $numResults lines in
# preparation for returning all subsequent pairs to the server.
rename("workfile.txt", "workfileb.txt");
open(WKFB, "workfileb.txt");
open(WKF, ">workfile.txt");
# write header
$line = <WKFB>;
chomp($line);
print WKF "$line\n";
for($lineCnt = 0; $lineCnt < $numResults; $lineCnt++) {
# skip lines
$line = <WKFB>;
}
$numCancel = 0;
while(<WKFB>) {
# write rest of file
[COLOR=SeaGreen] $numCancel++;[/COLOR]
$line = $_;
chomp($line);
print WKF "$line\n";
}
[COLOR=Red] # Write an extra null line due to a quirk in the
# server communication on the final cancelled pair.
print WKF "\n";[/COLOR]
[COLOR=SeaGreen] $numCancel++;[/COLOR]
close(WKFB);
unlink('workfileb.txt');

# Return all remaining pairs to the server.
[COLOR=SeaGreen] for($lineCnt = 0; $lineCnt < $numCancel; $lineCnt++) {[/COLOR]
system($llrnetPath . " -c");
}
close(WKF);
unlink('tosend.txt');
unlink('workfile.txt');
exit;
}
[/code]Red = work around

Green = other more efficient changed code

Forget looking at the confusing lua code! I coded a work-around right into the do.pl script. It required writing a final "null" record at the end of the workfile.txt and then attempting to send that null record. When it attempted to do that, it properly sent the final cancelled pair to the server such that it wrote a "CANCELLED" record in joblist.txt. Then, since it was trying to send the null record, it gave the message "No more job to cancel !". While that's an unnecessary message, it's quite clear and it is correct so this work-around appears to be a keeper.

Here was the final screen output:
[code]
[2010-03-03 04:02:09]
Cancelling : 100542854/346 (1000:M:1:3:258)
[2010-03-03 04:02:10]
Cancelling : 100542854/347 (1000:M:1:3:258)
[2010-03-03 04:02:10]
Cancelling : 100542854/348 (1000:M:1:3:258)
[2010-03-03 04:02:10]
Cancelling : 100542854/349 (1000:M:1:3:258)
[2010-03-03 04:02:10]
Cancelling : 100542854/350 (1000:M:1:3:258)
[2010-03-03 04:02:11]
No more job to cancel !
[/code]Perfecto! And it wrote the final cancelled pair with a "CANCELLED" record to the joblist.txt!!!

Even better: As a result of trying to simplify the code due to adding the extra null record, I came up with a better way to loop until all pairs were cancelled. See the green code above. If you look in my previous code, it had a convoluted way of opening and closing workfile.txt and then deleting it at the end so that the loop would end.

I can't explain it. Maybe you can Karsten. But for some reason, it has to have one final null cancelled record sent in order to correctly write the final "CANCELLED" record to joblist.txt.

Houston, we have lift off! I can think of no other bugs at this moment in the Linux code. But to be sure, I'm going to rerun Riesel and Sierp base 2 and Riesel base 3. I'll also run a new test for Sierp base 3. I'd like to see this incorrect sign again on a Proth Frobenious PRP. If we see that more than once, that will be another issue to take up with Jean. We now have 2 of them: That one and the infinite loop on a rounding error.

Attached is the updated do.pl script. Karsten, you'll notice that I changed the verbiage to "trial_factored" and "Frobenius PRP" (where applicable; from "factoed" and "Prime!") in the results conversion to be in sync with your script code. I suppose technically my prior code did have a bug there. It was sending a PRP as though it was an actual prime. Not a good thing if someone ends up encountering a composite PRP.

Hopefully you can somewhat simulate the logic of the above code in your Windows script. I suspect it will be easier than trying to mess around with .lua code somewhere.

Later today after final testing, I'll start checking all of the files in the client vs. what is in the Windows client. I think I have them all the same. After that, the only remaining thing is to correct the README.txt file per your previous comments in the PMs,

Once we have everything synced up, I'll then want to run one final "parallel" test on these n=1 to 1000 files that I have for both the Windows and Linux clients to verify that we get exactly the same output in the results on the server. That's the important thing.

WHEW!!

Enjoy! :smile:


Gary

gd_barnes 2010-03-03 10:48

One thing I dare you guys to try: :smile:

A base that is a power of another base or a k-value that is a multiple of the base. Examples: bases, 4, 9, or 16 -or- 21*3^n-1.

Trust me, it won't work a fairly high percentage of the time. :-)

I know this because I initially tried the same k-value that I had for base 2 (the one Karsten used) for base 3. But the k-value was divisible by 3. I ended up with quite a few rejected pairs that had nothing to do with the script.

The problem is, LLR "reduces" the tests to the lowest k-value or base (I think in some but not all situations). Well, don't quote me on that on base 9. Base 9 might work. But we all know that for any power-of-2 base, it uses base 2. Subsequently, when it does this conversion, the test doesn't match a pair in the server and it is rejected. :-(

This is another reason why we don't want to get too much into trying to debug other bases at this point.

That said, what we code here may be completely correct as far as it can be for other bases. That is, the only remaining issues after we're done here for other bases may just be as a result of LLR quirks that reduce bases or k's to their lowest possible. In other words, we may not be able to refine the code any further without actually putting "conversion patches" in place to convert the result back to the original k and base...but that would be kind of a ridiculous task to undertake.

Max, does PRPnet work for bases 4 or 16 when running LLR? If so, how does it match the pair when the result coming back from LLR is base 2?


Gary

kar_bon 2010-03-03 23:36

i've tried your suggestion of an extra line at end of the workfile.txt to cancel the last pair,
but it's not working on WIN!

another idea was the last pair twice in the workfile.txt and it seems to work!

mdettweiler 2010-03-04 01:30

First of all, way cool on getting the canceled pairs issue fixed! It sounds like even though Karsten's solution ended up being slightly different, they both should work equivalently on the outside so we should be all set now for both scripts. :w00t:
[quote=gd_barnes;207221]One thing I dare you guys to try: :smile:

A base that is a power of another base or a k-value that is a multiple of the base. Examples: bases, 4, 9, or 16 -or- 21*3^n-1.

Trust me, it won't work a fairly high percentage of the time. :-)

I know this because I initially tried the same k-value that I had for base 2 (the one Karsten used) for base 3. But the k-value was divisible by 3. I ended up with quite a few rejected pairs that had nothing to do with the script.

The problem is, LLR "reduces" the tests to the lowest k-value or base (I think in some but not all situations). Well, don't quote me on that on base 9. Base 9 might work. But we all know that for any power-of-2 base, it uses base 2. Subsequently, when it does this conversion, the test doesn't match a pair in the server and it is rejected. :-(

This is another reason why we don't want to get too much into trying to debug other bases at this point.

That said, what we code here may be completely correct as far as it can be for other bases. That is, the only remaining issues after we're done here for other bases may just be as a result of LLR quirks that reduce bases or k's to their lowest possible. In other words, we may not be able to refine the code any further without actually putting "conversion patches" in place to convert the result back to the original k and base...but that would be kind of a ridiculous task to undertake.

Max, does PRPnet work for bases 4 or 16 when running LLR? If so, how does it match the pair when the result coming back from LLR is base 2?


Gary[/quote]
With PRPnet, since it handles queueing of multiple pairs in a batch itself rather than leaving that to LLR (i.e., it passes each individual pair to LLR by itself in an input file then stores the result in the server_name.save file), it just takes the residual and stores it for a test regardless of whether the actual test was done for a different base. That is, for base 16, the server would record the base 2 residual under the base 16 number.

For LLRnet, the situation's a little different; the old LLRnet client uses a version of LLR that doesn't convert power-of-2 bases anyway, so it just does a PRP test on base 16, but v3.8 converts to base 2 and outputs to lresults.txt in that format, so do.pl/do.bat would end up sending it to the server with the base 2 n-value (but with the base 16 NewPGen header). Needless to say, the results would be rejected. What I'd recommend doing as a standard course of action is to first convert the sieve file to base 2 before loading it into the server; that's what I've always done with power-of-2 bases on LLRnet, and that way it would ensure it works even with the old LLRnet client as well.

BTW: I've uploaded the latest version of do.pl to the respective Windows and Linux client package links.

gd_barnes 2010-03-04 04:00

[quote=kar_bon;207280]i've tried your suggestion of an extra line at end of the workfile.txt to cancel the last pair,
but it's not working on WIN!

another idea was the last pair twice in the workfile.txt and it seems to work![/quote]

This will not necessarily look good to the user. At first I tried that with the Linux script and yes, it ended up correctly writing all of the "CANCEL" records to joblist.txt. But it also ends up displaying on the screen that the last pair was canceled twice. To me, that would be kind of unprofessional.

Are you sure that you actually tried writing a "null" line at the end and not just one with a single space in it? In Perl, the code is:

print WKF "\n";

Max, you said that the do.pl script should work in Windows. Can you try cancelling pairs with it on your Windows machine and see if it properly writes a "CANCEL" record to joblist.txt for all pairs including the final pair? I'm wondering if this is a Windows-related issue or if it is an Awk script related issue. I really don't want the same pair showing up on the screen as having been canceled twice.

gd_barnes 2010-03-04 04:05

[quote=mdettweiler;207289]
With PRPnet, since it handles queueing of multiple pairs in a batch itself rather than leaving that to LLR (i.e., it passes each individual pair to LLR by itself in an input file then stores the result in the server_name.save file), it just takes the residual and stores it for a test regardless of whether the actual test was done for a different base. That is, for base 16, the server would record the base 2 residual under the base 16 number.

For LLRnet, the situation's a little different; the old LLRnet client uses a version of LLR that doesn't convert power-of-2 bases anyway, so it just does a PRP test on base 16, but v3.8 converts to base 2 and outputs to lresults.txt in that format, so do.pl/do.bat would end up sending it to the server with the base 2 n-value (but with the base 16 NewPGen header). Needless to say, the results would be rejected. What I'd recommend doing as a standard course of action is to first convert the sieve file to base 2 before loading it into the server; that's what I've always done with power-of-2 bases on LLRnet, and that way it would ensure it works even with the old LLRnet client as well.

BTW: I've uploaded the latest version of do.pl to the respective Windows and Linux client package links.[/quote]

That answers only a small percentage of the problem that I posed. That is that it converts powers-of-2 bases to base 2.

But it doesn't answer the question of k's that are a multiple of the base. It also doesn't answer the question for bases that are powers of OTHER bases. That is bases like 9, 25, 36, etc. That said, LLR may not convert those other bases. It may only convert powers-of-2 bases. If so, the issue is only on powers-of-2 bases and k's that are a multiple of the base.

The user would have to convert many different things for LLRnet. For that reason, LLRnet should almost never be used for conjecture-type searches. It was never intended for that and our effort here certainly isn't to make it able to do so, so I feel we are in good shape now.

Also, for the above reason, users would have to convert even k's on base 2 to odd k's and increase the exponent. The only instance that I can think of where this is applicable is on Karsten's 2 even k's that are remaining on the Riesel base 2 conjecture. But I'm guessing that he probably already manually converts them when searching.


Gary

gd_barnes 2010-03-04 04:13

Guys,

Another quirk:

Karsten, on your "Frobineus_PRP" code that is written to the residue in the tosend.file, LLRnet still converts that to "prime!" in the results on the server. Obviously it writes "prime!" to any record where the "prime code" is 0 (vs. -2). Therefore me changing my code to be synced up with yours for Frobineus PRPs had no effect. I'm sure the same issue is there for Windows unless you made some .lua code change somewhere; likely on the server side.

That said, I agree with leaving the script code like you have it. At some point, perhaps it can be fixed where the server shows the correct thing in the results. By having the correct script code in there, that won't have to be changed at that point. If the server results can be fixed easily with this release, then let's do it. If not, let's wait until a later release.

An enhancement:

I've added code to the pairs cancellation that I think is very beneficial to the user. I've added displays to the screen that show how many completed results are sent and how many pairs are canceled when the user does do.pl -c. Here is the way that the previous example looks now that I've made the change:

[code]
Sending 20 completed results.
[2010-03-03 04:02:09]
Cancelling 5 pairs.
Cancelling : 100542854/346 (1000:M:1:3:258)
[2010-03-03 04:02:10]
Cancelling : 100542854/347 (1000:M:1:3:258)
[2010-03-03 04:02:10]
Cancelling : 100542854/348 (1000:M:1:3:258)
[2010-03-03 04:02:10]
Cancelling : 100542854/349 (1000:M:1:3:258)
[2010-03-03 04:02:10]
Cancelling : 100542854/350 (1000:M:1:3:258)
[2010-03-03 04:02:11]
No more job to cancel !

[/code]Karsten, you might consider adding the same enhancement.

After I complete retesting my script changes for Riesel/Sierp bases 2 and 3 for n=1 to 1000, I will repost what I consider will be the "final" version of the Linux do.pl script for this release. I'll then load that puppy on ALL my quads and get that speed boost I've been waiting for for eons. :-)


Gary

mdettweiler 2010-03-04 05:28

[quote=gd_barnes;207304]That answers only a small percentage of the problem that I posed. That is that it converts powers-of-2 bases to base 2.

But it doesn't answer the question of k's that are a multiple of the base. It also doesn't answer the question for bases that are powers of OTHER bases. That is bases like 9, 25, 36, etc. That said, LLR may not convert those other bases. It may only convert powers-of-2 bases. If so, the issue is only on powers-of-2 bases and k's that are a multiple of the base.

The user would have to convert many different things for LLRnet. For that reason, LLRnet should almost never be used for conjecture-type searches. It was never intended for that and our effort here certainly isn't to make it able to do so, so I feel we are in good shape now.

Also, for the above reason, users would have to convert even k's on base 2 to odd k's and increase the exponent. The only instance that I can think of where this is applicable is on Karsten's 2 even k's that are remaining on the Riesel base 2 conjecture. But I'm guessing that he probably already manually converts them when searching.


Gary[/quote]
Actually, the reason why I didn't quite answer your whole question is that beyond what I did answer, I really have no idea. :smile: I'm not sure if it autoconverts other powers of bases.

As for multiples of bases, it would be treated the same way: PRPnet would handle it properly (take the residual and pair it with the right pair regardless of what LLR converts it to), while do.pl/do.bat due to their alternate method of batch handling would not (rather they would try to send the server results for the converted pairs, which would be rejected). Fortunately, as you said, that's not really an issue for the types of work LLRnet is best for (straight-up contiguous prime searches), for which nobody would be searching multiple-of-base k's anyway. The only times anybody would bother with those is for conjecture searches, for which you'd want to use PRPnet anyway since it can stop a k after a prime is found.

BTW, I do have a Perl script sitting around that I use for converting power-of-2 bases to base 2 (since until recently you had to do that even with PRPnet to ensure it tested with LLR instead of doing a PRP with PFGW). It's not pretty since it's got a lot of hardcoded stuff that I need to mess with manually for each run, but I imagine I could touch it up for more straightforward usage if there's sufficient interest.

mdettweiler 2010-03-04 05:58

[quote=gd_barnes;207303]Max, you said that the do.pl script should work in Windows. Can you try cancelling pairs with it on your Windows machine and see if it properly writes a "CANCEL" record to joblist.txt for all pairs including the final pair? I'm wondering if this is a Windows-related issue or if it is an Awk script related issue. I really don't want the same pair showing up on the screen as having been canceled twice.[/quote]
Okay, tested this and it works on Windows.

kar_bon 2010-03-04 06:40

[QUOTE=gd_barnes;207305]An enhancement:
I've added code to the pairs cancellation that I think is very beneficial to the user. I've added displays to the screen that show how many completed results are sent and how many pairs are canceled when the user does do.pl -c. Here is the way that the previous example looks now that I've made the change:
[/QUOTE]

the counting is not easy with normal DOS-batch commands ('find' can count lines but the results is a line of text not only a number!).
i try to find another solution.

to the cancel-test:
i've paused the script during the cancellation and added a blank line maually. after the first pair was cancelled, the blank line was gone (-> llrnet writes the workfile.txt new!) i added it again. but not the effect you discribed for perl.

sugggestion:
let this be as it is in windows for now and i search more for the problem in the llrnet.lua. eliminating the issue there is the only right thing for 'real' programmers!

gd_barnes 2010-03-04 06:49

1 Attachment(s)
[quote=mdettweiler;207312]Okay, tested this and it works on Windows.[/quote]

Very good. Thanks for doing that Max. Karsten, is there a way to simulate a null line in the Awk script? If not, can you somehow suppress the 2nd display of the cancellation of the final pair on the screen?

In the mean time, I've now fully tested the Linux client on Riesel base 2, Sierp 2, Riesel 3, and Sierp 3. All is working as it should.

The only remaining quirks that I am aware of that cannot be easily coded for at this time are:
1. Bases that are powers of 2 and k's that are multiples of the base that won't work. (I haven't tested it but after thinking a bit about it, I suspect that LLR does not convert powers of other bases such as bases 9, 25, 36, etc.
2. A Frobenius PRP will show up as "Prime!" in the server results. (Perhaps Karsten can make a change to a server file for this issue.)

Attached is the final do.pl script. The only change from the last one was to add the displays of the # of results completed and # of pairs cancelled when doing the do.pl -c cancellation.

I'm now out of time to carefully review and sync up all files in the Linux and Windows clients and run some parallel tests before leaving on my trip. I'll have some limited time to do that while I'm gone.

Here is what I think would get this done as quickly as possible:

1. Karsten look into that final cancelled pair issue.
2. Karsten, I would suggest testing all n=1 to 1000 for Riesel base 2, Sierp 2, Riesel 3, and Sierp 3 for a large k. (Perhaps you've already done this.)
3. Karsten, is it possible to quickly fix the issue with "prime!" showing up in the server results for Frobenius PRPs? If not, don't worry about it.
4. I will Email both of you guys my entire client.
5. Max, please find the PM from Karsten where he says that some corrections are needed in README.txt on the Linux side. I did a partial rewrite of some of your wording to 3rd person and ended up with some incorrect statements in the 3rd para. as well as the word "code" used 3 different times in the 1st sentence of the 1st para.
6. Max, please download Karsten's latest client.
7. Max, for files with the same name, please run that Perl script that you have that compares files to one another looking for differences.
8. Discuss between you how to reconcile any differences between the files.
9. After the client files are synced up, Karsten, please inform Max if there are any changes to the server files that are needed. If so, please coordinate on syncing those up.
10. After all documentation correction/file sync up issues are resolved, Email me and I'll coordinate some parallel testing between the 2 clients. Likely I'll remotely load up 2 servers with the same pairs and I will run a Linux client against one of them and one of you guys can run a Windows client against the other.

I'll be pressed for time on the business trip but I still usually end up doing something for the projects for up to 2 hours each day while I'm gone.

The above is just my two cents so please don't take is as "orders". :-) I'm just trying to get a starting point for the final stuff that needs to be done before we release this publicly. If you guys don't have time or feel differently about who should do what, then by all means, change it up a little.

One final bit of information: For parallel testing on the gb servers, we'll use port 9950 for Windows clients and port 9985 for Linux clients. I'm actually going to load them up right now with some Riesel base 2 tests for n=1 to 1000; although I will not actually start the servers. When we're ready to start them, either Max or I can get them rollilng.

Both of you, I wanted to thank you immensely. Karsten especially for conceiving of the idea to begin with. Max for a quick job of writing a Perl script that closely replicated Karsten's script. Even with some of the issues that I found, I was amazed at how close both scripts were in their original state before testing to what will likely be the final version of them before we release this.


Gary

mdettweiler 2010-03-04 08:16

@Gary: sounds like a plan. I've uploaded the latest files to the web.

Could you also send me your latest README.txt file for do.pl? I must not have the latest as mine still has the "code" in three places in one sentence. :smile: I looked in my PM box but couldn't find a PM from Karsten detailing changes that needed to be made to the readme; when I get your latest one I'll read it over in depth and fix any problems I find.

kar_bon 2010-03-04 08:36

[QUOTE=gd_barnes;207314]Karsten, is there a way to simulate a null line in the Awk script? If not, can you somehow suppress the 2nd display of the cancellation of the final pair on the screen?[/quote]

a 'null'-line as you did it in the do.pl script ('\n') a newline-char is the same in WIN!
as i mentioned, llrnet.exe (or the runable in Unix) will (so the llrnet.lua says) write the workfile.txt new after one pairs was cancelled, so this 'newline' at the end should be deleted! please test this again: pause the script in the for-loop when "llrnet -c" is called!
look at workfile.txt, if the last empty line is there! (the "print WKF "\n";" is before the loop)

[quote]
2. A Frobenius PRP will show up as "Prime!" in the server results. (Perhaps Karsten can make a change to a server file for this issue.)
(...)

1. Karsten look into that final cancelled pair issue.
2. Karsten, I would suggest testing all n=1 to 1000 for Riesel base 2, Sierp 2, Riesel 3, and Sierp 3 for a large k. (Perhaps you've already done this.)
3. Karsten, is it possible to quickly fix the issue with "prime!" showing up in the server results for Frobenius PRPs? If not, don't worry about it.
[/QUOTE]

to 1:
i'll try to fix this problem. i think it has to do with the pair-list handled in the whole server-client-thing: the queue holds all kn-pairs from workfile.txt except the first one: this is a global declaration of t/k/n-values in the source. so if only one pair is in workfile.txt the queue is empty and t/k/n filled with the one pair!

to 2:
i will test when all is ready for the WIN-version.

to 3:
this is no problem, but what do you want the results should look like?
so we could say PRP is result="-3" and the server puts "... is PRP" instead of prime/not prime. but this has consequences in processing those result-files: you have to support PRP in there, too (like i use my processing script for a long time -> changes have to be made!).

Karsten

kar_bon 2010-03-04 14:22

[QUOTE=mdettweiler;207319]@Gary: sounds like a plan. I've uploaded the latest files to the web.

Could you also send me your latest README.txt file for do.pl? I must not have the latest as mine still has the "code" in three places in one sentence. :smile: I looked in my PM box but couldn't find a PM from Karsten detailing changes that needed to be made to the readme; when I get your latest one I'll read it over in depth and fix any problems I find.[/QUOTE]

here:
[code]
sentence:
"Karsten also made small changes to the LLR 3.8.0 source code to remove some (...)"
is quite false! i've not made changes in the source code! i've only patched (means changed
some binary values) in the cLLR.exe directly!

sentence:
"modified Windows LLR executable, the changes to Linux LLR are as follows:
-------------
Patched "cllr.exe":
4560CC: 00 -> V1 = ... ; Computing U0...done.
456344: 00 -> Starting Lucas Lehmer Riesel prime test of...
-------------"

this is only true for the WIN cLLR.exe! the unix program is totally different. i have not
checked this and if that would work in Unix, too. in compiled Win-programs, say executable,
a string for output will ends (mostly i know) with a hex '00'. so i made the beginning
(first char) of this text to '00' only and the print function has nothing to print -> end of text!

perhaps a note/example for the llr-clientconfig.txt should be given (as it is in the testing).

the remaining text is for the Unix-version only so a separate ReadMe for Win i think?
it's quite better because i use other setting-variables.
[/code]

mdettweiler 2010-03-04 17:04

[quote=kar_bon;207320]a 'null'-line as you did it in the do.pl script ('\n') a newline-char is the same in WIN!
as i mentioned, llrnet.exe (or the runable in Unix) will (so the llrnet.lua says) write the workfile.txt new after one pairs was cancelled, so this 'newline' at the end should be deleted! please test this again: pause the script in the for-loop when "llrnet -c" is called!
look at workfile.txt, if the last empty line is there! (the "print WKF "\n";" is before the loop)[/quote]
Hmm...well, the strange thing is, printing "\n" to the end of the file works perfectly with do.pl both on Windows and Linux. It definitely makes a difference from how it behaved without the "\n", since this way it actually cancels everything correctly.
[quote]to 3:
this is no problem, but what do you want the results should look like?
so we could say PRP is result="-3" and the server puts "... is PRP" instead of prime/not prime. but this has consequences in processing those result-files: you have to support PRP in there, too (like i use my processing script for a long time -> changes have to be made!).[/quote]
What I might suggest is something like this: have it use result="-2" but put in the residual field "Frobenius PRP". Then have the server piece together the line as follows:
x is [result]! Time: x.x sec.
For a prime, you'd say "prime" in the residual field, or for Frobenius PRP you'd put that there. So it would produce:
x is prime! Time: x.x sec.
x is Frobenius PRP! Time: x.x sec.
Of course this still has the disadvantage of requiring results-handing scripts (not to mention the stats database) to be changed, which is not a particularly ideal prospect.

Perhaps what might be best, at least for now, is just to leave it as is. The admins would then just know whether they're dealing with pairs that might come up PRP instead of prime; usually for anything small enough to do that, a re-verification run to prove them all for sure with PFGW would be quite trivial.
[quote=kar_bon;207339]here:
[code]
sentence:
"Karsten also made small changes to the LLR 3.8.0 source code to remove some (...)"
is quite false! i've not made changes in the source code! i've only patched (means changed
some binary values) in the cLLR.exe directly!

sentence:
"modified Windows LLR executable, the changes to Linux LLR are as follows:
-------------
Patched "cllr.exe":
4560CC: 00 -> V1 = ... ; Computing U0...done.
456344: 00 -> Starting Lucas Lehmer Riesel prime test of...
-------------"

this is only true for the WIN cLLR.exe! the unix program is totally different. i have not checked this and if that
would work in Unix, too. in compiled Win-programs, say executable, a string for output will ends (mostly i know)
with a hex '00'. so i made the beginning (first char) of this text to '00' only and the print function has nothing to
print -> end of text!

perhaps a note/example for the llr-clientconfig.txt should be given (as it is in the testing).

the remaining text is for the Unix-version only so a separate ReadMe for Win i think?
it's quite better because i use other setting-variables.
[/code][/quote]
Oh! Okay. I didn't realize you just patched the .exe directly. I'll fix that then.


All times are UTC. The time now is 07:23.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.