mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Raiders of the Lost Primes (https://www.mersenneforum.org/forumdisplay.php?f=87)
-   -   Testing.... (https://www.mersenneforum.org/showthread.php?t=13099)

kar_bon 2010-02-24 09:07

[QUOTE=mdettweiler;206532]One other consideration that I can see being an issue with this is screen spam; the way LLR does its screen output, for most k*2^n-1 numbers the status line is just longer than the size of a standard console window, so each update rolls over into a new line. (This issue doesn't usually happen for PRP or Proth tests, since the word "bit" is a lot smaller than "iteration", which is enough to make it fit.) One way to fix this is to expand the console window to a larger width, say 90 characters (instead of the standard 80). Or one could just use the GUI LLR which I think has plenty of room to avoid rollovers anyway. :smile:[/QUOTE]

i've not yet tested LLR-GUI version with this script because you have to put the llr.ini first to a minimum of the options set:
[code]
PgenInputFile=t17_b2.prp
PgenLine=1
[/code]

- the testfile
- and the line to begin

seems to work without script.
but: you have to end LLR-GUI by hand!?

the long line in the cLLR output is the result like:

7095*2^137576-1 is not prime. LLR Res64: 7E4DD3D5AA6CE447 Time : 15.954 sec.

and this is less than 80 characters only for smaller k-values.
there're two spaces, which could be omitted, so it's possible to do (but for greater k it's again longer than 80 chars).
so the best way is to set the output window to a width of 100 or so.
in Win the properties of a CMD-win can be set for all screens (i use it so no problems)!

kar_bon 2010-02-24 09:16

[QUOTE=gd_barnes;206534]Karsten, what about your Windows .awk script? How does it handle it? Will it work correctly if the first pair or pairs in the knpairs.txt file are prime? I don't remember it having a problem with any primes, even ones at the beginning of the file but I may not have been specifically looking for it because I didn't have the cores to stress test it a lot.
[/QUOTE]

i've just tested this:
lresults.txt contains this:
[code]
742837095*2^3190-1 is prime! Time : 38.325 ms.
742837095*2^3193-1 is not prime. LLR Res64: 9F1CECDF79EA85C4 Time : 37.998 ms.
742837095*2^3197-1 is not prime. LLR Res64: 711C29862057740B Time : 38.217 ms.
742837095*2^3201-1 is not prime. LLR Res64: 33EB063C96F7756C Time : 39.356 ms.
742837095*2^3205-1 is prime! Time : 39.301 ms.
742837095*2^3207-1 is not prime. LLR Res64: 689F3E5F8EA36E7F Time : 38.635 ms.
742837095*2^3210-1 is not prime. LLR Res64: 9876B6D0612180D6 Time : 38.365 ms.
[/code]

workfile.bak contains this:
5000000000000:M:1:2:258

and do.bat contains this:
gawk -f do_tosend.awk lresults.txt

put the awk-script (do_tosend.awk) and gawk.exe in the same folder and run it.
the resultfile for LLRnet, tosend.txt then contains:
[code]
5000000000000:M:1:2:258 742837095 3190 0 0
5000000000000:M:1:2:258 742837095 3193 -2 9F1CECDF79EA85C4
5000000000000:M:1:2:258 742837095 3197 -2 711C29862057740B
5000000000000:M:1:2:258 742837095 3201 -2 33EB063C96F7756C
5000000000000:M:1:2:258 742837095 3205 0 0
5000000000000:M:1:2:258 742837095 3207 -2 689F3E5F8EA36E7F
5000000000000:M:1:2:258 742837095 3210 -2 9876B6D0612180D6
[/code]

all is ok, nothing unusual!

gd_barnes 2010-02-24 09:41

Very good. So Karsten chose to put a "0" in the residue for a prime and I chose to put a "xxxxxxxxxxxxxxxx" in the residue for a prime. I chose that because I wasn't sure if it was expecting 16 digits.

Karsten, I got the same type of output as you did except with the 16 x's in the residue for primes.

I'll try it with just "0" also. If it works, I'll go with that to keep the Windows and Linux clients as close to the same as possible. "0" is technically more correct anyway since the residue really is zero if the number is prime.

Max, this confirms that the blank residue that you have in there doesn't work for primes. It just needs "something" in there.

One note of historical interest on all of this: We actually corrected a long-standing LLRnet bug with this. I confirmed by running the old client that if the first few pairs of the files were primes, the server would never get them because the residue came out as blank in tosend.txt file. Once a composite came through, then all future primes would be good because LLRnet just kept the previous pair's residue. That's pretty poor coding in the original design.


Gary

kar_bon 2010-02-24 10:29

[QUOTE=gd_barnes;206537]One note of historical interest on all of this: We actually corrected a long-standing LLRnet bug with this. I confirmed by running the old client that if the first few pairs of the files were primes, the server would never get them because the residue came out as blank in tosend.txt file. Once a composite came through, then all future primes would be good because LLRnet just kept the previous pair's residue. That's pretty poor coding in the original design.[/QUOTE]

i think, i found the reason why!

in llrnet.lua the call is
[code]
result, residue = primeTest(t, format("%s %s", k, n))
[/code]

in LUA it's possible to return more than one (even variable counts of results are possible)result by a function (see [url=http://lua-users.org/wiki/FunctionCallTutorial]here[/url]).

so the lua expect 2 return-values: result and residue.

in the LLRnet-source in llr2.c, the primtest-function is called
[code]
int primeTest (const char * type, const char * input, char * residue)
[/code]

and the end is

[code]
strcpy(residue, res64);
return retval;
[/code]

so in C only the result is returned as integer, the residue is give in the call-by-value-list.

perhaps a change of the lua-line in
[code]
result = primeTest(t, format("%s %s", k, n),residue)

[/code]

could do the job right!

i will test this afternoon.

gd_barnes 2010-02-24 13:14

1 Attachment(s)
I'll be interested to see what you come up with Karsten.

In the mean time, I have confirmed that putting a single "0" (vs. 16 x's) in the residue for primes works on the Linux side. Therefore I have made that an official modification to the Linux do.pl script. In that regard, the Windows and Linux clients are now in sync.

Max, attached is the updated do.pl script. Like discussed, I put it at 10000 iterations, beep at true, and individualPrimeLog at false.

I've calculated that a single quad running n=~20K tests is like 1000+ cores running n=~400K tests. Therefore a stress test with a single quad is not only sufficient, it's a lot easier to manage and find bugs from my perspective. I may try a 2nd quad for a while just to "feel like" I'm really stressing the server AFTER I can verify that all small bugs like what I've found so far are fixed.

Max, if you want to see what I've been running the last several hours, look at port 9985. There was 180K+ pairs in there when I started it 3-4 hours ago and it should dry out within the next 1-2 hours.

The last thing for me to test is the cancellation of pairs on the Linux client. I'll be fairly busy today (Weds.) but will make time to do that test.


Gary

mdettweiler 2010-02-24 15:37

[quote=gd_barnes;206534]On another topic: Should we use Karsten's client with the "awk" script or Max's client with the Perl script for the public Windows client?[/quote]
Definitely Karsten's--since mine requires Perl, it would be a pain in the butt for many Windows users to run. I mainly developed it with Linux in mind but since Perl is rather crosscompatible over both, figured that it should work just as well for both and that as such I'd test it on both platforms. As I said in the readme file, it never hurts to have options. :smile:
[quote=gd_barnes;206548]I'll be interested to see what you come up with Karsten.

In the mean time, I have confirmed that putting a single "0" (vs. 16 x's) in the residue for primes works on the Linux side. Therefore I have made that an official modification to the Linux do.pl script. In that regard, the Windows and Linux clients are now in sync.

Max, attached is the updated do.pl script. Like discussed, I put it at 10000 iterations, beep at true, and individualPrimeLog at false.[/quote]
AMAZING! :w00t: Thanks for fixing that--I should have figured it was something simple like that. :smile: This should end up saving me quite a bit of time since I imagine it would have taken me a little while to find that.

I'll get the new version uploaded shortly. (Edit: done)

[quote]I've calculated that a single quad running n=~20K tests is like 1000+ cores running n=~400K tests. Therefore a stress test with a single quad is not only sufficient, it's a lot easier to manage and find bugs from my perspective. I may try a 2nd quad for a while just to "feel like" I'm really stressing the server AFTER I can verify that all small bugs like what I've found so far are fixed.[/quote]
Okay, that's good. Two quads, I'd think, would be good to make sure we account for even the biggest rally (I think our biggest ones so far may have gone over 1000 cores, with you, Lennart, Beyond, IronBits, and quite a number of others all banging away full-steam).

mdettweiler 2010-02-24 15:47

[quote=kar_bon;206535]i've not yet tested LLR-GUI version with this script because you have to put the llr.ini first to a minimum of the options set:
[code]
PgenInputFile=t17_b2.prp
PgenLine=1
[/code]

- the testfile
- and the line to begin

seems to work without script.
but: you have to end LLR-GUI by hand!?

the long line in the cLLR output is the result like:

7095*2^137576-1 is not prime. LLR Res64: 7E4DD3D5AA6CE447 Time : 15.954 sec.

and this is less than 80 characters only for smaller k-values.
there're two spaces, which could be omitted, so it's possible to do (but for greater k it's again longer than 80 chars).
so the best way is to set the output window to a width of 100 or so.
in Win the properties of a CMD-win can be set for all screens (i use it so no problems)![/quote]
Ah, whoops, that's not quite what I meant...when I said "or one could just use the GUI LLR" I was referring to it in the more general context of running manual LLR. Not that that's used that commonly nowadays, though...

I had actually intended to try the GUI-LLR version with the script just for the fun of it; I hadn't gotten around to it yet with all the "real" issues to worry about, but I imagine it could still be done if you set NoTrayIcon=0 and NoIcon=1 (I think those are the correct options) in llr.ini to have LLR run without a tray icon, and without displaying any window whatsoever. I think the combination of those two should do the trick; at any rate, I know it worked for Riesel Sieve and PrimeGrid, who had to use the GUI LLR in their BOINC setups prior to the introduction of cllr. Of course, they may well have been using a modified version of the GUI LLR.

Anyway, though, not a particularly big deal...as you said, it's not hard to just make the command window bigger. I imagine a similar "permanent" setting should be possible on Linux as well.

mdettweiler 2010-02-24 15:58

Hmm, I just discovered something interesting. It turns out you CAN kill the do.pl script when it can't connect to a server. You just have to get it while it's in "sleeping 60 seconds" mode, which it does after every 5 failed tries. :smile: Guys, which do you feel would be a better way to handle this: just have the script try 5 times then exit (like Karsten's script) or keep trying over and over like it does now (so that unattended clients don't go kapooey after a longish temporary outage)?

Or, the best of both worlds: add another option for it! :grin: That shouldn't be too hard...

P.S. @Gary: I have a Windows client running do.pl on 9975 now and I just ran into a pair with a small factor. :razz: It seemed to handle it pretty well, though--from what I can tell it only rejected the one with the small factor and accepted the rest. I don't have time right now as I'm leaving the house in 15 minutes, but can you check to see if all of these were received (except the small factor of course)?
[code]2013*2^221147-1 is prime! Time : 73.842 sec.
2015*2^50601-1 has a small factor : 3 !!
2015*2^56660-1 is prime! Time : 3.520 sec.
2015*2^63662-1 is prime! Time : 4.571 sec.
2015*2^104784-1 is prime! Time : 15.064 sec.
Submitted to server at [2010-02-24 10:56:39][/code]

kar_bon 2010-02-24 18:24

[QUOTE=kar_bon;206541]i think, i found the reason why!

in llrnet.lua the call is
[code]
result, residue = primeTest(t, format("%s %s", k, n))
[/code]
[/QUOTE]

it's not so easy as thought, but the following lines will do the trick.
[code]
result, residue = primeTest(t, format("%s %s", k, n))
if result == 0 then
residue = "0"
end
[/code]

so, if a prime is found, set the residue to '0' and all is ok!

Note: not needed for the script, only for the 'old' version of the LLRnet-client.

gd_barnes 2010-02-24 20:34

[quote=kar_bon;206563]it's not so easy as thought, but the following lines will do the trick.
[code]
result, residue = primeTest(t, format("%s %s", k, n))
if result == 0 then
residue = "0"
end
[/code]

so, if a prime is found, set the residue to '0' and all is ok!

Note: not needed for the script, only for the 'old' version of the LLRnet-client.[/quote]

Which file in the old client needs to be changed? We should go ahead and get this corrected for users who like to "fiddle" with code and stuff.

My opinion on what to do when the server dries or goes down: Keep trying. Karsten, this would be a modification to your script. I know firsthand how maddening it would be if all of my clients stopped after a small 5-minute internet blip. IMHO, that should not be an option. It should be the default to keep trying.

Max,

It took me a while but I finally concluded what you did: It's much better to "kill" the Linux client with the system manager than it is to do Ctl-C. There are several times I noticed when it took 3-4 Ctl-C's to kill it; usually on small tests -or- when the server had dried. (Don't quote me on the exact scenarios but I do know that sometimes it didn't want to "die" on the first Ctl-C.)

Can you please put something in the documentation about it being best to kill the clients when stopping them?


Karsten,

Can you please modify the Windows script to do the following?:

1. Keep trying to connect when the server is dried or down.

2. Add an option to allow the user to change how often it tries to connect when the server is down. Both of you have that as 60 secs. but the Linux client allows the user to change it. Regardless, I think that is a good default for the value of a variable field that the user can change.

3. To sync up the Windows and Linux clients, would it be too much effort to allow the user an option to set the beep on or off and to put the primes in the current folder or the parent folder?


I think it make sense to have the 2 clients be as close to the same as possible. The above would accomplish that. Note: I think Karsten probably stopped adding options earlier because we had said that we don't want to add any new features. Sorry we're being a little wishy-washy here Karsten.

One more thing Karsten: There have been so many posts and changes here. Can you provide a link in the next post to your latest client? Sometime today or tomorrow after you make the above changes, I'd like to run a short test on it using 4 cores on my I7.

Gotta run...busy day for me. I'll probably get a little bit of testing in on cancelling pairs on the Linux client before about 5 PM CST.

Thanks everyone.


Gary

kar_bon 2010-02-24 20:48

[QUOTE=gd_barnes;206569]Which file in the old client needs to be changed? We should go ahead and get this corrected for users who like to "fiddle" with code and stuff.
[/quote]

see post #97 in the first code-block: it's llrnet.lua.

[QUOTE=gd_barnes;206569]
One more thing Karsten: There have been so many posts and changes here. Can you provide a link in the next post to your latest client?[/QUOTE]

i'll use the same link as in post #1 for any new version.

i'll try to implement the other options the next time, not sure if today all of them.

BTW: i thought about another helpful output:
when starting the script, prompt the most important setting from llr-clientconfig.txt at first!

[code]
+-------------------------------------+
| LLRnet client V0.9b7 with cLLR V3.8 |
| K.Bonath, 2010-02-10, Version 0.61 |
+-------------------------------------+

Current configuration:
server = "nplb-gb1.no-ip.org"
port = 9950
username = "kar_bon"
WUCacheSize=1
[/code]

that's what would have saved some time on running and checking for errors at the first tests with the script (you know: forgot to change my username in your settings).

suggestions?


All times are UTC. The time now is 03:13.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.