mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Raiders of the Lost Primes (https://www.mersenneforum.org/forumdisplay.php?f=87)
-   -   Testing.... (https://www.mersenneforum.org/showthread.php?t=13099)

gd_barnes 2010-02-23 23:47

[quote=kar_bon;206511]check, if you deleted all old files like tosend.txt, workfile.txt, workfile.res and check the entries in llr-clientconfig.txt again![/quote]

Oh, I started completely fresh. I've isolated it to a problem in the client code. Here is the message I'm getting coming from llrnet.lua:

./llrnet.lua:536: "end" expected (to close 'function' at line 61) near '<eof>'

It then tries 5 times and wait 60 seconds.

Max, did you change something in llrnet.lua without testing it or did I do something dumb? lol

I'm looking into it now.


Gary

kar_bon 2010-02-23 23:53

[QUOTE=gd_barnes;206513]Oh, I started completely fresh. I've isolated it to a problem in the client code. Here is the message I'm getting coming from llrnet.lua:

./llrnet.lua:536: "end" expected (to close 'function' at line 61) near '<eof>'

It then tries 5 times and wait 60 seconds.

Max, did you change something in llrnet.lua without testing it or did I do something dumb? lol

I'm looking into it now.


Gary[/QUOTE]

there's missing an 'end':

[code]
-- if no unfinished job, then ask a new job to the server
if not t or not k or not n then
--print("Requesting new job from the server ...")
t, k, n = GetPair()
if t and k and n then
print(format(" Fetching WU #1/%d: %s %s",WUCacheSize,k,n))
changed = 1
else
return
end
[color=red] end[/color]
[/code]

gd_barnes 2010-02-24 00:17

1 Attachment(s)
Thanks Karsten. I was also able to figure that out myself and got it working finally.

Max, please test any changes that you make before making a public posting. Nothing extensive. Just simply starting a client to make sure it runs would have saved me a lot of time here. You could connect to any server and start running a pair without completing it. Thanks.

The issue that won't go away: The primes.txt issue is still not working. It's still writing primes to the individual directories instead of one level up. Please test that one one of my Linux quads. It has to be tested on a Linux machine. Feel free to use Jeepford or whatever. You can see the options that I'm using in the clients on Jeepford.

Final thing: I was able to fix the problem with the execution of the ./do.pl command. That works now. I found the fix after googling it. Here is what I typed at the command prompt:

perl -i -pe's/\r$//;' do.pl

That did the trick. What must have happened is that it somehow changed the properties of the program itself. I say that because when I copied do.pl from the directory in which I executed the above command to the other directories, they all worked great.

Based on the above two corrections, I am attaching corrected llrnet.lua and do.pl programs to this posting. Please incorporate them with your Linux client and make the same changes to your Windows client.


Gary

gd_barnes 2010-02-24 01:18

1 Attachment(s)
All 31 cores are now finally running on port 9500. We'll see what happens. That's still a heck of a load and I can see the clients waiting several seconds in between the batches of 5 to get pairs; longer than what I would expect. So it's laboring to get them handed out.

It appears the server is still having issues with not accepting some pairs that it should. Even though the prune period is now set to only 15 mins., it's still showing some early pairs in the joblist.txt from over an hour ago by me and nearly 2 hours ago by Karsten. Only one pair has made it to the rejected.txt file so far.

This may be related to my own set up. Even with a commercial internet account, the inner workings of the server may not be robust enough to handle the load even if the LLRnet software could fully handle it. The highest socket that it has used so far is 29. All of the pairs were n=10K-25K with the exception of some initial primes that were n=10K-100K.

Amazingly, I'm going to blast through this in about 4 hours. Even though I loaded ~180K pairs this time around vs. ~30K last time, the tests just move so fast.

For everyone's reference, I'm attaching the knpairs that I loaded into port 9500 this time. IMPORTANT note: This is a tar.lzma compressed file. Linux chose that because it made it smaller than .bz2 or .gz compressed files that it typically creates. And the good news is that it made it just small enough to attach here. But there was still one problem: The attachments here don't accept a .lzma file extension so I had to change it to a .bz2 extension. The full name of the compressed file created was: knpairs-save.txt.tar.lzma. My guess is that you'll have to somehow get it renamed to that for your compression software to be able to read it. Hopefully someone will be able to read it. Please keep in mind that it's NOT a .bz2 compressed file!


Gary

mdettweiler 2010-02-24 03:21

[quote=gd_barnes;206515]Thanks Karsten. I was also able to figure that out myself and got it working finally.

Max, please test any changes that you make before making a public posting. Nothing extensive. Just simply starting a client to make sure it runs would have saved me a lot of time here. You could connect to any server and start running a pair without completing it. Thanks.

The issue that won't go away: The primes.txt issue is still not working. It's still writing primes to the individual directories instead of one level up. Please test that one one of my Linux quads. It has to be tested on a Linux machine. Feel free to use Jeepford or whatever. You can see the options that I'm using in the clients on Jeepford.

Final thing: I was able to fix the problem with the execution of the ./do.pl command. That works now. I found the fix after googling it. Here is what I typed at the command prompt:

perl -i -pe's/\r$//;' do.pl

That did the trick. What must have happened is that it somehow changed the properties of the program itself. I say that because when I copied do.pl from the directory in which I executed the above command to the other directories, they all worked great.

Based on the above two corrections, I am attaching corrected llrnet.lua and do.pl programs to this posting. Please incorporate them with your Linux client and make the same changes to your Windows client.


Gary[/quote]
Whoa, what the heck? That's weird. For one, I haven't made any changes to any of the .lua files; that's strictly in Karsten's purview. As for the "perl -i -pe's/\r$//;' do.pl thing, now THAT is weird. I just tried doing a diff on your file to compare it to mine and it listed every line as different--yet there's no apparent difference.

Oh, wait a minute! Now I know what was up. Yours has Unix line endings, mine has DOS ones. I can see why that would be enough to throw off the ./do.pl thing. I'm not sure how the command you executed fixed it, but hey, I guess it did. :smile:

Now that I tried converting the line endings in my Linux script to Unix format with a text editor, a diff comparison of the two reveals only two very small differences:
[code]18,19c18,19
< $individualPrimeLog = true;
< $iniOptions = "OutputIterations=10000\n";
---
> $individualPrimeLog = false;
> $iniOptions = "OutputIterations=1000000\n";[/code]
Namely, just differences in configuration. So it seems the line endings were the issue; now that's fixed, and will remain so for future uploaded versions.

Speaking of the individual prime log, though, that's weird...I can see you have the latest version since the "diff" lined up with mine, and I could have sworn that you mentioned before that the fix I applied last night on jeepford did the trick. Are you sure you've got the latest version on whatever machine(s) you're seeing this on?

At any rate, though, yes, I'll test this on one of your quads, later tonight if I can. I can't see a reason why it shouldn't work but perhaps I'll turn something up. :smile:

gd_barnes 2010-02-24 05:14

1 Attachment(s)
No matter. I have to fix all of your problems. LMAO

And no, I didn't say the primes issue had been fixed. You found the variable suffix of $ problem and "thought" that was the problem. I didn't have a chance to test it before you left. Obviously the variable suffix was only part of the problem. Clearly it had something to do with the boolean TRUE. That was just NOT going to work no matter what was tried. See below.

I have now fixed and corrected the following 4 problems. 3 existing ones that have already been mentioned and another one:

1. The earlier issue with the missing END in llrnet.lua. (BTW you said you didn't mess with that. You must have. I pulled it right out of your latest upload and it was different than your upload from yesterday. Even Karsten pulled it right out of your upload and independently found the same problem.)

2. The problem with ./do.pl not working.

3. The problem with primes.txt not writing one level up.

4. New problem: You didn't have an IF statement around the beepOnPrime so it always beeped no matter what you set the variable to.

I fixed the primes issue by changing the variable definition to a literal "TRUE" and checking the value of the literal. I tried repeated different things using the way you did it with the boolean TRUE. No luck at all. I did the same with the beeponprime except that I simply had to add an IF statement. Where was that IF statement anyway?

But we have one potential MAJOR PROBLEM: The server appears to be rejecting a very large majority of primes. This does not appear to be a load related issue. I'm thinking that somehow the formatting that is sent to the server in the tosend.txt file for primes is incorrect.

Attached are updated do.pl and README.txt files. README had to be changed to put quotes around the "TRUE".

BTW, I'm going to suggest a defualt of "FALSE" on both of these. I think most people will want the beeps off and the primes all in one folder. That's the way I've set them in do.pl.

You don't need to mess with anything on Jeepford. The script is virtually perfect as is. I've been testing just on that box using a completely different server.

#1 PRIORITY NOW:

Now to see if I can figure out what is causing the server to not accept so many primes. Max, Karsten, and whomever is available; that is a deal breaker whereas these other things were nit picky issues. Please take a look at how a prime is being formatted on the Linux client when put into tosend.txt to send to the server. I'm fairly confident of that now and that it's a problem only on the Linux client. I was seeing all kinds of rejected messages on the client but virtually no pairs in rejected.txt on the server. Whenever that message occurred, I would look up in the group of 5 pairs processed and there it was; at least one prime every time. Interestingly, not all primes are rejected. But the reverse situation happens all of the time; that is: If there is a rejected message on the client, there is ALWAYS a prime in the group of pairs processed.


Gary

mdettweiler 2010-02-24 06:19

[quote=gd_barnes;206526]1. The earlier issue with the missing END in llrnet.lua. (BTW you said you didn't mess with that. You must have. I pulled it right out of your latest upload and it was different than your upload from yesterday. Even Karsten pulled it right out of your upload and independently found the same problem.)[/quote]
Actually, I didn't mess with it, all I did was swap in Karsten's then-latest llrnet.lua version for yesterday's upload, and didn't test it since I figured he'd already done that. Then when he pointed out the typo today, I fixed it and included that in my upload. :smile:

[quote]2. The problem with ./do.pl not working.

3. The problem with primes.txt not writing one level up.

4. New problem: You didn't have an IF statement around the beepOnPrime so it always beeped no matter what you set the variable to.

I fixed the primes issue by changing the variable definition to a literal "TRUE" and checking the value of the literal. I tried repeated different things using the way you did it with the boolean TRUE. No luck at all. I did the same with the beeponprime except that I simply had to add an IF statement. Where was that IF statement anyway?[/quote]
Ah, whoops, thanks for fixing the beepOnPrime thingy--not sure how I missed that one. :rolleyes: I'm not sure why the straight boolean shouldn't work, though. Sticking a boolean variable in a conditional test should work...unless I'm getting my programming languages confused and Perl can't actually do that. :geek: (It would be rather strange if it couldn't, though, as it would seem to be a basic programming concept.)

Come to think of it, though, I believe I did have similar issues with boolean variables in some earlier scripts I wrote (the GB server status page scripts, I think). I'm beginning to wonder if it really is just an idiosyncracy of Perl.

At any rate, okay, as long as the string literal's working, then we may as well stick with that.

[quote]But we have one potential MAJOR PROBLEM: The server appears to be rejecting a very large majority of primes. This does not appear to be a load related issue. I'm thinking that somehow the formatting that is sent to the server in the tosend.txt file for primes is incorrect.[/quote]
Oh, that's right! The code for generating tosend.txt is taken verbatim from an earlier standalone script I wrote a long while back to do that job for manually cached pairs; as I recall, it never did work well for primes, so I always had to submit those separately with "normal" LLRnet. It was a long enough time ago that I'd completely forgotten.

I'll try to cook up a solution (probably re-write the whole section, it's not too big) either tonight or tomorrow.

[quote]Attached are updated do.pl and README.txt files. README had to be changed to put quotes around the "TRUE".

BTW, I'm going to suggest a defualt of "FALSE" on both of these. I think most people will want the beeps off and the primes all in one folder. That's the way I've set them in do.pl.[/quote]
Okay. I set it to beep by default since Karsten's script does that (and it's not even configurable there). :smile: But I can see why many users wouldn't want that. As for the primes in one folder, I guess the reason why I set that to false was because that's how I'd use it--my clients are all in subfolders under a main prime/ directory, which includes folders for all my other prime search applications, so a primes.txt file would kind of get lost there. :smile:

I've updated the do.pl and readme.txt files on my local copy and uploaded the latest version to the web site. BTW, I changed OutputIterations back to 10,000 rather than 1,000,000 as it was in your version of the script; 1M is rather big for a default value in a client that's going to be distributed to users, since for tests of the sizes that NPLB is primarily doing, having no interim progress reports is definitely not what most people will want.

[quote]You don't need to mess with anything on Jeepford. The script is virtually perfect as is. I've been testing just on that box using a completely different server.

#1 PRIORITY NOW:

Now to see if I can figure out what is causing the server to not accept so many primes. Max, Karsten, and whomever is available; that is a deal breaker whereas these other things were nit picky issues. Please take a look at how a prime is being formatted on the Linux client when put into tosend.txt to send to the server. I'm fairly confident of that now and that it's a problem only on the Linux client. I was seeing all kinds of rejected messages on the client but virtually no pairs in rejected.txt on the server. Whenever that message occurred, I would look up in the group of 5 pairs processed and there it was; at least one prime every time. Interestingly, not all primes are rejected. But the reverse situation happens all of the time; that is: If there is a rejected message on the client, there is ALWAYS a prime in the group of pairs processed.[/quote]
As I mentioned above, I'm quite sure this is an issue with my script. I should be able to fix it pretty easily.

gd_barnes 2010-02-24 06:35

[quote=mdettweiler;206528]Actually, I didn't mess with it, all I did was swap in Karsten's then-latest llrnet.lua version for yesterday's upload, and didn't test it since I figured he'd already done that. Then when he pointed out the typo today, I fixed it and included that in my upload. :smile:


Ah, whoops, thanks for fixing the beepOnPrime thingy--not sure how I missed that one. :rolleyes: I'm not sure why the straight boolean shouldn't work, though. Sticking a boolean variable in a conditional test should work...unless I'm getting my programming languages confused and Perl can't actually do that. :geek: (It would be rather strange if it couldn't, though, as it would seem to be a basic programming concept.)

Come to think of it, though, I believe I did have similar issues with boolean variables in some earlier scripts I wrote (the GB server status page scripts, I think). I'm beginning to wonder if it really is just an idiosyncracy of Perl.

At any rate, okay, as long as the string literal's working, then we may as well stick with that.


Oh, that's right! The code for generating tosend.txt is taken verbatim from an earlier standalone script I wrote a long while back to do that job for manually cached pairs; as I recall, it never did work well for primes, so I always had to submit those separately with "normal" LLRnet. It was a long enough time ago that I'd completely forgotten.

I'll try to cook up a solution (probably re-write the whole section, it's not too big) either tonight or tomorrow.


Okay. I set it to beep by default since Karsten's script does that (and it's not even configurable there). :smile: But I can see why many users wouldn't want that. As for the primes in one folder, I guess the reason why I set that to false was because that's how I'd use it--my clients are all in subfolders under a main prime/ directory, which includes folders for all my other prime search applications, so a primes.txt file would kind of get lost there. :smile:

I've updated the do.pl and readme.txt files on my local copy and uploaded the latest version to the web site. BTW, I changed OutputIterations back to 10,000 rather than 1,000,000 as it was in your version of the script; 1M is rather big for a default value in a client that's going to be distributed to users, since for tests of the sizes that NPLB is primarily doing, having no interim progress reports is definitely not what most people will want.


As I mentioned above, I'm quite sure this is an issue with my script. I should be able to fix it pretty easily.[/quote]


What's up with that llrnet.lua bug Karsten? lol

I'm catching up with your Perl knowledge Max. Better watch out. lol (not likely) Regardless, in looking at the Perl tutorial I've been using, it says nothing about boolean variables like there are in other languages like C and Pascal. Clearly they either don't exist or you need to define them in an entirely different way.

You set the prime thingie to FALSE before? Haven't you had that as TRUE all along? It was me who just now set it to FALSE so it would finally write one level up after I fixed the code. It took forever to get it to write a level up like Karsten's script. Personally the only reason I could ever see for it to be set to TRUE is if you are only running one core on a machine.

Agreed on the beep default being the same as Karsten's script and the default being 10,000 iterations. Karsten, by chance, would you want to add an option to allow the user to turn the beep on or off on the Windows client? Personally I find the beep annoying especially if I was about to go to sleep and I was searching small n-ranges on my 2 machines in my bedroom. :-)

About the iterations, I'm not sure why people would want continual status updates on an individual pair. It only sucks CPU cycles. You have a good idea how long each pair takes by looking in lresults.txt. Now, if you're running tests that take > 1 hour, I could see it. But for our typical 7 to 20 minute tests, I can't see the point. Just my 2 cents. I guess you like to see the iterations so perhaps others do too.

I think the reason I don't like the beep is because it reaks of a warning or doing something wrong like a file being deleted or the delete key hit too many times. Other times, it means a program has gone astray.

It's nice to know that I wasn't hallucinating on the servers not processing many of the pairs. I was getting a little worried that we were getting some serious load-related issues. I could see just a couple of straggling missing pairs with that last load but not the several hundred that I was seeing. At first, I didn't realize that it was mostly primes. (It appears to be some composites too but we'll only know when the formatting of the tosend.txt file is fixed.)


Gary

kar_bon 2010-02-24 06:58

[QUOTE=gd_barnes;206529]Agreed on the beep default being the same as Karsten's script and the default being 10,000 iterations. Karsten, by chance, would you want to add an option to allow the user to turn the beep on or off on the Windows client? Personally I find the beep annoying especially if I was about to go to sleep and I was searching small n-ranges on my 2 machines in my bedroom. :-)
[/QUOTE]

yes, i can implement 2 options to do.bat:
- one for the iterations counting (10000 by default)
- second the beeping (ON/OFF)

and Gary, this beeping is the same thing as with snoring:
if you can't stand it, go to a side room! :grin:

for me it's quite good to beep when do.bat ends. my offline pc got a screen but it's not always switched on. so a beep tells me when do.bat is complete. i don't have to calculate the estimated time, when he is ready and not send much idle time, if i'm not aware of it.

other things to do for me?

mdettweiler 2010-02-24 07:35

[quote=gd_barnes;206529]What's up with that llrnet.lua bug Karsten? lol

I'm catching up with your Perl knowledge Max. Better watch out. lol (not likely) Regardless, in looking at the Perl tutorial I've been using, it says nothing about boolean variables like there are in other languages like C and Pascal. Clearly they either don't exist or you need to define them in an entirely different way.[/quote]
Yeah, that just about settles it then...no boolean variables in perl. That would explain why a lot of scripts I've seen use "1" and "0" integer variables instead.

BTW, way cool on catching up with Perl. It's a really powerful language and since you have somewhat more time available to you to do do such things than I do, I imagine you'll be coming up with all sorts of handy-dandy programs. :smile:

[quote]You set the prime thingie to FALSE before? Haven't you had that as TRUE all along? It was me who just now set it to FALSE so it would finally write one level up after I fixed the code. It took forever to get it to write a level up like Karsten's script. Personally the only reason I could ever see for it to be set to TRUE is if you are only running one core on a machine.[/quote]
Ah, whoops, I meant I set it to [i]true[/i], which meant it would [i]not[/i] put the prime log in the parent directory. :smile:

Yes, most of the time when I'm running an LLRnet or PRPnet client on my dualcore, it's just one core; since my dualcore generally does all my "specialty" jobs (as opposed to my quad which just does straight PRPnet), I rarely have both cores doing full-automatic stuff. The only time I do that is during rallies.

But yes, agreed that most people will probably want to have it in the parent directory. That's how I'd do it on my quad, for example. I guess my dualcore is more of a rare case in that regard, so "false" for that setting makes better sense as a default. :smile:

[quote]Agreed on the beep default being the same as Karsten's script and the default being 10,000 iterations. Karsten, by chance, would you want to add an option to allow the user to turn the beep on or off on the Windows client? Personally I find the beep annoying especially if I was about to go to sleep and I was searching small n-ranges on my 2 machines in my bedroom. :-)[/quote]
lol, yeah, I can see how it would be annoying. I'll sometimes use beeps as a way of telling myself that a job is done (since I usually turn off my monitor when I'm not actively using the computer), though that's fallen somewhat into disuse now that I've discovered just how handy stringing up multiple jobs on the command line can be. :smile:

[quote]About the iterations, I'm not sure why people would want continual status updates on an individual pair. It only sucks CPU cycles. You have a good idea how long each pair takes by looking in lresults.txt. Now, if you're running tests that take > 1 hour, I could see it. But for our typical 7 to 20 minute tests, I can't see the point. Just my 2 cents. I guess you like to see the iterations so perhaps others do too.[/quote]
Hmm, I actually guessed you'd want it the opposite way, so you can see the progress. Not having a somewhat-longish test (>30 seconds or so) give progress updates at least periodically kind of drives me nuts in the same sort of way really high n-ranges drive you nuts. :smile: I figure that even when updating every 10000 iterations on a 50K test, the extra CPU time used to do the updates is so negligible that it's not worth worrying about.

One other consideration that I can see being an issue with this is screen spam; the way LLR does its screen output, for most k*2^n-1 numbers the status line is just longer than the size of a standard console window, so each update rolls over into a new line. (This issue doesn't usually happen for PRP or Proth tests, since the word "bit" is a lot smaller than "iteration", which is enough to make it fit.) One way to fix this is to expand the console window to a larger width, say 90 characters (instead of the standard 80). Or one could just use the GUI LLR which I think has plenty of room to avoid rollovers anyway. :smile:

gd_barnes 2010-02-24 08:53

Max, you like to sit there and watch any test > 30 secs. update its iterations? You must have a lot of time on your hands. hardy-har-he-he

I believe I have the primes submission issue fixed. It appears to be just a matter of putting "something" in the residue field, which is subsequently ignored.

Current LLRnet just puts the residue from the previous pair in there. One problem with that: If the first pair or pairs in the file are prime, the residue field is blank and it loses the prime(s)! My solution: Put 16 x's in the residue field on every prime. I'm still testing to see if it works. Initial testing seems to indicate that it does.

Karsten, what about your Windows .awk script? How does it handle it? Will it work correctly if the first pair or pairs in the knpairs.txt file are prime? I don't remember it having a problem with any primes, even ones at the beginning of the file but I may not have been specifically looking for it because I didn't have the cores to stress test it a lot.

On another topic: Should we use Karsten's client with the "awk" script or Max's client with the Perl script for the public Windows client?

Yet another topic: I haven't yet tested the do.pl -c logic yet in the Linux client. I've encountered too many other issues to test it yet.

A mistake I made in all of this was assuming that the Linux client would be virtually perfect since the Windows client was but it turned out to have several problems. I started out assuming that many of the issues were load related, which took quite a bit extra time. What I should have done is just continue testing on one quad until all of the issues were out of the way. After encountering all of the pairs (that mostly turned out to be primes) not submitting on the big test today, I've found and been able to fix all 5 issues (including this prime submission one) just by testing on one quad.

I'll test this prime submission issue a little more tonight and if it looks good, I'll post an updated script. The defaults as discussed by Max will be set to 10000 iterations with the beep set to true and individualprime set to false. I'll then be able to (hopefully) get a clean major stress test on Weds.


Gary


All times are UTC. The time now is 22:30.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.