mersenneforum.org Testing....
 Register FAQ Search Today's Posts Mark Forums Read

 2010-02-22, 21:10 #45 gd_barnes     May 2007 Kansas; USA 100111101111112 Posts I'm getting more confused by the minute but I'll just push forth. Unfortunately guys, you're going to have to completely ignore my prior timeframes. They were applicable only if I encountered no confusion but I immediately am having a problem with the Linux client. This is on Jeepford only with the cache set to 5. I haven't tried the others yet. 1. When I type: ./do.pl at the command prompt, I get: Code: bash: ./do.pl: /usr/bin/perl^M: bad interpreter: No such file or directory 2. When I type: perl do.pl at the command prompt, I get: Code: (general greeting message) followed by: llr-clientconfig.txt.3: unexpected symbol near "=" llr-clientconfig.txt.3: unexpected symbol near "=" llr-clientconfig.txt.3: unexpected symbol near "=" llr-clientconfig.txt.3: unexpected symbol near "=" llr-clientconfig.txt.3: unexpected symbol near "=" Error: could not connect to server after 5 tries. Most likely there is a problem either with your connection or the server. Sleeping 60 seconds. Jeepford has been running LLRnet on gb port 6000 so I know its connection is good. I'll look into a little more but I just wanted people to know it's going to be a while before I get the real test cases loaded into the server and even longer before I get all machines going. Max, on more thing. I don't see how you could truly test a Linux client on a Windows machine. The executables (binaries) are entirely different except for the Perl script. I would have thought that you would have used one of my quads to test it. I'm mentioning this because the above errors appear to be related to the fact that it was not tested on a Linux machine. (sorry if I'm wrong; just a feeling) I read the README but I can't see anything that I missed. Assuming that I get these problems on a 2nd machine, I'm going have to ask that Max run a test on a Linux machine before I load it on any more of my machines. Gary Last fiddled with by gd_barnes on 2010-02-22 at 21:21
 2010-02-22, 21:15 #46 kar_bon     Mar 2006 Germany 17×167 Posts check this: in Unix the textfiles are formatted different than in WIN: CR versus CR+LF! username properly set? LLRnet will quit when user is nobody! Last fiddled with by kar_bon on 2010-02-22 at 21:16
 2010-02-22, 21:43 #47 gd_barnes     May 2007 Kansas; USA 52×11×37 Posts I thought the same thing Karsten but haven't found that makes any difference. BUT...I did find my problem. I had assumed that Max put a server in there. It was left blank causing the bad unexpected symbol near "=" message. I could not figure out where that was coming from right away. I completely missed it so my bad. When I put in 9950, it worked. That said, it only worked with the command: perl do.pl But still gave the same error message about a bad command or interpreter when I tried: ./do.pl Max, I checked that Perl is in the appropriate usr/bin directory. I also checked that the properties allow the do script to be executed as a program. Is there some other setting that needs to be tweaked on my machines? I've actually run Perl scripts before on my machines using that structure of command with no problem. I think that most experienced Linux users will expect the ./do.pl command to work. I'm loading it onto the lion's share of my machines now. After that, I'll then set up the appropriate test cases in the servers. Last fiddled with by gd_barnes on 2010-02-22 at 21:44
 2010-02-22, 21:50 #48 kar_bon     Mar 2006 Germany 17×167 Posts that's why i used 'standard' DOS-batch in WIN! gawk and wait are small programs without any setup. when the servers ready, give the server/port details!
 2010-02-22, 22:32 #49 gd_barnes     May 2007 Kansas; USA 52×11×37 Posts OK, after various interruptions from "real life", I have all of the clients on my machines. I'm working on the servers now. Karsten, the ports will be 9950 and 9975. I'll let you know when they are ready to go. I will shut down 9950 shortly so that I can reload it with some different stuff.
 2010-02-22, 23:18 #50 gd_barnes     May 2007 Kansas; USA 52×11×37 Posts OK, guys I have ports 9950 and 9975 loaded with various misc. tests. Note on 9950: I've deleted all previous pairs and emptied out the joblist.txt. If you try to return a pair from before, it will be rejected. My suggestion is to delete your workfile.txt and tosend.txt files and begin anew. Port 9950 has: A whole bunch of small primes (I've already tested those; see problem below.) A bunch of k=3 thru 99 pairs for n=5000 to n=7500. A bunch of n=100K-120K tests. Port 9975 has: Essentially what is in the 11th drive. A bunch of n=~540K pairs. Port 9950 is the stress test port. Max, on your primes.txt file, it is in the incorrect place. It needs to be one level above the clients like what Karsten has. As it is, you would still have to look in all of your cores to see the primes. With the way Karsten has it, you would only have to look once per machine, regardless of the # of cores. Sure got a lot of beeps...that's good. I put a lot of primes in there. Gary
 2010-02-22, 23:25 #51 mdettweiler A Sunny Moo     Aug 2007 USA (GMT-5) 624910 Posts Okay, there's been way too many messages here since I last checked the forum for me to respond to each one individually, so I'll try to summarize: @Gary: Both scripts should work on Windows equally well, but since Karsten's only uses bult-in Windows functions (as opposed to mind which requires Perl), I expect that his will be more readily usable for most Windows users. Mine's primarily targeted at Linux setups, where Perl is commonly available. The reason why ./do.pl doesn't work is not entirely clear to me at this moment; obviously the privileges are set properly since it's at least attempting to interpret it, but something's amiss with the first line of the file. At any rate, though, don't worry about it; "perl do.pl" will always work, so if I can't get the problems with ./do.pl figured out in short enough order I'll just change that part of the documentation. BTW, as for the stuff that needs to be configured in llr-clientconfig.txt: it's all right there in the documentation. About how the client behaves when it finds a prime: yay, the beeps worked! (Those don't work on my setup so I couldn't test them.) As for the primes.txt file being in the same directory, that's in the documentation too. My script has the option to do it either that way, or in the parent directory like Karsten's script does.
 2010-02-22, 23:45 #52 gd_barnes     May 2007 Kansas; USA 52·11·37 Posts Max, I read the documentation but it was so long that I missed the part about where the primes would be written. :) Cool option. Gonna have to remove some of the "fluff" there. Regardless, nice work on all of the details. Sorry I missed that one and the fact that the clients didn't already have the port in them. OK, I found what appears to be a "real" problem: When the server is offline and you try to start a client, it just sits there with a few cryptic messages and won't exit to the command prompt. Multiple attempts at hitting Ctl-C wouldn't cause it to properly exit. It took 3 kills from the system manager to finally really kill it. That's probably something that needs to be looked into. I have 31 cores running port 9950 right now. Unfortunately I gotta go now and will be way for about 5 hours. Max, can you check port 9950. My 31 cores are blowing through it. It may need to be reloaded with some stuff. Not sure what at this point. Also, can you load up NPLB port 6000? It's running a little low. It would be critically low if my cores weren't pulled off of it right now. Sorry, gotta run but my cores will be on 9950 for the next few hours. Gary
2010-02-23, 00:14   #53
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

3·2,083 Posts

Quote:
 Originally Posted by gd_barnes Max, I read the documentation but it was so long that I missed the part about where the primes would be written. :) Cool option. Gonna have to remove some of the "fluff" there. Regardless, nice work on all of the details. Sorry I missed that one and the fact that the clients didn't already have the port in them. OK, I found what appears to be a "real" problem: When the server is offline and you try to start a client, it just sits there with a few cryptic messages and won't exit to the command prompt. Multiple attempts at hitting Ctl-C wouldn't cause it to properly exit. It took 3 kills from the system manager to finally really kill it. That's probably something that needs to be looked into.
Okay, I'll look into it.

Quote:
 I have 31 cores running port 9950 right now. Unfortunately I gotta go now and will be way for about 5 hours. Max, can you check port 9950. My 31 cores are blowing through it. It may need to be reloaded with some stuff. Not sure what at this point.
I just looked at it and it's at about n=153K on k=2009 (which appears at first glance to be the only k loaded in for that range). My guess is that it will last for the 5 hours until you get home and can make a decision as to what to put in there after that. I'll keep an eye on it, though; if it runs out, I've got a sieve file of smallish base 3 numbers that I use for testing PRPnet that I can load in there. That would be quite an interesting test, though the server would have to dry before I could load it due to the change in NewPGen header.

Of course, that's assuming I can even find the base 3 file on my hard drive somewhere...it seems to have gotten lost. Oh well, if I can't find it then I'm sure there's plenty of base 2 stuff I could come up with.

Quote:
 Also, can you load up NPLB port 6000? It's running a little low. It would be critically low if my cores weren't pulled off of it right now.
Okay, I'll do that later today if I get the chance. (If it isn't done by ~1 AM your time tonight, go ahead and load it yourself. )

 2010-02-23, 03:31 #54 kar_bon     Mar 2006 Germany 17·167 Posts just getting an error after k=2009 at port 9950 is done: llrnet.lua:80: bad argument #3 to format' (string expected, got nil) the line is print(format(" Fetching WU #1/%d: %s %s",WUCacheSize,k,n)) so k/n is empty! no pairs? other header in the server? restart brings up the same error! the script ended after 5 retries to getting a new job from the server but this error in llrnet.lua i have to look at then! about 1650 pairs were done with WUCacheSize=2 without error! it's 4:30 AM here and only woke up 20 min ago and thought i should look at the test! and nobody else is online :-( where're the admins when they needed! i think we have to talk about this i have to go to bed again. up again in 3 hours! Last fiddled with by kar_bon on 2010-02-23 at 03:51
2010-02-23, 05:20   #55
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

3×2,083 Posts

Quote:
 Originally Posted by kar_bon just getting an error after k=2009 at port 9950 is done: llrnet.lua:80: bad argument #3 to format' (string expected, got nil) the line is print(format(" Fetching WU #1/%d: %s %s",WUCacheSize,k,n)) so k/n is empty! no pairs? other header in the server? restart brings up the same error! the script ended after 5 retries to getting a new job from the server but this error in llrnet.lua i have to look at then! about 1650 pairs were done with WUCacheSize=2 without error! it's 4:30 AM here and only woke up 20 min ago and thought i should look at the test! and nobody else is online :-( where're the admins when they needed! i think we have to talk about this i have to go to bed again. up again in 3 hours!
There seem to be quite a number of candidates in the server, though perhaps they just haven't been pruned yet. Gary, I'm guessing you're home by now; I'll leave it up to you to decide what to load in there next.

 Similar Threads Thread Thread Starter Forum Replies Last Post kladner Soap Box 3 2016-10-14 18:43 GARYP166 Information & Answers 9 2009-02-18 22:41 gd_barnes Riesel Prime Search 20 2007-11-08 21:13 grobie Marin's Mersenne-aries 1 2006-05-15 12:26 eepiccolo Math 6 2006-03-28 20:53

All times are UTC. The time now is 06:36.

Sat Aug 8 06:36:49 UTC 2020 up 22 days, 2:23, 1 user, load averages: 1.79, 1.83, 1.88