mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Raiders of the Lost Primes (https://www.mersenneforum.org/forumdisplay.php?f=87)
-   -   Testing.... (https://www.mersenneforum.org/showthread.php?t=13099)

gd_barnes 2010-02-23 06:02

Just got home now. Looking at things now.

Sorry you missed all the admins online when you were on a couple of hours ago Karsten. I was on for 3-1/2 hours this afternoon and will be on for another 3 hours now.

Max, this pruning thing is really starting to bother me but I think it's something that's existed in LLRnet for a long time. It seems that it takes far longer to prune pairs than it should.

Anyway, here is what I'm not understanding:
1. The first few pairs of the file that were all small primes for k=3 (n=1K-10K primes) were shown as immediately rejected by the client.
2. When I look in the rejected file on the server for the rejected client results in #1, they aren't there.
3. When I look in the regular results on the server for the rejected client results in #1, 1 out of 5 of them ARE there.
4. When I look in joblist.txt for the rejected client results in #1, 4 out of 5 of them ARE there.

The rejected client pairs are:
3 1274
3 3276
3 4204
3 5134
3 7559

Pairs still in joblist.txt and knpairs.txt:
3 3276
3 4204
3 5134
3 7559

Pair in results.txt on the server:
3 1274

So for some reason, the server wouldn't "take" 4 out of the 5 small k=3 primes results.

Please note that these are NOT In the rejected SERVER pairs. They only show as rejected on the client.

I think what I'm going to do is stop the server, clear everything out completely, and reload the server. Unfortunately I didn't save the pairs that I loaded. (Big mistake. I don't know what I was thinking.) I'll make sure I save them this time and possibly post them here. I'll also keep the files from this first big run. I'll put a file name extension of "-1st" on them. I'm also changing that primes.txt file option. I'd like to see all of the primes from all 4 cores in one directory on each machine. That will be cool. :-)


Gary

kar_bon 2010-02-23 06:27

so it's a pruning error?

i thought of this: the knpairs-file on the server contains a blank line or something else because this error occurs almost instantly when k=209 was at n=250k (with your 31 cores grabbing pairs my last results not sent was 30000000000000:M:1:2:258 2009 249720 -2 0AE06F5C6CAB155A).

i started the script for G4000 then and just got connection errors half an hour ago!
why?

can't connect to G4000!

mdettweiler 2010-02-23 06:46

[quote=gd_barnes;206430]Just got home now. Looking at things now.

Sorry you missed all the admins online when you were on a couple of hours ago Karsten. I was on for 3-1/2 hours this afternoon and will be on for another 3 hours now.

Max, this pruning thing is really starting to bother me but I think it's something that's existed in LLRnet for a long time. It seems that it takes far longer to prune pairs than it should.

Anyway, here is what I'm not understanding:
1. The first few pairs of the file that were all small primes for k=3 (n=1K-10K primes) were shown as immediately rejected by the client.
2. When I look in the rejected file on the server for the rejected client results in #1, they aren't there.
3. When I look in the regular results on the server for the rejected client results in #1, 1 out of 5 of them ARE there.
4. When I look in joblist.txt for the rejected client results in #1, 4 out of 5 of them ARE there.

The rejected client pairs are:
3 1274
3 3276
3 4204
3 5134
3 7559

Pairs still in joblist.txt and knpairs.txt:
3 3276
3 4204
3 5134
3 7559

Pair in results.txt on the server:
3 1274

So for some reason, the server wouldn't "take" 4 out of the 5 small k=3 primes results.

Please note that these are NOT In the rejected SERVER pairs. They only show as rejected on the client.

I think what I'm going to do is stop the server, clear everything out completely, and reload the server. Unfortunately I didn't save the pairs that I loaded. (Big mistake. I don't know what I was thinking.) I'll make sure I save them this time and possibly post them here. I'll also keep the files from this first big run. I'll put a file name extension of "-1st" on them. I'm also changing that primes.txt file option. I'd like to see all of the primes from all 4 cores in one directory on each machine. That will be cool. :-)


Gary[/quote]
I'm not entirely sure what happened here so agreed, probably best to clean out and reload the server to make sure this wasn't a fluke from some boo-boo in one of the files or something like that. BTW, when you restart the server, try changing prunePeriod to 15 minutes in llr-serverconfig.txt. That should make the pruning less of an issue.

mdettweiler 2010-02-23 06:46

[quote=kar_bon;206432]so it's a pruning error?

i thought of this: the knpairs-file on the server contains a blank line or something else because this error occurs almost instantly when k=209 was at n=250k (with your 31 cores grabbing pairs my last results not sent was 30000000000000:M:1:2:258 2009 249720 -2 0AE06F5C6CAB155A).

i started the script for G4000 then and just got connection errors half an hour ago!
why?

can't connect to G4000![/quote]
Man, you're right, that is weird...I can't connect to the server machine at all. I wonder if something went kapooey over on Gary's end?

kar_bon 2010-02-23 06:52

want receive 100 WU's for offline pc, got only 50 at once, and error that pairs won't accepted: someone others did them. and i could not send all results!

got to go to work and no new pairs for my i7 and laptop! sh...

mdettweiler 2010-02-23 06:55

[quote=kar_bon;206436]want receive 100 WU's for offline pc, got only 50 at once, and error that pairs won't accepted: someone others did them. and i could not send all results!

got to go to work and no new pairs for my i7 and laptop! sh...[/quote]
Eh? That's weird. I'm not even getting into the server, so I'm not sure how you even got 50 workunits, let alone 100.

gd_barnes 2010-02-23 07:11

1 Attachment(s)
OK, guys, you way jumped the gun on me. From my last post, I hadn't stated that I had cleared everything out and reloaded the server yet. I've been playing around with some things; starting and stopping the server a couple of times and re-clearing some things. I didn't think anyone was around. Sorry.

Anyway, port 9950 has now been officially loaded back up and will remain going now. Max, attached are the pairs that I loaded into it.

2 problems:

1. I changed the appropriate option to false in the do.pl program but the primes are still writing to primes.txt in the individual directories instead of one directory above. Can you run a specific test on that on your end?

2. I changed the iterations to 1000000 in do.pl yet it's still displaying every 10000 iterations. (This sure seems like a tough thing to get rid of! Why is the default so small?) The continual extra display is driving me batty. lol Anyway, I made sure there was no previously existing .ini file in each directory.

One more thing: Don't forget about the problem trying to quit out of the clients when they can't get pairs. It is a serious major hassle to stop them and is part of the reason it took me a while to stop-start all of my clients. What I finally had to do after hitting Ctl-C several times on each (which turned out to not be necessary) is go to the system manager and kill all 4 instances of do.pl followed by killing all 4 instances of llrnet. If I only killed do.pl, the clients would try to "come back". It was really weird.


Gary

mdettweiler 2010-02-23 07:54

[quote=gd_barnes;206438]OK, guys, you way jumped the gun on me. From my last post, I hadn't stated that I had cleared everything out and reloaded the server yet. I've been playing around with some things; starting and stopping the server a couple of times and re-clearing some things. I didn't think anyone was around. Sorry.

Anyway, port 9950 has now been officially loaded back up and will remain going now. Max, attached are the pairs that I loaded into it.

2 problems:

1. I changed the appropriate option to false in the do.pl program but the primes are still writing to primes.txt in the individual directories instead of one directory above. Can you run a specific test on that on your end?[/quote]
Oh! Duh, I see it now. You see this bit of code down in the checkForPrimes() subroutine?
[code] # If individualPrimeLog is set to true, we put primes.txt in the working directory.
# Otherwise, we put it in the parent directory.
if([B]individualPrimeLog[/B]) { open(PRIMELOG, ">>primes.txt"); }
else { open(PRIMELOG, ">>", "../primes.txt"); }
print PRIMELOG $line . "\n";
close(PRIMELOG);
# If beepOnPrime is set to true, then beep (note: may not be supported on all configurations)
print "\a";[/code]
The part that I put in bold needs to be [B]$individualPrimeLog[/B] instead. I had a brain fart and forgot I was programming in Perl for a moment. :rolleyes: I'll upload corrected files shortly.

[quote]2. I changed the iterations to 1000000 in do.pl yet it's still displaying every 10000 iterations. (This sure seems like a tough thing to get rid of! Why is the default so small?) The continual extra display is driving me batty. lol Anyway, I made sure there was no previously existing .ini file in each directory.[/quote]
Did you stop and restart do.pl after making the change? It won't take effect until you do so. Also, keep in mind that it won't take effect until the next k/n pair [i]after[/i] the one currently in progress when you stopped the program to change it; the script only writes out llr.ini at the beginning of each batch (otherwise it would mess up processing of the batch).

[quote]One more thing: Don't forget about the problem trying to quit out of the clients when they can't get pairs. It is a serious major hassle to stop them and is part of the reason it took me a while to stop-start all of my clients. What I finally had to do after hitting Ctl-C several times on each (which turned out to not be necessary) is go to the system manager and kill all 4 instances of do.pl followed by killing all 4 instances of llrnet. If I only killed do.pl, the clients would try to "come back". It was really weird.[/quote]
Yes, as I mentioned before, I'll look into that; I haven't had time just yet but hopefully can do it tomorrow.

BTW, on a completely different topic, I never did get the chance to load up G6000; can you do that? Thanks. :smile:

gd_barnes 2010-02-23 07:57

OK on #1. Glad that's an easy fix.

On #2, I've started-stopped clients many times in all of this. I changed the # of iterations way earlier in the evening. There is definitely an issue there.

I'll load port 6000 tomorrow. Vaughan has pulled most cores off of it also and without me on it until tomorrow, it can wait now.

mdettweiler 2010-02-23 08:38

[quote=gd_barnes;206440]OK on #1. Glad that's an easy fix.

On #2, I've started-stopped clients many times in all of this. I changed the # of iterations way earlier in the evening. There is definitely an issue there.

I'll load port 6000 tomorrow.[/quote]
After some further discussion with Gary over chat, I was able to squash #2. Gary, as you saw I applied the fix to the clients on jeepford, but those don't have #1 fixed; I'd recommend downloading the latest do.pl (which I just uploaded) and swapping them out.

gd_barnes 2010-02-23 08:50

[quote=mdettweiler;206443]After some further discussion with Gary over chat, I was able to squash #2. Gary, as you saw I applied the fix to the clients on jeepford, but those don't have #1 fixed; I'd recommend downloading the latest do.pl (which I just uploaded) and swapping them out.[/quote]

I thought you fixed #1 on Jeepford also.


All times are UTC. The time now is 03:07.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.