mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > No Prime Left Behind

Reply
 
Thread Tools
Old 2009-12-01, 16:43   #1255
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
OK, I didn't shut it down.

See the edit in my above post. The problem happens only every other k-value and it is continuing right up to this moment. How could a carriage control get messed up on each change in k-value? Did you do an unusual sorting routine? My take is that there is a missing or extra carriage control whenever there is a change in k. That causes it to go from the correct order to the incorrect order and back again.

One more thing: We need to somehow back off all k>2400 results and primes from the NPLB DB counts and scores. The teeny primes will skew those big time. (Well, maybe not the scores but it will the counts.)


Gary
Hmm...strange. It looks like something may have gone awry when I loaded k=2400-2600 into the server. That would explain Bruce's problems too--the server would have been going nuts trying to hand out all those really tiny k/n pairs, not to mention the extra headers sprinkled throughout the file. Since the problem started right at k=2400, the easiest thing to do would be to re-do everything from that point on.

When Dave gets back from wherever he is, I'll send him some criteria for what to remove from the DB. That is, everything that's from G7000 and has n<50000, or k>2400.
Quote:
Originally Posted by gd_barnes View Post
I assume you mean server; as in singular port 7000. I assume there's no reason to shut down more than just this one server.

Edit: Getting off now. I pulled an all-nighter getting the CRUS web pages updated. I still have more to go. Good luck!
Yes, I only shut down G7000. However, the process of shutting down an LLRnet server with the loop thingy actually involves shutting down all the servers, restarting the VNC server (i.e. log off/log on), and then restarting the appropriate servers (in this case, everything but G7000). And because of the weird problems I mentioned earlier with the VNC server, I've put the servers on the console session.
mdettweiler is offline   Reply With Quote
Old 2009-12-01, 16:54   #1256
Brucifer
 
Brucifer's Avatar
 
Dec 2005

313 Posts
Default

Well not that I'm happy to see problems by any means at all, but I'm glad that you found something strange, cause it just didn't/doesn't make any sense to me at all from this side of the fence.

SSSo what might be the chances of getting a comparable prpnet server for stuff like the 7000 pairs. I'm curious now how prpnet would handle tons of the small stuff, and of course rather than running 3 pairs per load, it would need to be like 20 or so. If you could do that max, I'd put some systems testing that hard. Since I'm the only one running 7000 stuff anyway, might just be a good time to try it with having to re-do some of the pairs and all anyway. ??????
Brucifer is offline   Reply With Quote
Old 2009-12-01, 17:01   #1257
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Quote:
Originally Posted by Brucifer View Post
Well not that I'm happy to see problems by any means at all, but I'm glad that you found something strange, cause it just didn't/doesn't make any sense to me at all from this side of the fence.

SSSo what might be the chances of getting a comparable prpnet server for stuff like the 7000 pairs. I'm curious now how prpnet would handle tons of the small stuff, and of course rather than running 3 pairs per load, it would need to be like 20 or so. If you could do that max, I'd put some systems testing that hard. Since I'm the only one running 7000 stuff anyway, might just be a good time to try it with having to re-do some of the pairs and all anyway. ??????
I was thinking the same thing myself. Since we're going to have to re-do everything from k=2400 up, it makes little difference whether we do it in the existing LLRnet server or in a new PRPnet one. My guess is that PRPnet should handle the small k/n pairs pretty well, especially if you set your queue size to something like 20 as you suggested.

Gary, what do you think of turning G7000 into a PRPnet server for k=2400+? I know we don't have email notification working yet, but we've never used email notification for G7000 anyway since it's non-top-5000 work.
mdettweiler is offline   Reply With Quote
Old 2009-12-02, 02:39   #1258
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

236238 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
I was thinking the same thing myself. Since we're going to have to re-do everything from k=2400 up, it makes little difference whether we do it in the existing LLRnet server or in a new PRPnet one. My guess is that PRPnet should handle the small k/n pairs pretty well, especially if you set your queue size to something like 20 as you suggested.

Gary, what do you think of turning G7000 into a PRPnet server for k=2400+? I know we don't have email notification working yet, but we've never used email notification for G7000 anyway since it's non-top-5000 work.
Funny thing is that also crossed my mind too. OK, let's do it. Set it up. This will be an excellent stress test for a PRPnet server and we don't have to worry about Email notification. When Bruce is ready and wants to, he can move a number of machines to it. I'll probably even chip in for a day or so with 20-30 cores. We want to see if PRPnet can handle 60+ cores at a low n-range on my server. If it can, we may be able to have the 1st ever...PRPnet rally on a large n-range!

Hopefully PRPnet won't have problems with carriage control characters (CCCs) like LLRnet does. Max, I'm fairly certain this is what happened when you loaded the file because it is almost the only explanation:

1. You took the original k=2400-2600 file that was sorted by n-value and did an srfile with the -g switch, which creates a separate file for each k (or in some other manner ended up with a separate file for each k).
2. You manually copied or used a script to merge all of the files and removed the extraneous headers.
3. In the manual or automated removal of the extraneous headers in one big file, you either didn't do a carriage return or otherwise somehow left a rogue or extraneous CCC in between each change in k value (as a result of there originally being one file for each k).

There is virtually no other explanation for what would cause every other k-value to process in reverse order. It's a rogue CCC with each change in k-value.

I'm confident of this because I encountered a problem like this with CCCs way back with Ironbits. Only my problem wasn't nearly as bad. It was only at the end of a large file (it happened twice before I figured out what happened). So subsequent pairs ALL processed incorrectly. What I had to do each time I sent Ironbits a file is make sure there was NO carriage return at the BEGINNING of the file but make sure there WAS a carriage return at the END of the file. That worked perfect every time. On the times that it went bad, I didn't have a CCC at the end of the file.

As for the way you did it, that's the way I would change a large file from n-value to k-value order, that is using srfile to create a separate file for each k. I'd then go to the DOS prompt and copy them all together into one file (DOS copies in file name order so that works well). I then copied the big file into Excel, used a few IF statements to remove the extraneous headers, and resorted it to make sure it was in k-value order and to remove the blank lines caused by the IF statements. By doing that, there was never a bad CCC.

IMHO, this is a serious bug in LLRnet's parsing that should have been corrected a very long time ago. There's no excuse for LLRnet picking up the 1st value in a k/n pair as the n-value and the 2nd value as the k-value. It should easily be able to tell that a new line is the same as a space regardless of the CCC.

Alas, I hope this issue won't exist with PRPnet.

Bruce, I'm sorry you'll wind up losing what amounted to a day or more processing on a large # of machines as far as pairs/scores are concerned in the DB. But to leave them in there would make a big mess out of the DB. We'd wind up with a lot of duplicates as we retest and many primes that aren't applicable to the NPLB. The problems that you likely had were because the server was trying to hand out teeny tests (n=2400-2600 somewhere, which process in < 1 sec.) and it couldn't handle the load.

Max, please put it on a stickie or something to remind yourself to make sure that all appropriate results and primes get deleted from the DB. Also before starting this process, please make sure that ALL k<2400 have been processed. Because it would be easy to inadvertantly delete more pairs then intended (say from some other process that we can't remember at the moment), below is programmatically what I think should be deleted:

Date must be > 2009-11-20
-and-
[
(2400<k<2600
-and-
50K<=n<=250K )
-or-
( k>=50K
-and-
2400<n<2600 )
]

Please be very careful that this criteria is exactly followed. We don't want to delete more (or less) than necessary.

(caps for emphasis; not yelling)


Gary
gd_barnes is offline   Reply With Quote
Old 2009-12-02, 02:48   #1259
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

100111100100112 Posts
Default

Max,

A few more things:

Since we're creating a new PRPnet server for the 12th drive, cleaning up the DB becomes a higher priority then interfacing the PRPnet results and primes stats to the DB. If we interface the stats first, we'll wind up with duplicated results and primes, which will make for an even bigger mess. The cleanup must occur first.

This would be easy to forget when setting up the new PRPnet server: Please set whatever switch is necessary to make sure it is handed out in k-value order.

We previously talked about doing Tim's change to the various PRPnet servers so that the prpnet.candidates file is sorted by n instead of k. Obviously this one would be an exception. We want it sorted by k. It's the only server (at this point anyway) that we really need sorted by k.

I will change the appropriate threads to reflect the removal of k=2400-2600 from the LLRnet server.

What port will the new server be?


Gary
gd_barnes is offline   Reply With Quote
Old 2009-12-02, 06:18   #1260
Brucifer
 
Brucifer's Avatar
 
Dec 2005

313 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
Bruce, I'm sorry you'll wind up losing what amounted to a day or more processing on a large # of machines as far as pairs/scores are concerned in the DB. But to leave them in there would make a big mess out of the DB. We'd wind up with a lot of duplicates as we retest and many primes that aren't applicable to the NPLB. The problems that you likely had were because the server was trying to hand out teeny tests (n=2400-2600 somewhere, which process in < 1 sec.) and it couldn't handle the load.
Gary
Ya sure............... that's it Gary........... I was getting too close to you and you couldn't hack the pressure so you had to come up with some extravagant plan to cut me out of points and trash some primes.................... You may have the others fooled....... but you haven't fooled me................ LOL

As I said before, I'm just glad that there is an identifiable culprit for this stuff cause it really had me scratching my head. No idea where to start looking. And luckily there was only a day and a half or so of running on the new load there, so it isn't that bad a hit.
Brucifer is offline   Reply With Quote
Old 2009-12-02, 15:30   #1261
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
Funny thing is that also crossed my mind too. OK, let's do it. Set it up. This will be an excellent stress test for a PRPnet server and we don't have to worry about Email notification. When Bruce is ready and wants to, he can move a number of machines to it. I'll probably even chip in for a day or so with 20-30 cores. We want to see if PRPnet can handle 60+ cores at a low n-range on my server. If it can, we may be able to have the 1st ever...PRPnet rally on a large n-range!

Hopefully PRPnet won't have problems with carriage control characters (CCCs) like LLRnet does. Max, I'm fairly certain this is what happened when you loaded the file because it is almost the only explanation:

1. You took the original k=2400-2600 file that was sorted by n-value and did an srfile with the -g switch, which creates a separate file for each k (or in some other manner ended up with a separate file for each k).
2. You manually copied or used a script to merge all of the files and removed the extraneous headers.
3. In the manual or automated removal of the extraneous headers in one big file, you either didn't do a carriage return or otherwise somehow left a rogue or extraneous CCC in between each change in k value (as a result of there originally being one file for each k).

There is virtually no other explanation for what would cause every other k-value to process in reverse order. It's a rogue CCC with each change in k-value.

I'm confident of this because I encountered a problem like this with CCCs way back with Ironbits. Only my problem wasn't nearly as bad. It was only at the end of a large file (it happened twice before I figured out what happened). So subsequent pairs ALL processed incorrectly. What I had to do each time I sent Ironbits a file is make sure there was NO carriage return at the BEGINNING of the file but make sure there WAS a carriage return at the END of the file. That worked perfect every time. On the times that it went bad, I didn't have a CCC at the end of the file.

As for the way you did it, that's the way I would change a large file from n-value to k-value order, that is using srfile to create a separate file for each k. I'd then go to the DOS prompt and copy them all together into one file (DOS copies in file name order so that works well). I then copied the big file into Excel, used a few IF statements to remove the extraneous headers, and resorted it to make sure it was in k-value order and to remove the blank lines caused by the IF statements. By doing that, there was never a bad CCC.

IMHO, this is a serious bug in LLRnet's parsing that should have been corrected a very long time ago. There's no excuse for LLRnet picking up the 1st value in a k/n pair as the n-value and the 2nd value as the k-value. It should easily be able to tell that a new line is the same as a space regardless of the CCC.

Alas, I hope this issue won't exist with PRPnet.
Actually, I'm not so sure the problem was what you described. I just looked at my original k=2400-2600, n=50K-250K sieve file, which you'd premade a while back according to the method you described, and it was in perfect condition. I believe it was in perfect condition when it reached the server as well. However, I wonder if possibly when the server went down due to the UPS test, it did so in the middle of while it was saving knpairs.txt, which I imagine could easily cause a big mess like this.

PRPnet will probably be a bit "smarter" at least in regard to CCCs. The easiest way to add candidates to a PRPnet server is through the prpadmin tool, which transmits an ABC format file to the server according to a specific protocol. The server then receives the file and parses it one line at a time, which I think would eliminate any CCC problems.

Quote:
Bruce, I'm sorry you'll wind up losing what amounted to a day or more processing on a large # of machines as far as pairs/scores are concerned in the DB. But to leave them in there would make a big mess out of the DB. We'd wind up with a lot of duplicates as we retest and many primes that aren't applicable to the NPLB. The problems that you likely had were because the server was trying to hand out teeny tests (n=2400-2600 somewhere, which process in < 1 sec.) and it couldn't handle the load.

Max, please put it on a stickie or something to remind yourself to make sure that all appropriate results and primes get deleted from the DB. Also before starting this process, please make sure that ALL k<2400 have been processed. Because it would be easy to inadvertantly delete more pairs then intended (say from some other process that we can't remember at the moment), below is programmatically what I think should be deleted:

Date must be > 2009-11-20
-and-
[
(2400<k<2600
-and-
50K<=n<=250K )
-or-
( k>=50K
-and-
2400<n<2600 )
]

Please be very careful that this criteria is exactly followed. We don't want to delete more (or less) than necessary.

(caps for emphasis; not yelling)


Gary
Okay, sounds good. I had considered that possibly we could just leave the extra stuff in and let it count for doublechecks, but then again there's no telling how the LLRnet server might have scrambled which pair goes with which residual. Definitely better to clean it out first.
Quote:
Originally Posted by gd_barnes View Post
Max,

A few more things:

Since we're creating a new PRPnet server for the 12th drive, cleaning up the DB becomes a higher priority then interfacing the PRPnet results and primes stats to the DB. If we interface the stats first, we'll wind up with duplicated results and primes, which will make for an even bigger mess. The cleanup must occur first.

This would be easy to forget when setting up the new PRPnet server: Please set whatever switch is necessary to make sure it is handed out in k-value order.

We previously talked about doing Tim's change to the various PRPnet servers so that the prpnet.candidates file is sorted by n instead of k. Obviously this one would be an exception. We want it sorted by k. It's the only server (at this point anyway) that we really need sorted by k.

I will change the appropriate threads to reflect the removal of k=2400-2600 from the LLRnet server.

What port will the new server be?


Gary
My plan is to put the new server on port 7000, just like the old one. However, I won't turn on the results copying-off until Dave's gotten the chance to clean out the old stuff from the DB. This is because port 7000 is already wired into the DB for import, so as soon as I start copying off files they'll be sucked down and imported.
mdettweiler is offline   Reply With Quote
Old 2009-12-02, 17:53   #1262
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Hey Gary, I think I figured out the cause of all the weird problems we'd been having on G7000 with the k=2400-2600 range. It seems that for the other files in this range that you sorted by k and sent to me, you remembered to remove the extra NewPGen header lines, but you forgot for this one. I discovered this when attempting to load the file into the new PRPnet G7000 just now; I immediately got an "unable to parse line [NewPGen header] in ABC file" from prpserver. I've now removed the extraneous lines and will load the file into PRPnet shortly.
mdettweiler is offline   Reply With Quote
Old 2009-12-02, 18:11   #1263
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Well, it looks like PRPnet G7000 will have to wait a wee bit longer. I tried making the tweak that I'd done before to make it hand out work by k primary and n secondary, but it still handed work out by n (well, by decimal size to be more precise) when I tried running a client on it.

I've sent an email to Mark asking what part of the source code I need to tweak to make this change. (I'm thinking maybe I got the wrong part.) As soon as I hear back from him I'll get the new server started.
mdettweiler is offline   Reply With Quote
Old 2009-12-02, 19:51   #1264
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

3·11·307 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Hey Gary, I think I figured out the cause of all the weird problems we'd been having on G7000 with the k=2400-2600 range. It seems that for the other files in this range that you sorted by k and sent to me, you remembered to remove the extra NewPGen header lines, but you forgot for this one. I discovered this when attempting to load the file into the new PRPnet G7000 just now; I immediately got an "unable to parse line [NewPGen header] in ABC file" from prpserver. I've now removed the extraneous lines and will load the file into PRPnet shortly.
When did I send you the file? Didn't you just load it up last week without asking for anything from me? I didn't send you anything recently. Case in point: Do you have a file from me for k=2600-2800 or 2800-3000? If I sent you k=2400-2600 a long time ago, I would have also sent those. If so, you better check those also.

Edit: Oh, I remember that now. I sent the files several months ago. I had no programmatic means to remove the headers and the files were too big to use Excel formulas and sorting. It was taking me 10-15 mins. per file to remove them manually so I stopped after the first two but sent all 5 files. I was pretty sure that I let you know that they still needed to be removed from the last 3. I think k=2600-2800 and 2800-3000 will have the same issue. When you get a chance, can you check those and remove the extra headers?

Sorry about any confusion here.


Gary

Last fiddled with by gd_barnes on 2009-12-02 at 20:37 Reason: edit
gd_barnes is offline   Reply With Quote
Old 2009-12-02, 20:46   #1265
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

3·11·307 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Well, it looks like PRPnet G7000 will have to wait a wee bit longer. I tried making the tweak that I'd done before to make it hand out work by k primary and n secondary, but it still handed work out by n (well, by decimal size to be more precise) when I tried running a client on it.

I've sent an email to Mark asking what part of the source code I need to tweak to make this change. (I'm thinking maybe I got the wrong part.) As soon as I hear back from him I'll get the new server started.
When you make that tweak for the new port 7000, can you do the tweak suggested by Tim on the other PRPnet servers to sort the prpserver.candidates by n-value?

Also, when you get the candidates file sorted by n-value correctly, please check the prpnet status page, i.e. http://nplb-gb1.no-ip.org:3000/, which shows candidates by k-value. Although the page has limited value at NPLB (more value at CRUS), if we're going to have a link to it, it needs to display correctly. If it doesn't, let's talk more about it.

The bottom lines are these:

1. The prpserver.candidates file should be sorted the same way that the pairs are handed out for all PRPnet servers. Having them shown one way and handed out another way has been a major source of annoyance for me. It's too difficult to tell what is next to be handed out.

2. When making a change to a server like this (actually any program), everything must be checked. In this case, that would be: pairs are still being handed out in the correct order, appearance of status page, appearance of scores page, effect on other prpnet files, etc. The problem that we continue to have is that when one tweak is made, other problems are caused. We need to rein that in a little.


Gary

Last fiddled with by gd_barnes on 2009-12-02 at 20:51
gd_barnes is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
PRPnet servers for NPLB mdettweiler No Prime Left Behind 228 2018-12-26 04:50
Servers for NPLB gd_barnes No Prime Left Behind 0 2009-08-10 19:21
LLRnet servers for CRUS gd_barnes Conjectures 'R Us 39 2008-07-15 10:26
NPLB LLRnet server discussion em99010pepe No Prime Left Behind 229 2008-04-30 19:13
NPLB LLRnet server #1 - dried em99010pepe No Prime Left Behind 19 2008-03-26 06:19

All times are UTC. The time now is 06:45.

Fri Jun 5 06:45:22 UTC 2020 up 72 days, 4:18, 0 users, load averages: 1.17, 1.11, 1.05

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.