mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   No Prime Left Behind (https://www.mersenneforum.org/forumdisplay.php?f=82)
-   -   LLRnet servers for NPLB (https://www.mersenneforum.org/showthread.php?t=10042)

IronBits 2008-11-24 04:48

Thanks Gary, I'm still working out the numbers to take (total done for the day) - (the previous hours total), so I can post the real total amount of knpairs returned each hour so you don't have to do the math. This way you can find out if you have boxen not working :wink:

If that doesn't work out, then I'll just leave it showing the total knpairs returned for each user and let you folks do the math, until I can figure something else out.

Starting the web page stats 2 minutes after midnight will simplify the math. :wink:
If no knpairs are returned within the 1st 2 minutes of moving results.txt, then those users won't show up until the next run at 0100 hrs.
Make sense?

The web page will update at 2 minutes after midnight.
Results.txt will be moved out of there and processed, with .csv sent to the import directory at midnight.
I'll have to go over to that Server and edit crontab so it processes the .csv sometime after midnight as well.

edit:
crontab done

gd_barnes 2008-11-24 21:20

[quote=IronBits;150457]Thanks Gary, I'm still working out the numbers to take (total done for the day) - (the previous hours total), so I can post the real total amount of knpairs returned each hour so you don't have to do the math. This way you can find out if you have boxen not working :wink:

If that doesn't work out, then I'll just leave it showing the total knpairs returned for each user and let you folks do the math, until I can figure something else out.

Starting the web page stats 2 minutes after midnight will simplify the math. :wink:
If no knpairs are returned within the 1st 2 minutes of moving results.txt, then those users won't show up until the next run at 0100 hrs.
Make sense?

The web page will update at 2 minutes after midnight.
Results.txt will be moved out of there and processed, with .csv sent to the import directory at midnight.
I'll have to go over to that Server and edit crontab so it processes the .csv sometime after midnight as well.

edit:
crontab done[/quote]


Hum...I'm having the same problem again...with the date showing on 3 lines as well as multiple days showing. It's strange that it was correct for a while and then went back to the old way.

IronBits 2008-11-24 21:40

I can not make any of the columns (in FF 3+ or IE7+) resize by shrinking the size of the browser window.

Fine, I'll removed the DATE column then. :wink:

[url]http://nplb.ironbits.net/progress_400.html[/url]

This is only temporary until I can figure out how to get it into a database, or/and get php to parse a .csv properly...

gd_barnes 2008-11-24 21:49

Would it create a problem if I renumbered the servers here? Port 5000 should be near the bottom; perhaps #4; and the others moved up.


Gary

mdettweiler 2008-11-24 22:02

[quote=gd_barnes;150567]Would it create a problem if I renumbered the servers here? Port 5000 should be near the bottom; perhaps #4; and the others moved up.


Gary[/quote]
No, no problem--as far as I know, there isn't anything that depends on the numbers of the servers in this thread. :smile:

em99010pepe 2008-11-25 23:04

C443 has left ~9200 pairs although the work available is behind G4000 one (with 43k pairs left) I advise people to move a few cores to the latter. Thank you.

Carlos

gd_barnes 2008-11-25 23:17

[quote=em99010pepe;150721]C443 has left ~9200 pairs although the work available is behind G4000 one (with 43k pairs left) I advise people to move a few cores to the latter. Thank you.

Carlos[/quote]


I'm not sure what you mean. Are there ranges in C443 that are both lower and higher than G4000?

I just now checked and C443 is currently handing out n=569344 so it has quite a bit to go to get to n=570K.


Gary

em99010pepe 2008-11-25 23:19

[quote=gd_barnes;150723]I'm not sure what you mean. Are there ranges in C443 that are both lower and higher than G4000?

I just now checked and C443 is currently handing out n=569344 so it has quite a bit to go to get to n=570K.


Gary[/quote]

All ranges are lower, I just though we could move a few cores to help G4000.
Not too much, it's only 9 days on my Q6600.

gd_barnes 2008-11-25 23:56

[quote=em99010pepe;150724]All ranges are lower, I just though we could move a few cores to help G4000.
Not too much, it's only 9 days on my Q6600.[/quote]

Oh I see. IB400 has a huge amount of work in it right now and is still well below the other 2 servers so my machines are staying there for a while. You know I don't like gaps. :smile:

IronBits 2008-11-26 04:27

And I wonder who's fault it is that IB400 has such a huge pile of knpairs to chew through? :razz:
:rant: :lol:

gd_barnes 2008-11-26 06:42

[quote=IronBits;150757]And I wonder who's fault it is that IB400 has such a huge pile of knpairs to chew through? :razz:
:rant: :lol:[/quote]

lol and I sent that less than 3 mins. before it went down assuming that Lennart was going to stay the entire rally on it.

Not a problem...we'll chew through it quickly enough. It's near n=560K now, which means there is n=7K to go. We frequently reserve n=5K ranges for the servers anyway on this drive so there is only a somewhat larger range left than we'd normally start a new loaded range with.

Hey...one thing I just now noticed on your latest hourly stats page: You show the hours up through 23 and then start over at hour 00 for the new day without actually showing the total for the entire day. Just thought I'd mention it.


Gary

henryzz 2008-11-26 17:56

i am just guessing but i reckon mdettweiler probably wants his stats combining with Anonymous:smile:

IronBits 2008-11-26 18:50

Good guess :wink:
Just something else that needs to be done to the database records.
Along with removing the TeamName from the UserNames.
Speaking of which, someone needs to get a hold of BlisteringSheep, his TeamName is back in the UserName.
He must have a few stray boxen without the updated name change.

mdettweiler 2008-11-27 03:14

[quote=henryzz;150832]i am just guessing but i reckon mdettweiler probably wants his stats combining with Anonymous:smile:[/quote]
[quote=IronBits;150848]Good guess :wink:
Just something else that needs to be done to the database records.
Along with removing the TeamName from the UserNames.
Speaking of which, someone needs to get a hold of BlisteringSheep, his TeamName is back in the UserName.
He must have a few stray boxen without the updated name change.[/quote]
Yep, you guys guessed right. :smile:

About BlisteringSheep's stray client(s): I think he mentioned that he has LLRnet running on his mother's computer and that he only has access to it sporadically, and that for the time being it would be stuck with the team name in the username. Presumably that's what's causing this?

IronBits 2008-11-27 06:43

Well, hopefully he will find it and take care of it tomorrow then :wink:
It just started back up a few days ago...

gd_barnes 2008-11-27 17:05

David,

I don't seem to have access to the most recent results file.


Gary

IronBits 2008-11-27 17:48

fixed

BlisteringSheep 2008-11-28 04:15

[QUOTE=IronBits;150848]Speaking of which, someone needs to get a hold of BlisteringSheep, his TeamName is back in the UserName.
He must have a few stray boxen without the updated name change.[/QUOTE]


Sorry about that. :redface: I had changed the dedicated box, but forgot to change my octocore when I restarted some. I just fixed all of my llr-clientconfig.txt files on every machine, even ones not running llrnet right now, so it shouldn't be an issue again.

I've been so busy with work & life that I don't get to the/any forums very often. If you do need to get in touch with me, a PM here will send me an e-mail, or you should be able to e-mail directly from [URL="http://www.mersenneforum.org/member.php?u=2760"]my user page[/URL].

IronBits 2008-11-28 05:44

Excellent! I knew you would get a round toit one of these days :smile:

MyDogBuster 2008-11-29 20:29

Max, GB4000 may have had a slight problem.

Earlier today I posted finding a confirmed prime 861*2^572599-1.

It has not posted yet on your stats page or on the Primes found page.

I did find it on yesterday's log just before the daily cleanup.


user=MyDogBuster
[2008-11-29 06:43:14]
615*2^572587-1 is not prime. Res64: B6F5C537D8023AC6 Time : 449.0 sec.
user=MyDogBuster
[2008-11-29 06:43:18]
619*2^572587-1 is not prime. Res64: D60DBED8F4928B9D Time : 449.0 sec.
user=MyDogBuster
[2008-11-29 06:45:47]
861*2^572559-1 is prime! Time : 6796.0 sec.
user=MyDogBuster
[2008-11-29 06:45:49]
895*2^572559-1 is not prime. Res64: 6C90AA8402FC1737 Time : 6798.0 sec.
user=MyDogBuster
[2008-11-29 06:45:56]
713*2^572546-1 is not prime. Res64: 58FD9F95322C534D Time : 10028.0 sec.

mdettweiler 2008-11-29 20:41

[quote=MyDogBuster;151284]Max, GB4000 may have had a slight problem.

Earlier today I posted finding a confirmed prime 861*2^572599-1.

It has not posted yet on your stats page or on the Primes found page.

I did find it on yesterday's log just before the daily cleanup.


user=MyDogBuster
[2008-11-29 06:43:14]
615*2^572587-1 is not prime. Res64: B6F5C537D8023AC6 Time : 449.0 sec.
user=MyDogBuster
[2008-11-29 06:43:18]
619*2^572587-1 is not prime. Res64: D60DBED8F4928B9D Time : 449.0 sec.
user=MyDogBuster
[2008-11-29 06:45:47]
861*2^572559-1 is prime! Time : 6796.0 sec.
user=MyDogBuster
[2008-11-29 06:45:49]
895*2^572559-1 is not prime. Res64: 6C90AA8402FC1737 Time : 6798.0 sec.
user=MyDogBuster
[2008-11-29 06:45:56]
713*2^572546-1 is not prime. Res64: 58FD9F95322C534D Time : 10028.0 sec.[/quote]
Ah-ha! I think I know what caused that problem. The copy-off script runs at 6:59 AM CST daily, and the status page script (which handles keeping track of primes) runs every 15 minutes. However, the way it's set up, when it does its run for 7:00 AM (actually, it runs at 7:01 AM), it runs *after* the copy-off script has essentially blanked out the files it works with. Hence, it will miss any prime that just happens to be found between 6:45 AM and 6:59 AM.

I'll go and fix this shortly so that the copy-off script doesn't run until *after* the status page script has had a chance to run. :smile: I'll also retroactively add the missed prime into the recurring log of primes.

mdettweiler 2008-11-29 20:46

[quote=mdettweiler;151286]Ah-ha! I think I know what caused that problem. The copy-off script runs at 6:59 AM CST daily, and the status page script (which handles keeping track of primes) runs every 15 minutes. However, the way it's set up, when it does its run for 7:00 AM (actually, it runs at 7:01 AM), it runs *after* the copy-off script has essentially blanked out the files it works with. Hence, it will miss any prime that just happens to be found between 6:45 AM and 6:59 AM.

I'll go and fix this shortly so that the copy-off script doesn't run until *after* the status page script has had a chance to run. :smile: I'll also retroactively add the missed prime into the recurring log of primes.[/quote]
Okay, I've fixed the crontab so that the status page script runs at every :00, :15, and :45 of every hour, and the copy-off script runs at 7:01 AM. That should close the gap. I'll fix the primes list shortly...

MyDogBuster 2008-11-29 20:52

[QUOTE]Okay, I've fixed the crontab so that the status page script runs at every :00, :15, and :45 of every hour, and the copy-off script runs at 7:01 AM. That should close the gap. I'll fix the primes list shortly...
[/QUOTE]

Nice job. I figured it had something to do with the turnover. I just didn't want anyone to miss a prime. Aren't 'puters fun?

em99010pepe 2008-12-01 16:01

C443 currently processing at n= [COLOR=#0000ff]~577.21K[/COLOR]

gd_barnes 2008-12-03 09:34

Max,

Please load some more pairs in port 4000 within a day or so. I think we had calculated that Ian/you were processing 4200 pairs/day there so loading ~30000 pairs or an n=3K range, which would take 7-8 days, would be a good amount for now.

We should probably also send a file to David for port 400 within 1-2 days too. At about 8000 pairs/day, we could send about an n=5K file on it for the time being.


Gary

mdettweiler 2008-12-03 13:53

[quote=gd_barnes;151761]Max,

Please load some more pairs in port 4000 within a day or so. I think we had calculated that Ian/you were processing 4200 pairs/day there so loading ~30000 pairs or an n=3K range, which would take 7-8 days, would be a good amount for now.

We should probably also send a file to David for port 400 within 1-2 days too. At about 8000 pairs/day, we could send about an n=5K file on it for the time being.


Gary[/quote]
Okay, yep. I'll do that within the next few hours. :smile:

nuggetprime 2008-12-03 15:02

Will C443 get a stats page as there ins one for IB400/G4000?

IronBits 2008-12-03 23:59

I've given him lots of code, but haven't heard back from Carlos and, I'm not sure he has a web presence.
I can work something up over here for him, it will just be delayed by at least a day's worth of work.

gd_barnes 2008-12-05 10:32

Max,

I noticed that [URL]http://nplb-gb1.no-ip.org/llrnet/[/URL] is down at the moment so not only can we not view the status of port 4000, I can't access any of my machines remotely. I hope the machine that the server is on is not down. I checked my k/n pairs per hour on port 400 and it appears that there might have been a drop of one quad a few hours ago...Or it could be just some processing glitch in something. I'm not sure. (BTW, I added 2 more cores to port 4000 on Thurs. afternoon after my Riesel base 256 effort finished early Thurs. morning.)

I think I have the temps regulated pretty well on my machines now but unfortunately "crunchford", the machine that runs the server, is still one of the warmer running ones. (~70-71 C I think.) You might remember me mentioning that it was not one of the best choices for the server.

When you get a chance this morning, can you check things and make sure that you can get back on the above link and that port 4000 is working OK? I likely will be on next around 1 PM CST. (7 PM GMT)

Ian, you might also check and make sure you're still processing work on port 4000.


Thanks,
Gary

em99010pepe 2008-12-05 10:44

G4000 has been down for the last 5 hours.

gd_barnes 2008-12-05 11:06

[quote=em99010pepe;152050]G4000 has been down for the last 5 hours.[/quote]

Damn. Of all the luck. I'm sure the machine is down then. And what's worse is that I'm sure that it is the ONLY one of my 10 machines that is down.

It is one of the 8 machines plus 3 more cores that were running port 400 at the point that it must have gone down and I see a drop of ~10-12% in k/n pairs processed on that port around 4-5 hours ago, which would equate to 4 cores out of 35 and coincide with Carlos saying that it has been down 5 hours.

Max, it looks like we're screwed on port 4000 until late next Tuesday when I get back and see what is wrong with it unless you can somehow make the "master machine" another one of my machines remotely and then somehow set up port 4000 on it instead. (That sounds like a huge headache to me.) I think most of the others are running <= 68 C but if you do set it up on one of the others, please check the temps first. Like I said, Crunchford was definitely the warmest of my AMD machines.

Everyone, if you were on port 4000, please move to port 400. It can easily handle the load. Sorry.

This shouldn't delay us in total by more than 1/2-day to 1 day on finishing this drive if people move their machines within a day. Port 400 will just do more of the work and later on, I may need to move a few of my machines to port 4000 to clear it out more quickly once we get it going again.


Gary

em99010pepe 2008-12-05 11:08

C443 is also available with a lots of work to process.

MyDogBuster 2008-12-05 14:31

[QUOTE]Everyone, if you were on port 4000, please move to port 400. It can easily handle the load. Sorry.
[/QUOTE]

Just woke up. I'll start moving stuff shortly. Looks like it was down about 1AM EST last night.

I have got to find an easy way of changing servers. Switching 25 cores will not be fun.

MyDogBuster 2008-12-05 15:34

Halfway into switching the ports, my service provider went down. It hasn't been down since August. This just ain't my day. Time to go back to bed.

Okay it's back up and the switch is finished.

henryzz 2008-12-05 16:53

what happened to hour 24 yesterday
[URL]http://nplb.ironbits.net/progress_400.html[/URL]

edit could all the new stats pages be added to the first post of this thread

mdettweiler 2008-12-05 17:09

[quote=gd_barnes;152049]Max,

I noticed that [URL]http://nplb-gb1.no-ip.org/llrnet/[/URL] is down at the moment so not only can we not view the status of port 4000, I can't access any of my machines remotely. I hope the machine that the server is on is not down. I checked my k/n pairs per hour on port 400 and it appears that there might have been a drop of one quad a few hours ago...Or it could be just some processing glitch in something. I'm not sure. (BTW, I added 2 more cores to port 4000 on Thurs. afternoon after my Riesel base 256 effort finished early Thurs. morning.)

I think I have the temps regulated pretty well on my machines now but unfortunately "crunchford", the machine that runs the server, is still one of the warmer running ones. (~70-71 C I think.) You might remember me mentioning that it was not one of the best choices for the server.

When you get a chance this morning, can you check things and make sure that you can get back on the above link and that port 4000 is working OK? I likely will be on next around 1 PM CST. (7 PM GMT)

Ian, you might also check and make sure you're still processing work on port 4000.


Thanks,
Gary[/quote]
Ouch. However, I have some good news: based on what I'm seeing in the IB400 results file for today, most likely some or all of the rest of your machines are still up and crunching. :smile:

I doubt that thermal problems are the issue here; even if it went all the way up to 80 C, it should still run, though it would crunch somewhat slower. (A while ago I had my dualcore hovering at 83 C for about a month or two and I still used it as my primary machine.) I'm thinking something more along the lines of a power flicker (which can, depending on the duration, take out some machines and not others).

Hmm...if only I knew crunchford's MAC address I could try feeding it a Wake on LAN signal through port 4000 (since that port is already open on your router). Though even if I could do that, it's a tossup as to whether that would actually make the machine start up. (I can't get it to work on my machines, either.) Anyway, though, when you get it restarted I'll see about getting the MAC addresses of all your machines written down (I'll be able to obtain that information once I can get SSH access) in case we ever need to try a Wake on LAN in the future.

Max :mellow:

gd_barnes 2008-12-05 20:12

[quote=mdettweiler;152085]Ouch. However, I have some good news: based on what I'm seeing in the IB400 results file for today, most likely some or all of the rest of your machines are still up and crunching. :smile:

I doubt that thermal problems are the issue here; even if it went all the way up to 80 C, it should still run, though it would crunch somewhat slower. (A while ago I had my dualcore hovering at 83 C for about a month or two and I still used it as my primary machine.) I'm thinking something more along the lines of a power flicker (which can, depending on the duration, take out some machines and not others).

Hmm...if only I knew crunchford's MAC address I could try feeding it a Wake on LAN signal through port 4000 (since that port is already open on your router). Though even if I could do that, it's a tossup as to whether that would actually make the machine start up. (I can't get it to work on my machines, either.) Anyway, though, when you get it restarted I'll see about getting the MAC addresses of all your machines written down (I'll be able to obtain that information once I can get SSH access) in case we ever need to try a Wake on LAN in the future.

Max :mellow:[/quote]


English please. lol

You say you have good news? Didn't I just state that all of my machines were likely up except Crunchford in the first 2 paras. of my post and provide stats from port 400 to prove it? Are you skimming my posts again? (lmao)

On my AMD's, for the ones that previously ran consistently above about 74-75 C, the motherboard eventually shot craps so I'm just speculating on this one. Hopefully it was just a power flicker. It's oddly coincidental that it happened to the warmest and most important machine of the group.

Regardless, is it possible that you can switch the 'master machine' over to another one of my machines so at least I can remotely view the other machines before next Tuesday? For CRUS, I have Sierp base 256 and Sierp base 16 running on a couple of them.


Thanks,
Gary

em99010pepe 2008-12-05 20:19

[quote=nuggetprime;151795]Will C443 get a stats page as there ins one for IB400/G4000?[/quote]

[quote=IronBits;151878]I've given him lots of code, but haven't heard back from Carlos and, I'm not sure he has a web presence.
I can work something up over here for him, it will just be delayed by at least a day's worth of work.[/quote]

Too busy with real life! I have to see that again with IB.

Carlos

mdettweiler 2008-12-05 21:04

[quote=gd_barnes;152114]English please. lol

You say you have good news? Didn't I just state that all of my machines were likely up except Crunchford in the first 2 paras. of my post and provide stats from port 400 to prove it? Are you skimming my posts again? (lmao)[/quote]
LOL--yes, I was skimming your post, I must admit. :rolleyes:

[quote]On my AMD's, for the ones that previously ran consistently above about 74-75 C, the motherboard eventually shot craps so I'm just speculating on this one. Hopefully it was just a power flicker. It's oddly coincidental that it happened to the warmest and most important machine of the group.

Regardless, is it possible that you can switch the 'master machine' over to another one of my machines so at least I can remotely view the other machines before next Tuesday? For CRUS, I have Sierp base 256 and Sierp base 16 running on a couple of them.[/quote]
Unfortunately, I can't do anything until crunchford is back online again--all remote access into your network is through that machine. If it was just a power flicker, then all it needs is a reboot and I can get back in and get everything running again; however, if crunchford *did* blow its motherboard, then we can't recover all the LLRnet server stuff until you get it fixed. (If that does turn out to be the case, I'd recommend switching the hard drive into a machine with a good motherboard, so that we can at least get it online long enough for me to grab the LLRnet files and switch the "master machine" over to another box.)

After you get back and I can get in again, I'll see about setting up a "secondary master" so that if the master ever goes down again, we can still get in through an alternate port to a different machine.

In the meantime, maybe you could have your ex-wife stop by and reboot crunchford like you used to do before we got the remote desktop thing set up? :smile: Then, assuming it still works, I could get in and re-start the server stuff (and back it up, and set up a secondary master while I'm at it).

Max :smile:

gd_barnes 2008-12-06 10:55

[quote=mdettweiler;152129]LOL--yes, I was skimming your post, I must admit. :rolleyes:


Unfortunately, I can't do anything until crunchford is back online again--all remote access into your network is through that machine. If it was just a power flicker, then all it needs is a reboot and I can get back in and get everything running again; however, if crunchford *did* blow its motherboard, then we can't recover all the LLRnet server stuff until you get it fixed. (If that does turn out to be the case, I'd recommend switching the hard drive into a machine with a good motherboard, so that we can at least get it online long enough for me to grab the LLRnet files and switch the "master machine" over to another box.)

After you get back and I can get in again, I'll see about setting up a "secondary master" so that if the master ever goes down again, we can still get in through an alternate port to a different machine.

In the meantime, maybe you could have your ex-wife stop by and reboot crunchford like you used to do before we got the remote desktop thing set up? :smile: Then, assuming it still works, I could get in and re-start the server stuff (and back it up, and set up a secondary master while I'm at it).

Max :smile:[/quote]


The danger about having Sherri go by and turn it on is that is how I fried a motherboard before myself. That is...the fact that it shut itself down was a 'warning' sign that something was amiss. I turned it back on, started crunching again and a few days later it went off again. I did it again and it went off again in about a day. That was it...it had fried itself at that point.

I'm not going to turn it on and start crunching on it until I verify temps and stuff. Well, I suppose I could have her turn it on but not start crunching on it (assuming it will even come on; which I suspect there is < 50% chance of). Since the server actually does no crunching, it shouldn't heat up the machine. I'll see if she can do it. I hate to burden her with messing with stuff again though. She's already been by my house twice to make sure everything is OK and I told her that should be enough. Oh well, I'll see what I can do.

If the machine won't turn on, yes, I will swap hard drives with another machine after I get back to the coolest running machine so that we can make sure the server is on likely the most stable machine that I have. Actually, I've done that twice already based on the priority of stuff that was running on a machine that went down, even after you got the remote access set up. You just didn't know it. lol

Stupid machines!


Gary

em99010pepe 2008-12-06 12:00

Lennart,

You have cores doing duplicated work on C443. Please check them.

Meanwhile I moved 4 cores to IB400 to help to clean the lower ranges, 3 cores are still on C443.

Carlos

mdettweiler 2008-12-06 12:24

[quote=gd_barnes;152201]The danger about having Sherri go by and turn it on is that is how I fried a motherboard before myself. That is...the fact that it shut itself down was a 'warning' sign that something was amiss. I turned it back on, started crunching again and a few days later it went off again. I did it again and it went off again in about a day. That was it...it had fried itself at that point.

I'm not going to turn it on and start crunching on it until I verify temps and stuff. Well, I suppose I could have her turn it on but not start crunching on it (assuming it will even come on; which I suspect there is < 50% chance of). Since the server actually does no crunching, it shouldn't heat up the machine. I'll see if she can do it. I hate to burden her with messing with stuff again though. She's already been by my house twice to make sure everything is OK and I told her that should be enough. Oh well, I'll see what I can do.

If the machine won't turn on, yes, I will swap hard drives with another machine after I get back to the coolest running machine so that we can make sure the server is on likely the most stable machine that I have. Actually, I've done that twice already based on the priority of stuff that was running on a machine that went down, even after you got the remote access set up. You just didn't know it. lol

Stupid machines!


Gary[/quote]
Ah, I see...in that case, don't worry about getting it restarted just yet. Since all the clients working on G4000 have been moved to other servers by now, it shouldn't hurt if it's down just a few more days--rather than taking the risk of frying yet another motherboard. The main significant thing that is waiting on the server coming back online is about 500 results that I've got sitting on my computer that I had pulled down from G4000 and crunched with manual LLR, but that can wait. :smile:

em99010pepe 2008-12-06 13:04

Looks like Bliss is back. Hey Flatlander, only 10 tests per hour? Those 6 cores run too slow...lol I can't see either Henryzz and Max on the stats...

em99010pepe 2008-12-06 13:16

[B]NPLB LLRnet server #2 (updated 2008-12-02 08:00 GMT):[/B]
maintained by em99010pepe
Short identification: C443
server = "nplb.dynip.telepac.pt"
port = 443
k-range: 401 <= k <= 1001
n-range: 577K-578K, 587.4K-588K, 598K-600K
currently processing at n= [COLOR=#0000ff]~598.23K[/COLOR]

MyDogBuster 2008-12-06 14:20

[QUOTE] The main significant thing that is waiting on the server coming back online is about 500 results that I've got sitting on my computer that I had pulled down from G4000 and crunched with manual LLR, but that can wait. :smile:
[/QUOTE]

I also have about 300 results that were in my caches and complete. I saved the tosend files before I switched to IB400. Will their reservations expire when you bring GB4000 back up?

henryzz 2008-12-06 15:17

[quote=em99010pepe;152213]Looks like Bliss is back. Hey Flatlander, only 10 tests per hour? Those 6 cores run too slow...lol I can't see either Henryzz and Max on the stats...[/quote]
i was finishing a gnfs factorization this morning in safe mode so i had access to more memory so i didnt have access to the internet then

out of interest i started the postprocessing using ggnfs which worked straight away once i had gone into safe mode to free up memory
once i finally got to starting the linear algebra its estimate was about 10.5 hours with one core
if i had done that it would be still going
i thought that was way too long to bother with as msieve is faster
i ran msieve and it wouldnt solve the matrix as the weight was way too low about 15 per cycle
i fiddled for ages and eventurely got it to do it by removing some of the relations so it was less oversieved and doing the linear algebra with msieve took 1 hour and 40 mins with four cores instead of one
somewhat an improvement over ggnfs although not being able to rediculously oversieve with msieve drives me crazy sometimes

IronBits 2008-12-06 16:40

We set a record yesterday with 10,583 knpairs completed. :smile:

gd_barnes 2008-12-06 20:03

[quote=MyDogBuster;152226]I also have about 300 results that were in my caches and complete. I saved the tosend files before I switched to IB400. Will their reservations expire when you bring GB4000 back up?[/quote]


Yes, technically, they would expire. BUT...no, as long as they are not handed out to someone else, you should be OK. What I'll do is tell you exactly when I expect to bring it back up and coordinate it with you being online. When I bring it back up, you can then connect to the server and it will immediately send those results and not hand them back out to someone else or even "funnier"; try to hand them back out to YOU again. (lol)

Max, is that your understanding about what will happen if Ian connects to port 4000 after I bring it up and no one else is connected to it?


Gary

MyDogBuster 2008-12-06 20:32

[QUOTE]
When I bring it back up, you can then connect to the server and it will immediately send those results and not hand them back out to someone else or even "funnier"; try to hand them back out to YOU again. (lol)

[/QUOTE]

What I'll do is gather all of them up into 1 tosend file so that I only have to connect once when it comes back up. That way it should go lots faster and not have to wait till I get to all the cores.

mdettweiler 2008-12-06 20:54

[quote=MyDogBuster;152226]I also have about 300 results that were in my caches and complete. I saved the tosend files before I switched to IB400. Will their reservations expire when you bring GB4000 back up?[/quote]
[quote=gd_barnes;152283]Yes, technically, they would expire. BUT...no, as long as they are not handed out to someone else, you should be OK. What I'll do is tell you exactly when I expect to bring it back up and coordinate it with you being online. When I bring it back up, you can then connect to the server and it will immediately send those results and not hand them back out to someone else or even "funnier"; try to hand them back out to YOU again. (lol)

Max, is that your understanding about what will happen if Ian connects to port 4000 after I bring it up and no one else is connected to it?


Gary[/quote]
[quote=MyDogBuster;152285]What I'll do is gather all of them up into 1 tosend file so that I only have to connect once when it comes back up. That way it should go lots faster and not have to wait till I get to all the cores.[/quote]
As Gary said--yes, they probably would be expired by the time I get the server started up, but I'm planning to, before restarting the server, set jobMaxTime to 20 days or so to give everyone a chance to return their old results. Then, after Ian and I can both confirm that we've tied up any loose ends, I can change it back to the normal settings of 5 days. :smile:

gd_barnes 2008-12-07 09:52

[quote=mdettweiler;152292]As Gary said--yes, they probably would be expired by the time I get the server started up, but I'm planning to, before restarting the server, set jobMaxTime to 20 days or so to give everyone a chance to return their old results. Then, after Ian and I can both confirm that we've tied up any loose ends, I can change it back to the normal settings of 5 days. :smile:[/quote]


How about setting it to 3 days like port 400 after we get Ian's (and other's) already processed results returned to the server?

mdettweiler 2008-12-07 14:34

[quote=gd_barnes;152340]How about setting it to 3 days like port 400 after we get Ian's (and other's) already processed results returned to the server?[/quote]
Well, I usually like to have it set to 5 days so that I can manually cache k/n pairs from the server and crunch it manually. So far I haven't had any problems with bottlenecked k/n pairs; though, yes, I will keep an eye on things and be sure to decrease the time a bit if it seems necessary.

gd_barnes 2008-12-07 19:59

[quote=mdettweiler;152362]Well, I usually like to have it set to 5 days so that I can manually cache k/n pairs from the server and crunch it manually. So far I haven't had any problems with bottlenecked k/n pairs; though, yes, I will keep an eye on things and be sure to decrease the time a bit if it seems necessary.[/quote]


OK, but you better be nice to me or I'll set it to 1 day at random times just for my own entertainment! lol

mdettweiler 2008-12-07 20:02

[quote=gd_barnes;152406]OK, but you better be nice to me or I'll set it to 1 day at random times just for my own entertainment! lol[/quote]
LOL :wink: Though of course if you started doing that then I could just start running the server from the max username instead of the gary username that it's being run from now, so you wouldn't be able to change anything...hey, wait a minute, why am I telling you this stuff? :missingteeth:

gd_barnes 2008-12-08 10:00

Sherri went over to my place tonight. The computer that has been hosting the server is down. She turned it on and the green light came on but nothing else. She confirmed that the other 9 machines are still running as well as my slower Windows desktop and very slow borrowed laptop upstairs that I have running various low-intensity CRUS efforts. Murphy's law rules as usual: The server machine is the only one that went down. 11 total are running just fine.

As usual, it looks like another bad mobo. My 5th now. Once this one is replaced, assuming that is the issue, it should put all of my machines under 70 C after I reapply the thermal goo so I HOPE that will be the last problem I have with them.

Max, when I get back on Tuesday night, I'll swap hard drives with another good machine and I'll order whatever part has gone down on the bad machine. I'll let you know what machine it is so you can get the server running again.

To all: Assuming that Max can do this within a day, I would anticipate that port 4000 will be running again by Weds. night. I'll likely move most of my machines over to it to process the lower n-range at that time.


Gary

mdettweiler 2008-12-08 17:21

[quote=gd_barnes;152462]Sherri went over to my place tonight. The computer that has been hosting the server is down. She turned it on and the green light came on but nothing else. She confirmed that the other 9 machines are still running as well as my slower Windows desktop and very slow borrowed laptop upstairs that I have running various low-intensity CRUS efforts. Murphy's law rules as usual: The server machine is the only one that went down. 11 total are running just fine.

As usual, it looks like another bad mobo. My 5th now. Once this one is replaced, assuming that is the issue, it should put all of my machines under 70 C after I reapply the thermal goo so I HOPE that will be the last problem I have with them.

Max, when I get back on Tuesday night, I'll swap hard drives with another good machine and I'll order whatever part has gone down on the bad machine. I'll let you know what machine it is so you can get the server running again.

To all: Assuming that Max can do this within a day, I would anticipate that port 4000 will be running again by Weds. night. I'll likely move most of my machines over to it to process the lower n-range at that time.


Gary[/quote]
Okay, cool. As for notifying me which machine you swap the hard drive with, that won't be necessary; that's because the "heart and soul" of crunchford, so to speak, are in its hard drive; and thus, when you swap the hard drive, the new machine essentially becomes crunchford. It will have an IP address of 192.168.2.100 and should automatically fit right in to crunchford's previous role as gateway machine. :smile:

As for how fast I can get port 4000 up and running again: no problem, that shouldn't take long at all. I can probably have it going within 5 minutes of receiving word that you've got the machine back online. :smile:

Max :smile:

gd_barnes 2008-12-09 09:01

[quote=mdettweiler;152508]Okay, cool. As for notifying me which machine you swap the hard drive with, that won't be necessary; that's because the "heart and soul" of crunchford, so to speak, are in its hard drive; and thus, when you swap the hard drive, the new machine essentially becomes crunchford. It will have an IP address of 192.168.2.100 and should automatically fit right in to crunchford's previous role as gateway machine. :smile:

As for how fast I can get port 4000 up and running again: no problem, that shouldn't take long at all. I can probably have it going within 5 minutes of receiving word that you've got the machine back online. :smile:

Max :smile:[/quote]


Well, duh. I've swapped a couple of hard drives already and knew that. lol Brain fart again. I'm thinking I'll be home by 8 PM my time Tuesday so should have the hard drive swapped out by 10 PM after unpacking and stuff. I'll let you know.

In the mean time, please quickly send another n=5K file to port 400. We have less than 2 days work in it right now. I'll pull all of my machines off to dry port 4000 when it comes back up and after that, arrange them as necessary to dry any remaining ranges as needed in the various servers.


Gary

gd_barnes 2008-12-10 04:07

Back from my business trip now...

Max, it'll be an hour or so before I get the hard drive swapped out of the bad machine.

gd_barnes 2008-12-10 04:09

[quote=gd_barnes;152596]I'll pull all of my machines off to dry port 4000 when it comes back up and after that, arrange them as necessary to dry any remaining ranges as needed in the various servers.


Gary[/quote]


Per my subsequent comments in another thread, I'll leave all my machines on port 400 and Ian will move his back to port 4000 since he was running that one originally.

I'll post in this thread when port 4000 is up and running again.


Gary

MyDogBuster 2008-12-10 07:49

I'm going to assume you are having problems getting G4000 up again.

I'll stay on IB400 till I get the word. Going to take a nap.

gd_barnes 2008-12-10 07:57

[quote=MyDogBuster;152712]I'm going to assume you are having problems getting G4000 up again.

I'll stay on IB400 till I get the word. Going to take a nap.[/quote]


No, no problems. I didn't get home from my trip until close to 10 CST. I'm just now swapping out the hard drive between the bad machine and a good one. Max said he was going to bed an hour or so ago. So it will likely be the morning or early afternoon on Weds.

My bad machine, once again, is a bad mobo. I see the popped tops. On the last 2 mobos that went bad, there was no physical problem that I could see with them so I had to test several things. At least I don't have to do a bunch of testing to know where the problem lies this time.


Gary

MyDogBuster 2008-12-10 08:03

Okay, I'm off to bed then. Sorry to hear about the mobo. At least quad boards aren't as expensive as they used to be.

em99010pepe 2008-12-10 08:30

If you get G4000 up I'll move 7 cores to it (~1600 candidates per day).

gd_barnes 2008-12-10 09:22

I have now swapped the hard drives so that the server is on one of my coolest and most stable machines. Max will get it running sometime later today (Weds).


Gary

gd_barnes 2008-12-10 09:53

[quote=em99010pepe;152716]If you get G4000 up I'll move 7 cores to it (~1600 candidates per day).[/quote]

Hold off until after Ian has stated that he has connected to it and returned all of his processed pairs. I think he said he had 300+ of them. We don't want a bunch of double-work done on them.

After that, fire away with your cores if you want.

mdettweiler 2008-12-10 14:48

[quote=gd_barnes;152722]Hold off until after Ian has stated that he has connected to it and returned all of his processed pairs. I think he said he had 300+ of them. We don't want a bunch of double-work done on them.

After that, fire away with your cores if you want.[/quote]
Actually, no need to worry about that. I'll be setting jobMaxTime to 20 days until everyone's given me the OK on their queued results. :smile:

Going to go work on it right now...

mdettweiler 2008-12-10 14:56

Okay, G4000 is now officially back online! :banana:

I've submitted all me queued k/n pairs; Ian, let me know when you're all set so I can then change the deadline back to 5 days. (As mentioned in my last post, it's currently set to 20 days.)

Carlos and others: feel free to connect as soon as you wish. With the deadline set to 20 days there's no danger of anybody's results expiring for the time being. :smile:

em99010pepe 2008-12-10 15:36

Moved 3 cores, more 4 later when I get home.

MyDogBuster 2008-12-10 17:23

[quote] Ian, let me know when you're all set so I can then change the deadline back to 5 days.
[/quote]

Max, I've submitted them (some of them twice) and they don't show up anywhere. I just tried one again about 5 minutes ago, just before the 15 minute deadline and nothing. ???????. The one 1 submitted twice doesn't even show up as a rejected pair.

MyDogBuster 2008-12-10 19:21

Okay, right out of an episode of the Twilight Zone, da da da da,
I tried again and this time they were accepted. So I guess I'm all caught up.

I haven't got a clue as to why they worked the second time and not the first.

em99010pepe 2008-12-10 19:24

I'm full power on G4000.
Ian, good to know the results were accepted.

MyDogBuster 2008-12-10 20:33

[QUOTE]Ian, good to know the results were accepted
[/QUOTE]

Thanks, I just wish I knew what I did right the second time. 260 results were just too much to throw away.

em99010pepe 2008-12-10 21:09

Hey Lennart, a little push on G4000?

gd_barnes 2008-12-10 21:18

We aren't going to add any more work to G4000 with this drive. Having Carlos and Ian on it should be sufficient. If it looks like IB400 is going to finish before G4000, I'll start moving machines. Regardless of that, we still have the final n=1.8K range that we'll load into IB400 before I would move any of my machines.

As long as you can calculate that port G4000 will finish by Dec. 23rd-24th with its current work and cores, then we won't need anyone more on it. If later, I can always move a few machines.

Edit: What do people think about a rally at this point? I'm thinking now that one won't be necessary. It might even be overly risky. With port G4000 back up, we should have stable ports until the 1st drive is complete (estimated at Dec. 23rd-24th somewhere ~1 week ago) and we have plenty of resources to get there now. The estimate had assumed that Sheep wouldn't be doing any processing and I see now that he is doing ~1500 pairs/day.

Edit 2: Perhaps a rally to kick off the new n>600K drives and to get them rolling fast would be a good option; perhaps the last weekend of Dec. or first weekend of Jan.


Gary

MyDogBuster 2008-12-10 21:26

[QUOTE]We aren't going to add any more work to G4000 with this drive.
[/QUOTE]

Let me know if and when I have to move again. I kinda like it over here though. I see no problem with 2 servers running. It gives everyone another choice of where to put their assets. 2 servers doing different slices of the pie (not the same drive) would be nice.

gd_barnes 2008-12-10 21:38

[quote=MyDogBuster;152781]Let me know if and when I have to move again. I kinda like it over here though. I see no problem with 2 servers running. It gives everyone another choice of where to put their assets. 2 servers doing different slices of the pie (not the same drive) would be nice.[/quote]


Cool. That works for us.

If you like different servers on different drives, you'll like our n>600K effort. We'll have 3 servers; one each on k=400-600, k=600-800, and 800-1001. Likely they'll be at different n-ranges at different times. Initially we'll push k=400-600 more in rallies and at other times since they are lower k's so they will generally be at a higher search range but people will still be free to search any of the k-ranges.

Also, we'll have k=1005-2000 for n=50K-100K to start with and n=350K-500K later on as well as our double-check drive for k<=1001 and n=100K-260K. I stated in an RPS thread that I'll take k=27-31 from the double-check drive after our 1st drive is complete, due to the missing primes found for k=31 for n>400K.

Completing all k=300-1001 to n=600K is just the beginning! :smile:


Gary

em99010pepe 2008-12-10 21:39

I will stay on G4000 for three, four days, then I have to finish my manual range and help C443.

BTW, Glenn will only join us with more power if we do a rally.

MyDogBuster 2008-12-10 22:00

[QUOTE]If you like different servers on different drives, you'll like our n>600K effort. We'll have 3 servers; one each on k=400-600, k=600-800, and 800-1001. Likely they'll be at different n-ranges at different times. Initially we'll push k=400-600 more in rallies and at other times since they are lower k's so they will generally be at a higher search range but people will still be free to search any of the k-ranges.

Also, we'll have k=1005-2000 for n=50K-100K to start with and n=350K-500K later on as well as our double-check drive for k<=1001 and n=100K-260K. I stated in an RPS thread that I'll take k=27-31 from the double-check drive after our 1st drive is complete, due to the missing primes found for k=31 for n>400K.

[/QUOTE]

Great ideas. And here I was thinking about resting a bit on our laurels.
Looks like enough work to choke a horse.

mdettweiler 2008-12-11 00:47

[quote=MyDogBuster;152758]Okay, right out of an episode of the Twilight Zone, da da da da,
I tried again and this time they were accepted. So I guess I'm all caught up.

I haven't got a clue as to why they worked the second time and not the first.[/quote]
Hmm...that's odd. I don't know why it would do that. Anyway, now that you've got your results in, I'll go and change jobMaxTime to 5 days again...

mdettweiler 2008-12-11 01:03

Hi all,

In case you're all wondering what this new "port 8000/NPLB 7th Drive" thing that's just showed up on the [URL]http://nplb-gb1.no-ip.org/llrnet/[/URL] status page is, that's a new, empty server that we will eventually be loading work for k=800-1001, n>600K into after we're done with the 1st Drive, as Gary, David and I had discussed via PM. I just got all the behind-the-scenes stuff for it all taken care of ahead of time so that when the time comes, all I have to do is pop in the knpairs and let 'er rip. :smile:

Anyway, just wanted to let you guys know, because otherwise I'm sure there would have been somebody posting a message wondering what the heck this new thing was. :smile:

In the meantime, this has helped me spot a previously-unnoticed bug that causes the status page to display "-1 remaining knpairs" for a brand-new server. :smile:

Max :smile:

mdettweiler 2008-12-11 01:29

Hmm...G4000 apparently crashed at least two times within the past hour or so. Here's what the console output said:
[code]llrnet: net.cxx:138: static void* net_Server_t::connection_thread(void*): Assertion `thd->socket >= 0' failed.
net_signal called with code 6
shutting down listening socket
accept: errorno=22
waiting thread #0 exits
shutdown: errno 9
thread #2 exited[/code]
The console output is specifically different than last time I had problems with the server continually crashing; though it may very well be a corrupted executable again. I'll swap in a fresh binary and see if that fixes anything.

David, Carlos--any of you server gurus have any idea what's happening here?

mdettweiler 2008-12-11 01:33

[quote=mdettweiler;152809]Hmm...G4000 apparently crashed at least two times within the past hour or so. Here's what the console output said:
[code]llrnet: net.cxx:138: static void* net_Server_t::connection_thread(void*): Assertion `thd->socket >= 0' failed.
net_signal called with code 6
shutting down listening socket
accept: errorno=22
waiting thread #0 exits
shutdown: errno 9
thread #2 exited[/code]The console output is specifically different than last time I had problems with the server continually crashing; though it may very well be a corrupted executable again. I'll swap in a fresh binary and see if that fixes anything.

David, Carlos--any of you server gurus have any idea what's happening here?[/quote]
Okay, I've got the binary swapped out. Hopefully it won't crash any more...

mdettweiler 2008-12-11 02:20

[quote=mdettweiler;152810]Okay, I've got the binary swapped out. Hopefully it won't crash any more...[/quote]
Hmm...it's still crashing. This is weird; I have absolutely no idea why it's doing this. The only thing I do know is that it seems to crash whenever it prunes the joblist and knpairs files.

Anyone know why this is happening?

gd_barnes 2008-12-11 02:22

[quote=MyDogBuster;152791]Great ideas. And here I was thinking about resting a bit on our laurels.
Looks like enough work to choke a horse.[/quote]


The pressure will definitely be off. I'll likely drop to about 6-7 quads on NPLB, which is what I had on it prior to the last 1-2 months, from my current 8-9 after the 1st drive is done. There are several CRUS efforts that I'd like to put more CPU power into.

NPLB and CRUS like most prime-search efforts are infinite projects. (CRUS was finite to start with, although extremely huge, when we were only including bases <= 32 but others and now me have started processing much larger bases because they are fun.) It's all a matter of how big of a piece we choose to process at any one time. We prefer to process large #'s of k's at once because there are great efficiency gains in processing and it's far easier to set goals for entire swaths of k and n-ranges. The challenge is keeping it fun for everyone so we have to vary our efforts enough and have enough different ones going at a time to make things interesting. At this particular point in time, we likely have less different efforts going at once then we will have for the foreseeable future.


Gary

gd_barnes 2008-12-11 02:25

[quote=mdettweiler;152812]Hmm...it's still crashing. This is weird; I have absolutely no idea why it's doing this. The only thing I do know is that it seems to crash whenever it prunes the joblist and knpairs files.

Anyone know why this is happening?[/quote]


Is the server completely down?

mdettweiler 2008-12-11 02:27

[quote=gd_barnes;152814]Is the server completely down?[/quote]
No, it's just going down every 15 minutes (when it prunes the joblist and knpairs files), but I'm manually restarting it every time as soon as I see it. (I've got a VNC window open in the background so I can monitor it.) I'm currently working on a workaround to make it automatically restart the server every time it goes down (to serve as sort of a band-aid fix).

mdettweiler 2008-12-11 02:30

[quote=mdettweiler;152815]No, it's just going down every 15 minutes (when it prunes the joblist and knpairs files), but I'm manually restarting it every time as soon as I see it. (I've got a VNC window open in the background so I can monitor it.) I'm currently working on a workaround to make it automatically restart the server every time it goes down (to serve as sort of a band-aid fix).[/quote]
Okay, I think I've got a temporary workaround (using "while" loops in bash syntax to continually re-start the server every time it exits) that should keep things going without me restarting it manually every 15 minutes. I'll continue to monitor it, of course, to make sure that the workaround functions properly.

gd_barnes 2008-12-11 02:45

I'm typing from the machine right now. Is there anything I can do in the next 10 mins. to monitor it?

This has been, by far, my most stable machine. It's never been turned off and never run above 68 C (67 C right now) since I started running it in early May. Taking the cover off to swap out the hard drive was the first time I even had the cover off of it.

mdettweiler 2008-12-11 02:49

[quote=gd_barnes;152817]I'm typing from the machine right now. Is there anything I can do in the next 10 mins. to monitor it?

This has been, by far, my most stable machine. It's never been turned off and never run above 68 C since I started running it in early May. Taking the cover off to swap out the hard drive was the first time I even had the cover off of it.[/quote]
Okay...the workaround seems to be holding. Should be OK as a band-aid fix until we can figure out the root of the problem.

As for what's causing this: I honestly don't know. I doubt it could have anything to do with swapping the hard drive; I'm thinking something more along the lines of a messed-up binary. But then again, I don't know--it could be anything.

In the meantime, nobody should need to move any machines off G4000; the workaround should keep it running OK for now.

IronBits 2008-12-11 02:52

Check the perms chmod 664 *.txt should fix it.
Happens when a lot of folks are trying to get in, and it doesn't come up cleanly trying to lock onto the socket.
When folks are hitting mine real heavy, and I take it down to give it some more pairs, it might take me 15 tries to get it to come back up. Sometimes I just let it sit for a minute or two then try again to.

Mini-Geek 2008-12-11 02:55

When I run out of work on the 18th (possibly 19th), assuming the drive is still running and I haven't lined up new files yet, is IB400 probably the most stable and best for me to run on?

mdettweiler 2008-12-11 02:58

[quote=IronBits;152820]Check the perms chmod 664 *.txt should fix it.
Happens when a lot of folks are trying to get in, and it doesn't come up cleanly trying to lock onto the socket.
When folks are hitting mine real heavy, and I take it down to give it some more pairs, it might take me 15 tries to get it to come back up. Sometimes I just let it sit for a minute or two then try again to.[/quote]
Okay, thanks--I shut down the server, ran "chmod 664 *.txt", and restarted it. Hopefully that will do the trick. :smile:

As for it needing a few more minutes before it can be restarted: yeah, I've had that happen a few times, especially today with needing to restart the server so much. I've been doing the same thing you suggested--letting it sit for a few minutes and then trying again. :smile:

mdettweiler 2008-12-11 02:58

[quote=Mini-Geek;152822]When I run out of work on the 18th (possibly 19th), assuming the drive is still running and I haven't lined up new files yet, is IB400 probably the most stable and best for me to run on?[/quote]
Probably. If we time everything right, it will be the last 1st Drive server running at the end.

mdettweiler 2008-12-11 03:20

[quote=mdettweiler;152823]Okay, thanks--I shut down the server, ran "chmod 664 *.txt", and restarted it. Hopefully that will do the trick. :smile:

As for it needing a few more minutes before it can be restarted: yeah, I've had that happen a few times, especially today with needing to restart the server so much. I've been doing the same thing you suggested--letting it sit for a few minutes and then trying again. :smile:[/quote]
Hmm...it crashed again, just as before. I've put it back on the while-loop workaround that I used before.

Maybe a "chmod 777 *" would work better? I'll try that momentarily...

Edit: Okay, I've tried "chmod 777 *". We'll see if it still crashes after that. :smile: (Just to play it safe, though, I've enabled the while-loop thing once again, to ensure that the server isn't down for long periods of time if I'm not right at the computer when it crashes.)

gd_barnes 2008-12-11 05:02

[quote=mdettweiler;152827]Hmm...it crashed again, just as before. I've put it back on the while-loop workaround that I used before.

Maybe a "chmod 777 *" would work better? I'll try that momentarily...

Edit: Okay, I've tried "chmod 777 *". We'll see if it still crashes after that. :smile: (Just to play it safe, though, I've enabled the while-loop thing once again, to ensure that the server isn't down for long periods of time if I'm not right at the computer when it crashes.)[/quote]


Thanks for your attention to detail on this Max. Sounds like a mess.

Any luck with stopping the crashing with this last attempt?

gd_barnes 2008-12-11 05:09

[quote=Mini-Geek;152822]When I run out of work on the 18th (possibly 19th), assuming the drive is still running and I haven't lined up new files yet, is IB400 probably the most stable and best for me to run on?[/quote]


I'll add to what Max said here:

Yes, IB400 is the most stable at this point. But the question will be: Where should you connect at that point? Likely it will be IB400 but it could be one of the others if they are not dried before IB400 is.

Around the 16th or 17th, keep checking the threads. We'll post what needs to be finished off by that point. I'll attempt to balance my machines such that IB400 is the last remaining server with pairs for the 1st drive but I can't guarantee it.

On another note: You know what the most cool thing about this is?: Actually having a general idea of when we will complete something! How many projects are out there that can estimate when they will complete an entire effort that is being run by 10+ people?! This is great! :smile: It allows for excellent forward planning on future efforts.


Gary

mdettweiler 2008-12-11 05:18

[quote=gd_barnes;152837]Thanks for your attention to detail on this Max. Sounds like a mess.

Any luck with stopping the crashing with this last attempt?[/quote]
I just checked the server...and it looks like all is working well now! :grin: I'll still leave it on the while-loop thingie so that it will automatically restart if it does go down for whatever reason, but it looks like it should be good now. :smile:

Thanks David for your help! :smile:


All times are UTC. The time now is 23:26.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.