![]() |
|
|
#419 |
|
May 2007
Kansas; USA
101000101011012 Posts |
Damn. Of all the luck. I'm sure the machine is down then. And what's worse is that I'm sure that it is the ONLY one of my 10 machines that is down.
It is one of the 8 machines plus 3 more cores that were running port 400 at the point that it must have gone down and I see a drop of ~10-12% in k/n pairs processed on that port around 4-5 hours ago, which would equate to 4 cores out of 35 and coincide with Carlos saying that it has been down 5 hours. Max, it looks like we're screwed on port 4000 until late next Tuesday when I get back and see what is wrong with it unless you can somehow make the "master machine" another one of my machines remotely and then somehow set up port 4000 on it instead. (That sounds like a huge headache to me.) I think most of the others are running <= 68 C but if you do set it up on one of the others, please check the temps first. Like I said, Crunchford was definitely the warmest of my AMD machines. Everyone, if you were on port 4000, please move to port 400. It can easily handle the load. Sorry. This shouldn't delay us in total by more than 1/2-day to 1 day on finishing this drive if people move their machines within a day. Port 400 will just do more of the work and later on, I may need to move a few of my machines to port 4000 to clear it out more quickly once we get it going again. Gary Last fiddled with by gd_barnes on 2008-12-05 at 11:09 |
|
|
|
|
|
#420 |
|
Sep 2004
2·5·283 Posts |
C443 is also available with a lots of work to process.
|
|
|
|
|
|
#421 | |
|
May 2008
Wilmington, DE
22×23×31 Posts |
Quote:
I have got to find an easy way of changing servers. Switching 25 cores will not be fun. |
|
|
|
|
|
|
#422 |
|
May 2008
Wilmington, DE
22×23×31 Posts |
Halfway into switching the ports, my service provider went down. It hasn't been down since August. This just ain't my day. Time to go back to bed.
Okay it's back up and the switch is finished. |
|
|
|
|
|
#423 |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
588610 Posts |
what happened to hour 24 yesterday
http://nplb.ironbits.net/progress_400.html edit could all the new stats pages be added to the first post of this thread Last fiddled with by henryzz on 2008-12-05 at 16:54 |
|
|
|
|
|
#424 | |
|
A Sunny Moo
Aug 2007
USA (GMT-5)
3·2,083 Posts |
Quote:
![]() I doubt that thermal problems are the issue here; even if it went all the way up to 80 C, it should still run, though it would crunch somewhat slower. (A while ago I had my dualcore hovering at 83 C for about a month or two and I still used it as my primary machine.) I'm thinking something more along the lines of a power flicker (which can, depending on the duration, take out some machines and not others). Hmm...if only I knew crunchford's MAC address I could try feeding it a Wake on LAN signal through port 4000 (since that port is already open on your router). Though even if I could do that, it's a tossup as to whether that would actually make the machine start up. (I can't get it to work on my machines, either.) Anyway, though, when you get it restarted I'll see about getting the MAC addresses of all your machines written down (I'll be able to obtain that information once I can get SSH access) in case we ever need to try a Wake on LAN in the future. Max
|
|
|
|
|
|
|
#425 | |
|
May 2007
Kansas; USA
32·13·89 Posts |
Quote:
English please. lol You say you have good news? Didn't I just state that all of my machines were likely up except Crunchford in the first 2 paras. of my post and provide stats from port 400 to prove it? Are you skimming my posts again? (lmao) On my AMD's, for the ones that previously ran consistently above about 74-75 C, the motherboard eventually shot craps so I'm just speculating on this one. Hopefully it was just a power flicker. It's oddly coincidental that it happened to the warmest and most important machine of the group. Regardless, is it possible that you can switch the 'master machine' over to another one of my machines so at least I can remotely view the other machines before next Tuesday? For CRUS, I have Sierp base 256 and Sierp base 16 running on a couple of them. Thanks, Gary Last fiddled with by gd_barnes on 2008-12-05 at 20:22 |
|
|
|
|
|
|
#426 | ||
|
Sep 2004
2·5·283 Posts |
Quote:
Quote:
Carlos |
||
|
|
|
|
|
#427 | ||
|
A Sunny Moo
Aug 2007
USA (GMT-5)
186916 Posts |
Quote:
![]() Quote:
After you get back and I can get in again, I'll see about setting up a "secondary master" so that if the master ever goes down again, we can still get in through an alternate port to a different machine. In the meantime, maybe you could have your ex-wife stop by and reboot crunchford like you used to do before we got the remote desktop thing set up? Then, assuming it still works, I could get in and re-start the server stuff (and back it up, and set up a secondary master while I'm at it).Max
|
||
|
|
|
|
|
#428 | |
|
May 2007
Kansas; USA
28AD16 Posts |
Quote:
The danger about having Sherri go by and turn it on is that is how I fried a motherboard before myself. That is...the fact that it shut itself down was a 'warning' sign that something was amiss. I turned it back on, started crunching again and a few days later it went off again. I did it again and it went off again in about a day. That was it...it had fried itself at that point. I'm not going to turn it on and start crunching on it until I verify temps and stuff. Well, I suppose I could have her turn it on but not start crunching on it (assuming it will even come on; which I suspect there is < 50% chance of). Since the server actually does no crunching, it shouldn't heat up the machine. I'll see if she can do it. I hate to burden her with messing with stuff again though. She's already been by my house twice to make sure everything is OK and I told her that should be enough. Oh well, I'll see what I can do. If the machine won't turn on, yes, I will swap hard drives with another machine after I get back to the coolest running machine so that we can make sure the server is on likely the most stable machine that I have. Actually, I've done that twice already based on the priority of stuff that was running on a machine that went down, even after you got the remote access set up. You just didn't know it. lol Stupid machines! Gary |
|
|
|
|
|
|
#429 |
|
Sep 2004
2·5·283 Posts |
Lennart,
You have cores doing duplicated work on C443. Please check them. Meanwhile I moved 4 cores to IB400 to help to clean the lower ranges, 3 cores are still on C443. Carlos Last fiddled with by em99010pepe on 2008-12-06 at 12:04 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| PRPnet servers for NPLB | mdettweiler | No Prime Left Behind | 228 | 2018-12-26 04:50 |
| Servers for NPLB | gd_barnes | No Prime Left Behind | 0 | 2009-08-10 19:21 |
| LLRnet servers for CRUS | gd_barnes | Conjectures 'R Us | 39 | 2008-07-15 10:26 |
| NPLB LLRnet server discussion | em99010pepe | No Prime Left Behind | 229 | 2008-04-30 19:13 |
| NPLB LLRnet server #1 - dried | em99010pepe | No Prime Left Behind | 19 | 2008-03-26 06:19 |