mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet > GPU to 72

Reply
 
Thread Tools
Old 2017-12-04, 19:21   #4093
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

100100011000102 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
If I had RAID1 with two Seagate drives and one failed, I would expect the second one to also fail before the array could be rebuilt.
Yeah. That's why I always insist on different manufacturers for the drives in RAID arrays provisioned for my clients. MTBF can become a bit of a problem when all of them fail right around the same time....
chalsall is offline   Reply With Quote
Old 2017-12-04, 20:46   #4094
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

221428 Posts
Default

Quote:
Originally Posted by chalsall View Post
It is scheduled to be replaced in the next hour or so. It's "hot-swappable", and one of the drives in a RAID1 set, so there should be no downtime.
OK. The One and One techs have just now removed the /dev/sda drive.
chalsall is offline   Reply With Quote
Old 2017-12-04, 21:43   #4095
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2×4,657 Posts
Default

Quote:
Originally Posted by chalsall View Post
OK. The One and One techs have just now removed the /dev/sda drive.
And then all hell let loose. "Could I please speak to your supervisor? "Sure, please let me put you on hold...
chalsall is offline   Reply With Quote
Old 2017-12-04, 22:25   #4096
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2×4,657 Posts
Default

Quote:
Originally Posted by chalsall View Post
And then all hell let loose. "Could I please speak to your supervisor? "Sure, please let me put you on hold...
And then I finally got to do "real-time" with a level 3 tech, who understood what I was saying and laughed at my jokes.

Everything should be back to nominal. Please let me know if anyone sees anything odd.
chalsall is offline   Reply With Quote
Old 2017-12-05, 17:33   #4097
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

931410 Posts
Default

Quote:
Originally Posted by chalsall View Post
Everything should be back to nominal. Please let me know if anyone sees anything odd.
Just so everyone knows, the RAID array is rebuilding; should complete in about three days... (!) Things are a little sluggish because of this.

But the good news is the new drive is a different model (unfortunately still a Seagate), and the surviving drive in the array has zero non-recoverable read errors. Plus I have a script taking hourly snapshots of the DB mirrored to another server.

Edit: Just because it amuses my sorry ass...

Last fiddled with by chalsall on 2017-12-05 at 17:39
chalsall is offline   Reply With Quote
Old 2017-12-10, 12:43   #4098
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

271816 Posts
Default

Is gpu72.com down?

It's not just you! gpu72.com looks down from here.
kladner is offline   Reply With Quote
Old 2017-12-10, 14:01   #4099
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

2×2,383 Posts
Default

Quote:
Originally Posted by kladner View Post
Is gpu72.com down?

It's not just you! gpu72.com looks down from here.
Back up.
ET_ is offline   Reply With Quote
Old 2017-12-10, 14:01   #4100
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

2·72·31 Posts
Default

Looks up here and now.
James Heinrich is offline   Reply With Quote
Old 2017-12-11, 01:54   #4101
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2×4,657 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
Looks up here and now.
Sorry guys.

As I said before, the /dev/sda drive was failing hard, so I had the techs replace it. It was hot-swappable, but that didn't go so well.

Six days later (this Sunday morning) I woke up to the server not responding to anything. Even the serial console was non-responsive. A shame, since the RAID1 rebuild was up to 39.1% complete.

A hard reset later, and the machine was back. But not responding well.

TL;DR: Don't run mprime on a Linux server during a RAID1 rebuild. The rebuild estimation has dropped from two weeks to two days. Plus the server is a whole lot more responsive to HTTP and MySQL requests.

Oh, the irony!
chalsall is offline   Reply With Quote
Old 2017-12-11, 02:03   #4102
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

22×5×7×41 Posts
Default

Quote:
Originally Posted by chalsall View Post
TL;DR: Don't run mprime on a Linux server during a RAID1 rebuild. The rebuild estimation has dropped from two weeks to two days.
So the rebuild process has an equal or lower priority than mprime?
retina is offline   Reply With Quote
Old 2017-12-11, 15:01   #4103
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

931410 Posts
Default

Quote:
Originally Posted by retina View Post
So the rebuild process has an equal or lower priority than mprime?
The rebuild process (in this case, md3_resync) is at priority 20, nice of 0, while mprime is priority 30, nice of 10.

But it seems that because mprime is so incredibly efficient at saturating the CPUs, the OS wasn't able to prevent it from degrading the md3_resync process significantly.
chalsall is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Status Primeinator Operation Billion Digits 5 2011-12-06 02:35
62 bit status 1997rj7 Lone Mersenne Hunters 27 2008-09-29 13:52
OBD Status Uncwilly Operation Billion Digits 22 2005-10-25 14:05
1-2M LLR status paulunderwood 3*2^n-1 Search 2 2005-03-13 17:03
Status of 26.0M - 26.5M 1997rj7 Lone Mersenne Hunters 25 2004-06-18 16:46

All times are UTC. The time now is 04:58.

Tue Sep 29 04:58:58 UTC 2020 up 19 days, 2:09, 0 users, load averages: 1.05, 1.34, 1.41

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.