mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet > GPU to 72

Reply
 
Thread Tools
Old 2017-12-11, 15:06   #4104
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

Quote:
Originally Posted by chalsall View Post
The rebuild process (in this case, md3_resync) is at priority 20, nice of 0, while mprime is priority 30, nice of 10.

But it seems that because mprime is so incredibly efficient at saturating the CPUs, the OS wasn't able to prevent it from degrading the md3_resync process significantly.
I think it's just that modern OSs are really bad at actually implementing minimum priority. For instance, I can't run yafu or msieve, at nice -n 19, and e.g. play a video game at the same time, and those are hardly efficient at saturating the CPU. (Well, maybe yafu's SIQS has inline assembly, or GGNFS' sievers with the same, but even those get ~20% gains from hyperthreading. Linear algebra is surely even less good at saturating the CPU, and even then I still can't truly do two things at once.)
Dubslow is offline   Reply With Quote
Old 2017-12-11, 15:09   #4105
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

11·509 Posts
Default

Quote:
Originally Posted by chalsall View Post
The rebuild process (in this case, md3_resync) is at priority 20, nice of 0, while mprime is priority 30, nice of 10.

But it seems that because mprime is so incredibly efficient at saturating the CPUs, the OS wasn't able to prevent it from degrading the md3_resync process significantly.
I think the scheduler is broken. Is it putting two threads of different priorities onto one CPU because it got the SMC (aka hyperthreading) map wrong? Or maybe it thinks SMC is no issue at all and just allocates without caring.
retina is online now   Reply With Quote
Old 2017-12-11, 16:18   #4106
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

23×17×67 Posts
Default

Quote:
Originally Posted by retina View Post
I think the scheduler is broken. Is it putting two threads of different priorities onto one CPU because it got the SMC (aka hyperthreading) map wrong? Or maybe it thinks SMC is no issue at all and just allocates without caring.
To be perfectly honest, I'm still trying to figure out what was/is going on. Mostly using empirical testing.

Hyperthreading is not the issue; the CPU (Intel(R) Xeon(R) CPU E31220 @ 3.10GHz) has four (4#) CPUs, and doesn't support hyperthreading.

I did have mprime configured to use all four CPUs in parallel on a single P-1 workload. Memory usage (up to 8 GB of 12 GB available) had no effect on the /proc/mdstat estimated completion.

I could try spinning mprime up again using only one CPU, and see how the scheduler handles this. But this is a mission critical server (not only for GPU72, but also other clients), so I'm a bit reluctant to do so.
chalsall is offline   Reply With Quote
Old 2017-12-18, 16:01   #4107
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

23·17·67 Posts
Default

Man, when it rains, it pours...

So, after all the hell we went through earlier with the failing Seagate drive, the hotswap (which didn't go well), and then trying to rebuild the RAID1 array twice, it turns out the *brand* *new* Seagate drive is bad!!!

They've just swapped in ANOTHER Seagate drive, and we have to go through the whole process AGAIN!

We are going to have to reboot the server to get the hotswapped drive to show up as /dev/sda instead of /dev/sdc, so there will (hopefully) be a minute or two of downtime shortly (longer if things don't go well)...

If you have the choice, NEVER use Seagate drives. They are CRAP!!!!
chalsall is offline   Reply With Quote
Old 2017-12-18, 16:04   #4108
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

1011011100012 Posts
Default

That's not quite what I predicted, but certainly confirms my opinion of Seagate
James Heinrich is offline   Reply With Quote
Old 2017-12-18, 17:59   #4109
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013
Ͳօɾօղէօ

532 Posts
Default

I still use them, but only because I was getting them at a very low cost. Would I trust one? Nope.
Mark Rose is offline   Reply With Quote
Old 2017-12-18, 18:05   #4110
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

23·17·67 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
That's not quite what I predicted, but certainly confirms my opinion of Seagate
Yeah...

This is a direct quote from the email I received from 1and1 about this:
Quote:
In a datacenter, Hard disks are "consumables" and are changed out like tires on a car when they wear out. Your dedicated server is designed to have two "twins" or mirror drives. When one goes down, we [replace it].
No shit, Sherlock... But clearly 1and1 haven't learned the lessons of Ford.

Just so everyone knows, the reboot was cleanly successful, and the RAID1 rebuild is under-way. /proc/mdstat is reporting about ten times faster rebuilding compared to the last "new" Seagate drive...

I took the opportunity to test if mprime still impacted the rebuild speed, and it does.

P.S. Just to say, while this has been a frustrating experience, I do generally find 1and1 to be quite good. Very inexpensive for truly dedicated servers, and their higher-level techs do actually know their stuff.
chalsall is offline   Reply With Quote
Old 2017-12-18, 18:13   #4111
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

B7116 Posts
Default

I also use 1and1 for mersenne.ca and fortunately I've never had cause to test their tech support
James Heinrich is offline   Reply With Quote
Old 2017-12-18, 18:43   #4112
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

23·17·67 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
I still use them, but only because I was getting them at a very low cost. Would I trust one? Nope.
I will _sometimes_ include a single Seagate drive in a RAID6 array (because I like a mixture of manufacturers and models in such an array).

Whenever someone asks me if they should buy an external USB hard drive, I answer "Yes, unless you see Seagate anywhere on the box".
chalsall is offline   Reply With Quote
Old 2017-12-23, 01:56   #4113
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

23·17·67 Posts
Default

Quote:
Originally Posted by chalsall View Post
We are going to have to reboot the server to get the hotswapped drive to show up as /dev/sda instead of /dev/sdc, so there will (hopefully) be a minute or two of downtime shortly (longer if things don't go well)...
So, five days later, the RAID1 resync/rebuild completed successfully.

I have learnt a few things about this:

1. Never trust Seagate. (I already knew this, but just reconfirmed).

2. Don't map most of your drive into a RAID1, and then subdivide using LVM.

2.1. Instead, make an educated guess as to what your storage needs are going to be, and create multiple arrays and partitions to suit.

2.2. The reason this rebuild took so long (beyond the fact the first newly replaced Seagate drive was bad) was every time the server answered a network request one or more block had to be written, which slowed down the resync process.
chalsall is offline   Reply With Quote
Old 2018-01-13, 14:39   #4114
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

17×251 Posts
Default

https://www.gpu72.com/ is trying to use a self-signed certificate right now, which my MISFIT is, understandably, complaining about. Has something gone wrong with the server configuration?
Mini-Geek is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Status Primeinator Operation Billion Digits 5 2011-12-06 02:35
62 bit status 1997rj7 Lone Mersenne Hunters 27 2008-09-29 13:52
OBD Status Uncwilly Operation Billion Digits 22 2005-10-25 14:05
1-2M LLR status paulunderwood 3*2^n-1 Search 2 2005-03-13 17:03
Status of 26.0M - 26.5M 1997rj7 Lone Mersenne Hunters 25 2004-06-18 16:46

All times are UTC. The time now is 04:25.

Wed Aug 5 04:25:01 UTC 2020 up 19 days, 11 mins, 1 user, load averages: 1.59, 1.54, 1.64

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.