mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   No Prime Left Behind (https://www.mersenneforum.org/forumdisplay.php?f=82)
-   -   Server outrages (https://www.mersenneforum.org/showthread.php?t=13840)

gd_barnes 2014-06-30 03:28

Wow. Bad timing. I'm out of town until Thursday morning. My home phone line is out too. I'll try contacting my internet provider to see if they see anything on their end.

When I get it reconnected, I'll set all of the servers to 2 weeks for a period of 2-3 days to allow everyone to return their work.

Sorry everyone.

AMDave 2014-06-30 09:31

ITMT I restored the backups from 2014-06-13 on my DRP server. The restore is confirmed successful.

DRP URL for viewing only - [url]http://nplb.no-ip.org/stats/index.php?content=port[/url]
To those not in the know: do not attempt to use DRP ports - like your brain-sucker you will starve ;)
The ports remain closed and untested on the DRP server and some - as yet untested - config would be required to make it active.

Have a safe trip, Gary. Hopefully all comes back up ok when you get home.

EDIT -
I got asked the last time I mentioned DRP:
DRP = Disaster Recovery Plan - [url]http://en.wikipedia.org/wiki/Disaster_recovery_plan[/url]
Due to the differences between the current server and the DRP server I have not yet been able to upgrade the DRP plan to a BCP plan as the DRP config is not backward compatible to the current host. I could not implement the automatic fail-over as it could not fail-back. This may be resolved at some point in the future if necessary.
Our DRP plan is tested about twice per year to keep it valid and up to date. Although the full administrative functionality has not yet been fully tested under the DRP, there is a high level of confidence that the 'automagic wand' of linux admin SMACK-FU will 'Make it so.' - Yes. I like Picard quotes too ;)

mdettweiler 2014-06-30 16:34

Ah, thanks for getting that set up Dave. As you mentioned, the PRPnet/LLRnet ports are obviously not open on the DRP server - that would indeed be somewhat tricky to implement, since there's not really a "clean" way to communicate assigned pairs and results back to the original ports when they come back up.

This got me thinking...Dave, what did you have in mind for implementing the automatic fail-back? In this case, the most recent backup we had available to restore onto the DRP server was from over 2 weeks ago (the last monthly backup). That's great if the main server were to fail completely (i.e. if the hard drive bailed and we lost everything on it) - rolling back 2 weeks is certainly better than losing it entirely - but, in situations like the present one where the main server is expected to come back online soon, with no loss of data, the two servers would be completely out of sync if we attempted to run PRP/LLRnet ports on the DRP server int he meantime. For a (relatively) short downtime like we anticipate this one to be, by the time the main server came back up we'd still be processing "historical" work on the DRP server that's long been completed.

The only way I can think of to do this practically (i.e. so we're not spending days on end spinning our wheels on historical work) would be to:[LIST][*]Have the backup server running continuously, keeping its stats database updated daily from the master server's results files.[*]Likewise, download a snapshot of the PRPnet port directories and databases to the DRP server on a daily basis.[*]In case of main server failure, the DRP server would be activated, and clients could fall back to it. (This is easy for PRPnet since it allows backup servers to be configured; with LLRnet it would require manual intervention.)[*]When the main server comes back online, instead of bringing its own ports online immediately, it would redirect (port forward) to the DRP server, meanwhile getting its own stats database back in sync by pulling down daily and hourly files from the DRP server.[*]We'd then need to manually migrate the current port state back from the DRP server to the main server. This would involve shutting down the port, taking a mysqldump of the respective database and a tarball of the port directory, SCPing the files to jeepford, restoring the database and port directory, then restarting the server.[/LIST]Aside from the logistical hassles of keeping the second server "in touch" with the main one on a daily basis to be ready for fallback (which may or may not be practical), the last one would seem to be the kicker - sometimes those mysqldumps can get kind of big. The raw size of the dump file is not so much the issue; they can be compressed quite effectively, but it can take quite a while (>5 minutes) just to make the dump. When you include the time to do the SCP, and to restore the backup on the main server, the port could realistically be offline for an hour (and it does have to be offline, because you need to transfer the PRPnet database and directory together to keep them in sync and make sure no results are lost). This would need to be done individually for each port, and would probably entail a fair amount of manual admin intervention to pull off smoothly. In this light, it [i]may[/i] not be worthwhile to even attempt to have the ports run on the DRP server.

Just thinking out loud...obviously none of this is of immediate critical importance, but, since we were talking about DRP...

gd_barnes 2014-07-01 06:04

Well, I had a friend go over to my house. There is definitely internet access there because my main desktop machine was able to connect just fine. The server machine was a different story. It rebooted itself for some reason. I had him log onto it but he could not get internet access from the machine itself even after doing some recyling of routers and modems. So I don't know why it cannot connect and why it rebooted itself. Max or Dave, you might try remote access to the machine now and see if you can get to it.

I'll be home around 3 AM CDT Thursday morning. I'll look at it right away when I get home. Hopefully it's as simple as messing with the cable in the wall or tigtening a loose connection.

Sorry again for the problems.

AMDave 2014-07-01 09:39

Negative from me unfortunately.
I had set up a 'call home' trace from the server via the log file emails, a while back, so that even if the IP address changed I could trace it back, but that message is not coming out.
The IP on the DNS provider has also not updated.
So must to conclude that the server cannot communicate outbound.
If I have the IP I can connect and fix things but not until then.
EDIT - yep. a loose CAT5 terminator would do it.

odicin 2014-07-01 10:07

The backup website handled by no-ip is also unreachable now, because no-ip was takedown by MS: [URL]https://www.noip.com/blog/2014/06/30/ips-formal-statement-microsoft-takedown/[/URL]

Regards Odi

AMDave 2014-07-01 10:25

That is an interesting development.
We stopped using NO-IP as the secondary domain back on post [URL="http://www.mersenneforum.org/showthread.php?t=13840&page=24"]#257[/URL] in this thread, circa 14 Aug 2013.
"Decision made - [url]http://nplb-gb1.no-ip.org/[/url] will not be renewed.
Since the DNS update for [url]www.noprimeleftbehind.net[/url] has been fixed and has been more reliable, it will be the only address moving forward."

I test and observe that the DRP link via NO-IP is currently working and confirm that the NO-IP issue [I]should not[/I] be affecting us. However, that doesn't mean that it couldn't:
"In the meantime, NO-IP / Vitalwerks have published their answer online:
Apparently, the Microsoft infrastructure is not able to handle the billions of queries from our customers. Millions of innocent users are experiencing outages to their services because of Microsoft’s attempt to remediate hostnames associated with a few bad actors.”
The un-seized dynamic dns ".net" domains may have become caught up in this filtering overload, although I am not clear on the 'how'.
So this is potentially a feasible hypothesis from Odi.

edit - the ".net" domain may well be caught up in this after further reading, even though we are using a different provider. I can't connect via the IP, so I still suspect something else.

edit -
ITMT (M$) are boasting about it - [url]http://blogs.technet.com/b/microsoft_blog/archive/2014/06/30/microsoft-takes-on-global-cybercrime-epidemic-in-tenth-malware-disruption.aspx[/url]
Reminds me of Judge Dredd ... "I AM the law!"
May the deities help us, because justice is out to lunch.
/me facepalms

Who the heck made M$ the Internet's policeman ... oh looky there. A judge did.
SlashDot thread - [url]http://yro.slashdot.org/story/14/07/01/0025220/microsoft-takes-down-no-ipcom-domains?utm_source=rss1.0moreanon&utm_medium=feed[/url]

AMDave 2014-07-01 12:02

update - the no-ip addresses to NPLB DRP site and also to the FDCPS project are no longer working.
They are hosted in Australia on a linux server.
{bad words}
If it were a car company, they'd have to issue a recall, but some US judge said sure, you can highjack the international highway so you can find and stop the defective vehicles you made and sold.
I have issues with the disregard for jurisdiction and for the incorrect placement of responsibility.

mdettweiler 2014-07-01 16:10

Yeah, this whole Microsoft/No-IP episode has been rather disappointing - Microsoft's other botnet takedowns prior to this had generally been against unequivocally shady providers, and they seem to be using the same tactics in this case, clueless to the fact that No-IP is actually a legit provider heavily used by good guys. The weird thing is that, over the last number of years I'd noticed Microsoft becoming [i]more[/i] clueful as a company in general, but this proves a shining exception. It doesn't help that the judges involved are often quite clueless about the technological ramifications, and thus inevitably base a large part of these subpoena/court order decisions on the reputation of the party asking for it rather than the technical and situational merits of the case...definitely a recipe for disaster if there ever was one.

Interestingly, I can still access the DRP server via it's No-IP address, as well as one of my own boxes which also has an address at no-ip.org. I wonder if the "temporary infrastructure" bottlenecks are affecting international links more than the U.S. - that could be plausible, and would explain why it works for some but not others.

odicin 2014-07-01 19:44

Hmm... curious. I can't loop up the No-IP Adress with the DRP Site here from Germany. I tested it with different DNS-Servers from different providers.

Maybe I should take an american DNS ;)

Regards Odi

gd_barnes 2014-07-02 06:48

I just realized today that my home phone line is not working either. This may end up requiring a call to Time Warner after all. I'll try my best to get it working early Thursday morning after I get home. If I can't get my phone line and internet connection on the server machine working myself, it will have to wait until Thursday afternoon when I can put in the call.


All times are UTC. The time now is 22:11.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.