mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   No Prime Left Behind (https://www.mersenneforum.org/forumdisplay.php?f=82)
-   -   Server outrages (https://www.mersenneforum.org/showthread.php?t=13840)

MyDogBuster 2013-02-13 00:15

[QUOTE]edit - I checked the local weather. Very cold but otherwise nothing remarkable, so I'm going with the network/ISP hiccup probability. [/QUOTE]

Your probably right. The router looms big in this mess, but I think it's localized to this server machine. Actually, this is the only thing I can access.

[url]http://www.noprimeleftbehind.net/crus/Riesel-conjectures.htm[/url]

AMDave 2013-02-13 00:22

It doesn't load. I think you have a cached page there. Hit refresh.

Lennart 2013-02-13 01:11

[QUOTE=AMDave;329207]It doesn't load. I think you have a cached page there. Hit refresh.[/QUOTE]

The problem started many days ago and it have being worse every day.

Lennart

AMDave 2013-02-13 01:56

Ahh. So that's what MyDogBuster meant by "Becoming very unreliable". That prpnet port has an issue.
I sent an email to Gary, Max and Karsten 2 days ago about a problem on the prpnet server you two are using which my monitoring scripts were picking up and emailed to me. I was not aware of the symptom that you have been experiencing.
The rest of the server performance has been going swimmingly according to the performance logs that I set up for NPLB stats processing.

That port db is not over-sized so it is not due for a purge, but the indexes may need to be refreshed. Responsiveness falls off the cliff after a while. Once the server is back up I will re-optimize the prpnet databases and you should get much better performance. I have not scheduled an automated optimization of the prpnet databases yet as it locks the tables while they get refreshed and that causes communication error messages on the client end. Maybe it is time to add them to my NPLB optimize script which runs once each week on Sunday morning at about 00:15. A couple of minutes of suffering once a week is better than several days of bad performance every month or two.

I cannot see that this issue would cause the outage though.
More information will come to bear once the server is back and I can dissect the logs.

Lennart 2013-02-13 02:08

[QUOTE=AMDave;329228]I sent an email to Gary, Max and Karsten 2 days ago about a problem on the prpnet server you two are using which my monitoring scripts were picking up and emailed to me. I was not aware of the symptom that you have been experiencing.

That port db is not over-sized so it is not due for a purge, but the indexes may need to be refreshed. Responsiveness falls off the cliff after a while. Once the server is back up I will re-optimize the prpnet databases and you should get much better performance. I have not scheduled an automated optimization of the prpnet databases yet as it locks the tables while they get refreshed and that causes communication error messages on the client end. Maybe it is time to add them to my NPLB optimize script which runs once each week on Sunday morning at about 00:15. A couple of minutes of suffering once a week is better than several days of bad performance every month or two.

I cannot see that this issue would cause the outage though.
More information will come to bear once the server is back and I can dissect the logs.[/QUOTE]

Thanks Dave


Lennart

MyDogBuster 2013-02-13 03:59

Really weird happening. It returned to normal for about 5 minutes and I was able to return all my LLR tests and retrieve about 600 more, BUT, it did not do anything with the PRP stuff. Why would the router discriminate? LOL

I think maybe we have 2 problems, the router and PRPNET/MYSQL.

Anyway, it's back to no access again.

AMDave 2013-02-13 05:17

I got logged in.
Monitoring processes.
The optimization of all tables in all databases is in progress.
Hunting for other info now.

AMDave 2013-02-13 06:27

Server itself is ok. Server uptime is > 83 days.
DNS is OK. Router OK. Network/ISP cannot vouch for but not likely to be in a 'bouncing' state.
Likely the database table problem is causing the server to get too busy to talk to us.
problem with one of the 1468 tables.
it could be causing transactions to build up behind it using more RAM than it should
I noticed about 1.5GB of swap space is occupied.
I was rebuilding the table when I got kicked out again
All other tables in all DBs rebuilt successfully.
Just having a problem with this one.
While I was doing it I observed transactions going through so it is working - all ports are working in fact.
Just appears to be an issue with this table in the database.
Got to wait for the condition to resolve again so I can log in again and get it finished.
werkin on it :)

MyDogBuster 2013-02-13 06:58

[QUOTE]Just having a problem with this one.[/QUOTE]

Point of interest. Which table? I'd guess candidatetestresult. It's built on the fly at the most critical time.

AMDave 2013-02-13 07:08

Candidate was the table it was choking on.
CandidateTest also took a very long time.
The rebuild took a long time.
The check+repair+optimize process has finished successfully this time.

I'm monitoring.
I'll trawl through the logs a bit later.

Your clients should be getting busy again.
There's a few more things to do.
I'm checking the HDD for bad blocks atm.

AMDave 2013-02-13 08:39

HDD is clean and healthy and running at 37 C (98.6 F).
Server responsiveness is good.
Comms errors low (currently nil)
Issue appears to be resolved.
I will implement preventative action this evening.


All times are UTC. The time now is 13:54.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.