mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   PrimeNet (https://www.mersenneforum.org/forumdisplay.php?f=11)
-   -   OFFICIAL "SERVER PROBLEMS" THREAD (https://www.mersenneforum.org/showthread.php?t=5758)

srow7 2014-08-27 02:44

www.mersenne.ca
 
[url]www.mersenne.ca[/url] has been down all day
[url]http://www.mersenne.ca/[/url]

Service Temporarily Unavailable

The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.

Prime95 2014-08-27 04:35

[QUOTE=TheMawn;381482]Did you receive it, George? Not asking you to rush or anything, but I just want to make sure you're not still waiting because I misspelled the address.[/QUOTE]

I got it. Haven't looked at it yet, I was fairly busy today.

James Heinrich 2014-08-27 04:47

[QUOTE=srow7;381485][url]www.mersenne.ca[/url] has been down all day
The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.[/QUOTE]The server is, in fact, down for maintenance. It should be up tomorrow morning, assuming all goes well. Like mersenne.org I'm suffering from disk space shortage, which is part of the reason I'm moving to a new server.

Prime95 2014-08-27 08:06

[QUOTE=Madpoo;381475]Here's what I could determine:
7 "Factor found" missing out of 17
11 "P-1 results" missing out of 47 (no factor found)
20 "ECM results" missing out of 43 (no factor found)
33 "LL results" missing out of 99 (not prime)
272 "TF results" missing out of 944 (no factor found).[/QUOTE]

Here is my rundown on the event.

The standard order of operations is to shut down IIS, then restart the database, then restart IIS. This is what Scott taught me to do and I did not relay that info to madpoo. This and/or the SQL Log Full caused our problem.

The prime95 client handles a "cannot contact server" error quite well, retrying indefinitely. There are several other errors where the client does the same thing, like database busy.

However, there are some errors where the client retries a few times and then gives up. One such error is the one Mawn got: primenet error 11 - server database malfunction. The fear was I didn't want the client blocked on a corrupt message that would never be accepted by the server.

Madpoo and I are working on a way to resubmit the results that did not make it into the database.

I will upgrade prime95 to do a limited number of retries, but at daily intervals. If I can't get the message through after 5 days, then and only then, will prime95 discard the presumably corrupt result message.

TheMawn 2014-08-27 16:13

[QUOTE=Prime95;381498]
Madpoo and I are working on a way to resubmit the results that did not make it into the database.

I will upgrade prime95 to do a limited number of retries, but at daily intervals. If I can't get the message through after 5 days, then and only then, will prime95 discard the presumably corrupt result message.[/QUOTE]

This sounds good. Any idea on how often a message is corrupted?

Madpoo 2014-08-28 06:20

[QUOTE=TheMawn;381536]This sounds good. Any idea on how often a message is corrupted?[/QUOTE]

George has an idea to look back at the last few times when the server had issues and we'll do the same look at the web logs, capturing the attempts to check in a result, and see if anything needs adjusting.

We gathered a list of 338 entries from this most recent thing that will get fixed. And, now I have some ideas on the server side of things to keep an eye on things and maybe just shut down the API website if things are a little iffy and fire off an email to George and I. Better that a client get a "can't connect" in those kinds of situations.

M29 2014-08-29 02:02

After years of inactivity, I recently re-started Prime95.

I am working on 4 exponents and have another 4 in the queue.

My worktodo.txt looks correct
[Worker #1]
DoubleCheck=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,35776079,71,1 (Currently at 76%)
DoubleCheck=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,35498161,71,1

[Worker #2]
DoubleCheck=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,36158659,71,1 (Currently at 76%)
DoubleCheck=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,35533789,71,1

[Worker #3]
DoubleCheck=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,36342563,71,1 (Currently at 66%)
DoubleCheck=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,35553767,71,1

[Worker #4]
DoubleCheck=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,36680023,71,1 (Currently at 67%)
DoubleCheck=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,35553787,71,1

However, My Account Summary > Assignments says I have only 7 exponents reserved, working on 3 with 4 in the queue. 35776079 is missing

[B][I]This table shows the worktodo.txt entries for your assignments. You need this only if your existing worktodo.txt file is accidentally deleted or corrupted.[/I][/B]

[b]M29 has 7 assignments[/b]
DoubleCheck=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,35498161,71,1
DoubleCheck=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,35533789,71,1
DoubleCheck=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,35553767,71,1
DoubleCheck=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,35553787,71,1
DoubleCheck=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,36158659,71,1
DoubleCheck=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,36342563,71,1
DoubleCheck=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,36680023,71,1

snme2pm1 2014-08-29 02:08

35776079
 
[QUOTE=M29;381677]After years of inactivity ... 35776079 is missing[/QUOTE]

[url]http://www.mersenne.org/report_exponent/default.php?exp_lo=35776079&exp_hi=&full=1[/url]

That particular exponent shows double check completed.

M29 2014-08-29 02:31

[QUOTE=snme2pm1;381678][url]http://www.mersenne.org/report_exponent/default.php?exp_lo=35776079&exp_hi=&full=1[/url]

That particular exponent shows double check completed.[/QUOTE]
I should have done that myself. Thank you.

The exponent status page also says:
Verified 2014-08-20 Scott Kurowski 1C0D5781F8C25946
2014-08-19 M29 D expired on 2014-08-20
2014-05-22 ANONYMOUS D expired on 2014-08-20

Assigned one day, then the next day it expired twice and was verified once?

I'll just let it run, I suppose.

Prime95 2014-08-29 03:01

[QUOTE=M29;381679]
Verified 2014-08-20 Scott Kurowski 1C0D5781F8C25946
2014-08-19 M29 D expired on 2014-08-20
2014-05-22 ANONYMOUS D expired on 2014-08-20

Assigned one day, then the next day it expired twice and was verified once?[/QUOTE]

Some nitty gritty ugly details:

Scott did not do the double-check. There is a rogue version of prime95 where someone translated all strings to Chinese -- including the strings that the server processes. When these results are sent to the server, they are rejected as incomprehensible and dumped to a SQL table for human review at a later date. Once every couple of months I go through that table. There is enough recognizable pattern for me to convert it back to a processable result. For no particularly good reason, these reclaimed results are processed under Scott's GIMPS account. Unluckily for you, I did this on August 20 one day after your assignment. There were only 9 recovered LL results, so your particular problem does not happen very often.

chalsall 2014-08-29 18:33

I could be wrong, but something strange seems to be going on...
 
For the last few days, a great deal more assignments are being made than are being released.

I wouldn't have thought anything about this expect that one of my spiders was given over a thousand candidates according to Primenet, but not according to GPU72. 64148129 is just one (of many) examples.

Scott/George/James/Madpoo... Any ideas?


All times are UTC. The time now is 23:05.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.