mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   PrimeNet (https://www.mersenneforum.org/forumdisplay.php?f=11)
-   -   OFFICIAL "SERVER PROBLEMS" THREAD (https://www.mersenneforum.org/showthread.php?t=5758)

Prime95 2014-09-11 21:36

I'll let madpoo render a verdict. I did not see anything in the PHP logs, madpoo knows more places to look for evidence.

chalsall 2014-09-11 22:37

[QUOTE=Prime95;382846]I'll let madpoo render a verdict. I did not see anything in the PHP logs, madpoo knows more places to look for evidence.[/QUOTE]

OK... Happy to provide additional data.

[CODE]20140911_221714 ERR : Error from Network. Message: "500 read timeout"
20140911_221714 INFO: 100: 77003071,74,0 (5EB14E057AC239FC6FAC60A52E09FC76) -- Keep: 0
20140911_221714 INFO: 100: 77003089,74,0 (F8EA35171419A456EB6A8CFB9C3B4B13) -- Keep: 0
20140911_221715 INFO: 100: 77003281,74,0 (FA81467FDA9A0EFDA212F09FA2EA8438) -- Keep: 0
20140911_221715 INFO: 100: 77003383,74,0 (1A6931D8D909D8055A44D1F05B106EB0) -- Keep: 0
20140911_221715 INFO: 100: 77003389,74,0 (974DD223DA94D833EB2A82078A1E4CF2) -- Keep: 0
20140911_221715 INFO: 100: 77003429,74,0 (1F82E3F6C1A9250C6843AF41EF5573ED) -- Keep: 0
20140911_221715 INFO: Spider has finished. Exiting.


20140911_223002 INFO: Get Manual Work spider starting...
20140911_223003 INFO: 100: 67397093,74,0 (55FE20CE3DBD0498B4F4DD8A105B78D1) -- Keep: 0
20140911_223003 INFO: 100: 67397173,74,0 (957FD57F810D316AA6FBB58404D11154) -- Keep: 0
20140911_223003 INFO: 100: 67397203,74,0 (BC8B07D85C0AEDA1CDC37FD6FC4F8632) -- Keep: 0
20140911_223003 INFO: 100: 67397527,74,0 (C0F81B5653280EBAACDDADA04D170ED4) -- Keep: 0
20140911_223003 INFO: 100: 67397581,74,0 (42548F86FA4818B8CB91AEF57CBDE5C4) -- Keep: 0
20140911_223003 INFO: 100: 67397599,74,0 (063794E6F371632F15870A979EF732B1) -- Keep: 0
20140911_223003 INFO: 100: 67397611,74,0 (B66DBCD1924D88035FB1A904E3FF4133) -- Keep: 0
20140911_223004 INFO: 100: 67397623,74,0 (6502753AA0366EF5317B7B503A4DE043) -- Keep: 0
20140911_223004 INFO: 100: 67397777,74,0 (CA885A1E502909D62B47FDF602B91E82) -- Keep: 0
20140911_223004 INFO: 100: 67397879,74,0 (A3FB956BB5663B88C7BCADC74C741E9B) -- Keep: 0
20140911_223004 INFO: Category 3...
20140911_223004 INFO: 100: 64262729,74,1 (FCD4A161C2919226C1DCAD57DA7D14B4) -- Keep: 0
20140911_223004 INFO: 100: 64262893,74,1 (ADB390B47BC5B31245B52EB96EF2C436) -- Keep: 0
20140911_223204 ERR : Bad response:
500 read timeout

20140911_223204 ERR : Error from Network. Message: "500 read timeout"
20140911_223204 INFO: 100: 64262917,74,1 (4129E70FAE1AD408CA497C3AFCB7BC87) -- Keep: 0
20140911_223204 INFO: 100: 64262953,74,1 (7DB2832AA03D02911855437B8A18C58C) -- Keep: 0
20140911_223404 ERR : Bad response:
500 read timeout

20140911_223404 ERR : Error from Network. Message: "500 read timeout"
20140911_223405 INFO: 100: 64262981,74,1 (7842FA7F68D3117DA13153E7C258C163) -- Keep: 0
20140911_223405 INFO: 100: 64263593,74,1 (EBB6AE72DB86E8A231573C19D77785FA) -- Keep: 0
20140911_223405 INFO: Category 4...
20140911_223405 INFO: 100: 70109747,73,0 (FA04968641DF730781E733CD87617E00) -- Keep: 0
20140911_223405 INFO: 100: 70109749,73,0 (B4FC385769FDFE3AE9639F1BA46CC4E2) -- Keep: 0
20140911_223405 INFO: 100: 70109777,73,0 (B9F8EA48B510115645791AA2BAB56DA5) -- Keep: 0
20140911_223405 INFO: 100: 70109821,73,0 (5D929F0408DFAD2AF65860C324F2E89A) -- Keep: 0
20140911_223605 ERR : Bad response:
500 read timeout[/CODE]

Madpoo 2014-09-12 03:24

[QUOTE=tha;382778]The graph of the PrimeNet activity has become remarkably flat at 160 TFlops for the past 28 hours.[/QUOTE]

Yeah, that was from me doing something and keeping those hourly stats from running for a while. The graph is filling in with a flat line during those missing hours. It'll clear itself up in time.

Madpoo 2014-09-12 03:40

[QUOTE=chalsall;382848]OK... Happy to provide additional data.

[CODE]20140911_221714 ERR : Error from Network. Message: "500 read timeout"...[/CODE][/QUOTE]

What timezone would those timestamps correlate with? I can try to match it up with server logs to see if there was anything going on.

FYI, the server is set to run in UTC, just to keep it all standard no matter it's location and without the hassles of converting when reading in different apps that handle local time conversions differently.

EDIT: I'm assuming your times are also UTC.

All I can see in the IIS logs are the hits coming from your system around the times you mentioned. The ones being logged were all status 200 and the time-taken field are all pretty normal, in the 300-400 millisecond range. There are some hits to the "/report_factoring_effort/" page that do take longer on average, like 7.5 seconds. That might be where the server is doing some factor checking on reported results? Not sure.

Nothing looks funny in the IIS or PHP logs around those times though. There's some of the typical PHP warnings and info that we've been seeing due to increased log levels, but I don't know for sure that any of them are related since they don't match those particular time periods and are from other parts of the website in most cases. Just more fun things to eventually crack down on. The PHP is a little loosey goosey with some old syntax, so it does tend to throw a lot of warnings about deprecated syntax. Kind of a chore to go through.

Let me know if the issue is recurring and I'll look some more.

In regards to some of the specific exponents from your own logs, for example, 67397093. I see that exponent is assigned out to 2 people, but the first expired on July 8 and the second assigment for it just expired on Sep 7, but it hasn't been assigned to anyone else yet.

Whatever happened must have kept your spider from getting those assignments, I gather.

500 errors (the "500 read timeout" from your snippet) seems to imply it timed out on a connection to the server itself. I don't know how the manual assignment really works, like if it's possible to start getting assignments back but then if there's a server error/timeout, it won't actually reserve them for you?

Nothing on the server looks out of place. Network, CPU, memory, disk space, etc. have all been pretty normal today, and aside from me tinkering with an update to Google charts on a test page, I haven't done anything else on there, and I don't believe George or James have either, so that wouldn't explain it.

More info on what I see is that at 22:17 between :14 and :15 there are 3 hits to the server from your spider:
[CODE]2014-09-11 22:17:14 /manual_assignment/ cores=1&num_to_get=2&pref=100&exp_lo=67675370&exp_hi=&B1=Get+Assignments/
2014-09-11 22:17:14 /manual_assignment/ cores=1&num_to_get=2&pref=100&exp_lo=67675370&exp_hi=&B1=Get+Assignments/
2014-09-11 22:17:15 /manual_assignment/ cores=1&num_to_get=2&pref=100&exp_lo=67675370&exp_hi=&B1=Get+Assignments/[/CODE]

NBtarheel_33 2014-09-12 04:07

Barbados time = US EDT = UTC-4 if that is indeed the time zone to which his servers are synced. So 2217 would actually be 0217 GMT on 2014-09-12.

error 2014-09-12 08:36

Manual results form does not recognize ECM factor found. It responds with

Notice: Undefined variable: b2 in C:\inetpub\www\manual_result\manual_result.inc.php on line 282
Error: Missing checksum. Correct the problem or email results to woltman at etc

chalsall 2014-09-12 13:14

[QUOTE=Madpoo;382871]FYI, the server is set to run in UTC, just to keep it all standard no matter it's location and without the hassles of converting when reading in different apps that handle local time conversions differently.[/QUOTE]

Yup, ditto GPU72; UTC kept in sync via NTP.

[QUOTE=Madpoo;382871]Let me know if the issue is recurring and I'll look some more.[/QUOTE]

Seems to be fine again since early this morning.

[QUOTE=Madpoo;382871]500 errors (the "500 read timeout" from your snippet) seems to imply it timed out on a connection to the server itself.[/QUOTE]

If you're not seeing any related errors in the logs, then it definitely could have been a connectivity issue between the two servers. Although my spiders were seeing the issue from 1and1 (in the eastern US of A), as well as myself from my workstation (Barbados).

Anyway, thanks for drilling down. Will keep an eye on it, and let you know if I see anything more (and try to get deeper info on the connectivity situation during the event).

James Heinrich 2014-09-12 14:33

[QUOTE=error;382881]Manual results form does not recognize ECM factor found.[/QUOTE]Fixed.

ET_ 2014-09-12 14:36

[QUOTE=James Heinrich;382896]Fixed.[/QUOTE]

Is there a way to address the issue for GMP-ECM results files having B2 values other than 100*B1?

Luigi

James Heinrich 2014-09-12 14:58

[QUOTE=ET_;382898]Is there a way to address the issue for GMP-ECM results files having B2 values other than 100*B1?[/QUOTE]Probably, but I'm afraid I don't understand the question. Can you give me an example of what you mean?

error 2014-09-12 15:14

Originally Posted by [B]error[/B] [URL="http://www.mersenneforum.org/showthread.php?p=382881#post382881"][IMG]http://www.mersenneforum.org/images/buttons/viewpost.gif[/IMG][/URL]
[I]Manual results form does not recognize ECM factor found.[/I]

[QUOTE=James Heinrich;382896]Fixed.[/QUOTE]

Thanks, but ...
still complains about the checksum and the factor is not accepted.


All times are UTC. The time now is 23:07.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.