![]() |
|
|
#23 |
|
A Sunny Moo
Aug 2007
USA (GMT-5)
3×2,083 Posts |
I just noticed that the G3000 server started "barfing" again about an hour ago. Like the last few times, it seems to have started doing so in response to what appears to be a communications error between the server and one of Lennart's boxes. Lennart, can you please check over your various boxes that you have on G3000 and see if any of them contain any clues to what might be happening? I'm afraid I can't narrow it down to a specific box from the logs.
I'm going to turn on "debug mode" on the server, which will have it log full socket communication data to a file. I don't usually like to use this for production servers since it produces extremely large logfiles, but I'll use it for now since it should allow us to pinpoint exactly which machine is causing this problem. It's possible that it's the same one every time, which would indicate a problem on Lennart's end rather than the server's. Interestingly enough, though, I have never seen this barfing problem happen on any other PRPnet server. Possibly there is something specific to G3000 that's causing the problem. Last fiddled with by mdettweiler on 2009-08-03 at 18:21 |
|
|
|
|
|
#24 | |
|
"Lennart"
Jun 2007
25·5·7 Posts |
Quote:
I keep looking. Lennart |
|
|
|
|
|
|
#25 |
|
A Sunny Moo
Aug 2007
USA (GMT-5)
3×2,083 Posts |
Okay, thanks. The server seems to be holding up OK since I last restarted it (right before I posted my last message), but if it happens again (which, given time, it probably will, given the track record we've been seeing), I should be able to catch exactly what machine is talking to the server at the time from the debug logs.
|
|
|
|
|
|
#26 |
|
A Sunny Moo
Aug 2007
USA (GMT-5)
3·2,083 Posts |
Looks like the server barfed again. I've restarted it to fix it back up again, and I'll start looking through the logs for clues to what caused it.
|
|
|
|
|
|
#27 |
|
A Sunny Moo
Aug 2007
USA (GMT-5)
186916 Posts |
I've looked through the debug log a bit, and while I can't find anything conclusively linking the problem to a particular machine, it does look like the problem may have started occurring when the server communicated with Lennart's machine "_207". Lennart, you may want to check the box with that ID and see if there's anything strange going on with it.
|
|
|
|
|
|
#28 | |
|
"Lennart"
Jun 2007
25·5·7 Posts |
Quote:
Lennart |
|
|
|
|
|
|
#29 |
|
A Sunny Moo
Aug 2007
USA (GMT-5)
3·2,083 Posts |
Just curious, have you by chance been having any problems with your internet connection lately--sudden dropoffs, etc? Because if so, that might possibly explain why these problems are happening (say, if the connection gets cut during a communication with the server).
|
|
|
|
|
|
#30 | |
|
"Lennart"
Jun 2007
100011000002 Posts |
Quote:
Lennart |
|
|
|
|
|
|
#31 | |
|
A Sunny Moo
Aug 2007
USA (GMT-5)
3×2,083 Posts |
Quote:
Code:
[2009-08-03 18:20:13 GMT] Message coming on socket 5 [2009-08-03 18:20:13 GMT] socket 5 <<<< FROM sm5ymt@pekhult.se _153 sm5ymt [2009-08-03 18:20:13 GMT] sm5ymt@pekhult.se connecting from *.*.*.* [2009-08-03 18:20:13 GMT] socket 5 <<<< RETURNWORK 2.2.3 [2009-08-03 18:20:13 GMT] socket 5 <<<< WorkUnit: 31340*6^145004+1 1249318058 [2009-08-03 18:20:13 GMT] socket 5 >>>> INFO: Workunit found [2009-08-03 18:20:13 GMT] socket 5 <<<< Test Result: pfgw BD78034F699566B1 [2009-08-03 18:20:13 GMT] socket 5 <<<< End of WorkUnit [2009-08-03 18:20:13 GMT] socket 5 >>>> INFO: Test for candidate 31340*6^145004+1 accepted [2009-08-03 18:20:13 GMT] 31340*6^145004+1: Test received by sm5ymt@pekhult.se at *.*.*.* Residue Residue: BD78034F699566B1 [2009-08-03 18:20:13 GMT] socket 5 >>>> End of Workunit Message [2009-08-03 18:20:14 GMT] socket 5 <<<< WorkUnit: 124221*6^145005+1 1249318058 [2009-08-03 18:20:14 GMT] socket 5 >>>> INFO: Workunit found [2009-08-03 18:20:14 GMT] socket 5 <<<< Test Result: pfgw 30DF4273EBA52CE2 [2009-08-03 18:20:14 GMT] socket 5 <<<< End of WorkUnit [2009-08-03 18:20:14 GMT] socket 5 >>>> INFO: Test for candidate 124221*6^145005+1 accepted [2009-08-03 18:20:14 GMT] 124221*6^145005+1: Test received by sm5ymt@pekhult.se at *.*.*.* Residue Residue: 30DF4273EBA52CE2 [2009-08-03 18:20:14 GMT] socket 5 >>>> End of Workunit Message [2009-08-03 18:20:15 GMT] socket 5 <<<< End of Message [2009-08-03 18:20:15 GMT] socket 5 >>>> INFO: All 2 test results were accepted [2009-08-03 18:20:15 GMT] socket 5 >>>> End of Message [2009-08-03 18:20:15 GMT] socket 5 <<<< QUIT [2009-08-03 18:20:15 GMT] closing socket 5 Code:
[2009-08-03 21:19:35 GMT] Message coming on socket 5 [2009-08-03 21:19:35 GMT] socket 5 <<<< FROM sm5ymt@pekhult.se _31 sm5ymt [2009-08-03 21:19:35 GMT] sm5ymt@pekhult.se connecting from *.*.*.* [2009-08-03 21:19:35 GMT] socket 5 <<<< RETURNWORK 2.2.3 [2009-08-03 21:19:35 GMT] socket 5 <<<< WorkUnit: 124221*6^148285+1 1249333282 [2009-08-03 21:19:35 GMT] socket 5 >>>> INFO: Workunit found [2009-08-03 21:19:35 GMT] socket 5 <<<< WorkUnit: 74612*6^148287+1 1249333282 [2009-08-03 21:19:35 GMT] socket 5 <<<< WorkUnit: 172257*6^148286+1 1249333282 [2009-08-03 21:19:35 GMT] socket 5 <<<< End of Message [2009-08-03 21:19:35 GMT] socket 5 <<<< QUIT [2009-08-03 21:19:46 GMT] socket 5 (nothing received) [2009-08-03 21:19:46 GMT] socket 5 >>>> INFO: Test for candidate 124221*6^148285+1 accepted [2009-08-03 21:19:46 GMT] Error sending <<INFO: Test for candidate 124221*6^148285+1 accepted>> to localhost:3000 [2009-08-03 21:19:46 GMT] socket 5 >>>> !!! send error !!! [2009-08-03 21:19:46 GMT] 124221*6^148285+1: Test received by sm5ymt@pekhult.se at *.*.*.* Residue Residue: [2009-08-03 21:19:46 GMT] socket 5 >>>> End of Workunit Message [2009-08-03 21:19:46 GMT] Error sending <<End of Workunit Message>> to localhost:3000 [2009-08-03 21:19:46 GMT] socket 5 >>>> !!! send error !!! [2009-08-03 21:19:57 GMT] socket 5 (nothing received) [2009-08-03 21:19:57 GMT] socket 5 >>>> INFO: All 1 test results were accepted [2009-08-03 21:19:57 GMT] Error sending <<INFO: All 1 test results were accepted>> to localhost:3000 [2009-08-03 21:19:57 GMT] socket 5 >>>> !!! send error !!! [2009-08-03 21:19:57 GMT] socket 5 >>>> End of Message [2009-08-03 21:19:57 GMT] Error sending <<End of Message>> to localhost:3000 [2009-08-03 21:19:57 GMT] socket 5 >>>> !!! send error !!! [2009-08-03 21:20:08 GMT] socket 5 (nothing received) [2009-08-03 21:20:08 GMT] closing socket 5 This all makes perfect sense now. It is most definitely not a connection error like I thought earlier, but I was confused because connection errors also tend to produce those mysterious "Error sending <<x>> to localhost:3000" errors. Instead, it looks more like an odd bug in the client that is confusing the heck out of the server. I'll report it to Mark. |
|
|
|
|
|
|
#32 |
|
"Mark"
Apr 2003
Between here and the
11000110100002 Posts |
If someone could append a debug log from a client that has experienced this issue, that would be helpful.
I will put a separate fix into the server to address it losing the candidates since there is no valid test result. |
|
|
|
|
|
#33 |
|
A Sunny Moo
Aug 2007
USA (GMT-5)
3·2,083 Posts |
I'm afraid I haven't encountered this issue myself on the client end, so I can't help you there; Lennart, could you possibly put all of your G3000 clients on level 1 debug logging, if they aren't already?
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| PRPNet server for personal use | johnadam74 | Software | 2 | 2016-01-01 15:58 |
| New SR5 PRPnet server online | ltd | Sierpinski/Riesel Base 5 | 15 | 2013-03-19 18:03 |
| First PSP PRPnet 4.0.6 server online | ltd | Prime Sierpinski Project | 9 | 2011-03-15 04:58 |
| PRPnet 3.1.3 stress-test server | mdettweiler | No Prime Left Behind | 40 | 2010-01-30 18:05 |
| First pass PRPNet server out of work? | opyrt | Prime Sierpinski Project | 6 | 2009-09-24 18:14 |