mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Conjectures 'R Us

Reply
 
Thread Tools
Old 2009-08-08, 18:23   #100
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24·397 Posts
Default

There is nothing weird in the code where it is hanging. The debug message is written before the send(). It is possible that the send() hung, but I really don't know. AFAICT, something happened in the network communication that I'm not capturing. If the send() fails then the server would output a message.

As for the `À|ë¨ in the message on the client side, that is my bad. I'm not passing an appropriate parameter to the output string.
rogue is offline   Reply With Quote
Old 2009-08-09, 13:52   #101
MyDogBuster
 
MyDogBuster's Avatar
 
May 2008
Wilmington, DE

1011001001002 Posts
Default

Okay guys, I don't know what this problem falls under. I took a power hit last night from an electrical storm. When the power came back on an hour later, 3/4th's of my candidates file was gone. A big hole in the center. I rebuilt it and started things up again. 2 hours later I took another power hit and was down for 2 hours. When I came back up 3/4 of my candidates file was missing again, this time the whole backend was gone. This particular candidates file is 4.2MB large, 150K candidates.

What is strange is that I have another PRPNet server running a different test on the same machine. The second candidates file was not affected either time. The second candidates file is only 353KB, 13K candidates.

Is reading the candidates file into memory somehow causing part of the file to disappear when something happens to memory? There were no abnormal messages anywhere either time.

Last fiddled with by MyDogBuster on 2009-08-09 at 13:53
MyDogBuster is offline   Reply With Quote
Old 2009-08-09, 16:04   #102
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24·397 Posts
Default

Quote:
Originally Posted by MyDogBuster View Post
Okay guys, I don't know what this problem falls under. I took a power hit last night from an electrical storm. When the power came back on an hour later, 3/4th's of my candidates file was gone. A big hole in the center. I rebuilt it and started things up again. 2 hours later I took another power hit and was down for 2 hours. When I came back up 3/4 of my candidates file was missing again, this time the whole backend was gone. This particular candidates file is 4.2MB large, 150K candidates.

What is strange is that I have another PRPNet server running a different test on the same machine. The second candidates file was not affected either time. The second candidates file is only 353KB, 13K candidates.

Is reading the candidates file into memory somehow causing part of the file to disappear when something happens to memory? There were no abnormal messages anywhere either time.
You were missing stuff in the middle of the file? It sounds like a bug, yet you didn't see any messages and the candidates weren't written anywhere else. I suspect a memory leak or maybe you are using too much memory. How much RAM is being used by the larger one? Is memory being cached to disk?
rogue is offline   Reply With Quote
Old 2009-08-09, 17:54   #103
MyDogBuster
 
MyDogBuster's Avatar
 
May 2008
Wilmington, DE

22·23·31 Posts
Default

Quote:
You were missing stuff in the middle of the file? It sounds like a bug, yet you didn't see any messages and the candidates weren't written anywhere else. I suspect a memory leak or maybe you are using too much memory. How much RAM is being used by the larger one? Is memory being cached to disk?
Ram used = 40548K not cached to disk

It looked like it was 3MB (3000K) of memory missing both times. The machine has 3GB of memory. The candidates were not written anywhere else.

I understand that I would have lost any changes still in memory since the last update because of the power blip, but I didn't expect 3/4 of the file to go missing. I actually noticed it only because I went into the directory to see if I have any new primes, when I noticed the candidates file went from 4500K to 1500K in size.

Should be an easy test to reconstruct. Build a large candidates file, start the server and a client or 2 and pull the plug.
MyDogBuster is offline   Reply With Quote
Old 2009-08-09, 18:50   #104
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24×397 Posts
Default

Quote:
Originally Posted by MyDogBuster View Post
Ram used = 40548K not cached to disk

It looked like it was 3MB (3000K) of memory missing both times. The machine has 3GB of memory. The candidates were not written anywhere else.

I understand that I would have lost any changes still in memory since the last update because of the power blip, but I didn't expect 3/4 of the file to go missing. I actually noticed it only because I went into the directory to see if I have any new primes, when I noticed the candidates file went from 4500K to 1500K in size.

Should be an easy test to reconstruct. Build a large candidates file, start the server and a client or 2 and pull the plug.
The candidates file is only open when reading (at startup) and writing (when it saves). It seems very unusual that power would have gone out twice in the middle of saving, which should only take a few seconds.
rogue is offline   Reply With Quote
Old 2009-08-09, 19:03   #105
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

5,881 Posts
Default

Quote:
Originally Posted by rogue View Post
The candidates file is only open when reading (at startup) and writing (when it saves). It seems very unusual that power would have gone out twice in the middle of saving, which should only take a few seconds.
i seem to remember that MyDogBuster had his server writing after every single change or something ridiculous like that to help test the server in all situations
henryzz is offline   Reply With Quote
Old 2009-08-09, 20:07   #106
MyDogBuster
 
MyDogBuster's Avatar
 
May 2008
Wilmington, DE

54448 Posts
Default

Quote:
i seem to remember that MyDogBuster had his server writing after every single change or something ridiculous like that to help test the server in all situations
It was at 0 (instant update), but I changed it to 10 minutes before all this started. Makes no sense to me either. I'm watching that file like a hawk now.

Last fiddled with by MyDogBuster on 2009-08-09 at 20:08
MyDogBuster is offline   Reply With Quote
Old 2009-08-09, 22:50   #107
MyDogBuster
 
MyDogBuster's Avatar
 
May 2008
Wilmington, DE

22×23×31 Posts
Default

Well, good news. I've gone thru two more power blips and didn't lose anything. Getting a little tired with the power company switching grids, buts its 90 degrees with 100% humidity here. Air conditioner heaven.
MyDogBuster is offline   Reply With Quote
Old 2009-08-11, 00:51   #108
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

I've loaded 140K-150K into G3000 again. Since no major bugs seem to still be open (excepting the weird problems Ian had, which don't seem to have occurred anywhere else), if this run goes well then we should be OK to proceed with the next range.
mdettweiler is offline   Reply With Quote
Old 2009-08-11, 14:54   #109
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

All servers have been upgraded to PRPnet 2.2.5. Meanwhile, though, I had another strange freeze on the server (this was while it was still running 2.2.4):

Code:
[2009-08-11 13:36:34 GMT] Message coming on socket 5
[2009-08-11 13:36:34 GMT] socket 5 <<<< FROM sm5ymt@pekhult.se _6 sm5ymt
[2009-08-11 13:36:34 GMT] sm5ymt@pekhult.se connecting from 91.149.43.243
[2009-08-11 13:36:34 GMT] socket 5 <<<< GETGREETING
[2009-08-11 13:36:34 GMT] socket 5 >>>> ############
[2009-08-11 13:36:34 GMT] socket 5 >>>> Welcome to the CRUS G3000 PRPnet beta test server! :-D
[2009-08-11 13:36:34 GMT] socket 5 >>>> Server is running PRPnet v2.2.3
[2009-08-11 13:36:34 GMT] socket 5 >>>> ############
[2009-08-11 13:36:34 GMT] socket 5 >>>> OK.
[2009-08-11 13:36:34 GMT] socket 5 <<<< QUIT
[2009-08-11 13:36:34 GMT] closing socket 5
[2009-08-11 13:36:34 GMT] Message coming on socket 5
[2009-08-11 13:36:34 GMT] socket 5 <<<< FROM sm5ymt@pekhult.se _6 sm5ymt
[2009-08-11 13:36:34 GMT] sm5ymt@pekhult.se connecting from 91.149.43.243
[2009-08-11 13:36:34 GMT] socket 5 <<<< RETURNWORK 2.2.4
[2009-08-11 13:36:34 GMT] socket 5 <<<< WorkUnit: 51255*6^144850+1 1249996661
[2009-08-11 13:36:34 GMT] socket 5 >>>> ERROR: Workunit 51255*6^144850+1 not found on server
[2009-08-11 13:36:34 GMT] socket 5 <<<< End of Message
[2009-08-11 13:36:34 GMT] socket 5 >>>> INFO: 0 of 1 test results were accepted
[2009-08-11 13:36:34 GMT] socket 5 >>>> End of Message
[2009-08-11 13:36:34 GMT] socket 5 <<<< QUIT
[2009-08-11 13:36:34 GMT] closing socket 5
[2009-08-11 13:36:34 GMT] Message coming on socket 5
[2009-08-11 13:36:34 GMT] socket 5 <<<< FROM sm5ymt@pekhult.se _6 sm5ymt
[2009-08-11 13:36:34 GMT] sm5ymt@pekhult.se connecting from 91.149.43.243
[2009-08-11 13:36:34 GMT] socket 5 <<<< GETWORK 2.2.4 1
[2009-08-11 13:36:34 GMT] socket 5 <<<< llr
[2009-08-11 13:36:34 GMT] socket 5 <<<< phrot
[2009-08-11 13:36:34 GMT] socket 5 <<<< pfgw
[2009-08-11 13:36:34 GMT] socket 5 <<<< End of Message
[2009-08-11 13:36:34 GMT] socket 5 >>>> ServerVersion: 2.2.4
[2009-08-11 13:36:34 GMT] First check candidate 22, 166753*6^144869+1
[2009-08-11 13:36:34 GMT] socket 5 >>>> ServerType: 1
[2009-08-11 13:36:34 GMT] socket 5 >>>> WorkUnit: 166753*6^144869+1 1249997794 166753 6 144869 1
[2009-08-11 13:36:34 GMT] sm5ymt@pekhult.se (_6) at 91.149.43.243: Sent 166753*6^144869+1
[2009-08-11 13:36:34 GMT] socket 5 >>>> End of Message
[2009-08-11 13:36:34 GMT] socket 5 <<<< QUIT
[2009-08-11 13:36:34 GMT] closing socket 5
[2009-08-11 13:36:34 GMT] Message coming on socket 5
[2009-08-11 13:36:34 GMT] socket 5 <<<< FROM sm5ymt@pekhult.se _6 sm5ymt
[2009-08-11 13:36:34 GMT] sm5ymt@pekhult.se connecting from 91.149.43.243
[2009-08-11 13:36:34 GMT] socket 5 <<<< RETURNWORK 2.2.4
[2009-08-11 13:36:34 GMT] socket 5 <<<< WorkUnit: 51255*6^144850+1 1249996661
[2009-08-11 13:36:34 GMT] socket 5 >>>> ERROR: Workunit 51255*6^144850+1 not found on server
[2009-08-11 13:36:34 GMT] socket 5 <<<< End of Message
[2009-08-11 13:36:34 GMT] socket 5 >>>> INFO: 0 of 1 test results were accepted
[2009-08-11 13:36:34 GMT] socket 5 >>>> End of Message
[2009-08-11 13:36:34 GMT] socket 5 <<<< QUIT
[2009-08-11 13:36:34 GMT] closing socket 5
[2009-08-11 13:36:34 GMT] Message coming on socket 5
[2009-08-11 13:36:34 GMT] socket 5 <<<< FROM sm5ymt@pekhult.se _6 sm5ymt
[2009-08-11 13:36:34 GMT] sm5ymt@pekhult.se connecting from 91.149.43.243
[2009-08-11 13:36:34 GMT] socket 5 <<<< GETGREETING
[2009-08-11 13:36:34 GMT] socket 5 >>>> ############
[2009-08-11 13:36:34 GMT] socket 5 >>>> Welcome to the CRUS G3000 PRPnet beta test server! :-D
[2009-08-11 13:36:34 GMT] socket 5 >>>> Server is running PRPnet v2.2.3
[2009-08-11 13:36:34 GMT] socket 5 >>>> ############
[2009-08-11 13:36:34 GMT] socket 5 >>>> OK.
[2009-08-11 13:36:34 GMT] socket 5 <<<< QUIT
[2009-08-11 13:36:34 GMT] closing socket 5
[2009-08-11 13:36:39 GMT] Message coming on socket 5
[2009-08-11 13:36:39 GMT] socket 5 <<<< FROM sm5ymt@pekhult.se _6 sm5ymt
[2009-08-11 13:36:39 GMT] sm5ymt@pekhult.se connecting from 91.149.43.243
[2009-08-11 13:36:51 GMT] socket 5 <<<< GETGREETING
[2009-08-11 13:36:51 GMT] socket 5 >>>> ############
[2009-08-11 13:36:51 GMT] socket 5 >>>> Welcome to the CRUS G3000 PRPnet beta test server! :-D
[2009-08-11 13:36:51 GMT] socket 5 >>>> Server is running PRPnet v2.2.3
[2009-08-11 13:36:51 GMT] socket 5 >>>> ############
[2009-08-11 13:36:51 GMT] socket 5 >>>> OK.
[2009-08-11 13:36:51 GMT] socket 5 <<<< QUIT
[2009-08-11 13:36:51 GMT] closing socket 5
[2009-08-11 13:36:51 GMT] Message coming on socket 5
[2009-08-11 13:36:51 GMT] socket 5 <<<< FROM sm5ymt@pekhult.se _6 sm5ymt
[2009-08-11 13:36:51 GMT] sm5ymt@pekhult.se connecting from 91.149.43.243
[2009-08-11 13:36:51 GMT] socket 5 <<<< GETGREETING
[2009-08-11 13:36:51 GMT] socket 5 >>>> ############
[2009-08-11 13:36:51 GMT] socket 5 >>>> Welcome to the CRUS G3000 PRPnet beta test server! :-D
[2009-08-11 13:36:51 GMT] socket 5 >>>> Server is running PRPnet v2.2.3
[2009-08-11 13:36:51 GMT] socket 5 >>>> ############
[2009-08-11 13:36:51 GMT] socket 5 >>>> OK.
[2009-08-11 13:36:51 GMT] socket 5 <<<< QUIT
[2009-08-11 13:36:51 GMT] closing socket 5
[2009-08-11 13:36:51 GMT] Message coming on socket 5
[2009-08-11 13:36:51 GMT] socket 5 <<<< FROM gbarnes017@gmail.com humpford gd_barnes
[2009-08-11 13:36:51 GMT] gbarnes017@gmail.com connecting from 76.92.253.11
[2009-08-11 13:36:51 GMT] socket 5 <<<< GETGREETING
[2009-08-11 13:36:51 GMT] socket 5 >>>> ############
[2009-08-11 13:36:51 GMT] socket 5 >>>> Welcome to the CRUS G3000 PRPnet beta test server! :-D
[2009-08-11 13:36:51 GMT] socket 5 >>>> Server is running PRPnet v2.2.3
[2009-08-11 13:36:51 GMT] socket 5 >>>> ############
[2009-08-11 13:36:51 GMT] socket 5 >>>> OK.
[2009-08-11 13:36:51 GMT] socket 5 <<<< QUIT
[2009-08-11 13:36:51 GMT] closing socket 5
[2009-08-11 13:36:51 GMT] Message coming on socket 5
[2009-08-11 13:36:51 GMT] socket 5 <<<< FROM sm5ymt@pekhult.se _6 sm5ymt
[2009-08-11 13:36:51 GMT] sm5ymt@pekhult.se connecting from 91.149.43.243
[2009-08-11 13:36:51 GMT] socket 5 <<<< RETURNWORK 2.2.4
[2009-08-11 13:36:51 GMT] socket 5 <<<< WorkUnit: 74612*6^144857+1 1249996678
[2009-08-11 13:36:51 GMT] socket 5 >>>> ERROR: Workunit 74612*6^144857+1 not found on server
[2009-08-11 13:36:52 GMT] socket 5 <<<< End of Message
[2009-08-11 13:36:52 GMT] socket 5 >>>> INFO: 0 of 1 test results were accepted
[2009-08-11 13:36:52 GMT] socket 5 >>>> End of Message
[2009-08-11 13:36:53 GMT] socket 5 <<<< QUIT
[2009-08-11 13:36:53 GMT] closing socket 5
[2009-08-11 13:36:53 GMT] Message coming on socket 5
[2009-08-11 14:41:14 GMT] Accepted force quit.  Waiting to close sockets before exiting
[2009-08-11 14:42:24 GMT] Accepted force quit.  Waiting to close sockets before exiting
[2009-08-11 14:42:54 GMT] Accepted force quit.  Waiting to close sockets before exiting
(at this point I had to kill the server)
Oddly enough, on numerous occasions in the log excerpt above, it seems that various clients connected to the server, grabbed the greeting, and exited. Mark, do you know why any clients would want to do that?

Also, there are a lot of candidates being returned that aren't found on the server; I'm not sure if this is something odd, or if there just happened to be a number of expired tests right around the time when I checked the server.

At any rate, though, at 13:36:53 GMT it received a connection from a client, then promptly froze. It remained that way until over an hour later when I had to -SIGKILL the server.

It's now restarted with v2.2.5.

P.S.: I just thought of something. There weren't any changes to the server in 2.2.5, right? If so, then it seems I just wasted my time changing the servers over.
mdettweiler is offline   Reply With Quote
Old 2009-08-11, 16:48   #110
MyDogBuster
 
MyDogBuster's Avatar
 
May 2008
Wilmington, DE

22×23×31 Posts
Default

Max, you might want to check to see if some of the candidates file is missing right around where Lennart's returned tests failed. Also look in the completed work file. They have to be somewhere. I think you'll find the server was looping because some of the file is gone.

Last fiddled with by MyDogBuster on 2009-08-11 at 17:05
MyDogBuster is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
PRPNet server for personal use johnadam74 Software 2 2016-01-01 15:58
New SR5 PRPnet server online ltd Sierpinski/Riesel Base 5 15 2013-03-19 18:03
First PSP PRPnet 4.0.6 server online ltd Prime Sierpinski Project 9 2011-03-15 04:58
PRPnet 3.1.3 stress-test server mdettweiler No Prime Left Behind 40 2010-01-30 18:05
First pass PRPNet server out of work? opyrt Prime Sierpinski Project 6 2009-09-24 18:14

All times are UTC. The time now is 09:44.


Tue Jul 27 09:44:34 UTC 2021 up 4 days, 4:13, 0 users, load averages: 2.59, 2.16, 1.96

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.