mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Conjectures 'R Us

Reply
 
Thread Tools
Old 2009-08-08, 02:33   #89
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

143208 Posts
Default

I would like to see both logs. Could you e-mail to me?
rogue is offline   Reply With Quote
Old 2009-08-08, 02:57   #90
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by MyDogBuster View Post
Nope same event. "to localhost:7102" is the server not the client.
Yes, but the server log says that it sent the client new workunits to test, whereas the client says it just successfully sent 20 results to the server. Possibly the times are a little dissynchronized between the server and client machine?
mdettweiler is offline   Reply With Quote
Old 2009-08-08, 03:25   #91
MyDogBuster
 
MyDogBuster's Avatar
 
May 2008
Wilmington, DE

22×23×31 Posts
Default

Sorry guys, Max was right. I misinterpreted the error message. It's an error of sending work to the clients. I was just matching times and not actual k/n pairs. DUH

So my bottleneck is timing out on sends to the clients. Let me watch it for a while.

I see how this works now. My bad. The server program is slicker than I thought.

Last fiddled with by MyDogBuster on 2009-08-08 at 03:34
MyDogBuster is offline   Reply With Quote
Old 2009-08-08, 04:41   #92
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Okay, I've upgraded all servers to 2.2.4 and re-loaded 140K-150K into G3000 for another test run. Hopefully all will work well this time.

Edit: Mark, I see that the server pages still don't display HTML <title>'s, even though this option is properly set in prpserver.ini. I can confirm that the "Max N" column header has been fixed, though.

Last fiddled with by mdettweiler on 2009-08-08 at 04:44
mdettweiler is offline   Reply With Quote
Old 2009-08-08, 07:42   #93
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

101·103 Posts
Default

Outstanding testing Ian. Keep ferreting every little issue you can find. That is what I've been looking for.

Good job on getting some fixes in quickly Mark and Max. This will be an amazing setup when it is all done and runs perfectly!
gd_barnes is online now   Reply With Quote
Old 2009-08-08, 11:10   #94
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24·397 Posts
Default

Quote:
Originally Posted by MyDogBuster View Post
Sorry guys, Max was right. I misinterpreted the error message. It's an error of sending work to the clients. I was just matching times and not actual k/n pairs. DUH

So my bottleneck is timing out on sends to the clients. Let me watch it for a while.

I see how this works now. My bad. The server program is slicker than I thought.
The message might just be misworded. I still need both logs (with debuglevel=1) to know what is going on.

Max, which page isn't showing the title?
rogue is offline   Reply With Quote
Old 2009-08-08, 13:16   #95
MyDogBuster
 
MyDogBuster's Avatar
 
May 2008
Wilmington, DE

22×23×31 Posts
Default

Quote:
The message might just be misworded. I still need both logs (with debuglevel=1) to know what is going on.
I've deleted all the logs and restarted everything with debug on. As soon as I get the bottleneck error again, I'll send you everything. I can always force it to happen. Need some sleep.

server_stats.html - I get a blank page now under server v2.3.4

Server-status.html - I get no html heading. Same with the user_status.html. Also on user_status.html I get 0 for PRP even though I have a PRP file with 11 found primes

Edited: Files emailed

Last fiddled with by MyDogBuster on 2009-08-08 at 13:55
MyDogBuster is offline   Reply With Quote
Old 2009-08-08, 15:52   #96
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Quote:
Originally Posted by rogue View Post
Max, which page isn't showing the title?
All of the pages aren't showing the titles.

In other news, I'm seeing some rather weird behavior right now on the G3000 server. It's almost like the server "froze" at 6:00:04 GMT today (about 9 hours, 45 minutes ago), and in the middle of a communication with one of Lennart's boxes to boot. I had to kill the server manually (with -SIGKILL, since a regular -SIGTERM didn't do the trick) and restart it in order to fix it. Now it seems to be working OK.

Also of note, there was one lone test with a missing residue that showed up right before the server froze:
Code:
[2009-08-08 05:58:31 GMT] Message coming on socket 5
[2009-08-08 05:58:31 GMT] socket 5 <<<< FROM sm5ymt@pekhult.se _79 sm5ymt
[2009-08-08 05:58:31 GMT] sm5ymt@pekhult.se connecting from 93.179.39.71
[2009-08-08 05:58:31 GMT] socket 5 <<<< RETURNWORK 2.2.4
[2009-08-08 05:58:31 GMT] socket 5 <<<< WorkUnit: 168610*6^141180+1 1249710466
[2009-08-08 05:58:31 GMT] socket 5 >>>> INFO: Workunit found
[2009-08-08 05:59:02 GMT] socket 5 (nothing received) 
[2009-08-08 05:59:02 GMT] socket 5 >>>> ERROR:  Test for 168610*6^141180+1 rejected.  No residue reported
[2009-08-08 05:59:02 GMT] Error sending <<ERROR:  Test for 168610*6^141180+1 rejected.  No residue reported>> to localhost:3000
[2009-08-08 05:59:02 GMT] socket 5 >>>> !!! send error !!!
[2009-08-08 05:59:02 GMT] sm5ymt@pekhult.se (_79) at 93.179.39.71: Rejected test on 168610*6^141180+1 due to no residue
[2009-08-08 05:59:02 GMT] socket 5 >>>> INFO: Test for candidate 168610*6^141180+1 accepted
[2009-08-08 05:59:02 GMT] Error sending <<INFO: Test for candidate 168610*6^141180+1 accepted>> to localhost:3000
[2009-08-08 05:59:02 GMT] socket 5 >>>> !!! send error !!!
[2009-08-08 05:59:02 GMT] 168610*6^141180+1: Test received by sm5ymt@pekhult.se at 93.179.39.71  Residue Residue: 
[2009-08-08 05:59:02 GMT] socket 5 >>>> End of Workunit Message
[2009-08-08 05:59:02 GMT] Error sending <<End of Workunit Message>> to localhost:3000
[2009-08-08 05:59:02 GMT] socket 5 >>>> !!! send error !!!
[2009-08-08 05:59:33 GMT] socket 5 (nothing received) 
[2009-08-08 05:59:33 GMT] socket 5 >>>> INFO: All 1 test results were accepted
[2009-08-08 05:59:33 GMT] Error sending <<INFO: All 1 test results were accepted>> to localhost:3000
[2009-08-08 05:59:33 GMT] socket 5 >>>> !!! send error !!!
[2009-08-08 05:59:33 GMT] socket 5 >>>> End of Message
[2009-08-08 05:59:33 GMT] Error sending <<End of Message>> to localhost:3000
[2009-08-08 05:59:33 GMT] socket 5 >>>> !!! send error !!!
[2009-08-08 06:00:04 GMT] socket 5 (nothing received) 
[2009-08-08 06:00:04 GMT] closing socket 5
Of note, the server did notice that no residue was reported, but marked down the test anyway.

After that the server went back to business as usual for a while, then ran into this:
Code:
[2009-08-08 06:00:04 GMT] Message coming on socket 5
[2009-08-08 06:00:04 GMT] socket 5 <<<< FROM sm5ymt@pekhult.se _29 sm5ymt
[2009-08-08 06:00:04 GMT] sm5ymt@pekhult.se connecting from 93.179.39.71
[2009-08-08 06:00:04 GMT] socket 5 <<<< RETURNWORK 2.2.4
[2009-08-08 06:00:04 GMT] socket 5 <<<< WorkUnit: 108527*6^141131+1 1249710211
[2009-08-08 06:00:04 GMT] socket 5 >>>> INFO: Workunit found
[2009-08-08 15:42:57 GMT] Accepted force quit.  Waiting to close sockets before exiting
[2009-08-08 15:43:44 GMT] Accepted force quit.  Waiting to close sockets before exiting
Without any particular explanation why, it simply froze right after sending the "INFO: Workunit found" message. The next time it did anything was when I hit Ctrl-C and it gave the first "Accepted force quit" message. The second such message was from when I -SIGTERM'd the process by PID to make sure I was killing the right one with -SIGKILL immediately after (so I didn't abruptly kill all the other servers at the same time).

Since I've restarted the server just now, all seems to be working well.

Lennart, do you possibly have a debug.log file from client _29 around when this happened?
mdettweiler is offline   Reply With Quote
Old 2009-08-08, 16:11   #97
Lennart
 
Lennart's Avatar
 
"Lennart"
Jun 2007

25·5·7 Posts
Default

Code:
[2009-08-08 05:43:26 GMT] rps: Getting work from server nplb-gb1.no-ip.org at port 3000
[2009-08-08 05:43:26 GMT] socket 3 >>>> GETWORK 2.2.4 3
[2009-08-08 05:43:26 GMT] socket 3 >>>> llr
[2009-08-08 05:43:26 GMT] socket 3 >>>> phrot
[2009-08-08 05:43:26 GMT] socket 3 >>>> pfgw
[2009-08-08 05:43:26 GMT] socket 3 >>>> End of Message
[2009-08-08 05:43:31 GMT] socket 3 <<<< ServerVersion: 2.2.4
[2009-08-08 05:43:32 GMT] socket 3 <<<< ServerType: 1
[2009-08-08 05:43:32 GMT] socket 3 <<<< WorkUnit: 108527*6^141131+1 1249710211 108527 6 141131 1
[2009-08-08 05:43:32 GMT] socket 3 <<<< ServerType: 1
[2009-08-08 05:43:32 GMT] socket 3 <<<< WorkUnit: 87800*6^141133+1 1249710211 87800 6 141133 1
[2009-08-08 05:43:32 GMT] socket 3 <<<< ServerType: 1
[2009-08-08 05:43:32 GMT] socket 3 <<<< WorkUnit: 124125*6^141134+1 1249710211 124125 6 141134 1
[2009-08-08 05:43:32 GMT] socket 3 <<<< End of Message
[2009-08-08 05:43:32 GMT] socket 3 >>>> GETGREETING
[2009-08-08 05:43:32 GMT] socket 3 <<<< ############
[2009-08-08 05:43:32 GMT] socket 3 <<<< Welcome to the CRUS G3000 PRPnet beta test server! :-D
[2009-08-08 05:43:32 GMT] socket 3 <<<< Server is running PRPnet v2.2.3
[2009-08-08 05:43:32 GMT] socket 3 <<<< ############
[2009-08-08 05:43:32 GMT] socket 3 <<<< OK.
[2009-08-08 05:43:32 GMT] socket 3 >>>> QUIT
[2009-08-08 05:48:37 GMT] rps: 108527*6^141131+1 is not prime.  Residue 490957BF01DB7E2F
[2009-08-08 05:53:41 GMT] rps: 87800*6^141133+1 is not prime.  Residue B9EF669CBC119BEC
[2009-08-08 05:58:45 GMT] rps: 124125*6^141134+1 is not prime.  Residue 9F52401A6B78FAE8
[2009-08-08 05:58:45 GMT] Total Time: 16:39:12  Total Tests: 153  Total PRPs Found: 1
[2009-08-08 05:58:48 GMT] socket 3 >>>> FROM sm5ymt@pekhult.se _29 sm5ymt
[2009-08-08 05:58:48 GMT] rps: Returning work to server nplb-gb1.no-ip.org at port 3000
[2009-08-08 05:58:48 GMT] socket 3 >>>> RETURNWORK 2.2.4
[2009-08-08 05:58:48 GMT] socket 3 >>>> WorkUnit: 108527*6^141131+1 1249710211
[2009-08-08 05:59:19 GMT] socket 3 (nothing received) 
[2009-08-08 05:59:19 GMT] `À|ë¨: Count not verify existence of workunit on the server.
After this there is no more in the log. All my clients working on crus did hang !!

Lennart

Last fiddled with by Lennart on 2009-08-08 at 16:12
Lennart is offline   Reply With Quote
Old 2009-08-08, 16:12   #98
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

141518 Posts
Default

Quote:
Originally Posted by Lennart View Post
After this there is no more in the log. All my clients working on crus did hang !!

Lennart
Are you saying that the clients actually froze up, or just stopped getting work from CRUS?

Last fiddled with by mdettweiler on 2009-08-08 at 16:13 Reason: typo
mdettweiler is offline   Reply With Quote
Old 2009-08-08, 16:17   #99
Lennart
 
Lennart's Avatar
 
"Lennart"
Jun 2007

112010 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Are you saying that the clients actually froze up, or just stopped getting work from CRUS?

They froze all off them I have killed many clients now and restarted them again. I had to kill them !!

Lennart
Lennart is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
PRPNet server for personal use johnadam74 Software 2 2016-01-01 15:58
New SR5 PRPnet server online ltd Sierpinski/Riesel Base 5 15 2013-03-19 18:03
First PSP PRPnet 4.0.6 server online ltd Prime Sierpinski Project 9 2011-03-15 04:58
PRPnet 3.1.3 stress-test server mdettweiler No Prime Left Behind 40 2010-01-30 18:05
First pass PRPNet server out of work? opyrt Prime Sierpinski Project 6 2009-09-24 18:14

All times are UTC. The time now is 09:42.


Tue Jul 27 09:42:25 UTC 2021 up 4 days, 4:11, 0 users, load averages: 2.11, 1.96, 1.88

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.