mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Prime Sierpinski Project (https://www.mersenneforum.org/forumdisplay.php?f=48)
-   -   PRPNet servers down? (https://www.mersenneforum.org/showthread.php?t=12655)

opyrt 2009-11-03 08:37

PRPNet servers down?
 
Hi.

I'm unable to get any work from any of the PRPNet servers. Are they down, or did I manage to grab all the WUs again? :down:

Mini-Geek 2009-11-03 12:16

Well, both are saying there aren't any candidates left, so I'd guess that's what's happening. :smile:

opyrt 2009-11-03 12:28

There should have been several hundred available on both servers from what I understand. So I suspect a faulty client has been able to reserve all candidates. I just hope it's not my fault again... I had a faulty switch here, so my computers Morbo, Fry and Zapp were unable to write to their config/checkpoint/log files which are stored on a network share (but still intermittantly able to reach the internet). If prpclient behaves the same way as llrnet when this happens, it just continues to download candidates until the server is empty.

Hopefully ltd will find that that's not the case and someone else than me is to blame for once. :redface:

Joe O 2009-11-03 12:53

[QUOTE=rogue;194619]I found a major bug in 2.4.3 that occurs when the server allows for more than 15 or so workunits at a time and the client grabs that many. This causes the client to crash when returning them.

Here is a consolidated list of changes from 2.4.3:
[list][*]all: Fix a crash that occurs when large messages (> 1000 bytes) are received.[*]server: Prevent server from double-checking primes.[*]server: Output a message if unable to open of the .removed files and keep candidate in the main file until the .removed file can be opened. This addresses a potential crash in which the server presumes that the .removed file could be opened.[/list]

opyrt 2009-11-03 14:44

[quote=Joe O;194658][quote=rogue;194619]I found a major bug in 2.4.3 that occurs when the server allows for more than 15 or so workunits at a time and the client grabs that many. This causes the client to crash when returning them.

Here is a consolidated list of changes from 2.4.3:
[LIST][*]all: Fix a crash that occurs when large messages (> 1000 bytes) are received.[*]server: Prevent server from double-checking primes.[*]server: Output a message if unable to open of the .removed files and keep candidate in the main file until the .removed file can be opened. This addresses a potential crash in which the server presumes that the .removed file could be opened.[/LIST][/quote]
[/quote]

That could be it, but default the clients are set to only download 1 WU at the time.

Joe O 2009-11-03 15:49

Try commenting out the double check line ie port 7101.
I can get to the port 7100 server but not the port 7101 server.

[CODE]k*b^n+/-c Total N Min N Max N FT Done FT Done Thru Max FT Done
79309*2^n+1 70 8397254 8499134 0 0 0
79817*2^n+1 167 8328351 8499791 0 0 0
90527*2^n+1 207 8324351 8499791 0 0 0
152267*2^n+1 106 8364867 8499963 0 0 0
156511*2^n+1 65 8346000 8498568 0 0 0
168451*2^n+1 114 8373240 8499528 0 0 0
222113*2^n+1 324 8380773 8499453 0 0 0
225931*2^n+1 122 8320568 8499656 0 0 0
237019*2^n+1 169 8388502 8499886 0 0 0 [/CODE]

opyrt 2009-11-03 17:05

[quote=Joe O;194682]Try commenting out the double check line ie port 7101.
I can get to the port 7100 server but not the port 7101 server.[/quote]

I can't get any work from 7100 either... :-/

ltd 2009-11-03 22:12

The server are working but one client ran wild and reserved all tests from both machine within a very short time. I will load some tests into both queues.

For the DC server this will be low n tests from another k. Both activities will take some time.

Sorry that I did not react earlier but I had no time to watch the forum before.

By the way it was not one of your machine opyrt.

ltd 2009-11-03 23:11

Hopefully the machine should be up again in the next 10 minutes.
Took a little bit longer as the admin program refused to work for me for unknown reason. So I had to edit some files by hand. ( Hope I made no error)

Used the downtime to upgrade to revision 2.4.4 of the server also.

opyrt 2009-11-03 23:14

Great job ltd, thanks for fixing it so fast! :smile:

ltd 2009-11-03 23:23

Both server are up again.
The main server has already handed out several tests and there were no requests from the runaway client. So I hope that there will be no empty queue anymore.

Sorry once more that I did not notice the problem earlier but today I totally ignored the forum.


All times are UTC. The time now is 18:42.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.