mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   No Prime Left Behind (https://www.mersenneforum.org/forumdisplay.php?f=82)
-   -   LLRnet servers for NPLB (https://www.mersenneforum.org/showthread.php?t=10042)

mdettweiler 2008-11-20 06:53

[quote=IronBits;149984]Looks like a total application crash.
Put a new copy of the binary in there and see if that fixes it.
It's not happy about something on the socket...
Could be another proxy server coming in.
Had all kinds of fits when Beyond was running is own llrnet server to connect to my Windows server, thus the major reason for switching to linux.
Who's mac addy is it that comes in when it crashes?
Good luck![/quote]
Hmm...it happened immediately after a prune, which was in turn preceded by a communication with one of Buster's clients.

I'll swap out the binary, see if it crashes any more after that...nobody's been talking to the server recently except for Buster and I, and I'm sure Buster isn't using a proxy server, so that can't be it. The first time it crashed was not too terribly long after I had rebooted the server. Possibly somewhere in the process of rebooting the server, the binary got corrupted? Anyway, I'll switch it with a new one, so if that's the problem it should be fixed shortly. :smile:

Edit: Okay, I've swapped out the binary with a copy of one that's currently in use on my dualcore, so it should be good. We'll see how it holds up. :smile:

mdettweiler 2008-11-20 15:53

[quote=mdettweiler;149986]Hmm...it happened immediately after a prune, which was in turn preceded by a communication with one of Buster's clients.

I'll swap out the binary, see if it crashes any more after that...nobody's been talking to the server recently except for Buster and I, and I'm sure Buster isn't using a proxy server, so that can't be it. The first time it crashed was not too terribly long after I had rebooted the server. Possibly somewhere in the process of rebooting the server, the binary got corrupted? Anyway, I'll switch it with a new one, so if that's the problem it should be fixed shortly. :smile:

Edit: Okay, I've swapped out the binary with a copy of one that's currently in use on my dualcore, so it should be good. We'll see how it holds up. :smile:[/quote]
Well, the server seems to have survived the night. Looks like it's good to go for the rally. :tu:

gd_barnes 2008-11-21 02:07

David,

Can you make sure that you load the file that I sent Weds. morning into port 400 sometime this evening or Friday morning before the rally starts?


Thanks,
Gary

IronBits 2008-11-21 04:28

Done

MyDogBuster 2008-11-21 16:22

Max,

I see a small problem with one of your scripts
The Time: 4448.0 is really 444.80. Alll my cores running on 4000 are in the 440 second range.
I don't think your run times are in the 12K range either.
Looks to be off by 1 decimal to the right.



user=MyDogBuster
[2008-11-20 06:59:12]
759*2^570284-1 is not prime. Res64: 887916AF14999EE1 Time : 4448.0 sec.
user=MyDogBuster
[2008-11-20 06:59:13]
605*2^570284-1 is not prime. Res64: C57B39193EA15AAA Time : 4559.0 sec.
user=MyDogBuster
[2008-11-20 07:00:10]
843*2^570284-1 is not prime. Res64: 67519D2631C816DD Time : 4457.0 sec.
user=mdettweiler
[2008-11-20 07:01:00]
809*2^570276-1 is not prime. Res64: 59D459B5DAE64BAB Time : 12154.0 sec.
user=MyDogBuster
[2008-11-20 07:03:09]
773*2^570284-1 is not prime. Res64: F8DC1C9F28AFF800 Time : 4656.0 sec.
user=MyDogBuster
[2008-11-20 07:05:46]
423*2^570284-1 is not prime. Res64: D9857189296C5476 Time : 5487.0 sec.

mdettweiler 2008-11-21 16:31

[quote=MyDogBuster;150094]Max,

I see a small problem with one of your scripts
The Time: 4448.0 is really 444.80. Alll my cores running on 4000 are in the 440 second range.
I don't think your run times are in the 12K range either.
Looks to be off by 1 decimal to the right.



user=MyDogBuster
[2008-11-20 06:59:12]
759*2^570284-1 is not prime. Res64: 887916AF14999EE1 Time : 4448.0 sec.
user=MyDogBuster
[2008-11-20 06:59:13]
605*2^570284-1 is not prime. Res64: C57B39193EA15AAA Time : 4559.0 sec.
user=MyDogBuster
[2008-11-20 07:00:10]
843*2^570284-1 is not prime. Res64: 67519D2631C816DD Time : 4457.0 sec.
user=mdettweiler
[2008-11-20 07:01:00]
809*2^570276-1 is not prime. Res64: 59D459B5DAE64BAB Time : 12154.0 sec.
user=MyDogBuster
[2008-11-20 07:03:09]
773*2^570284-1 is not prime. Res64: F8DC1C9F28AFF800 Time : 4656.0 sec.
user=MyDogBuster
[2008-11-20 07:05:46]
423*2^570284-1 is not prime. Res64: D9857189296C5476 Time : 5487.0 sec.[/quote]
That's because LLRnet doesn't record the *real* times on the server end--in fact, the client doesn't even send the times to the server, only the residual. So, then, what is the server putting down there? Actually, it's just the time that elapsed between when the k/n pair was first handed out, and when it was returned by the client. So, if the client has a WUCacheSize of 1, the timings on the server will *APPROXIMATELY* match the ones on the client; and if you have a WUCacheSize of 2, the server's recorded timings will be approximately double the client's. (Note that, of course, if you stop the client for a while, the time on the server will also be affected by that.)

At any rate, though, the timings on the server don't really mean much except for possibly some remotely-useful information in debugging. For example, sometimes they'll show up as "0.7 sec." or some other similarly unbelievable time, usually due to something complex happening with a k/n pair expiring and then being returned shortly after. And if a k/n pair is returned but had *not* been assigned, it will give "0.0 sec." as the time for the result. (You'll notice wacky times like this especially in the rejected results file.)

Hope this explains things a bit! :smile: It's kind of confusing at first, especially considering that these timings that are hardly even related to the client's timings are written down in the same place in the results where manual LLR would put the *real* times. :smile:

Max :smile:

MyDogBuster 2008-11-21 16:39

[QUOTE]Hope this explains things a bit![/QUOTE]

Makes sense. Just looked weird.

IronBits 2008-11-22 20:05

I've added another item of stats to the nplb web page. It's just a preliminary stage and gathering the information right now...

2008-11-22 10:58,MyDogBuster,56,
2008-11-22 10:58,IronBits,26,
2008-11-22 10:58,BlisteringSheep,1,
2008-11-22 11:58,MyDogBuster,225,
2008-11-22 11:58,IronBits,86,
2008-11-22 11:58,BlisteringSheep,1,
2008-11-22 12:58,MyDogBuster,324,
2008-11-22 12:58,IronBits,122,
2008-11-22 12:58,BlisteringSheep,1,
2008-11-22 12:58,gd_barnes,21,

[URL]http://nplb.ironbits.net/progress_400.txt[/URL]

IronBits 2008-11-23 07:17

Ok, now it's much easier to read and follow. :smile:
[URL]http://nplb.ironbits.net/progress_400.html[/URL]

I also changed the time the reports are generated to on the hour every hour.

gd_barnes 2008-11-23 18:12

[quote=IronBits;150308]Ok, now it's much easier to read and follow. :smile:
[URL]http://nplb.ironbits.net/progress_400.html[/URL]

I also changed the time the reports are generated to on the hour every hour.[/quote]


Cool new report David!

A couple of things:

1. You probably want to 'clear it out' once a day like you do the daily totals. It still has yesterday's and today's hourly stats in it. Otherwise it will get very big very fast. :smile:

2. The dashes in the date are making it where the date displays on 3 different lines causing it to only display a few lines per page (8 on my screen). Can you fix the HTML to make a fixed length width for the date column within the table? I think that will resolve the problem.

Thanks for the continued new cool features!


Gary

IronBits 2008-11-23 20:07

No problem, just added NOWRAP to the td :smile:
Now nothing will wrap no matter how your resize your browser window, or how small you monitor might be, or your desktop resolution.

Working on the stats output, ideally I want
[code]
$date (work was done on)
$userName 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23
$gd_barnes $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w
$ironbits $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w
($w = work completed in the past hour)

$date (work was done on)
$userName 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23
$gd_barnes $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w
$ironbits $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w $w
[/code]Then totals for the week, then month, then year...
etc.


All times are UTC. The time now is 22:54.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.