mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   No Prime Left Behind (https://www.mersenneforum.org/forumdisplay.php?f=82)
-   -   LLRnet servers for NPLB (https://www.mersenneforum.org/showthread.php?t=10042)

MyDogBuster 2009-01-24 21:38

Vaughn and I were both crunching on it earlier for a few hours

IronBits 2009-01-24 21:39

I checked the routing, dns, and re-started the server, let's see what happens next.
I also ran ./llrnet llrserver.lua -s
It removed one solved pair 1401/319483

Re-started the server again, and I see connections coming in. It's alive :smile:
Thanks for the heads up.

MyDogBuster 2009-01-24 21:41

Working okay now. Thanks.

IronBits 2009-01-25 03:27

If we can identify the following individuals, some on teams, so we can get their email address in a PM to me, I can get the database completed.
a - in the team column means you are not listed on a team.

[code]
[FONT=Fixedsys]AES -
Batalov -
db597 Singapura
Death Ukraine
DragonOrta XtremeSystems
ET_ Team_Italia
grobie Free-DC
jamers -
jcool XtremeSystems
JeffGilchrist Free-DC
masser -
McDuck -
michaf -
Renwood -
ShishFree-DC Free-DC
sm5ymt -
SMH -
smirre -
smithdc Ars Technica Team Prime Rib
spikey_richie AMD Users
templus -
tnerual Free-DC
vfrey -
Wabbit98 The Knights Who Say Ni![/FONT]
[/code]

Lennart 2009-01-25 04:32

[quote=IronBits;160293]If we can identify the following individuals, some on teams, so we can get their email address in a PM to me, I can get the database completed.
a - in the team column means you are not listed on a team.

[code]
[FONT=Fixedsys]AES -[/FONT]
[FONT=Fixedsys]Batalov -[/FONT]
[FONT=Fixedsys]db597 Singapura[/FONT]
[FONT=Fixedsys]Death Ukraine[/FONT]
[FONT=Fixedsys]DragonOrta XtremeSystems[/FONT]
[FONT=Fixedsys]ET_ Team_Italia[/FONT]
[FONT=Fixedsys]grobie Free-DC[/FONT]
[FONT=Fixedsys]jamers -[/FONT]
[FONT=Fixedsys]jcool XtremeSystems[/FONT]
[FONT=Fixedsys]JeffGilchrist Free-DC[/FONT]
[FONT=Fixedsys]masser -[/FONT]
[FONT=Fixedsys]McDuck -[/FONT]
[FONT=Fixedsys]michaf -[/FONT]
[FONT=Fixedsys]Renwood -[/FONT]
[FONT=Fixedsys]ShishFree-DC Free-DC[/FONT]
[FONT=Fixedsys]sm5ymt -[/FONT]
[FONT=Fixedsys]SMH -[/FONT]
[FONT=Fixedsys]smirre -[/FONT]
[FONT=Fixedsys]smithdc Ars Technica Team Prime Rib[/FONT]
[FONT=Fixedsys]spikey_richie AMD Users[/FONT]
[FONT=Fixedsys]templus -[/FONT]
[FONT=Fixedsys]tnerual Free-DC[/FONT]
[FONT=Fixedsys]vfrey -[/FONT]
[FONT=Fixedsys]Wabbit98 The Knights Who Say Ni![/FONT]
[/code][/quote]


sm5ymt is me It is my hamcall.
smirre is my dog so it is me to :smile:

sm5ymt a t pekhult d o t se

/Lennart

IronBits 2009-01-25 04:56

Ok, two down :smile:
Did you want to join a team or make up a team or be part of a team?
Or are you a one person team :wink: which is ok to :smile:

Lennart 2009-01-25 05:10

[quote=IronBits;160298]Ok, two down :smile:
Did you want to join a team or make up a team or be part of a team?
Or are you a one person team :wink: which is ok to :smile:[/quote]


Se team tread. :smile:

/Lennart

MyDogBuster 2009-01-25 05:23

David, never got an email on my last prime. Hope it's not contagious.

Bingo Thanks

IronBits 2009-01-25 07:48

hehehe - glad you got that email, it was the last vbscript generated one :wink:

The auto-notify is now working from the Server directly, under linux, thanks to AMDave.
I am not running the vbscript auto-notify any longer starting now.

If you do not get notified by email, let us know...

AMDave 2009-01-25 11:11

Milestone:

IB server clocked over the 5 million pair mark, today.
[url]http://stats.ironbits.net/statsnew/stats_by_server.php[/url]

gd_barnes 2009-01-26 06:25

David,

Please set the JobMaxTime for port 8000 at 1 day. I'll have you set it back to 3 days sometime late Monday night.

We have some low k/n pairs that need to be cleared out.


Thanks,
Gary

MyDogBuster 2009-01-26 08:28

GB4000 is down

AMDave 2009-01-26 09:21

confirmed

actually it appears to br GB as a whole, not just the port
I cannot access the web files either
(could be the dyndns thing again - but I don't get those problems on my own no-ip account so it is anyone's guess)

MyDogBuster 2009-01-26 10:44

Okay all you guru's out there. Don't laugh when you read this but I need a sort utility for WINDOWS that allows me sort sort on more than 1 key at a time (ie sort by n value primary and k value secondary). I said don't laugh.

I have searched the web and can't find anything.

The DOS sort is limited to 1 key only (surprise).

AMDave 2009-01-26 11:09

excel and open office will do that
but if you mean in a script, then most scripting languages will let you do that as well
/ed -
oh, heh! DOS!? :(
not really a scripting language
yes there is a quick solution
concatenate both the keys together as a single string and then sort that as 1 key
(you have to be careful how you do that make sure you pad each with enough zeros)
then split it open again
-ed/

BTW - the GB server is back up and running

MyDogBuster 2009-01-26 11:28

[quote]
excel and open office will do that
but if you mean in a script, then most scripting languages will let you do that as well
/ed -
oh, heh! DOS!? :(
not really a scripting language
yes there is a quick solution
concatenate both the keys together as a single string and then sort that as 1 key
(you have to be careful how you do that make sure you pad each with enough zeros)
then split it open again
-ed/

BTW - the GB server is back up and running
[/quote]

Thanks mate. I think I found a solution. MS Access will also do what I want.

The GB's are still not working for me??????? I can get the status page but I still can't get any work.

gd_barnes 2009-01-26 11:55

[quote=MyDogBuster;160464]Thanks mate. I think I found a solution. MS Access will also do what I want.

The GB's are still not working for me??????? I can get the status page but I still can't get any work.[/quote]


Bad machine again. I swapped the hard drive to another but don't know how to restart the server.

CRUS port 3000 seems to be up. All I had to do is hit enter on the "prpserver" executable. Ian, you might try that one to see if you can get pairs.

Going to bed in 30 mins. If I get an answer by then, I'll try to start it. Otherwise, Max is aware of it and will start it when he gets on or it will be 8 hrs. before I'm on.


G

gd_barnes 2009-01-26 12:11

[quote=gd_barnes;160467]Bad machine again. I swapped the hard drive to another but don't know how to restart the server.

CRUS port 3000 seems to be up. All I had to do is hit enter on the "prpserver" executable. Ian, you might try that one to see if you can get pairs.

Going to bed in 30 mins. If I get an answer by then, I'll try to start it. Otherwise, Max is aware of it and will start it when he gets on or it will be 8 hrs. before I'm on.


G[/quote]


OMG, I actually got lucky and figured it out after multiple different attempts. Hopefully I didn't jack anything up. Here is the command that I used to restart port G4000 at the command prompt:

./llrnet llrserver.lua

Can someone verify that I executed the correct command?

Knowing absolutely nothing about the syntax of the command or what file to execute other than to execute a program in Linux, you have to start with "./", I'm floored. Right after I did it, it started "proposing", i.e. sending, pairs to Ian and in the system monitor, it showed an occurrence of LLRnet that is sleeping, which is exactly what it always does when the server is up. Now I can sleep better.

When you're good, you're good. lmao (ya right!)

Edit: Now it just shut itself back down after doing some pruning for no particular reason that I can see so I restarted it and it's running again. I suspect that's nothing I've done. I seem to recall others having a problem when a server goes down. Sometimes you have to wait a while or something like that to get it to stay up. Oh well, if it goes down again, Max should bring it back up or I will later today.


Gary

mdettweiler 2009-01-26 18:05

[QUOTE=gd_barnes;160468]OMG, I actually got lucky and figured it out after multiple different attempts. Hopefully I didn't jack anything up. Here is the command that I used to restart port G4000 at the command prompt:

./llrnet llrserver.lua

Can someone verify that I executed the correct command?

Knowing absolutely nothing about the syntax of the command or what file to execute other than to execute a program in Linux, you have to start with "./", I'm floored. Right after I did it, it started "proposing", i.e. sending, pairs to Ian and in the system monitor, it showed an occurrence of LLRnet that is sleeping, which is exactly what it always does when the server is up. Now I can sleep better.

When you're good, you're good. lmao (ya right!)

Edit: Now it just shut itself back down after doing some pruning for no particular reason that I can see so I restarted it and it's running again. I suspect that's nothing I've done. I seem to recall others having a problem when a server goes down. Sometimes you have to wait a while or something like that to get it to stay up. Oh well, if it goes down again, Max should bring it back up or I will later today.


Gary[/QUOTE]
Yep, you got it pretty well right--other than G4000 crashing shortly after (which isn't your fault anyway), it sounds like it's all working fine after you started it up. :smile:

I've noticed you've sent me a PM about this--I'll read and reply to that shortly with more details.

IronBits 2009-01-27 00:07

Ok, stopped the server and flushed it out, took it to one day, flushed it out, put it back to 3 days and restarted it.

See what you got.

Can anyone confirm/deny
1) That we [B]have[/B] to take a server offline before appending more pairs to knpairs.txt ???
2) That the server only reads the llr-serverconfig.txt file ONCE at startup so any changes made to llr-serverconfig.txt after a server is running will be ignored until it is stopped and restarted?

IF the above are both true, then is there anything in a .lua file or something else that can be done so that if it sees a change, it re-reads it?
(thinking dnetc clients...)

[quote=gd_barnes;160436]David,

Please set the JobMaxTime for port 8000 at 1 day. I'll have you set it back to 3 days sometime late Monday night.

We have some low k/n pairs that need to be cleared out.


Thanks,
Gary[/quote]

mdettweiler 2009-01-27 02:16

[quote=IronBits;160579]Can anyone confirm/deny
1) That we [B]have[/B] to take a server offline before appending more pairs to knpairs.txt ???
2) That the server only reads the llr-serverconfig.txt file ONCE at startup so any changes made to llr-serverconfig.txt after a server is running will be ignored until it is stopped and restarted?

IF the above are both true, then is there anything in a .lua file or something else that can be done so that if it sees a change, it re-reads it?
(thinking dnetc clients...)[/quote]
Well, based on what I've seen, neither the client or server reads any of the config or LUA files after it's initially restarted. Though, I guess it would be possible to add some code that makes it automatically re-read the serverconfig file, say, every 5 minutes or so...

IronBits 2009-01-27 02:26

Once an hour, or better yet, every time it does the prunePeriod would be fine to.
I keep mine at hourly ;)
Adding another batch of pairs to knpairs.txt while the server was running would be a great feature to.

mdettweiler 2009-01-27 04:12

Okay, I just logged on to Gary's machine and verified that all the servers are fully up and running the way they should be (except for the while-loop thing that I use as a failsafe for server crashes, but which the servers can still function quite well without). All GB-servers are now officially open for business again. :smile:

Brucifer 2009-01-28 18:53

In linux it would seem that you should be able to make a change to the configuration file, and then do a "ps- aux" to get the PID number of the running process for the llrnet server. And assuming that it is 2222 for example, do a
kill -HUP 2222
and hit return and it should reread the config file and keep on running without interuption.

vaughan 2009-02-01 21:54

Are the 8000 and 9000 servers running? Today I'm seeing tasks accumulating in the "unsent result(s)" pane of the GUI.

Flatlander 2009-02-01 21:54

Port 8000 is sleeping here. (For over an hour.)

IronBits 2009-02-01 22:39

Damn it!!! /bangs head
I moved the motherboard box out from under the motherboard so it could get cooled off better by having the bottom of the motherboard exposed to cooler air.
ya, about an hour ago... sheesh
Rebooted, all servers back running again.
Sorry for the inconvenience it may have caused anyone.

vaughan 2009-02-02 01:17

Thanks IB.

Has anyone ever tried sitting the motherboard on a laptop cooler so the fans(s) can supply a gentle but constant airflow over the back of the mobo? The laptop cooler could run off the USB port on the mobo.

IronBits 2009-02-02 03:21

I have 24/7/365 dedicated A/C in the computer room.
The motherboards sit on slotted shelves where the air can circulate around everything.
All the Servers, switch, router and cable modem get clean power from a dedicated pure sine wave, instant switching, UPS.

It was one of the original ones that I never got around to removing the box out from underneath it.
I must have bumped the nic card setting it down or pulled on the cable when I tried to slide the box out.
I should have checked...

henryzz 2009-02-04 07:39

where is the last discussion we had about NLPB's goals?

gd_barnes 2009-02-05 23:20

Hum. I don't seem to be able to connect to:

[URL="http://nplb.ironbits.net"][COLOR=#810081]http://nplb.ironbits.net[/COLOR][/URL].
[URL]http://stats.ironbits.net/statsnew/progress.php?a=reset[/URL]


David; any idea why there would be a problem?


Gary

gd_barnes 2009-02-05 23:21

[quote=henryzz;161506]where is the last discussion we had about NLPB's goals?[/quote]


Max, can you answer Henry here?

I honestly can't remember and won't have the time to find it for another day or so yet. (Just getting back from a business trip.)

mdettweiler 2009-02-06 01:01

[quote=gd_barnes;161734]Hum. I don't seem to be able to connect to:

[URL="http://nplb.ironbits.net"][COLOR=#810081]http://nplb.ironbits.net[/COLOR][/URL].
[URL]http://stats.ironbits.net/statsnew/progress.php?a=reset[/URL]


David; any idea why there would be a problem?


Gary[/quote]
Hmm...works fine for me. Maybe it was only a brief downtime?

mdettweiler 2009-02-06 01:02

[quote=gd_barnes;161735]Max, can you answer Henry here?

I honestly can't remember and won't have the time to find it for another day or so yet. (Just getting back from a business trip.)[/quote]
Well, actually I honestly can't remember either, I was hoping you could. :smile:

IronBits 2009-02-06 04:01

I'm pretty sure the goal was to find primes and crunch as hard as we could to find them before anyone else :wink:

Web server ran out of swap space and crashed, Bok is taking care of it so it won't happen again...
The good news is, the Servers kept on going because they are on another box :grin:

MyDogBuster 2009-02-06 05:22

GB4000 rejecting all my results??????????

I also can get the status page. ????????????

mdettweiler 2009-02-06 05:31

[quote=MyDogBuster;161781]GB4000 rejecting all my results??????????

I also can get the status page. ????????????[/quote]
Hmm...that's weird. I can get to it just fine, both via the web page and via VNC/SSH.

Maybe something along the pipes was down briefly, and then came back up in time before I checked it? :smile:

henryzz 2009-02-06 07:47

[quote=henryzz;161506]where is the last discussion we had about NLPB's goals?[/quote]

[quote=gd_barnes;161735]Max, can you answer Henry here?

I honestly can't remember and won't have the time to find it for another day or so yet. (Just getting back from a business trip.)[/quote]

[quote=mdettweiler;161754]Well, actually I honestly can't remember either, I was hoping you could. :smile:[/quote]
i have just checked the 5th 6th 7th 8th and 9th drive threads
i have also checked both primes threads and the doublecheck
edit: done this thread as well

AMDave 2009-02-06 10:03

I found a few
this looks like one of the most recent
[url]http://www.mersenneforum.org/showpost.php?p=154849&postcount=21[/url]

also some notes in the news thread around september

henryzz 2009-02-06 16:50

[quote=AMDave;161808]I found a few
this looks like one of the most recent
[URL]http://www.mersenneforum.org/showpost.php?p=154849&postcount=21[/URL]

also some notes in the news thread around september[/quote]
thanks
i have found the one i was looking for
it starts [url]http://www.mersenneforum.org/showthread.php?p=154533#post154533[/url]

mdettweiler 2009-02-06 17:23

[quote=AMDave;161808]I found a few
this looks like one of the most recent
[URL]http://www.mersenneforum.org/showpost.php?p=154849&postcount=21[/URL]

also some notes in the news thread around september[/quote]
Hmm...interesting! According to that post, a suggested goal for k=1005-2000 was to have everything done up to n=450K by the end of 2009. We're almost to 450K already! :grin:

Of course, there's still a lot of work in the n=200K-350K range to be completed, but it sure won't take all year to complete. :wink:

If we combine the goals shown in the "Come join us!" thread, with the suggested ones in the above post, we get the following:

[LIST][*]Have the same k-values tested up to n=1M by 12/31/2011.[*]Have all k-values double-checked to n=260K by 12/31/2009.[*]Become the top prime search project on the web by number of top-5000 primes by 9/30/2009![*]Have all k<300 tested up to n=850K by 12/31/20009.[*]Have all k=300-1001 tested up to n=800K by 12/31/2009.[*]Have all k=1005-2000 tested up to n=450K by 12/31/2009.[/LIST]The last two could probably use some revision. The former of the two (k=300-1001 up to 800K) is quite reachable, though we might be able to get that to 850K if we hammer that range hard with a few good-sized rallies. And as for k=1005-2000, we'll be long past 450K by the end of the year; maybe we could tighten that goal to 450K (including all 200K-350K work) by the end of April? That would be a slight bit more of a stretch--and as we learned from the 1st Drive, there's nothing like a little pressure to get everyone motivated. :wink:

Max :smile:

gd_barnes 2009-02-06 22:30

[quote=mdettweiler;161857]Hmm...interesting! According to that post, a suggested goal for k=1005-2000 was to have everything done up to n=450K by the end of 2009. We're almost to 450K already! :grin:

Of course, there's still a lot of work in the n=200K-350K range to be completed, but it sure won't take all year to complete. :wink:

If we combine the goals shown in the "Come join us!" thread, with the suggested ones in the above post, we get the following:

[LIST][*]Have the same k-values tested up to n=1M by 12/31/2011.[*]Have all k-values double-checked to n=260K by 12/31/2009.[*]Become the top prime search project on the web by number of top-5000 primes by 9/30/2009![*]Have all k<300 tested up to n=850K by 12/31/20009.[*]Have all k=300-1001 tested up to n=800K by 12/31/2009.[*]Have all k=1005-2000 tested up to n=450K by 12/31/2009.[/LIST]The last two could probably use some revision. The former of the two (k=300-1001 up to 800K) is quite reachable, though we might be able to get that to 850K if we hammer that range hard with a few good-sized rallies. And as for k=1005-2000, we'll be long past 450K by the end of the year; maybe we could tighten that goal to 450K (including all 200K-350K work) by the end of April? That would be a slight bit more of a stretch--and as we learned from the 1st Drive, there's nothing like a little pressure to get everyone motivated. :wink:

Max :smile:[/quote]



A slight modification I think: n=800K for 5th-6th-7th drive, n=500K for the 8th drive, and completion of the n=200K-350K portion of the 9th drive by year-end. n=850K on all of the 5th-6th-7th drives would be difficult with all of the work yet to do on the 8th/9th/double-check/individual-k drives. It'd be preferrable to push some unreserved individual k's 300-400 above n=600K before continuing to push all k=400-1001 much above n=700K. In other words, spread it out a little more so that we keep things moving up at all k-ranges without leaving holes.

That reminds me: We're going to plan another rally for 2 weekends from now. We'd like to do it just on our 5th drive for k=400-600 for n=600K-1M since we'll run out of sieve file on the 8th drive by n=500K. In the next few days, I'll be pulling all but one quad off of the 8th drive in favor of the 5th/6th drives.

Opinions? Too close to last one? Not often enough? A different drive instead? We're trying to find a happy medium that makes them somewhat "special" so that we have lots of fun, i.e. not more than once a month, but also often enough to keep things cooking along.

On another note, we'll open up the 6 heavy-weight k drive on the k=300-400 for n=600K-1M range here in the next few days. We'll run it on my port 8000.

On a final note: I'm currently sieving k=1003-2000 for n=500K-1M right now on one quad; increasing to 3 quads within a week. At that rate, it will still be several months before it is ready without a large team effort. I'll hit P=1T on Feb. 11th. At that point, if anyone has a spare core (or quad) or two to assist, I'd be grateful. With all the work that we have, I don't want to put a high-priority on it because smaller k's LLR faster so would like to push k=300-1001 higher before putting a big priority on going higher with k=1003-2000. The main objective of k=1003-2000 was to fill in the messy holes left behind in the k-range. At n=500K, that will have been mostly accomplished. Of course, we also would like to test ranges that have been little searched and n>500K would fill the bill so we definitely don't want to wait too long with it.


Gary

AMDave 2009-02-07 07:23

2 weeks time is great.

MyDogBuster 2009-02-07 16:15

[QUOTE]Opinions? Too close to last one? Not often enough? A different drive instead? We're trying to find a happy medium that makes them somewhat "special" so that we have lots of fun, i.e. not more than once a month, but also often enough to keep things cooking along.[/QUOTE]

I'm going to take a pass on any rally. The last one was way too much work for me 'flipping' all my cores to something different. With all my CRUS work and independent searches, it was a gigantic mess restarting those after the rally. I'm still not sure I got it right. I'll just plod along with Drive #7 and a few ranges and individual k's[I].[/I] I'll still be doing the same goal, just not in the same area.

As far as sieving goes, I'll have a quad available in about 4 days. It can sieve for as long as you need it.

MyDogBuster 2009-02-08 01:38

GB4000 seems to be down again, status page also

mdettweiler 2009-02-08 02:52

[quote=MyDogBuster;162031]GB4000 seems to be down again, status page also[/quote]
Hmm...works for me. :confused:

MyDogBuster 2009-02-12 06:33

Tried GB8000. Couldn't connect.

gd_barnes 2009-02-12 09:41

So much for moving 2 cores from port IB4000 to G8000. Port G8000 isn't working.

Max, I tried the "./llrnet llrserver.lua" command to attempt to start it...no go. It gives a cryptic error message. Can you look into the problem?


Gary

AMDave 2009-02-12 11:02

This should help
[url]http://www.nulon.com.au/products.php?productName=Start_Ya_Bastard_Instant_Engine_Starter[/url]

(Yes - they are for real :devil: )

mdettweiler 2009-02-12 15:20

[quote=gd_barnes;162557]So much for moving 2 cores from port IB4000 to G8000. Port G8000 isn't working.

Max, I tried the "./llrnet llrserver.lua" command to attempt to start it...no go. It gives a cryptic error message. Can you look into the problem?


Gary[/quote]
Hmm...I just checked the server and it is indeed still running. :huh:

Gary, one thing to keep in mind: if you ever need to restart any of the servers, make sure you do it by logging into the VNC desktop on crunchford, and doing it from there. That's where I've got all the terminal windows already open for the servers. It sounds like you tried to start up the server a *second* time on the main desktop, and (fortunately!) it gave you the cryptic error message and didn't start since there was already an instance of that same server running.

Oh, wait--doh! I think I know what may be causing the problem. I don't think I ever remember setting up the port forwarding for port 8000... *bangs head* :redface:

Okay, I've just checked and it turns out that the absence of port forwarding on that port was indeed the cause of the problem. It is now remedied. :smile:

BTW--one other thing I'd like to mention. I've got all the public servers running within a while loop, which means that if the server stops running for whatever reason, it will instantaneously and automatically be restarted. So, in most cases, even if the server crashes, it shouldn't need to be restarted manually; and Gary, if you do ever think it needs to be restarted manually and I'm not around at the time, make sure you check the VNC desktop *first* to see if it's still running on there. If, say, the server is frozen and thus didn't exit and allow the while loop to restart it, then just hit Ctrl-C on the server and it will shut off, then automatically be restarted anew. :smile:

Max :smile:

gd_barnes 2009-02-13 22:48

[quote=mdettweiler;162579]Hmm...I just checked the server and it is indeed still running. :huh:

Gary, one thing to keep in mind: if you ever need to restart any of the servers, make sure you do it by logging into the VNC desktop on crunchford, and doing it from there. That's where I've got all the terminal windows already open for the servers. It sounds like you tried to start up the server a *second* time on the main desktop, and (fortunately!) it gave you the cryptic error message and didn't start since there was already an instance of that same server running.

Oh, wait--doh! I think I know what may be causing the problem. I don't think I ever remember setting up the port forwarding for port 8000... *bangs head* :redface:

Okay, I've just checked and it turns out that the absence of port forwarding on that port was indeed the cause of the problem. It is now remedied. :smile:

BTW--one other thing I'd like to mention. I've got all the public servers running within a while loop, which means that if the server stops running for whatever reason, it will instantaneously and automatically be restarted. So, in most cases, even if the server crashes, it shouldn't need to be restarted manually; and Gary, if you do ever think it needs to be restarted manually and I'm not around at the time, make sure you check the VNC desktop *first* to see if it's still running on there. If, say, the server is frozen and thus didn't exit and allow the while loop to restart it, then just hit Ctrl-C on the server and it will shut off, then automatically be restarted anew. :smile:

Max :smile:[/quote]


What do you mean it was "still" running? It was never running to begin with! I tried moving my 2 cores just a few hours after you presumably started it. lol Oh well, I'm glad you got it running. Unfortunately I don't have time to move my 2 cores over from port 7000 at the moment.

Can I make a request? When starting a new server, please run a couple of tests on it to make sure it works before opening it up to the public. Thanks.


Gary

gd_barnes 2009-02-13 23:00

Max,

Two problems on the port G8000 status page:

1. The "All primes found so far on this server" is a dead link. Perhaps it is because no primes have been found yet. If so, it should still link to a page that says "No primes found yet". (Be sure and delete the message once a prime is found.)

2. There is no previous results file in the "All results files for NPLB servers". There should be one because the first k/n pair processed today was n=600285 and I know there were some pairs for lower n's loaded into the server.

In the future, I'd like to work on a "test plan" for all of the scenarios on new servers before rolling them out publicly, especially when we do the PRPnet server.


Thanks,
Gary

mdettweiler 2009-02-14 12:01

[quote=gd_barnes;162750]Max,

Two problems on the port G8000 status page:

1. The "All primes found so far on this server" is a dead link. Perhaps it is because no primes have been found yet. If so, it should still link to a page that says "No primes found yet". (Be sure and delete the message once a prime is found.)

2. There is no previous results file in the "All results files for NPLB servers". There should be one because the first k/n pair processed today was n=600285 and I know there were some pairs for lower n's loaded into the server.

In the future, I'd like to work on a "test plan" for all of the scenarios on new servers before rolling them out publicly, especially when we do the PRPnet server.


Thanks,
Gary[/quote]
Regarding #1: ah, yes, I forgot about that. With G4000 we never ran into that problem because I'd ran a few small known primes through the server anyway to test the functionality of my scripts. I'll see what I can do about it, though as you'll see below this is the least of our worries for now...

Regarding #2: Ouch! I just realized why that's happening: I just realized that my copy-off script was accidentally set to use "4000" in the results file names for port 8000, instead of "8000"! And since the script does the port 8000 results files *after* it does the results for port 4000 from that day....the 8000 results file would overwrite the 4000 results file for that day! This means that we are essentially missing all of our results from port 4000 for 2/13 and 2/14. :ouch2:

I've got the results file name fixed so it won't happen again, and I also renamed the mislabeled files to their correct port #, but there's nothing I can do to bring back the two days' worth of results we lost from G4000. David, would it be possible for you to take all the G4000 results in the DB imported on 2/13 and 2/14, dump them to a file, and send them to me? We may be able to salvage most of the lost work that way. [I](Edit: Correction, make that the results from 2/12 and 2/13. I forgot for a moment that since the daily files are copied off at midnight, the datestamp on the file is actually a day later from what the individual entries in the DB will show.)[/I]

Well, to look on the bright side, at least most of the missing results still made it into the DB. That's because David's server copies off my 15-minute results file updates, as well as the daily files. So, it sounds like at the most we've lost 30 minutes of results (15+15, from the two days) since that's the only thing that (theoretically) doesn't make it into the 15-minute update files. I say "theoretically" because sometimes in actual practice the last day's results will stick around for ~15 minutes after the daily copy-off since the status page updated just before the copy-off script ran. So, we may have been fortunate and didn't lose anything.

This also means that a few G8000 results were mistakenly imported as G4000. This shouldn't be a problem; all it will mean is that there'll be a few extra results showing up on the progress table for G4000 on 2/13 and 2/14. Once the numbers drop off the progress table, all that matters is that the results are in the database and have the "GB" server code on them; at that point, the port number isn't displayed on any chart and doesn't really make that much of a difference.

Gary, as you said in your previous post, you're right, I really should have done some more testing before telling everyone this server was online. :rolleyes: However, if it's any comfort, the only thing that would have fixed is the thing with the port forwarding not being online at first--the results file problem was somewhat buried and hard to detect. I missed it when I originally added port 8000 to the copy-off script a few months ago, and we never saw it give us any problems since there were no results from port 8000 to overwrite the port 4000 results--until now. Thus, that error, at least, most likely could not have been detected until we actually saw it mess up.

Next time I'll be sure to watch veeeeery closely when I make such changes to my scripts! :smile:

Max :smile:

AMDave 2009-02-14 13:06

csv.zip file sent via email

mdettweiler 2009-02-14 20:15

[quote=AMDave;162805]csv.zip file sent via email[/quote]
Thanks! I just reviewed the data and it looks pretty well complete. I'll get it converted back to the original LLRnet results format and posted on the [url]http://nplb-gb1.no-ip.org/llrnet/results/[/url] website shortly. :smile:

henryzz 2009-02-14 20:24

has the redirect from port 400 to port 4000 been removed today ironbits?
i have just had to change my client to 4000
i have just installed a second network card in my pc and i though for ages that that was causing the problem but after lots of fiddling around i remembered the port number had changed and i hadnt changed my clients

mdettweiler 2009-02-14 20:34

[quote=mdettweiler;162827]Thanks! I just reviewed the data and it looks pretty well complete. I'll get it converted back to the original LLRnet results format and posted on the [URL]http://nplb-gb1.no-ip.org/llrnet/results/[/URL] website shortly. :smile:[/quote]
Okay, I've uploaded the reproduced results files to the web site. The results are a little out of order within the files (possibly due to how they're stored in the DB?), but they should all be in the correct day's results file. So, even though they may look a little funny at first glance, everything is still where it's supposed to be. :smile:

IronBits 2009-02-14 21:42

The redirect for port 400 is no longer there, as you found out.

MyDogBuster 2009-02-14 22:21

Looks like Gary's servers changed their IP address again. I had to do the flushdns thingy.

gd_barnes 2009-02-15 12:29

[quote=mdettweiler;162798]
Regarding #2: Ouch! I just realized why that's happening: I just realized that my copy-off script was accidentally set to use "4000" in the results file names for port 8000, instead of "8000"! And since the script does the port 8000 results files *after* it does the results for port 4000 from that day....the 8000 results file would overwrite the 4000 results file for that day! This means that we are essentially missing all of our results from port 4000 for 2/13 and 2/14.
[/quote]

So, how are Karsten or I supposed to balance the results? Ugh!

I have to beg to differ with you. Correct test plans would quickly catch this. I've worked in the programming industry long enough to know it. You pick what and when you want tested and run TEST data through it at the time that simulates the test that you want to conduct. In this case, run test data from 11:45 PM to 12:15 AM local time and verify that the results are properly being copied off. If not, it's not a big deal because it's misc. test data that we care nothing about.

When we say a test plan, we're talking about using test and not production data. If we "test in production" like this, we get burned.

I know I'm beating a dead horse now but:

For the PRPnet server, I would suggest that we start putting a test plan together and creating some test data for it fairly soon and long before we actually get to loading production data into it.

BTW, I can break anything. lol I've tested legacy stuff for so many years; if there's a bug in something that I'm quite familiar with, I will find the scenario(s) where it won't work. Ian did the same type of work that I did on legacy systems. I'm sure he can do the same if he is familiar with the process being tested.

One final thing: Code reviews can help too. For the PRPnet server, I'd suggest having David and/or Rogue do a code-review on your code before beginning testing. Frequently that will catch more than testing can.


Gary

mdettweiler 2009-02-15 14:44

[quote=gd_barnes;162870]So, how are Karsten or I supposed to balance the results? Ugh![/quote]
Well, that actually wouldn't be too big a deal. Remember, all (or at least almost all) of the results *are* there, and in the correct daily files. The only difference from normal is that they're a little out of order within their correct daily file--but that's no big deal since the results come in from the server out of order in the first place, and will have to be sorted anyway during processing/balancing. Even if there are any missing k/n pairs, they'll be caught quite quickly and easily during processing. :smile:

[quote]I have to beg to differ with you. Correct test plans would quickly catch this. I've worked in the programming industry long enough to know it. You pick what and when you want tested and run TEST data through it at the time that simulates the test that you want to conduct. In this case, run test data from 11:45 PM to 12:15 AM local time and verify that the results are properly being copied off. If not, it's not a big deal because it's misc. test data that we care nothing about.

When we say a test plan, we're talking about using test and not production data. If we "test in production" like this, we get burned.

I know I'm beating a dead horse now but:

For the PRPnet server, I would suggest that we start putting a test plan together and creating some test data for it fairly soon and long before we actually get to loading production data into it.

BTW, I can break anything. lol I've tested legacy stuff for so many years; if there's a bug in something that I'm quite familiar with, I will find the scenario(s) where it won't work. Ian did the same type of work that I did on legacy systems. I'm sure he can do the same if he is familiar with the process being tested.

One final thing: Code reviews can help too. For the PRPnet server, I'd suggest having David and/or Rogue do a code-review on your code before beginning testing. Frequently that will catch more than testing can.


Gary[/quote]Yeah, you're right--it definitely could have benefited from more testing, and I'll be sure to test the PRPnet server thoroughly before setting it up. However, I do have to beg to differ regarding your begging to differ :smile: about this results file blooper being caught by testing: yes, we may have been able to catch it with testing, but it still would have overwritten some G4000 results nonetheless. :smile:

Anyway, long story short: in the future when we try to set up something new we'll test it more thoroughly. :smile:

Max :smile:

gd_barnes 2009-02-17 11:27

[quote=mdettweiler;162888]Well, that actually wouldn't be too big a deal. Remember, all (or at least almost all) of the results *are* there, and in the correct daily files. The only difference from normal is that they're a little out of order within their correct daily file--but that's no big deal since the results come in from the server out of order in the first place, and will have to be sorted anyway during processing/balancing. Even if there are any missing k/n pairs, they'll be caught quite quickly and easily during processing. :smile:[/quote]


OH! Clearly I confused myself. (Nothing new there!) What is missing are the daily results and I assumed that was the only place to get the server results. Duh...server ignorance. Of course they can be gotten directly from the server results file(s). Thanks for the explanation.


Gary

AMDave 2009-02-17 12:16

Max,

can you help out kar_bon here?
[url]http://www.mersenneforum.org/newreply.php?do=newreply&noquote=1&p=163106[/url]

Please let me know if there are any files that need to be re-sent to the stats db.

kar_bon 2009-02-17 12:17

[QUOTE=AMDave;163115]Max,

can you help out kar_bon here?
[url]http://www.mersenneforum.org/newreply.php?do=newreply&noquote=1&p=163106[/url]

Please let me know if there are any files that need to be re-sent to the stats db.[/QUOTE]

i just did it. see there!

gd_barnes 2009-02-19 08:58

Port 2000 has now been dried and removed from the 1st post of this thread.

IronBits 2009-02-22 18:46

Not sure what's going on here, but Port 3000 is getting some weird log entries...
user=mdettweiler
[2009-02-22 10:31:19]
1167*2^468538-1 is not prime. Res64: D80357B4A66AA8F9 Time : 1079.0 sec.
[B]user=is prime![/B]
[2009-02-22 10:31:37]
1111*2^468539-1 is not prime. Res64: FB70B65AF0823FDA Time : 406.0 sec.
[B]user=is prime![/B]
[2009-02-22 10:31:50]
1173*2^468539-1 is not prime. Res64: 23D689C292263D8D Time : 406.0 sec.
[B]user=is prime![/B]
[2009-02-22 10:31:52]
1195*2^468539-1 is not prime. Res64: 64CCB477F93BB450 Time : 406.0 sec.
[B]user=is prime![/B]
[2009-02-22 10:31:55]
1233*2^468539-1 is not prime. Res64: C59D2E3D76083254 Time : 406.0 sec.
user=mdettweiler
[2009-02-22 10:34:48]
1221*2^468538-1 is not prime. Res64: F66FDDB862FD5766 Time : 1079.0 sec.

I made a quick script change to ignore this false positive when looking for "prime!" as indicator that a prime was found so they don't show up on the server status page.

I also took the liberty to change it to 1 day instead of 3 in anticipation of it drying up soon... it's almost out of work now.
Port [B]3000[/B] has 1753 remaining knpairs!

gd_barnes 2009-02-22 21:09

Does anyone know who the user "Bismarck" is? By looking at the pattern of the results and their timings, I believe the "is prime!" user is the same person. Here is the pattern that I noticed:

[code]
user=mdettweiler
[2009-02-22 10:24:01]
1365*2^468537-1 is not prime. Res64: 49CED42A1D7673D7 Time : 1080.0 sec.
user=mdettweiler
[2009-02-22 10:24:10]
1053*2^468538-1 is not prime. Res64: 4C8913CD4DBF4CF7 Time : 1082.0 sec.
user=Bismarck
[2009-02-22 10:24:52]
1195*2^460229-1 is not prime. Res64: B349FE3B1A166E40 Time : 107909.0 sec.
user=Bismarck
[2009-02-22 10:25:04]
1309*2^460229-1 is not prime. Res64: 2E2CFEAA0DDF9E5E Time : 107889.0 sec.
user=Bismarck
[2009-02-22 10:25:06]
1119*2^460229-1 is not prime. Res64: 75018450EEFA2A75 Time : 107924.0 sec.
user=Bismarck
[2009-02-22 10:25:09]
1225*2^460229-1 is not prime. Res64: 7B45F74768F58D94 Time : 107907.0 sec.
user=mdettweiler
[2009-02-22 10:27:40]
1113*2^468538-1 is not prime. Res64: 3657B5123158EEC7 Time : 1083.0 sec.
user=mdettweiler
[2009-02-22 10:27:43]
1121*2^468538-1 is not prime. Res64: 701D982CF36626C9 Time : 1079.0 sec.
user=Bismarck
[2009-02-22 10:28:12]
1335*2^460230-1 is not prime. Res64: 14E7E351AD6E97DD Time : 107907.0 sec.
user=Bismarck
[2009-02-22 10:28:25]
1119*2^460231-1 is not prime. Res64: 57626131318408A5 Time : 107889.0 sec.
user=Bismarck
[2009-02-22 10:28:28]
1293*2^460230-1 is not prime. Res64: D66A56CC99908594 Time : 107925.0 sec.
user=Bismarck
[2009-02-22 10:28:31]
1071*2^460231-1 is not prime. Res64: D130A523E123368F Time : 107906.0 sec.
user=mdettweiler
[2009-02-22 10:31:13]
1127*2^468538-1 is not prime. Res64: 7E35DB3815B2BC71 Time : 1080.0 sec.
user=mdettweiler
[2009-02-22 10:31:19]
1167*2^468538-1 is not prime. Res64: D80357B4A66AA8F9 Time : 1079.0 sec.
user=is prime!
[2009-02-22 10:31:37]
1111*2^468539-1 is not prime. Res64: FB70B65AF0823FDA Time : 406.0 sec.
user=is prime!
[2009-02-22 10:31:50]
1173*2^468539-1 is not prime. Res64: 23D689C292263D8D Time : 406.0 sec.
user=is prime!
[2009-02-22 10:31:52]
1195*2^468539-1 is not prime. Res64: 64CCB477F93BB450 Time : 406.0 sec.
user=is prime!
[2009-02-22 10:31:55]
1233*2^468539-1 is not prime. Res64: C59D2E3D76083254 Time : 406.0 sec.
user=mdettweiler
[2009-02-22 10:34:48]
1221*2^468538-1 is not prime. Res64: F66FDDB862FD5766 Time : 1079.0 sec.
user=mdettweiler
[2009-02-22 10:34:55]
1307*2^468538-1 is not prime. Res64: 0E04A6BEEBF0D243 Time : 1080.0 sec.
user=is prime!
[2009-02-22 10:35:02]
1369*2^468539-1 is not prime. Res64: 281B7B78A6192801 Time : 410.0 sec.
user=is prime!
[2009-02-22 10:35:15]
1025*2^468540-1 is not prime. Res64: 7CC78FCCADE81C3F Time : 410.0 sec.
user=is prime!
[2009-02-22 10:35:17]
1095*2^468540-1 is not prime. Res64: 823AC77A46476EF8 Time : 410.0 sec.
user=is prime!
[2009-02-22 10:35:20]
1127*2^468540-1 is not prime. Res64: A02973ED780AAE35 Time : 409.0 sec.
user=mdettweiler
[2009-02-22 10:38:25]
1365*2^468538-1 is not prime. Res64: 39F739FD2D371D6B Time : 1080.0 sec.
user=is prime!
[2009-02-22 10:38:27]
1199*2^468540-1 is not prime. Res64: 3080A732CB3965F3 Time : 410.0 sec.
user=mdettweiler
[2009-02-22 10:38:31]
1371*2^468538-1 is not prime. Res64: FB100CA8753B4226 Time : 1080.0 sec.
user=is prime!
[2009-02-22 10:38:39]
1323*2^468540-1 is not prime. Res64: 0CF35D0CDEF5C469 Time : 409.0 sec.
user=is prime!
[2009-02-22 10:38:42]
1337*2^468540-1 is not prime. Res64: 4F29054BD609FBC2 Time : 410.0 sec.
user=is prime!
[2009-02-22 10:38:45]
1027*2^468541-1 is not prime. Res64: B42DE629C308B9F0 Time : 410.0 sec.
user=is prime!
[2009-02-22 10:41:53]
1167*2^468541-1 is not prime. Res64: 390DD11C3B031441 Time : 411.0 sec.
user=mdettweiler
[2009-02-22 10:42:01]
1051*2^468539-1 is not prime. Res64: 88A175D3FDE869D2 Time : 1081.0 sec.
user=is prime!
[2009-02-22 10:42:03]
1195*2^468541-1 is not prime. Res64: 63D899C14D728086 Time : 409.0 sec.
user=mdettweiler
[2009-02-22 10:42:06]
1105*2^468539-1 is not prime. Res64: DCDFBA1A81A362B5 Time : 1076.0 sec.
user=is prime!
[2009-02-22 10:42:08]
1219*2^468541-1 is not prime. Res64: 5751DF9ED9282FE2 Time : 411.0 sec.
user=is prime!
[2009-02-22 10:42:11]
1221*2^468541-1 is not prime. Res64: 3E22984B5F585632 Time : 411.0 sec.
user=is prime!
[2009-02-22 10:45:18]
1291*2^468541-1 is not prime. Res64: B6AC8C163E486A6C Time : 411.0 sec.
user=is prime!
[2009-02-22 10:45:32]
1005*2^468542-1 is not prime. Res64: B204ABDC9E52ED88 Time : 413.0 sec.
user=is prime!
[2009-02-22 10:45:34]
[/code]


Notice the pattern of 2 results by Max followed by 4 results by Bismarck, the pattern of which repeated twice. Directly after that, there were 2 results by Max followed by 4 results by "is prime!", the pattern of which repeated twice. It then went to a slightly different pattern with the "is prime!" user in between Max's 2 results but that appears to only be because the "is prime!" cores are gaining on Max's cores at just slightly greater than a 2 to 1 ratio.

Barring some bug that just now came into the server results conversion process, I believe this is a sabatoge attempt of some kind. Whomever Bismarck is needs to cut it out! :mad: David, can you tell by the IP address where the "Bismarck" and "is prime!" connection is coming from and that it is indeed the same person?

I'm not sure how this will be scored in the stats but IMHO, unless it's a server error of some kind, the "is prime!" entries should not be scored at all. I'll leave that up to you server/stats guys.

BTW, nice work in supressing the entries from showing as prime in your listing! :smile:


Gary

IronBits 2009-02-22 21:37

[B]If that is the case...
Bismarck[/B] or [B]is prime![/B] has not picked up any work since I posted.
I assume because 'he/she/they' got their kicks/laughs and then were found out, they went away.
Didn't mean to crash their party... :grin:

No big deal, found another spot in my script to harden :smile:

Be interesting how the rest of the scripts/stats/database handle it... not a big issue, we'll script around the 'kiddies' :big grin:

A quick search for Boinc and Bismark show someone on Team China...
and it shows a Q6600 computer using Vista64...
[URL]http://boinc.bio.wzw.tum.de/boincsimap/show_user.php?userid=25765[/URL]

Don't know if it's the same person or not however... and to be fair, on another Boinc site, searching for Bismark shows the following

[code]
User Name Country Team
Bismark Ukraine Ukraine
Bismark Australia Australia
BISMARK Spain Spain
bismark International International
bismark Canada Canada
[/code]

Max is the only person coming in now, so will keep an eye out...

AMDave 2009-02-23 01:12

[QUOTE=IronBits;163609]Be interesting how the rest of the scripts/stats/database handle it... not a big issue, we'll script around the 'kiddies' :big grin:
[/QUOTE]
I have modified the teams table and admin pages
For the admins with access, I am adding some notes at the top of the admin page to explain why.

IronBits 2009-02-23 01:54

I've put my 3 cores on port 3000 until it's gone.

I'll have to go take a look at what AMDave has up his sleeve next ;)

MyDogBuster 2009-02-23 04:37

GB4000 seems to be down again.

gd_barnes 2009-02-23 05:19

[quote=MyDogBuster;163633]GB4000 seems to be down again.[/quote]


Argh! I see that my cores are sleeping on port GB8000 also.

I had to unplug and plug back in a wireless router to get my kid's laptop to connect to the internet but that shouldn't have made a difference. I only unplugged and replugged the little black chord from the back of the router. I didn't mess with the modem or any other connections.

I'll go take a look at the servers and see if I can do anything. Have you tried that command that you always do that seems to work when my IP address changes?


Gary

gd_barnes 2009-02-23 05:23

[quote=IronBits;163609][B]If that is the case... [/B]
[B]Bismarck[/B] or [B]is prime![/B] has not picked up any work since I posted.
I assume because 'he/she/they' got their kicks/laughs and then were found out, they went away.
Didn't mean to crash their party... :grin:

No big deal, found another spot in my script to harden :smile:

Be interesting how the rest of the scripts/stats/database handle it... not a big issue, we'll script around the 'kiddies' :big grin:

A quick search for Boinc and Bismark show someone on Team China...
and it shows a Q6600 computer using Vista64...
[URL]http://boinc.bio.wzw.tum.de/boincsimap/show_user.php?userid=25765[/URL]

Don't know if it's the same person or not however... and to be fair, on another Boinc site, searching for Bismark shows the following

[code]
User Name Country Team
Bismark Ukraine Ukraine
Bismark Australia Australia
BISMARK Spain Spain
bismark International International
bismark Canada Canada
[/code]

Max is the only person coming in now, so will keep an eye out...[/quote]


BTW, the userid is Bismarck (with a "c"), not Bismark. Have you tried a search for Bismarck? That might narrow it down.

MyDogBuster 2009-02-23 05:31

[quote]Have you tried that command that you always do that seems to work when my IP address changes?

[/quote]

The command just now worked. I always try that first. Something interesting, I do not have to do the flushdns on 32 bit machines, just 64 bitters. Seems the 32 bitters just take off when the problem is resolved. Another hidden asset in Willies software. What a guy. LOL

BTW, unplugging the router and replugging it would fix a service problem. Your problem could have been between the router and the cable modem.

IronBits 2009-02-23 05:50

Ahhhh, my BAD
Further research on Bismarck points to that one individual on Team China...
[url]http://stats.kwsn.net/user.php?proj=all&cpid=c5fed7f1fbcec10ca395d4d698353e4f[/url]
No harm, no foul
Hope he chimes in he wants to create a Team China in the database and then wants to join it :smile:

gd_barnes 2009-02-23 07:33

I can't get my clients to consistently connect to port GB8000. I've sent a PM to Max to take a look at the server. I tried stopping and restarting both the server and the clients with no luck. Sometimes a couple of them will run for a little while but then they will stop again. There appears to be no "flushdns" command for Linux. Does anyone have any thoughts along those lines?

If you're wanting to run the 6k drive on this port, you might wait until late Monday or Max gives the word that the server is working OK.


Gary

IronBits 2009-02-23 07:44

Because the Server is on your internal LAN, you should be able to just use the internal network IP address in your configs. I do here :wink:
Let's say you have
192.168.0.1 as your gateway
192.168.0.2 as the ip on your server running the llrnet servers

edit your config file and use server = 192.168.0.2

Your server should be using a fixed ip address within your LAN anyways...

mdettweiler 2009-02-23 15:46

Hi all,

It turns out the G8000 server had crashed and needed to be restarted. However, it seems that the crashed copy of the server somehow refused to release its socket. netstat -a shows that port 8000 is bound for listening, yet I saw the crashed server terminal window with my own eyes and exited it. :huh:

David, do you have any idea what's happening? Usually when I restart a server the "cannot bind to port" error clears after a few minutes (possibly due to exactly when users are hitting the port, as you'd previously theorized), but this time it looks like something a little "bigger".

Anyway, as far as everyone else is concerned, the upshot of this is that I couldn't restart the server just yet. Stay tuned, though, as I'll try periodically. :smile:

Max :smile:

mdettweiler 2009-02-23 15:49

[quote=IronBits;163646]Because the Server is on your internal LAN, you should be able to just use the internal network IP address in your configs. I do here :wink:
Let's say you have
192.168.0.1 as your gateway
192.168.0.2 as the ip on your server running the llrnet servers

edit your config file and use server = 192.168.0.2

Your server should be using a fixed ip address within your LAN anyways...[/quote]
In fact, we *do* use fixed IPs for all of Gary's dedicated crunch machines, one of which is running all the servers. :smile:

Gary, as David said--whenever you have a client within your network running one of the nplb-gb1.no-ip.org servers, you can configure the client to say 192.168.2.100 instead. That's the *internal* IP address of the server and it will be a much more direct connection that will never cut off when any No-IP things have to change. :smile: (Note that you can't do this on your laptop, since then you wouldn't be able to connect to the server when away from your home network.)

IronBits 2009-02-23 19:26

You have to reboot the server if the port has hung after a crash.
You might want to try using an Intel Server NIC and disable the onboard motherboard NICs.
Also, if you have two onboard NICs, use the one and see if it behaves better.

mdettweiler 2009-02-23 19:32

[quote=IronBits;163707]You have to reboot the server if the port has hung after a crash.
You might want to try using an Intel Server NIC and disable the onboard motherboard NICs.
Also, if you have two onboard NICs, use the one and see if it behaves better.[/quote]
The only onboard NIC is the one on the motherboard, so not much of a choice there. Also, even if we did have more than one NIC installed I wouldn't be able to mess with any of that stuff from here, since any changes to NIC configuration would potentially mess up the SSH link that forms my only connection to that machine. :wink:

Anyway, I'll see if I can find out which prime-search appliations Gary's got running on that machine right now and, if they're easy enough to re-start up from here, try rebooting it remotely. :smile: (Don't worry Gary, I won't even try it unless I'm reasonably sure that I can get everything up and running again so that you'll hardly notice anything happened. :smile;)

mdettweiler 2009-02-23 19:37

[quote=mdettweiler;163708]The only onboard NIC is the one on the motherboard, so not much of a choice there. Also, even if we did have more than one NIC installed I wouldn't be able to mess with any of that stuff from here, since any changes to NIC configuration would potentially mess up the SSH link that forms my only connection to that machine. :wink:

Anyway, I'll see if I can find out which prime-search appliations Gary's got running on that machine right now and, if they're easy enough to re-start up from here, try rebooting it remotely. :smile: (Don't worry Gary, I won't even try it unless I'm reasonably sure that I can get everything up and running again so that you'll hardly notice anything happened. :smile;)[/quote]
OH MAN! I just figured out what happened. :rolleyes: When I tried logging on to the server's "terminal session" (as opposed to the session it gives you when you log on through VNC), lo and behold--there was a copy of the G8000 LLRnet server that had been running all along! No wonder port 8000 was already tied up by something--it was tied up by the G8000 LLRnet server which Gary had already restarted! :smile:

Gary, in the future, when you restart servers on crunchford could you possibly log into the machine via VNC, and then restart the server from there? Otherwise I can't see the restarted server's terminal window, and I think it's not even running. Thus I kept trying to restart it over on the VNC session, not realizing that it was already up and running on the console session and that that was what was hogging the port. (That would also explain why results have kept coming in for that port even after I thought the server went down. :wink:)

Okay, long story short: turns out the crashed server hadn't hung on the port after all. A new server was running the whole time! :smile:

I'll go and get the server moved over to the VNC session now so I can see it better, and get it into the while loop in case of crashes. :smile:

gd_barnes 2009-02-23 23:43

[quote=mdettweiler;163711]OH MAN! I just figured out what happened. :rolleyes: When I tried logging on to the server's "terminal session" (as opposed to the session it gives you when you log on through VNC), lo and behold--there was a copy of the G8000 LLRnet server that had been running all along! No wonder port 8000 was already tied up by something--it was tied up by the G8000 LLRnet server which Gary had already restarted! :smile:

Gary, in the future, when you restart servers on crunchford could you possibly log into the machine via VNC, and then restart the server from there? Otherwise I can't see the restarted server's terminal window, and I think it's not even running. Thus I kept trying to restart it over on the VNC session, not realizing that it was already up and running on the console session and that that was what was hogging the port. (That would also explain why results have kept coming in for that port even after I thought the server went down. :wink:)

Okay, long story short: turns out the crashed server hadn't hung on the port after all. A new server was running the whole time! :smile:

I'll go and get the server moved over to the VNC session now so I can see it better, and get it into the while loop in case of crashes. :smile:[/quote]


OK, sorry about that. But after I stopped and restarted the server, only one of my clients would run it. I even waited 10-15 mins. one time after stopping the server and then restarted it. The other clients still hung and were sleeping, even after multiple attempts to stop and restart each one of them after stopping and restarting the server. I'll go downstairs and see if they are running now.

You were seeing results because I had ONE client running against the server but the other 6 that I was running against it wouldn't connect.

What's the deal with the VNC thing anyway? I've noticed the same thing that you have. If you have a terminal window up from directly messing with the machine, it won't show in VNC. When you have a terminal window up in VNC, it won't show when you are directly messing with the machine. That seems like a bug to me. It doesn't make sense because I pulled up the task manager (Linux version) and verified exactly what was running on the machine. I then killed the server, waited an appropriate amount of time and restarted it. I tried this twice to unhang my clients. Shouldn't that have killed any terminal window in VNC or non-VNC?

Is there a way around this confusing VNC (remote access) vs. non-VNC problem? English laymen's terms please.

Also, why do you have to keep running this "while" loop? Is that because of my constant IP address changes? If so, why didn't it work this time? It was well over an hour after the crash or IP address change before I tried starting and stopping it.

Gawd, this server stuff is confusing. My question is: Why did port 8000 have the problem and port 4000 didn't? Also, why should it be so difficult to kill the server from the task manager, wait 10-15 mins., and then restart it? These servers shouldn't be rocket science but they are from my perspective.

I think in the future, I'm not going to do any kind of attempt at stopping and restarting of the servers. I just end up creating more problems than there were originally. If there's a problem, I'll just move my machines to something else and you can fix them the next day.

Personally, I think having the servers on my machines has turned out to be a bad idea. My internet connection is quite stable but keeps changing addresses. I've had a mobo crash one time but I think I've gotten that problem resolved. Knock on wood. Ian has been very patient with port GB4000 and diligently does the flushdns thingy when the IP address changes. Others, I'm sure, won't be so patient.


Gary

gd_barnes 2009-02-23 23:59

Port 3000 has now dried out and the range is complete. It has been removed from the 1st post of this thread.


Gary

IronBits 2009-02-24 00:34

ps aux --forest is your friend. :wink:

AMDave 2009-02-24 02:06

ps -ef | grep llr :razz:

gd_barnes 2009-02-24 02:27

Are you guys answering anything in my post because I can't tell?

As stated above:

[quote]
English laymen's terms please.
[/quote]


:smile: Gary

gd_barnes 2009-02-24 03:02

[quote=mdettweiler;163687]In fact, we *do* use fixed IPs for all of Gary's dedicated crunch machines, one of which is running all the servers. :smile:

Gary, as David said--whenever you have a client within your network running one of the nplb-gb1.no-ip.org servers, you can configure the client to say 192.168.2.100 instead. That's the *internal* IP address of the server and it will be a much more direct connection that will never cut off when any No-IP things have to change. :smile: (Note that you can't do this on your laptop, since then you wouldn't be able to connect to the server when away from your home network.)[/quote]

OK, thanks guys. I did that. Hopefully that will resolve the reconnect problem when my public IP address changes.

I'm still wondering about some of the issues that came up per my last post.

Everything seems to be working fine now. Thanks for the help.


Gary

gd_barnes 2009-02-24 03:19

About 15 mins. ago, I just got inundated with 400+ Emails of old primes found. Can you guys look into that quickly? Thanks.

Lennart 2009-02-24 03:25

SPAM :)
 
What the f***k are you doing ?????????:shock:


Creating ads ? :big grin:

/Lennart

Brucifer 2009-02-24 03:40

You guys been hacked or something? I just got 31 notifications for primes found under ports that haven't been run for ages, and aren't running now, plus others that are but I don't have systems crunching on those ports. ???????????????????????????????????

SUM TING WONG

PCZ 2009-02-24 03:48

Holy Spam Batman !!!

AMDave 2009-02-24 03:58

No no no.
Everything is ok.

It was me.

Whoo.
When I fail, I fail spectacularly.

I forgot to update the mail_sent flag on the prime_list table when I re-activated th mail_notification script on the new server.

I completely missed it.
It was not on my checklist.

409 emails went out - thats the difference between the snapshot I took when I started the database migration and the current status of the old database.


I do apologise to Gary, and Max and everyone who just got their notifications again.

Please delete all of the emails from nplb_stats recieved in the last hour.
The table is upto date and there are no more coming.

:sorry:

gd_barnes 2009-02-24 04:38

No problem Dave. Thanks for letting us know. I figured you were just doing some testing. I deleted all of mine.

gd_barnes 2009-02-24 04:50

I got a PM from Ian that port G4000 is down. I just checked. Actually, port G4000 is officially "quasi" down. (lol) By that I mean, I can connect to it just fine using my direct IP address of 192.168.2.100 but I absolutely cannot connect at all using the no-IP address site. The same applies to port G8000.

A question that I'll ask to the server guys:
Could Ian connect using the [ server = "192.168.2.100" ] command in the same manner that I can using that internal IP address? It sure would be nice to bypass the continuous problems thru the no-IP site that come up from my constant public IP address changes.

Max, I'm beginning to think that having servers on my machines isn't going to work. Ian has been more patient with G4000 than anyone else will probably be. He also has the time to check his machines several times a day, which others may not have the time to do.

I suppose the alternative may be to pay for a "fixed" public IP address. I run through $150 in electricity a month for this hobby so I suppose paying another $5-15 a month for a fixed IP address shouldn't be a big deal. It's just very irritating. Even though it's a small amount of money, I'll feel like I'm being ripped off. Actually, I'd just as soon pay a "one time" fee than to have to mess with paying something monthly as long as the fee is < ~$150-200.

Can anyone give me some "laymen's terms" info. on buying a fixed public IP address?

Ian, in the mean time, I'm going to suggest moving your machines to David's port 5000. There is very little activity there. The current n-range is n=~621K I think. I'm sorry you've had such problems with it. Thanks for being more than patient with it.


Gary

MyDogBuster 2009-02-24 05:10

[QUOTE]Ian, in the mean time, I'm going to suggest moving your machines to David's port 5000. There is very little activity there. The current n-range is n=~621K I think. I'm sorry you've had such problems with it. Thanks for being more than patient with it.[/QUOTE]

Don't worry about me. I'll slowly switch over time. No big rush.

I tried 192.168.2.100 with no luck.

Something is tripping the flip switch. Something has changed. Just so you know all the parameters, on the last 2 occurances, I've had to flushdns AND restart the LLRNET client. Never had to restart the clients before. hmmmmmmmmm

gd_barnes 2009-02-24 05:40

[quote=MyDogBuster;163782]Don't worry about me. I'll slowly switch over time. No big rush.

I tried 192.168.2.100 with no luck.

Something is tripping the flip switch. Something has changed. Just so you know all the parameters, on the last 2 occurances, I've had to flushdns AND restart the LLRNET client. Never had to restart the clients before. hmmmmmmmmm[/quote]


Well, late last night and early this morning, I "messed" with port G8000 by trying to stop and restart it. That may have made things "extra bad". :smile: Although the flushdns thing will likely be needed in the future, as long as yours truly doesn't try to mess with the servers, hopefully that is ALL that will be needed and you won't need to stop and restart the clients.

Also, perhaps unplugging and replugging the wireless router, which is something that I have to do rarely "flipped" something last night.

I'd like to flip the finger to the servers. lol


Gary


All times are UTC. The time now is 06:02.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.