mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > No Prime Left Behind

Reply
 
Thread Tools
Old 2010-05-04, 06:38   #89
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
One more thing that you might test: I seem to remember that the workfile.res file is not deleted on the Windows client either whenever do -c is executed but there are no completed pairs in lresults.txt. That might have been fixed with the latest Windows client but I'm not sure. Can you check that?
I noticed this behavior just recently with the latest version of do.bat, so it is present in that version.

Last fiddled with by mdettweiler on 2010-05-04 at 06:38
mdettweiler is offline   Reply With Quote
Old 2010-05-04, 06:42   #90
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
OK, I still need a clarfication because I remember reading that before and still couldn't quite get it. If I'm merrily running along with stopoption=2 and then stop the client with CTL-C, obviously I know that it returns all completed results. What I don't know is if it properly cancels all remaining inprocess and unprocess pairs and lets the server know that they are no longer reserved.

To make the question clearer: If I'm running with stopoption=2 and then hit CTL-C, do I then still need to do a final run with STARToption=1 to return the inprocess/untested pair(s) to the server so that they can be immediately handed back out for testing? If that is the case, then that is far from clear by reading the above.

To make it clearer: If startoption (or stopoption) 1 returns incompleted pairs to the server -or- if it somehow "tells" the server that the pairs are no longer reserved, then "abandon the rest" is very misleading. That needs to be reworded to show something like "cancel the rest" or "release the rest". The word "abandon" is effectively what people do when they don't do the "do -c" command on the LLRnet client. Those pairs are abandoned and so the server has to wait 2 days to hand them back out again. We do not want to "abandon" them, we want to return them to the server or tell the server that they are no longer reserved.


Gary
If you're running stopoption=2 and Ctrl-C the client, it will return anything that's completed, and leave any incomplete tests in the queue. You would then need to do another run with startoption=1 to clean those out.

This is essentially just like how do.bat/pl behave; in fact with stopoption=3 they behave exactly the same. stopoption=2 only differs in that anything that's already done when you Ctrl-C will be returned to the server before shutdown; nothing else is touched.

Agreed that the "abandon" wording is a bit confusing. Mark, if you're reading this, can you change this to say "cancel" instead in the next version? That will hopefully be a little less confusing since "abandon" implies that they're just being dropped without any word to the server.
mdettweiler is offline   Reply With Quote
Old 2010-05-04, 06:44   #91
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

2·52·7·29 Posts
Default

Very good. Thanks for the clarification. I'm glad to know that it's no more steps to return everything than with LLRnet.
gd_barnes is offline   Reply With Quote
Old 2010-05-04, 09:06   #92
kar_bon
 
kar_bon's Avatar
 
Mar 2006
Germany

1011000010002 Posts
Default

Quote:
Originally Posted by mdettweiler
Quote:
Originally Posted by gd_barnes View Post
One more thing that you might test: I seem to remember that the workfile.res file is not deleted on the Windows client either whenever do -c is executed but there are no completed pairs in lresults.txt. That might have been fixed with the latest Windows client but I'm not sure. Can you check that?
I noticed this behavior just recently with the latest version of do.bat, so it is present in that version.
Yes, I noticed this, too. I've fixed this with the '-c' option and it's available with all other fixes the next days!
kar_bon is offline   Reply With Quote
Old 2010-05-10, 01:01   #93
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

With how port 3000 ran out of pairs and went idle for a couple of hours just now, it got me thinking about a possible feature we may want to consider adding to a future version of do.bat/pl: backup servers. One very handy feature of PRPnet is that if a server goes down and looks like it's going to stay down (i.e., two or three successive connection attempts have failed), it can automatically fall back to an alternate server until the main one is back up--ensuring that the client never is without work. I wonder how hard it would be to implement this in do.bat/pl? That way, for instance, somebody could configure port 3000 as a primary server, but if it runs out of work, it could get work from port 6000 until 3000 is refilled.
mdettweiler is offline   Reply With Quote
Old 2010-05-10, 06:10   #94
kar_bon
 
kar_bon's Avatar
 
Mar 2006
Germany

1011000010002 Posts
Default

This is sure possible and could be a good way to avoid idle clients then.

In the batch-version I could set a second server as a new parameter, writing this to llr-clientconfig.txt after timeout of connecting to current server and call llrnet again with altered config-file.
Another option: make a second llr-clientconfig.txt with settings for another server and rename this to the current config-file and calling llrnet.
But if the second server don't respond as well, there must be a termination of that, too.

PS: I can implement this in the next update, but have to test some other things, too, so could take some days until ready.

Last fiddled with by kar_bon on 2010-05-10 at 06:14
kar_bon is offline   Reply With Quote
Old 2010-05-10, 06:14   #95
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

2×52×7×29 Posts
Default

Quote:
Originally Posted by kar_bon View Post
This is sure possible and could be a good way to avoid idle clients then.

In the batch-version I could set a second server as a new parameter, writing this to llr-clientconfig.txt after timeout of connecting to current server and call llrnet again with altered config-file.
Another option: make a second llr-clientconfig.txt with settings for another server and rename this to the current config-file and calling llrnet.
But if the second server don't respond as well, there must be a termination of that, too.
This will be tricky and will have to be thought out. How to handle it if the original server comes back online?

It's almost as if, with each attempt to get new pairs, you have to check the original server, and after whatever number of tries, you then go to the backup server. This sounds like what PRPnet does and if so, would be the way to go.

As for what to do if the second server is unavailable...if you are going to do this, I would make it keep going back and forth between the 2 servers. It could get kind of hairy trying more than 2 servers. That would be like an "initial" release of such a thing. Later, you could do an updated release for more than 2 servers.

BTW, before doing these, please make sure the previous fixes are in place, tested, and complete so that I can incorporate them in the Linux client. In other words...fixes before enhancements.

Last fiddled with by gd_barnes on 2010-05-10 at 06:16
gd_barnes is offline   Reply With Quote
Old 2010-05-10, 06:20   #96
kar_bon
 
kar_bon's Avatar
 
Mar 2006
Germany

23×353 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
This will be tricky and will have to be thought out. How to handle it if the original server comes back online?

It's almost as if, with each attempt to get new pairs, you have to check the original server, and after whatever number of tries, you then go to the backup server. This sounds like what PRPnet does and if so, would be the way to go.
I think we should not make it so complex: this feature would only be used if 'the' server if offline/out of pairs, so the client don't go idle. I think the user should got an eye on his clients and could switch back to the old server when available. I could also write a small script for many clients at once to cancel pairs/submit results to second server and continue with original server.
kar_bon is offline   Reply With Quote
Old 2010-05-10, 12:59   #97
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
This will be tricky and will have to be thought out. How to handle it if the original server comes back online?

It's almost as if, with each attempt to get new pairs, you have to check the original server, and after whatever number of tries, you then go to the backup server. This sounds like what PRPnet does and if so, would be the way to go.
Exactly--that's what PRPnet does and what I was pretty much thinking for this.

Quote:
As for what to do if the second server is unavailable...if you are going to do this, I would make it keep going back and forth between the 2 servers. It could get kind of hairy trying more than 2 servers. That would be like an "initial" release of such a thing. Later, you could do an updated release for more than 2 servers.
Yeah, I see what you mean. Essentially what PRPnet does is to try the first server twice, then if that doesn't work try the next one, etc. on down the list. When it runs out of servers, it sleeps 60 seconds (or whatever you've specified in the prpclient.ini file) and tries again from the top. It can sound a tad messy, but in the end it turns out to work pretty well.

I can think of one thing, though, that might make it a little trickier for LLRnet: whereas PRPnet keeps its save files separate by server (for instance, work_G9000.save), LLRnet just uses one file (workfile.txt). That could lead to some potential confusion and mixed-up work unless the implementation is airtight on the backend; one possibility would be to change LLRnet to append the port # to the workfile.txt name a la PRPnet, though I imagine that would be tough to implement.

Quote:
BTW, before doing these, please make sure the previous fixes are in place, tested, and complete so that I can incorporate them in the Linux client. In other words...fixes before enhancements.
Indeed, most definitely. I primarily suggested this is a possibility to consider for the future; it may not turn out to be feasible due to the complexity of changing the existing LLRnet code (after all, it took at least a couple major PRPnet releases to initially get the multiple-servers things working in an airtight fashion, as I recall).
mdettweiler is offline   Reply With Quote
Old 2010-05-10, 21:05   #98
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

2·52·7·29 Posts
Default

Max brought up some additional points that will make such a thing tough to both test and implement.

Karsten, I would suggest putting this option as low priority for now.

I think you have some other enhancements that you were working on that I believe would be better to work on first before attempting this.

Of course those few fixes mentioned previously would need to come first before anything.
gd_barnes is offline   Reply With Quote
Old 2010-05-10, 21:09   #99
kar_bon
 
kar_bon's Avatar
 
Mar 2006
Germany

23·353 Posts
Default

I have to do some additional tests for those special cases again and it takes me more time I thought, so the next update is out perhaps on weekend.
kar_bon is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
LLRNET ValerieVonck Software 12 2010-03-15 18:09
llrnet 64 bit balachmar Prime Sierpinski Project 4 2008-07-19 08:21
LLRNet em99010pepe Riesel Prime Search 20 2007-09-11 21:03
Bush Supports $120 Billion Iraq War Compromise ewmayer Soap Box 23 2007-05-27 12:37
LLRnet over proxy? Bananeweizen Sierpinski/Riesel Base 5 4 2006-10-14 07:51

All times are UTC. The time now is 00:24.

Fri Jul 10 00:24:52 UTC 2020 up 106 days, 21:57, 0 users, load averages: 2.21, 1.73, 1.71

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.