mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > No Prime Left Behind

Reply
 
Thread Tools
Old 2009-12-08, 14:13   #45
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

23·52·29 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
From what I can tell, it's based off of how many clients concurrently hit the server and that is regardless of the n-value size. It one person starts 100 cores all at once at n=40M, the server may still barf, even though the tests take 20 days or more. We just have to ask ourselves: What is an acceptable level of risk. If we keep risking these problems, especially on an increasingly higher percentage of our servers, our higher-resourced folks will find testing to do elsewhere.
Gary, I would prefer that you avoid the use of that term because it is too generic.

There is one issue with the 2.4.6 server that Max has a patch for and has implemented. I have not released 2.4.7 yet. There are definitely no barfing issues of any kind in 2.4.6.

I did not write PRPNet to be multi-threaded or to compete directly with BOINC. It was intended for smaller projects. I would argue that having hundreds of clients is really pushing the limit of what PRPNet was intended for.

The issue here is that since PRPNet is not multi-threaded that some clients need to wait to send/receive work from/to the server. The server should be able to handle up to about 10 clients connecting each minute. When you have a client connecting every 5 minutes to send/receive work and then add hundreds of clients doing the same thing every few minutes, that is problematic. By telling users to grab enough work to keep busy for a longer period of time more greatly reduces the load on the server.

I don't intend to multi-thread PRPNet until there is a database behind it. That requires a lot of work and considering that I am doing the development on my own in my limited free time, it is unlikely to happen anytime soon.
rogue is offline   Reply With Quote
Old 2009-12-08, 18:18   #46
Brucifer
 
Brucifer's Avatar
 
Dec 2005

313 Posts
Talking

Quote:
Originally Posted by rogue View Post
I did not write PRPNet to be multi-threaded or to compete directly with BOINC. It was intended for smaller projects. I would argue that having hundreds of clients is really pushing the limit of what PRPNet was intended for.
Before things get blasted into the stratosphere........ :) It would appear that there are multiple issues here that multiple people are looking at. The first thing though to keep in mind is the comment above about PRPNet being intended for smaller projects. I can understand that, and as for the issue of having hundreds of clients pushing the limits, I have seen that first hand with the very short tests, and have to agree with Rogue on that too.

So basically what we have here is a couple opposing thought lines. On the one hand is a project that likes to run rallies with a wad of clients hitting a server to try and get the maximum tests completed by a bunch of people in a short specified amount of time. That right there in itself is contrary to the author's intended purpose of writing the application as that specifically was not what he wrote it for. So based on that it would be reasonable to assume that we shouldn't be pushing this application for a purpose that it wasn't intended for. And what that really boils down to is for something like the Double Check work that prpnet is currently processing and working just fine on, then it's great for that purpose. For handling very high volume situations, then things are much better handled by llrnet which has been working better with large traffic. However, lets not forget that llrnet has it's issues too, they are just different issues. The initial hassle on llrnet on port 7000 went nuts for some reason that really hasn't been explained well yet. The port blew up before Gary got to issue of the infamous "UPS Test" scenario.

I have had zero issues with prpnet on the doublecheck port, or on the port 3000 work. So again, everything points back to the short tests. While we were hammering G7000, Lennart had better than a hundred cores hitting it at the time I was running around 30 cores. On a Q6600 each core is doing multiple tests per minute. That is just a horrendous hit on the server, not to forget that we also have added latency from standard internet hassles and all thrown into the pot too. I would dare say that prpnet would work just fine on the short tests if there was only a small number of clients hitting it at the same frequency rate as what hits Ports 3000 and 2000.

So the long and short of this is that there is a time and place for everything...... and it is wise to keep that in mind even when dealing with llr testing LOL Heavy hitters should really be running their own servers as it takes much much stress of the project's servers, it also significantly reduces the bandwidth drain across the board on the internet, and places it on an internal network which is much better at handling the load. The admin can watch his own stuff, and if he(she) needs to add more servers to the mix it is a simple thing to do.

Personally I think that all the port 7000 type work should just be put out for manual reservations. Not many people run that stuff anyway as most people are intent on hitting the top-5000 stuff. With the port 7000 stuff out of the way, then things would and should get back to their peaceful normal status quo.

Max is a geek and loves playing with this stuff. He has two apps to use now, llrnet and prpnet. Gives him lots of geeking time. Two apps gives boys like AMDave/Bok more little projects to write their stats stuff for. Put up both llrnet and prpnet servets so people can use whatever turns their little old hearts on. Then everyone will have their own little space in the playground here to play in..... and everyone can do what they enjoy doing as that's what we are here for anyway. :-) And through it all, we will still be finding primes which is what Gary (and tribe) was intending when creating this project.

As for Max, he's been working away hard at trying to keep us all happy, and your efforts don't go unnoticed Max. As for Rogue, I hope you don't get dismayed at all this controversy over Port 7000 tests. It is good you wrote PRPNet. It does what YOU intended for YOUR application to do. More power to you. I think it is cool that you have taken the step to let others use your app because in doing that you open yourself up to stuff like all this controversy over the short tests, etc. I sure hope that this stuff doesn't tick you off and you give up on it cause it's working really well for what you intended.

Meanwhile, if possible just get the rest of the port 7000 pairs ready and what Lennart doesn't want to do, I'll do, and we can get those puppies out of sight and out of mind and out of the way and move on down the road.
Brucifer is offline   Reply With Quote
Old 2009-12-08, 19:22   #47
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

580010 Posts
Default

Quote:
Originally Posted by Brucifer View Post
As for Max, he's been working away hard at trying to keep us all happy, and your efforts don't go unnoticed Max. As for Rogue, I hope you don't get dismayed at all this controversy over Port 7000 tests. It is good you wrote PRPNet. It does what YOU intended for YOUR application to do. More power to you. I think it is cool that you have taken the step to let others use your app because in doing that you open yourself up to stuff like all this controversy over the short tests, etc. I sure hope that this stuff doesn't tick you off and you give up on it cause it's working really well for what you intended..
Thank you!

I'm starting a new thread that will be used to list requirements for PRPNet to replace LLRNet.
rogue is offline   Reply With Quote
Old 2009-12-09, 01:03   #48
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

2×52×7×29 Posts
Default

Let me also thank you, Mark, for your excellent contribution to prime searching. The new PFGW is a tremendous tool now. PRPnet is great for smaller volume operations.

On PRPnet, since it has been used at PrimeGrid, I was wholly under the impression that it was also for high volume use and as a smaller competitor for BOINC. I thought that was the intent of allowing the varying percentages against different servers.

Since it was your intent that PRPnet only be for smaller applications, I agree it is a great tool now. Unfortunately we've had to rein ourselves in a little here. It was our original intention to convert all NPLB servers except one to PRPnet but we've now realized that we can't do that. We have to define what is small and what is big at NPLB. For the time being, we're making n>600K small and n<600K big. That's the best we can do. It's reversed from the n-size because from the server's perspective, smaller tests are bigger because it means more load on them.

Yes, I have been harsh on PRPnet but none of it was ever intended as personal insults directed at you. I'm sorry if you felt that way. I was only under the impression that PRPnet was intended for high volume use and have become extremely frustrated at Max saying that it was "ready for high volume use", yet it wasn't. Max, were you aware that it was not intended for such high use? Somewhere along the line, I think both of us were mislead.

Regardless, I don't know the techie reason of why there are still issues in higher volume situations. Max has said that multi-threading is needed so I have to go with that. Also what I heard earlier is memory utilization, especially in situations where there is an unexpected outage of the server or clients. I know these are huge changes so for the time being, we shall go with the assumption that it can only be used in lower volume situations.


Gary

Last fiddled with by gd_barnes on 2009-12-09 at 01:04
gd_barnes is offline   Reply With Quote
Old 2010-01-22, 20:20   #49
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Hi all,

I have now posted a Windows client package for PRPnet 3.1.3. (The Linux counterpart is still pending an inquiry I sent to Mark via email; it should be available within a day or two.) I recommend that all users upgrade their clients to the latest version; if you're upgrading from anything older than 2.4.4 or so, make sure that you use the new prpclient.ini included with the package as it has new options in it.

Note that as of yet, all of our PRPnet servers are still on version 2.4.6. I will be upgrading those to 3.1.3 over the next few days. In the meantime, the 3.1.3 client is fully backwards compatible with 2.4 servers (and vice versa).

I will be posting a new thread shortly with details about a new PRPnet 3.1.3 server which I've set up and loaded with small 12th Drive doublecheck tests for stress testing purposes. Stay tuned.

Max
mdettweiler is offline   Reply With Quote
Old 2010-01-23, 07:06   #50
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

2×52×7×29 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Hi all,

I have now posted a Windows client package for PRPnet 3.1.3. (The Linux counterpart is still pending an inquiry I sent to Mark via email; it should be available within a day or two.) I recommend that all users upgrade their clients to the latest version; if you're upgrading from anything older than 2.4.4 or so, make sure that you use the new prpclient.ini included with the package as it has new options in it.

Note that as of yet, all of our PRPnet servers are still on version 2.4.6. I will be upgrading those to 3.1.3 over the next few days. In the meantime, the 3.1.3 client is fully backwards compatible with 2.4 servers (and vice versa).

I will be posting a new thread shortly with details about a new PRPnet 3.1.3 server which I've set up and loaded with small 12th Drive doublecheck tests for stress testing purposes. Stay tuned.

Max
I assume we'll have a full stress test successfully completed before doing a wholesale upgrade of NPLB servers. 2.4.6 is running just fine. It can handle many clients on our higher n-ranges with no problem. It's just the lower n-ranges with many clients that it can't handle.


Gary
gd_barnes is offline   Reply With Quote
Old 2010-01-23, 16:11   #51
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

624910 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
I assume we'll have a full stress test successfully completed before doing a wholesale upgrade of NPLB servers. 2.4.6 is running just fine. It can handle many clients on our higher n-ranges with no problem. It's just the lower n-ranges with many clients that it can't handle.


Gary
Right, I'm not sure what I was thinking when I wrote that. The only real problem with 2.4.6 is that the email notification in it is broken, but that's not a big deal since for all the NPLB public servers the DB can handle that. That, and there were a couple extra features (built-in sorting options, etc.) that were added in the also-stable 2.4.7, but I was too lazy to upgrade them since we didn't need any of those features right off the bat. I may as well keep them on 2.4.6 now until 3.1.4 (or 3.1.5 by the time we've nailed down these little issues here?) has passed testing.
mdettweiler is offline   Reply With Quote
Old 2010-01-23, 17:35   #52
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

186916 Posts
Default

Client packages for PRPnet 3.1.4 are now available. This fixes a bug for the Windows version in which the CPU was tied erroneously during server communications, and it is the first 3.1 version posted here for Linux. Come and get 'em!
mdettweiler is offline   Reply With Quote
Old 2010-01-23, 21:52   #53
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

2·52·7·29 Posts
Default

I just now ran a client on 3.1.4 for the first time. A couple of nits:

1. LLR is writing an output status every 10,000 iterations, which causes too much screen scrolling. That can be set in manual LLR to something higher. How can we change that in the client?

2. I see the timing for each candidate in the work_G7465.save file. We need that timing to be on the candidate in the prpclient.log file because the work_G7465.save file is only temporary.

As an example on #1, it is writing this:
2001*2^101941-1, iteration : 20000 / 101941 [19.61%]. Time per iteration : 0.15

I'll put a quad on it for a while. If working well, I'll dogpile most my cores later tonight.

Max, I added a Greeting.txt file so that it stopped writing out those not found messages.


Gary

Last fiddled with by gd_barnes on 2010-01-23 at 21:53
gd_barnes is offline   Reply With Quote
Old 2010-01-23, 22:42   #54
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

11000011010012 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
I just now ran a client on 3.1.4 for the first time. A couple of nits:

1. LLR is writing an output status every 10,000 iterations, which causes too much screen scrolling. That can be set in manual LLR to something higher. How can we change that in the client?
That's a little difficult because PRPnet recreates llr.ini for each candidate. However, LLR is actually designed NOT to scroll, but rather just overwrite the previous line; it's prevented from doing this, though, by the smaller width of the console window pushing it onto the next line. This can be remedied by making the console window wider: on Linux, just stretch it sideways like you'd resize any window; on Windows, click on the little "C:\" logo in the command prompt's titlebar and then click Defaults; set Width to 90, click OK, and you should be all set.

Quote:
2. I see the timing for each candidate in the work_G7465.save file. We need that timing to be on the candidate in the prpclient.log file because the work_G7465.save file is only temporary.
Hmm, right. Mark, could you possibly add that info to prpclient.log in 3.1.5?

Speaking of which, it would be helpful to have that in the server's logs as well. Heck, I'm not even seeing anything about returned tests in prpserver.log--is this an error?

Quote:
As an example on #1, it is writing this:
2001*2^101941-1, iteration : 20000 / 101941 [19.61%]. Time per iteration : 0.15

I'll put a quad on it for a while. If working well, I'll dogpile most my cores later tonight.
I'll forewarn you, this may not be the the definitive dogpile--I don't think this release has even addressed the "no candidates on server" message yet (Mark, correct me if I'm wrong here), so we'll need to re-test after that's fixed.

Quote:
Max, I added a Greeting.txt file so that it stopped writing out those not found messages.
Okay, thanks.
mdettweiler is offline   Reply With Quote
Old 2010-01-24, 01:50   #55
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

2×52×7×29 Posts
Default

OK, I'll do that on the stretching of the window. I suppose it's no different than PFGW now so not a big deal.

Based on what you're saying, I think I'll hold off on the dogpile. I see that Lennart has brought some machines and we're starting to get some "volume related" type of errors. Here is a sampling:

Quote:
[2010-01-24 01:46:49 GMT] Error sending [WorkUnit: 2001*2^148837-1 1264297609 2001 2 148837 -1] to socket 17
[2010-01-24 01:46:49 GMT] ODBC Information: SQL_ERROR: [MySQL][ODBC 3.51 Driver][mysqld-5.0.51a-3ubuntu5.4]Duplicate entry '2001*2^148837-1-1264297609' for key 1
[2010-01-24 01:46:49 GMT] ODBC Information: SQL_ERROR: [MySQL][ODBC 3.51 Driver][mysqld-5.0.51a-3ubuntu5.4]Duplicate entry '2001*2^148837-1-1264297609' for key 1
[2010-01-24 01:46:49 GMT] Error sending [End of Message] to socket 17
[2010-01-24 01:46:49 GMT] ODBC Information: SQL_ERROR: [MySQL][ODBC 3.51 Driver][mysqld-5.0.51a-3ubuntu5.4]Duplicate entry '2001*2^148837-1-1264297609' for key 1
[2010-01-24 01:46:49 GMT] Error sending [Start Greeting] to socket 17
[2010-01-24 01:46:49 GMT] Error sending buffer to socket 17
[2010-01-24 01:46:49 GMT] ODBC Information: SQL_ERROR: [MySQL][ODBC 3.51 Driver][mysqld-5.0.51a-3ubuntu5.4]Duplicate entry '2001*2^148837-1-1264297609' for key 1
[2010-01-24 01:46:49 GMT] ODBC Information: SQL_ERROR: [MySQL][ODBC 3.51 Driver][mysqld-5.0.51a-3ubuntu5.4]Duplicate entry '2001*2^148837-1-1264297609' for key 1
[2010-01-24 01:46:49 GMT] ODBC Information: SQL_ERROR: [MySQL][ODBC 3.51 Driver][mysqld-5.0.51a-3ubuntu5.4]Duplicate entry '2001*2^148837-1-1264297609' for key 1
[2010-01-24 01:46:49 GMT] Error sending [End Greeting] to socket 17
[2010-01-24 01:46:49 GMT] ODBC Information: SQL_ERROR: [MySQL][ODBC 3.51 Driver][mysqld-5.0.51a-3ubuntu5.4]Duplicate entry '2001*2^148837-1-1264297609' for key 1
Looks like that needs to be looked into.


Gary
gd_barnes is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
PSP goes prpnet ltd Prime Sierpinski Project 86 2012-06-06 02:30
Setting up PRPnet Mattyp101 Conjectures 'R Us 2 2011-02-07 13:53
PRPNet 4.0.1 Released Joe O Sierpinski/Riesel Base 5 1 2010-10-22 20:11
PRPNet 3.0.0 Released rogue Conjectures 'R Us 220 2010-10-12 20:48
PRPNet released! rogue Conjectures 'R Us 250 2009-12-27 21:29

All times are UTC. The time now is 01:25.

Fri Jul 10 01:25:36 UTC 2020 up 106 days, 22:58, 0 users, load averages: 1.20, 1.42, 1.40

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.