mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Conjectures 'R Us

Reply
 
Thread Tools
Old 2009-09-02, 12:21   #144
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

11000110100002 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
We found that on the Sierp. base 6 drive, we got loads and loads of barfs. For big stuff like base 22 it seems to be keeping the barfs to a minimal level, but base 6 is nonetheless a bit hard unless it's just a personal server (as we recommended for people with large #'s of cores in the first post of this thread).

Fortunately, for now PRPnet's barfs seem to be partially fixed to the point where they don't actually contaminate the results at all (which happened with some of the earlier barfs); though nonetheless, they can be rather annoying to fix. For instance, when I was processing some Sierp. base 6 results from a largish chunk of work Lennart did a few days ago, I came across a whole pile of barfs that took me about 6 hours to re-do. And that was only for a personal server.
I do have the one change to handle the buffering of messages. I know addressing that will solve some of the problems, but the others I don't have enough information on, i.e. early expiration and hanging. I'm hoping that you will eventually trigger one of those so that I can fix them.
rogue is offline   Reply With Quote
Old 2009-09-02, 14:25   #145
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
Max, when the tests get extremely long, say at n>500K, would trying PRPnet for a smallish range such at n=5K or 10K make sense? I think Lennart is essentially doing that right now for his personal ranges on this drive.
Yes, at that point it should definitely be OK. In fact, come to think of it, as the tests are already getting rather large for this drive, we could probably start using PRPnet right now with no more ill effects than the occasional barfs we get from Lennart's ranges. The manual ranges have been quite popular, but if nobody has any objections to using PRPnet instead, then we could possibly take a whack at it. As I said in my last note, the problems seem to be fixed to the point where they won't contaminate the results any more, so the worst that could happen is a few tests that would need to be re-done.
Quote:
Originally Posted by rogue View Post
I do have the one change to handle the buffering of messages. I know addressing that will solve some of the problems, but the others I don't have enough information on, i.e. early expiration and hanging. I'm hoping that you will eventually trigger one of those so that I can fix them.
Are those changes in 2.3.0? I haven't gotten around to trying that version yet.
mdettweiler is offline   Reply With Quote
Old 2009-09-02, 15:17   #146
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24·397 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Are those changes in 2.3.0? I haven't gotten around to trying that version yet.
No. The buffering of messages needs to wait for the next release.

For those of you not in the "the know". When I refer to "buffering" I mean that the client will sent all test results before waiting for the server to give positive acknowledgement for any of them. This way the client should not get out of sync with the server. The client will also send all info for each test in a single message rather than in multiple parts. This should help avoid issues where the server gets part of the info for a test, but not all of it.

Note that these issues are fairly rare. I have one clear example of the "out of sync" issue from Max, but have had no reports from PrimeGrid or anyone else regarding it. I can only presume that server load is somehow a piece of the puzzle behind the issue.
rogue is offline   Reply With Quote
Old 2009-09-02, 15:52   #147
Lennart
 
Lennart's Avatar
 
"Lennart"
Jun 2007

25·5·7 Posts
Default

Quote:
Originally Posted by rogue View Post
No. The buffering of messages needs to wait for the next release.

For those of you not in the "the know". When I refer to "buffering" I mean that the client will sent all test results before waiting for the server to give positive acknowledgement for any of them. This way the client should not get out of sync with the server. The client will also send all info for each test in a single message rather than in multiple parts. This should help avoid issues where the server gets part of the info for a test, but not all of it.

Note that these issues are fairly rare. I have one clear example of the "out of sync" issue from Max, but have had no reports from PrimeGrid or anyone else regarding it. I can only presume that server load is somehow a piece of the puzzle behind the issue.


The problem occurs if you have low n , many clients. ie. many hits/min
if you have 5 or more in cache on client the problem comes more often.

You need to have one core only to give the servers some CPU power.

I have tested on a quad to run 4 servers and 4 clients, and 4 servers and 3 clients. It works much better when running only 3 clients.

I have also tested to run 5 10 and 20 quad on a k*2^n-1 there n=250k

Clients set to 100:3:

It works up to 10 when i use 20 i have the problem with missing residue.

This is done on server & client 2.3.0

If i increase to 100:10: i get more problems.

Lennart
Lennart is offline   Reply With Quote
Old 2009-09-02, 16:30   #148
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by Lennart View Post
The problem occurs if you have low n , many clients. ie. many hits/min
if you have 5 or more in cache on client the problem comes more often.

You need to have one core only to give the servers some CPU power.

I have tested on a quad to run 4 servers and 4 clients, and 4 servers and 3 clients. It works much better when running only 3 clients.

I have also tested to run 5 10 and 20 quad on a k*2^n-1 there n=250k

Clients set to 100:3:

It works up to 10 when i use 20 i have the problem with missing residue.

This is done on server & client 2.3.0

If i increase to 100:10: i get more problems.

Lennart
Hmm....that's interesting. I would think that having bigger cache sizes would decrease the load on the server, and lessen the barfing rather than increase it. But, then again, with bigger cache sizes, that makes for longer communications with the server, which gives a bigger window of time for the barfs to strike, which makes sense.

BTW Lennart, what size caches and how many cores were you using on your internal server for the Sierp. base 6 range 166K-200K? When I processed that range a couple of days ago I ran into about 30 blank residues, spread evenly throughout the file (thus the whole range seems to have been affected equally). However, if memory serves, I haven't encountered any blank residues at all in any of your other recent ranges either on the Riesel or Sierp. sides, so it may be something specific to how you had things set up for 166K-200K.
mdettweiler is offline   Reply With Quote
Old 2009-09-02, 18:57   #149
Lennart
 
Lennart's Avatar
 
"Lennart"
Jun 2007

25·5·7 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Hmm....that's interesting. I would think that having bigger cache sizes would decrease the load on the server, and lessen the barfing rather than increase it. But, then again, with bigger cache sizes, that makes for longer communications with the server, which gives a bigger window of time for the barfs to strike, which makes sense.

BTW Lennart, what size caches and how many cores were you using on your internal server for the Sierp. base 6 range 166K-200K? When I processed that range a couple of days ago I ran into about 30 blank residues, spread evenly throughout the file (thus the whole range seems to have been affected equally). However, if memory serves, I haven't encountered any blank residues at all in any of your other recent ranges either on the Riesel or Sierp. sides, so it may be something specific to how you had things set up for 166K-200K.
I had 18 quad on it and it was to many.

The problem is if you have bigger cache and you have more then one client sending results you can lose many at once.

Lennart
Lennart is offline   Reply With Quote
Old 2009-09-02, 20:15   #150
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24·397 Posts
Default

Quote:
Originally Posted by Lennart View Post
I had 18 quad on it and it was to many.

The problem is if you have bigger cache and you have more then one client sending results you can lose many at once.

Lennart
I will change the next version of the server to not run in IDLE mode. That way it won't have to compete with the client if the client is running on the same box. The server takes so little CPU, I don't expect this to be a problem. Does anyone think differently?
rogue is offline   Reply With Quote
Old 2009-09-02, 20:24   #151
Lennart
 
Lennart's Avatar
 
"Lennart"
Jun 2007

25·5·7 Posts
Default

Quote:
Originally Posted by rogue View Post
I will change the next version of the server to not run in IDLE mode. That way it won't have to compete with the client if the client is running on the same box. The server takes so little CPU, I don't expect this to be a problem. Does anyone think differently?
I think that sounds ok.
Lennart
Lennart is offline   Reply With Quote
Old 2009-09-03, 01:10   #152
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Quote:
Originally Posted by Lennart View Post
I think that sounds ok.
Lennart
Ditto here.
mdettweiler is offline   Reply With Quote
Old 2009-09-03, 03:01   #153
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24×397 Posts
Default

I finally found the other issue with the blank residues. It will be fixed in the next release. I have also fixed a few other things that I discovered along the way. I'll explain when I release the code.

I am hard at work testing the release (I've had some time today since I worked on this rather than watching Man vs. Food).
rogue is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
PRPNet server for personal use johnadam74 Software 2 2016-01-01 15:58
New SR5 PRPnet server online ltd Sierpinski/Riesel Base 5 15 2013-03-19 18:03
First PSP PRPnet 4.0.6 server online ltd Prime Sierpinski Project 9 2011-03-15 04:58
PRPnet 3.1.3 stress-test server mdettweiler No Prime Left Behind 40 2010-01-30 18:05
First pass PRPNet server out of work? opyrt Prime Sierpinski Project 6 2009-09-24 18:14

All times are UTC. The time now is 09:44.


Tue Jul 27 09:44:28 UTC 2021 up 4 days, 4:13, 0 users, load averages: 2.38, 2.11, 1.95

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.