mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Conjectures 'R Us

Reply
 
Thread Tools
Old 2013-08-05, 06:09   #1
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default PRPnet versions 5.x.x issues

Quote:
Originally Posted by gd_barnes View Post
Mark, here is my opinion regarding using any kind of new software: Because of issues that we've had in the past and the fact that this project much be so exacting unlike other projects where primes are simply "found" and don't affect any future searches like they do at CRUS, it must work "in production" with no bugs for at least 2-3 months before we will consider upgrading it. If someone can tell me the latest version of PRPnet that has run bug free in production for at least 2 months, then I would be fine with upgrading the servers to that version.
I think we'll be safe on this one - the current version, 5.2.x, has been around for about 10 months now (since last November) and has held up extremely well at PrimeGrid. I haven't seen any reports of major issues over there for quite a while now. Looking through the changelog right now, most of the focus since 5.x has been on cleaning up minor usability/performance issues with the existing features. The memory leaks you mentioned to me by email earlier should be fixed (as well as decreased memory usage in general); and 5.0.2 introduced a client/server protocol change that I think should address the issues with occasional dropped results you'd also noticed.

We definitely do need to do this upgrade - the latest 5.2 clients being used over at PrimeGrid aren't fully backwards-compatible with the 4.1.4 version we're using on port 9000 over at NPLB, and 4.3.6 is getting pretty old now as well. There haven't been any major changes to the database schema in the meantime so the upgrades should be fairly low-risk (though of course I'll be sure to take a backup of the tables first).

Last fiddled with by mdettweiler on 2013-08-05 at 06:12
mdettweiler is offline   Reply With Quote
Old 2013-08-05, 08:53   #2
Lennart
 
Lennart's Avatar
 
"Lennart"
Jun 2007

25×5×7 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
I think we'll be safe on this one - the current version, 5.2.x, has been around for about 10 months now (since last November) and has held up extremely well at PrimeGrid. I haven't seen any reports of major issues over there for quite a while now. Looking through the changelog right now, most of the focus since 5.x has been on cleaning up minor usability/performance issues with the existing features. The memory leaks you mentioned to me by email earlier should be fixed (as well as decreased memory usage in general); and 5.0.2 introduced a client/server protocol change that I think should address the issues with occasional dropped results you'd also noticed.

We definitely do need to do this upgrade - the latest 5.2 clients being used over at PrimeGrid aren't fully backwards-compatible with the 4.1.4 version we're using on port 9000 over at NPLB, and 4.3.6 is getting pretty old now as well. There haven't been any major changes to the database schema in the meantime so the upgrades should be fairly low-risk (though of course I'll be sure to take a backup of the tables first).

5.2.5 is tested now for a couple of month and there are no issues
5.2.6 is a no no.
5.2.7 I have just installed it on my server and will do some tests on it.
specialy DC work.

I will move the cores aug 11 to join PG challenge.

Lennart
Lennart is offline   Reply With Quote
Old 2013-08-10, 17:48   #3
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

11000110100002 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
The memory leaks you mentioned to me by email earlier should be fixed (as well as decreased memory usage in general).
If there are significant memory leaks on *nix, then I would like to know about them. I know that the Windows server has a memory leak, but that is in the MySQL ODBC driver, not in PRPNet.
rogue is offline   Reply With Quote
Old 2013-08-10, 19:53   #4
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

186916 Posts
Default

Quote:
Originally Posted by rogue View Post
If there are significant memory leaks on *nix, then I would like to know about them. I know that the Windows server has a memory leak, but that is in the MySQL ODBC driver, not in PRPNet.
I haven't observed them myself - I was referring to something Gary described to me via email early this year. He's been the main one keeping an eye on the servers lately since I've been busy with work for the last couple of years.

Looking at "top" command output on jeepford right now, I'm not seeing any particular signs of memory leaks - when sorting by virtual memory image size, none of the prpserver processes show up on the list (which puts all of their image sizes at least below 293 MB, the smallest on the list right now). mysqld is using about 1.2 GB, but that's not unexpected since we're also running the (very large) NPLB stats database off the same MySQL system. You'd have to ask Gary what he was referring to...
mdettweiler is offline   Reply With Quote
Old 2013-08-10, 20:19   #5
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

11000110100002 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
I haven't observed them myself - I was referring to something Gary described to me via email early this year. He's been the main one keeping an eye on the servers lately since I've been busy with work for the last couple of years.

Looking at "top" command output on jeepford right now, I'm not seeing any particular signs of memory leaks - when sorting by virtual memory image size, none of the prpserver processes show up on the list (which puts all of their image sizes at least below 293 MB, the smallest on the list right now). mysqld is using about 1.2 GB, but that's not unexpected since we're also running the (very large) NPLB stats database off the same MySQL system. You'd have to ask Gary what he was referring to...
Okay.

IMO you can migrate port 9000 to 4.3.x at any time and then to 5.2.x when Gary gives his approval.
rogue is offline   Reply With Quote
Old 2013-08-11, 08:40   #6
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

101·103 Posts
Default

In speaking with my main tester, he has found memory leak issues in versions 5.x.x and later of PRPnet. Therefore we will be staying with PRPnet 4.3.6 on this server. Even without the issues, there are no features that are needed at CRUS that were added with 5.x.x versions.
gd_barnes is online now   Reply With Quote
Old 2013-08-11, 10:30   #7
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24·397 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
In speaking with my main tester, he has found memory leak issues in versions 5.x.x and later of PRPnet. Therefore we will be staying with PRPnet 4.3.6 on this server. Even without the issues, there are no features that are needed at CRUS that were added with 5.x.x versions.
5.x.x is very generic. I would appreciate if you or your tester could provide me detailed information (via e-mail, not PM) about any memory leak so that I can investigate.

On a separate note it would have been fair to me and other users of my software to have provided this information when you first knew about it. I'm under the impression that you have known about this for months and have relayed that information to others, but not to me.
rogue is offline   Reply With Quote
Old 2013-08-11, 18:34   #8
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

1040310 Posts
Default

No, I just got word of it on Aug. 5th. I don't have the knowledge to do in-depth testing of PRPnet. My tester referred to "minor/major bugs untested after fixing. Some 5.x versions cause MYSQL to timeout under heavy use." Regardless, I've been informed that versions 5.x.x just have updates for GenFer and wwww stuff...things that CRUS doesn't need. Port 1400 is running 4.3.6, which has been stable for a long time and has good stats, so there's no reason to upgrade.

The main problem that I have is that "beta" versions of software that have had some minor testing done are being espoused as "alpha" versions, which have had rigorous testing done. It strikes me as just a little concerning that a piece of software goes from version 5.0 to 5.6 in 6 months.

I don't believe in upgrading stuff just for the sake of having the newest and greatest. I realize that my thoughts on software upgrades are deemed "conservative" and/or cautious. But I feel very strongly about the integrity of this project. We have to run software that is virtually flawless and has had rigorous testing done, even when a minor upgrade is done to the software. I'm still very bothered about the numerous bad sieve files that I/we ran across due to using a new "upgraded" version of srsieve that was removed algebraic factors. I realize that newer versions work well but it is an example of what I am talking about here. I'm very confident that primes were missed due to the bad version of srsieve. I personally caught a completely bad set of primes for both S15 and S35 that were provided by one of our most accurate and trusted testers because he didn't realize he had a bad version of srsieve. The only reason that I caught them is that the prime count was way too low...like 50-65% fewer than expected. Had it only been 20-30% less than expected, I would have written it off as "variance" in the # of primes. The ranges had to be completely rerun.

Last fiddled with by gd_barnes on 2013-08-11 at 18:43
gd_barnes is online now   Reply With Quote
Old 2013-08-11, 18:50   #9
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
The main problem that I have is that "beta" versions of software that have had some minor testing done are being espoused as "alpha" versions, which have had rigorous testing done. It strikes me as just a little concerning that a piece of software goes from version 5.0 to 5.6 in 6 months.
Ahem...the current version is 5.2.7.
mdettweiler is offline   Reply With Quote
Old 2013-08-11, 20:16   #10
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24·397 Posts
Default

You take that risk with any software, not just mine.

That MySQL times out under heavy usage is more likely an issue with MySQL configuration or odic than with PRPNet. In any case if someone can point to a specific problem with PRPNet, hopefully one that I can reproduce, that would be helpful to me.

As for 5.2.x, there are some nice enhancements to how stats are shown in the browser, such as the ability to sort columns and links between pages. There are some changes in 5.2.0 for CRUS, albeit small.

How I number releases gives you an idea regarding what types of changes are in them. Given v.r.p (version, release, patch), I typically change the version when there are significant enhancements that are likely to break some compatibility between versions of the server. I change the release for smaller enhancements. I change the patch strictly to fix bugs. That might not be ideal, but it works fairly well.
rogue is offline   Reply With Quote
Old 2013-08-12, 01:39   #11
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

101·103 Posts
Default

Oops, I meant version 5.2.6 (now 5.2.7) not 5.6.

What you guys don't seem to realize is that PrimeGrid, for the most part, does not need the degree of accuracy that CRUS requires.

Mark and Max, with my tester's permission, I will cut-and-paste an Email that I just got from him with some modification for clarity and the such. This is referring to PRPnet versions 5.x.x and later.

Quote:
1. Under heavy load (heavy load defined as 20 cores testing a base loaded on PRPNET with over 1M tests), the server issues an error message that MYSQL timed out. PRPNET issued the error message. Even if it is MYSQL's problem, why is PRPNET issuing the error message? If PRPNET issues a message, it must know what is wrong.

I was doing some testing on a large range of R7 at the time. I had to bust up the input into multiple loads. Load 1st. Test. Delete it. Load 2nd. Test. etc.

2. (Referring to all versions of PRPNET, not just 5.x.x and later): PRPNET can't handle the loading of over 1.7M tests for a specific base. It seems that the admin program only passes tests to PRPSERVER. PRPSERVER has some kind of logic to account for duplicate tests. I'm assuming that this is causing the load to hang when some sort of limit is reached. A. What is the limit and why? B. Shouldn't the DB be set up to reject duplicates on its own? C. MYSQL can handle billions of entries in a table, why can't PRPNET load over 1.7M?

3. When running the PRPSERVER with a sort option that includes the "a" option (age), it takes a long time to retrieve tests to send to a client. When I change the sort option to not include a, the retrieval rate is acceptable. I'm assuming since there is no index on the candidate table for age, that the entire table is accessed looking for the earliest. There is a timestamp field; LastUpdateTime; but it is not a key. Why isn't it?

4. When doing new base tests, I frequently have a situation where 2 or more primes are sent from a client during a reporting cycle. The server will handle it properly. Minutes later, the server will try to handle the 2nd or even 3rd prime again. This causes multiple entries in the PRP output file. Again, this happens under the same heavy load I mentioned above. Something is not getting cleared out correctly from the initial upload from the client or the server is just lost.
He reports that problems #1, #3, and #4 are issues specific to PRPNET versions 5.x.x. Only problem #2 is specific to all versions. FYI he is running Windows.

Since the issue with the bad version of srsieve; whenever anyone suggests that we upgrade any software here, I talk to my tester first. He does what I refer to as "alpha" testing, which tests multiple different scenarios under varying degrees of load; similar to what a professional-level video game company would do before putting a product on the market.

I will move this discussion about PRPNET issues to a separate thread shortly. I will also unsticky this thread and move related posts to the S6 thread since we are now running a PRPNET server on it.


Gary

Last fiddled with by gd_barnes on 2013-08-12 at 02:03
gd_barnes is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Links to Precompiled GMP-ECM versions wblipp GMP-ECM 469 2019-11-12 15:02
Prime95 License/Untrusted Versions? Dubslow Software 21 2012-05-04 18:30
Links to Precompiled Msieve versions wblipp Msieve 0 2011-07-17 20:59
Recommended versions Prime95 markhl Software 4 2008-08-04 13:46
Differences between LLR versions MooooMoo Riesel Prime Search 6 2006-09-27 18:51

All times are UTC. The time now is 10:34.


Tue Jul 27 10:34:52 UTC 2021 up 4 days, 5:03, 0 users, load averages: 1.94, 1.97, 1.91

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.