mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Conjectures 'R Us

Reply
 
Thread Tools
Old 2013-08-12, 06:53   #12
Lennart
 
Lennart's Avatar
 
"Lennart"
Jun 2007

25×5×7 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
Oops, I meant version 5.2.6 (now 5.2.7) not 5.6.

What you guys don't seem to realize is that PrimeGrid, for the most part, does not need the degree of accuracy that CRUS requires.

Mark and Max, with my tester's permission, I will cut-and-paste an Email that I just got from him with some modification for clarity and the such. This is referring to PRPnet versions 5.x.x and later.



He reports that problems #1, #3, and #4 are issues specific to PRPNET versions 5.x.x. Only problem #2 is specific to all versions. FYI he is running Windows.

Since the issue with the bad version of srsieve; whenever anyone suggests that we upgrade any software here, I talk to my tester first. He does what I refer to as "alpha" testing, which tests multiple different scenarios under varying degrees of load; similar to what a professional-level video game company would do before putting a product on the market.

I will move this discussion about PRPNET issues to a separate thread shortly. I will also unsticky this thread and move related posts to the S6 thread since we are now running a PRPNET server on it.


Gary

Nothing of this is any problem with the official servers here.

I agree about >1.7M in DB is a issue.

"I was doing some testing on a large range of R7 at the time. I had to bust up the input into multiple loads. Load 1st. Test. Delete it. Load 2nd. Test. etc."

This I also have problem with when I have about 20-100 core on it and I load 500k-1M candidates at the same time. OBS !! Only when I run very small candidates like base 3 n<80k

"3. When running the PRPSERVER with a sort option that includes the "a" option (age), it takes a long time to retrieve tests to send to a client. When I change the sort option to not include a, the retrieval rate is acceptable. I'm assuming since there is no index on the candidate table for age, that the entire table is accessed looking for the earliest. There is a timestamp field; LastUpdateTime; but it is not a key. Why isn't it?"

This happends when connection rate are < then 1 sec.


"4. When doing new base tests, I frequently have a situation where 2 or more primes are sent from a client during a reporting cycle. The server will handle it properly. Minutes later, the server will try to handle the 2nd or even 3rd prime again. This causes multiple entries in the PRP output file. Again, this happens under the same heavy load I mentioned above. Something is not getting cleared out correctly from the initial upload from the client or the server is just lost. "

I don't have any problem whit this. I always export what I need from DB.

Maybe Mark can fix this or add a conj. primefile.


Lennart
Lennart is offline   Reply With Quote
Old 2013-08-12, 08:29   #13
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

5,881 Posts
Default

I hit 4 a while ago. It would be nice to fix that one. That was with 5.0.8
henryzz is online now   Reply With Quote
Old 2013-08-12, 11:57   #14
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

11000110100002 Posts
Default

Thank you for reporting this. Now I'll try to address this as best I can.

Quote:
Originally Posted by gd_barnes View Post
1. Under heavy load (heavy load defined as 20 cores testing a base loaded on PRPNET with over 1M tests), the server issues an error message that MYSQL timed out. PRPNET issued the error message. Even if it is MYSQL's problem, why is PRPNET issuing the error message? If PRPNET issues a message, it must know what is wrong.

I was doing some testing on a large range of R7 at the time. I had to bust up the input into multiple loads. Load 1st. Test. Delete it. Load 2nd. Test. etc.

2. (Referring to all versions of PRPNET, not just 5.x.x and later): PRPNET can't handle the loading of over 1.7M tests for a specific base. It seems that the admin program only passes tests to PRPSERVER. PRPSERVER has some kind of logic to account for duplicate tests. I'm assuming that this is causing the load to hang when some sort of limit is reached. A. What is the limit and why? B. Shouldn't the DB be set up to reject duplicates on its own? C. MYSQL can handle billions of entries in a table, why can't PRPNET load over 1.7M?

3. When running the PRPSERVER with a sort option that includes the "a" option (age), it takes a long time to retrieve tests to send to a client. When I change the sort option to not include a, the retrieval rate is acceptable. I'm assuming since there is no index on the candidate table for age, that the entire table is accessed looking for the earliest. There is a timestamp field; LastUpdateTime; but it is not a key. Why isn't it?

4. When doing new base tests, I frequently have a situation where 2 or more primes are sent from a client during a reporting cycle. The server will handle it properly. Minutes later, the server will try to handle the 2nd or even 3rd prime again. This causes multiple entries in the PRP output file. Again, this happens under the same heavy load I mentioned above. Something is not getting cleared out correctly from the initial upload from the client or the server is just lost.
I will note that these problems have always existed with PRPNet. They are not distinct to 5.x.

1) One thing that doesn't help is that the primary key of many tables is a string, not a number. When I originally wrote PRPNet I hadn't considered that some some people would try to put over 1e6 candidates into a database. I can fix this, but it is not a simple fix. As for the message PRPNet is only echoing a message from the database. Some problems reported by the database can be addressed by PRPNet, others cannot.

2) Like #1, this is tied to the use of a string for the primary key.

3) Adding an index for LastUpdateTime is easy. If this had been an issue you could have added the index and provided that to me as feedback. You would not be the first person to have provided such feedback.

4) This is the first I've heard of this issue. I would need both server and client logs that show the issue then I can look for the cause.

My question to you and your tester is this: what requirements do you have of PRPNet WRT volume, i.e. both number of records and number of users? PrimeGrid pushed the boundaries of PRPNet's capabilities, most notably on PPSElow.
rogue is offline   Reply With Quote
Old 2013-08-12, 14:54   #15
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

16F916 Posts
Default

To clarify with 4. I hit it with only 4 cores. It doesn't take much(quite possibly 2 cores).
henryzz is online now   Reply With Quote
Old 2013-08-12, 15:44   #16
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24·397 Posts
Default

Quote:
Originally Posted by henryzz View Post
To clarify with 4. I hit it with only 4 cores. It doesn't take much(quite possibly 2 cores).
I run 4 cores on my work computer and I haven't seen it, that is why I'm requesting the client and server logs so that I can investigate.
rogue is offline   Reply With Quote
Old 2013-08-12, 18:34   #17
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

101×103 Posts
Default

Quote:
Originally Posted by rogue View Post
My question to you and your tester is this: what requirements do you have of PRPNet WRT volume, i.e. both number of records and number of users? PrimeGrid pushed the boundaries of PRPNet's capabilities, most notably on PPSElow.
I would say at least 10 million tests with at least 200 cores. Does anyone else have any input on that? At NPLB, it would be nice to load a huge range (say k=600-1000 for n=1.3M-2M) into the server and just forget about it. I don't think PPSElow had nearly as fast of tests as we have here (say for base 3 at n=80K like what Lennart was talking about) and I doubt that they attempted to load much more than 1 million tests into the server because the server would have bogged down.

Can I make a couple of requests? First, when one or more of these issues is fixed, I'd like to hand it off to my tester before it is introduced as a new version to the public. If it is good, then add it as a new version and publicly release it. If not, we will relay the unfixed or new issues back to you and you can attempt to fix them again. Second, I'm asking that no new features be added at the same time that one or more of these issues is fixed.

My tester doesn't have to be the "official" tester. He is a programmer who was a former co-worker and knows what he is doing. It can be anyone with extensive knowledge of databases and testing.

I personally would feel much better if this was how future versions were introduced. It would greatly reduce the number of versions and it would make it where there were far fewer versions out there with problems.

And finally, there seems to be some question that we have known about this for a long time. Approximately a couple of weeks ago just before I left on my trip people were requesting here that we upgrade our servers to version 5.x.x. At that point, I asked my tester to start testing such versions and get back with me. I just found out about the issues around Aug. 5th and after getting back Friday, finally found time to post them on Sunday.


Gary
gd_barnes is online now   Reply With Quote
Old 2013-08-12, 20:10   #18
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

588110 Posts
Default

Quote:
Originally Posted by rogue View Post
I run 4 cores on my work computer and I haven't seen it, that is why I'm requesting the client and server logs so that I can investigate.
I found it while doing some base 15 testing I think. I ran upto 2500 with pfgw and then sieved and ran prpnet.
I will do the same again to try and break it. I am away from Thursday so I might not be able to finish before then as I am busy this week. It might end up being after I get back on the 24th that I can provide the data.
henryzz is online now   Reply With Quote
Old 2013-08-12, 20:12   #19
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24·397 Posts
Default

10 million candidates?

I can't make any promises, but I will see what I can do. Changing the primary key will certainly help, but how much, I don't know.

200 cores?

I have no ability to test that number of cores. It is more of a question of the number of cores accessing the server concurrently than anything else.

I have a request of you. Instead of you acting as a go between between me and your tester, I would appreciate if that person communicated with me directly via e-mail. If that person wants to remain anonymous, i.e. not reveal their name, then they can set up a dummy e-mail account somewhere and communicate with me via that means.

I doubt that I will be able to wait for your tester for everything as that person can't test everything and some fixes are rather obvious. I would prefer to label releases as alpha or beta so that other user have the opportunity to play with the software as I release it. Another options is to put the code into sourceforge. That would allow me to tag releases when they are stable.

Last fiddled with by rogue on 2013-08-12 at 20:16
rogue is offline   Reply With Quote
Old 2013-08-12, 20:55   #20
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

5,881 Posts
Default

I just upgraded from 5.0.8 to 5.2.7 and for some reason 5.2.7 can't connect to the database. Is there any changes needed that I should be aware of? The database.ini files is the same(as 5.0.8) except I replaced prpnet2 with prpnet3 for the database name.
5.0.8 connects fine.
henryzz is online now   Reply With Quote
Old 2013-08-12, 21:21   #21
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

5,881 Posts
Default

I have discovered the problem. The 5.0.8 binary I had was 32-bit. The 5.2.7 binary is 64-bit. I need to install the 64-bit driver.
Not a terribly obvious problem.
edit: solution worked. Now waiting for pfgwing upto 2500

Last fiddled with by henryzz on 2013-08-12 at 21:35
henryzz is online now   Reply With Quote
Old 2013-08-12, 21:54   #22
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

101·103 Posts
Default

Quote:
Originally Posted by rogue View Post
10 million candidates?

I can't make any promises, but I will see what I can do. Changing the primary key will certainly help, but how much, I don't know.

200 cores?

I have no ability to test that number of cores. It is more of a question of the number of cores accessing the server concurrently than anything else.

I have a request of you. Instead of you acting as a go between between me and your tester, I would appreciate if that person communicated with me directly via e-mail. If that person wants to remain anonymous, i.e. not reveal their name, then they can set up a dummy e-mail account somewhere and communicate with me via that means.

I doubt that I will be able to wait for your tester for everything as that person can't test everything and some fixes are rather obvious. I would prefer to label releases as alpha or beta so that other user have the opportunity to play with the software as I release it. Another options is to put the code into sourceforge. That would allow me to tag releases when they are stable.
I will ask my tester about that and get back with you. He can test quite a lot and can push nearly 100 cores for periods of time. He found all of those issues in just a few days...3 of which had never been mentioned anywhere before to the best of my knowledge.

200 cores can be tested without an "official" release...i.e. a beta version, by placing a set of code out there for some designated mass testers.

I think sourceforge sounds like a good idea. But my thought is that release numbers should not be changed until new features or fixes are extensively tested. Perhaps I am not understanding the version numbers. I feel like every version should be fully and extensively tested by large numbers of cores before it is considered a new "version". Is that a possibility? It bothers me, for instance, when I hear Lennart say: "5.2.6 is a no no.". Had 5.2.6 been extensively tested, it would not have been an "official version". In other words, what is now 5.2.7 would be 5.2.6 and there would be no "bad" version.
gd_barnes is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Links to Precompiled GMP-ECM versions wblipp GMP-ECM 469 2019-11-12 15:02
Prime95 License/Untrusted Versions? Dubslow Software 21 2012-05-04 18:30
Links to Precompiled Msieve versions wblipp Msieve 0 2011-07-17 20:59
Recommended versions Prime95 markhl Software 4 2008-08-04 13:46
Differences between LLR versions MooooMoo Riesel Prime Search 6 2006-09-27 18:51

All times are UTC. The time now is 10:33.


Tue Jul 27 10:33:36 UTC 2021 up 4 days, 5:02, 0 users, load averages: 2.26, 2.05, 1.94

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.