mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > No Prime Left Behind

Reply
 
Thread Tools
Old 2009-01-15, 04:10   #12
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

5·2,017 Posts
Default

Max,

David and I are having an Email discussion about a formula for properly crediting results and primes.

The formula that Karsten used is a decent one but I want one that is more "generalized" and isn't specific to an n=333333 prime.

David, I'll get back with you later tonight or on Thurs. I'll come up with something that everyone can live with and will properly score CPU time taken. I've been offline most of the day replacing my mobo and Wednesday's are my night with my kids during the week.

This is not to say that we can't keep the # of total results and primes AND have a total score too. You can have separate categories/listings. The top-5000 site does that. Even if you're in the top 5 on primes, frequently you're not in the top 20-30 in score because your primes are a lot smaller than some of the people that have found gargantuan primes. There's nothing wrong with that. It gives the folks with smaller resources a shot at moving high on the list for # of primes.


Gary

Last fiddled with by gd_barnes on 2009-01-15 at 04:14
gd_barnes is offline   Reply With Quote
Old 2009-01-15, 04:15   #13
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

5×2,017 Posts
Default

Quote:
Originally Posted by henryzz View Post
IMO this drive has messed up the stats a bit
drive 4 also did though for the opposite reasons

The scoring method that I will propose to David, if implemented, will fix all of that.
gd_barnes is offline   Reply With Quote
Old 2009-01-15, 04:40   #14
IronBits
I ♥ BOINC!
 
IronBits's Avatar
 
Oct 2002
Glendale, AZ. (USA)

3×7×53 Posts
Default

My goal is to add another column of stats that contains the score, in addition to everything else you already see when you visit your nplb database.
... And ... only if everyone wants it.
I have no vested interest in how you guys want your database to work.
My initial goal was to get the database up and running, I have achieved that.
With AMDave's help with everything, you have what you see, and what you see is accurate to the best of our knowledge.
We are trying to add some nice stats features, as we feel any database should allow one to view as much of the data that you want to see or use, or not.

It is entirely up to all of you to decide what you would like to see, in what way can the data be presented to you that would help you with what you do behind the scenes, like Karsten for example. Is there a certain display that you could use to help make your job easier?

Anyway, it's your stuff, you all decide...

I'm just passing through and making good on my promise to take over the original AES database and make it better and more accurate. I've reached that goal completely.

I've also been able to setup a super reliable llrnet Server for you guys to pound on
/me toots own horn and walks away

Thanks to AMDave for all his hard work that made it all come together for you fine folks to. /tips hat

Last fiddled with by IronBits on 2009-01-15 at 04:42
IronBits is offline   Reply With Quote
Old 2009-01-15, 05:34   #15
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

276516 Posts
Default

You have done an unbelieveable job for us David and I think I can speak for all of us in saying: THANK YOU!!

I think everyone will agree that a "score" column is needed. If anyone thinks that it isn't, please speak now.

I need to go finish working on getting my final machine running now...may need to reload the O.S. due to a slightly different mobo. After that (likely a couple of hours), I'll post an official formula for you to use for the scoring. As indicated in the Email to you, it will be a little different for results than it is for primes.

After that, I'll get the whole k=341 situation sorted out, move a quad over to sieve Nugget's range in the sieveing drive that he never sent me factors for (ARGH!!), and then, I'll be able to rest easier. lol


Gary

Last fiddled with by gd_barnes on 2009-01-15 at 05:35
gd_barnes is offline   Reply With Quote
Old 2009-01-15, 10:50   #16
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

100111011001012 Posts
Default

I have thought all along that the top-5000 site, which Karsten used as a basis for scoring our earlier drives was WAY WAY to complicated! It can be accomplished in a far easier way without logarithms AND it is still completely fair.

Many of you may know the following:

The CPU time to process a k/n pair varies with the SQAURE of the n-value; that is:
If n=100K takes 15 secs. to process, n=200K will take ~60 secs. and n=400K will take ~240 secs.

Put more simply, a k/n pair at n=400K will take 240 / 15 = 16 times as long to process as a pair at n=100K because the n-value is 4 times as large and 4^2=16.

The CPU time to find a prime varies with the CUBE of the n-value; that is:
If a prime at n=100K takes 20 CPU hours to find, a prime at n=200K will take 160 CPU hours to find, and a prime at n=400K will take 1280 CPU hours to find.

Put more simply, a prime at n=400K will take 1280 / 20 = 64 times as long to find a prime at n=100K because the n-value is 4 times as large and...4^3=64.

For those of you who didn't realize it before, now you know why you were able to find primes so quickly and in such bunches on this drive.

Now to the much improved easy formulas:

Score results in the following manner:

n-value^2 / 1e10

Score primes in the following manner:

n-value^3 / 1e15


If the notation confuses some people, 1e10 = 10^10 and 1e15 = 10^15.

It's as simple as that. David, when implementing this, please make sure there are plenty of digits of internal accuracy. For instance, for a prime at n=800000, the calculation will internally calculate as 800000^3/1e15=5.12e17/1e15=512. Also, please make sure that the scoring field itself is quite large on the results side. If someone has 1 million results and their average test is at n=500K, they will have 25 million "results" points. I would suggest at least 12 full digits for the field (goes to 999 billion) and possibly 15 (goes to 999 trillion) for results. For primes, I think it could be 3 digits less.

This is so simple, it's child's play. A prime at n=100K scores 1 point, at n=200K scores 8 points, n=400K 512 points, and n=800K 4096 points. A result at n=100K scores 1 point, at n=200K scores 4 points, n=400K 16 points, and n=800K 64 points.

If people object to such high total scores, we can always increase the divisor on the above but we certainly don't need to complicate things with logarithms.

Karsten, Max, Mini, Ian, and any other folks who like messing with math, please state here if you disagree with this in any manner.

David,

Let's give it a day for any objections. If there are none, please proceed with adding a column to both your results and primes stats for this scoring method.

I cannot think of a more simple way to fairly score things.


Thanks,
Gary
gd_barnes is offline   Reply With Quote
Old 2009-01-15, 16:50   #17
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

10000101010112 Posts
Default

Sounds fine to me, but, and perhaps this is intended, but a side effect of scoring this way will mean that in high n-ranges, luck will have a very large effect on your overall score. Let's consider for a moment that if a very lucky new person has his first result return prime at n=1.6M, they will receive 8192 points, while a normal result at that size would be 128, giving them the equivalent credit for 8192/128=64 results. Do we want to give such a preference in scores to blind chance? (on thinking about this a bit more, I don't think it's too big of a deal, but I still want to bring the topic up)
The only thing I can think of to fix this would be to score primes no different from other results. (this would work just fine, but wouldn't give a nice bonus for actually finding what we're searching for)
Maybe in the future, scores for manual results and sieving could be counted as well. Maybe count sieving by the size of the number sieved out and/or the size of the factor. Manual results/primes would, of course, be counted the same way as LLRnet results/primes.
When somebody finds a prime at e.g. 1.6M (to reuse numbers from before), do they get 8192 points only or 8192+128=8320 for returning a result that happened to be prime? (i.e. is the normal result score given in addition to the bonus for finding a prime?)
Also keep in mind that with just squarings/cubings to figure the score, the units will, in the future, inflate to very large amounts, and the work of today's computers will be a tiny, tiny drop in the bucket (as, indeed, they are)...do we want to try to do something to give higher credits the earlier it happened or simply let old work's credit shrink to a negligible amount?

Edit: In regard to henry's post below mine, I think n=100K would be fine because it's a nice round number, and n=400K will seem as out-of-our-range/tiny as n=100K does now within a few years, and changing the value would make our algorithm more like top-5000's where instead of absolutes we have constantly (ok, monthly) changing scores.. This reminds me of something else: when a result is returned, does the credit given round to an integer and then get stored as that, or does it continue out as a float, get added to the rest of the float, and then be rounded to the nearest integer only for display? It all being float in the back-end would be much better for rounding accuracy.

Last fiddled with by Mini-Geek on 2009-01-15 at 17:03
Mini-Geek is offline   Reply With Quote
Old 2009-01-15, 16:51   #18
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT)

2·3·941 Posts
Default

very good
one thing though:
what do we want 1 to be?
what you said would put it as n=100k but we dont do many tests at anywhere near n=100k so it is hard to pin a meaning to the value
i would say we should possibly have it as something recognizable like n=400k or maybe the number of digits of the lowest top 5000 prime updated monthly
henryzz is offline   Reply With Quote
Old 2009-01-15, 17:09   #19
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Quote:
Originally Posted by Mini-Geek View Post
Sounds fine to me, but, and perhaps this is intended, but a side effect of scoring this way will mean that in high n-ranges, luck will have a very large effect on your overall score. Let's consider for a moment that if a very lucky new person has his first result return prime at n=1.6M, they will receive 8192 points, while a normal result at that size would be 128, giving them the equivalent credit for 8192/128=64 results. Do we want to give such a preference in scores to blind chance? (on thinking about this a bit more, I don't think it's too big of a deal, but I still want to bring the topic up)
The only thing I can think of to fix this would be to score primes no different from other results. (this would work just fine, but wouldn't give a nice bonus for actually finding what we're searching for)
Maybe in the future, scores for manual results and sieving could be counted as well. Maybe count sieving by the size of the number sieved out and/or the size of the factor. Manual results/primes would, of course, be counted the same way as LLRnet results/primes.
When somebody finds a prime at e.g. 1.6M (to reuse numbers from before), do they get 8192 points only or 8192+128=8320 for returning a result that happened to be prime? (i.e. is the normal result score given in addition to the bonus for finding a prime?)
Also keep in mind that with just squarings/cubings to figure the score, the units will, in the future, inflate to very large amounts, and the work of today's computers will be a tiny, tiny drop in the bucket (as, indeed, they are)...do we want to try to do something to give higher credits the earlier it happened or simply let old work's credit shrink to a negligible amount?

Edit: In regard to henry's post below mine, I think n=100K would be fine because it's a nice round number, and n=400K will seem as out-of-our-range/tiny as n=100K does now within a few years, and changing the value would make our algorithm more like top-5000's where instead of absolutes we have constantly (ok, monthly) changing scores.. This reminds me of something else: when a result is returned, does the credit given round to an integer and then get stored as that, or does it continue out as a float, get added to the rest of the float, and then be rounded to the nearest integer only for display? It all being float in the back-end would be much better for rounding accuracy.
Well, since David said before that he'd just be adding score columns to the existing stats tables, the k/n pairs and primes would still be scored separately--so we wouldn't have to worry about "lucky" users getting overly huge stats bonuses. Though, now that you mention it, having, say, a separate "overall stats" page where combined scores for both k/n pairs and primes are shown is an interesting possibility...
mdettweiler is offline   Reply With Quote
Old 2009-01-15, 17:13   #20
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
I have thought all along that the top-5000 site, which Karsten used as a basis for scoring our earlier drives was WAY WAY to complicated! It can be accomplished in a far easier way without logarithms AND it is still completely fair.

Many of you may know the following:

The CPU time to process a k/n pair varies with the SQAURE of the n-value; that is:
If n=100K takes 15 secs. to process, n=200K will take ~60 secs. and n=400K will take ~240 secs.

Put more simply, a k/n pair at n=400K will take 240 / 15 = 16 times as long to process as a pair at n=100K because the n-value is 4 times as large and 4^2=16.

The CPU time to find a prime varies with the CUBE of the n-value; that is:
If a prime at n=100K takes 20 CPU hours to find, a prime at n=200K will take 160 CPU hours to find, and a prime at n=400K will take 1280 CPU hours to find.

Put more simply, a prime at n=400K will take 1280 / 20 = 64 times as long to find a prime at n=100K because the n-value is 4 times as large and...4^3=64.

For those of you who didn't realize it before, now you know why you were able to find primes so quickly and in such bunches on this drive.

Now to the much improved easy formulas:

Score results in the following manner:

n-value^2 / 1e10

Score primes in the following manner:

n-value^3 / 1e15


If the notation confuses some people, 1e10 = 10^10 and 1e15 = 10^15.

It's as simple as that. David, when implementing this, please make sure there are plenty of digits of internal accuracy. For instance, for a prime at n=800000, the calculation will internally calculate as 800000^3/1e15=5.12e17/1e15=512. Also, please make sure that the scoring field itself is quite large on the results side. If someone has 1 million results and their average test is at n=500K, they will have 25 million "results" points. I would suggest at least 12 full digits for the field (goes to 999 billion) and possibly 15 (goes to 999 trillion) for results. For primes, I think it could be 3 digits less.

This is so simple, it's child's play. A prime at n=100K scores 1 point, at n=200K scores 8 points, n=400K 512 points, and n=800K 4096 points. A result at n=100K scores 1 point, at n=200K scores 4 points, n=400K 16 points, and n=800K 64 points.

If people object to such high total scores, we can always increase the divisor on the above but we certainly don't need to complicate things with logarithms.

Karsten, Max, Mini, Ian, and any other folks who like messing with math, please state here if you disagree with this in any manner.

David,

Let's give it a day for any objections. If there are none, please proceed with adding a column to both your results and primes stats for this scoring method.

I cannot think of a more simple way to fairly score things.


Thanks,
Gary
Looks good to me! However, one thing: you mentioned that we may possibly want to reduce the divisors a bit for lower overall scores. I think that would be wise; otherwise, considering that most of our work is indeed *not* at n=100K, these scores are going to balloon really, really quickly. We don't want them to become unmanageable.

I'm thinking that, rather than having an n=100K k/n pair be worth 1 point, we should have our 1-point base value be at 400K--that's a little closer to the general spectrum in which we do most of our work, so the scores should be a bit more manageable.

Max
mdettweiler is offline   Reply With Quote
Old 2009-01-15, 17:27   #21
MyDogBuster
 
MyDogBuster's Avatar
 
May 2008
Wilmington, DE

2×13×109 Posts
Default

Quote:
Looks good to me! However, one thing: you mentioned that we may possibly want to reduce the divisors a bit for lower overall scores. I think that would be wise; otherwise, considering that most of our work is indeed *not* at n=100K, these scores are going to balloon really, really quickly. We don't want them to become unmanageable.

I'm thinking that, rather than having an n=100K k/n pair be worth 1 point, we should have our 1-point base value be at 400K--that's a little closer to the general spectrum in which we do most of our work, so the scores should be a bit more manageable.
I agree wth Max. Keep the score readable. Now if we just had a script or something that would read the manual files submitted and calculate a score, it would solve that end as well.
MyDogBuster is offline   Reply With Quote
Old 2009-01-16, 01:12   #22
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

5×2,017 Posts
Default

Ref. Mini:

As Max said, the results and primes will be scored separately. The results scoring will technically be a more accurate reflection of total CPU effort expended because there is no luck involved. For the element of luck on huge primes, see the top-5000 site. It's a virtual impossibility to move into the top 15 on score unless you get lucky, even with 100-200 cores. There is an exception or two for institutions running 200-500 cores at all times but for the most part, I seriously doubt that most of the people in the top 15 are running that many because most only have 1 monster prime, hence got lucky for the # of cores that they are running. That's the nature of primes and is why we will have separate scores for results and primes.

Ref. all about making the score 1 for an n=400K prime:

That sounds good to me. The only reason I used 100K initially is because we'll get into teeny fractions of a point for the lower n-ranges. But...as you guys said, those efforts here are not the norm.

Revised formulas:

Results:
n^2 / 160e9

Primes:
n^3 / 64e15

Therefore a prime at n=100K will score 1/64th (.015625) of a point and a prime at n=50K will score 1/512th (.001953) of a point. [lol]

A result at n=100K will score 1/16th (.0625) of a point and a result at n=50K will score 1/64th (.015625) of a point.

...and of course a result or a prime at n=400K will score 1 point.

David, the column will need to DISPLAY 12 digits to the left of the decimal point and 3 digits to the right, i.e. 999,999,999,999.999.

As for internally storing (not displaying) the score of each result/prime, we probably need it out to 6 decimal places. This should not be a problem because each result/prime will almost always have a score of < 16 (an n=1M prime will score 15.625) so could each be internally stored in 999.999999 format.

But when making that calculation on each result/prime, the numbers are large, i.e. an n=1M prime would be 1e6^3/64e15 or a 19-digit number divided by a 17-digit number. The division needs to be done so it is accurate to 6 decimal places.

[With the scores much lower, 9 digits to the left of the decimal might be sufficient for displaying folk's total scores but we may as well plan for the long-term!]

And finally:

Combining the 2 scores would not make sense. Using the above, the results scores will be so much higher that they will overwhelm the primes scores so as to make such a combined score virtually meaningless. For example: If there is a 1 in 5000 chance of an n=400K result being prime, on average, a person will score 5000 results points and 1 prime point. Hence they must be kept separate.


Gary

Last fiddled with by gd_barnes on 2009-01-16 at 01:20
gd_barnes is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Distribution of Mersenne primes before and after couples of primes found emily Math 34 2017-07-16 18:44
ECPP - Scoring, or other primality tests (PFGW?) f1pokerspeed FactorDB 13 2012-07-02 09:04
Hoot discussion - "Beastly primes". Arkadiusz Math 12 2011-11-28 15:52
Statistics and scoring kar_bon No Prime Left Behind 85 2008-09-19 02:02
possible primes (real primes & poss.prime products) troels munkner Miscellaneous Math 4 2006-06-02 08:35

All times are UTC. The time now is 22:11.

Sun Mar 29 22:11:02 UTC 2020 up 4 days, 19:44, 2 users, load averages: 1.13, 1.20, 1.27

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.