mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Conjectures 'R Us (https://www.mersenneforum.org/forumdisplay.php?f=81)
-   -   Odds of prime discussion (https://www.mersenneforum.org/showthread.php?t=18765)

 TheCount 2013-09-29 01:22

Odds of prime discussion

I calculated the chance of finding prime for Top 20 Conjectures with 1k Remaining by Highest and Lowest Conjectured k for their next 100k test ranges.
R620 has the highest chance, 11.42%. I've started testing it.
Of course the higher the weight and higher the range, the longer to test.

 gd_barnes 2013-09-29 04:10

[QUOTE=TheCount;354471]I calculated the chance of finding prime for Top 20 Conjectures with 1k Remaining by Highest and Lowest Conjectured k for their next 100k test ranges.
R620 has the highest chance, 11.42%. I've started testing it.
Of course the higher the weight and higher the range, the longer to test.[/QUOTE]

Don't R702 and R916 have a much higher weight? They also have one k remaining at n=100K.

 henryzz 2013-09-29 06:27

[QUOTE=gd_barnes;354490]Don't R702 and R916 have a much higher weight? They also have one k remaining at n=100K.[/QUOTE]
They are also higher bases which means larger tests. Maybe that swung the balance.

 gd_barnes 2013-09-29 08:24

[QUOTE=henryzz;354492]They are also higher bases which means larger tests. Maybe that swung the balance.[/QUOTE]

There is very little testing time difference between base 620 and 702. At n=150000, base 620 is ~419,000 digits and base 702 is ~427,000 digits.

 TheCount 2013-09-29 12:31

Probability

[QUOTE=gd_barnes;354490]Don't R702 and R916 have a much higher weight? They also have one k remaining at n=100K.[/QUOTE]
R702 and R916 are in the list of "Top 20 Conjectures with 1k Remaining by Highest Weight":
[URL]http://www.noprimeleftbehind.net/crus/vstats_new/crus-top20.htm#Table25[/URL]

As stated in my post I only calculated the probabilities based on these tables:
- "Top 20 Conjectures with 1k Remaining by Highest Conjectured k": [URL]http://www.noprimeleftbehind.net/crus/vstats_new/crus-top20.htm#Table61;[/URL] and,
- "Top 20 Conjectures with 1k Remaining by Lowest Conjectured k: [URL]http://www.noprimeleftbehind.net/crus/vstats_new/crus-top20.htm#Table62[/URL].

I've only spent a few hours looking at the CRUS website so I might not be optimally searching yet.
This table looks more comprehensive: [URL]http://www.noprimeleftbehind.net/crus/vstats_new/crus-unproven.htm[/URL]
I plan to start a 2k's search next, so maybe I'll base it on that table.

Anyway, since you bought it up 32*702^n-1 has weight 2338, 78*916^n-1 has weight 2313. They are both tested to 100k.
If you test R702 to 200k the chance of finding a prime and so proving the conjecture is 14.12%.
If you test R916 to 200k the chance of finding a prime and so proving the conjecture is 13.43%.

My calculations are based on the Prime Number Theory. I posted this method on PrimeGrid 6 months ago. No one's told me I am wrong (or right) yet:

Why don't you add probabilities to the CRUS tables?
If your looking for a result rely on probability. If your looking to see how quick you can test a range look at difficulty.
Probability does not take account length of time to test an n or tests to be done in a range, just the chance you'll find a great result.

 gd_barnes 2013-09-29 18:34

Oh, OK. I'll take a look at those links later today. Your percentages are in line with what I would expect. We have an "odds of prime" spreadsheet with formulas created by one of the math experts here. I'll check them against that.

I think I misunderstood you previously. With this statement:

[quote]
R620 has the highest chance, 11.42%
[/quote]

I thought you were looking for the 1k base with the largest chance of finding a prime by n=200K. But with these statements:

[quote]
If you test R702 to 200k the chance of finding a prime and so proving the conjecture is 14.12%.
If you test R916 to 200k the chance of finding a prime and so proving the conjecture is 13.43%.
[/quote]

You seem to have concluded otherwise.

So we are in agreement that R702 and R916 have a better chance of prime by n=200K.

Regardless it doesn't matter to us what base you test. I just wanted you to realize that there are bases with a better chance of prime by n=200K than the one that you chose.

Gary

 gd_barnes 2013-09-29 23:14

1 Attachment(s)
I looked at your links and I don't know enough math to verify them one way or another.

According to the odds of prime spreadsheet attached, which I created based on formulas given by one of our math experts here (ID axn), here are the chance of prime percentages that I came up with for a sieve depth of P=5T, which is how far our 3 files have been sieved:

Base / # tests / % chance

[code]
R620 3875 19.5%
R702 4896 23.7%
R916 5216 24.2%
[/code]

The spreadsheet only allows entry for an average n, which is not very accurate when the nmax / nmin > ~ 1.5. So what I did was break it up into 10 mini-ranges, i.e. n=100K-110K, 110K-120K, etc., to get the expected # of primes of each and add them all up.

I'm not sure why you are showing a less chance of prime for R916 vs. R702. With bases this high, the difference in base size has little impact on % chance of finding prime. For instance, if base R702 had 5216 tests like R916 does, R702 would have a 24.9% chance of prime (vs. 24.2% for R916) so you can see there is not a lot of difference in prime chance when a base is only 30% bigger than another one if all other things are equal.

Edit: If you are using only a Nash weight to compute your chances of prime, that may explain the problem. Nash weight only works off of a sieve to P=511. Obviously a sieve to P=5T is going to be much more accurate. On our "difficulty" stats, our programming guru, Mark (rogue), uses a sieve to P=1M, which is very clearly accurate enough for determing such a stat.

 henryzz 2013-09-30 10:56

For a quick glance everything looks correct with your calculations. I will look to extend the odds of prime spreadsheet at somepoint with your ideas for ranges of n.

 TheCount 2013-09-30 14:32

[QUOTE=gd_barnes;354538]I'm not sure why you are showing a less chance of prime for R916 vs. R702. With bases this high, the difference in base size has little impact on % chance of finding prime. For instance, if base R702 had 5216 tests like R916 does, R702 would have a 24.9% chance of prime (vs. 24.2% for R916) so you can see there is not a lot of difference in prime chance when a base is only 30% bigger than another one if all other things are equal.

Edit: If you are using only a Nash weight to compute your chances of prime, that may explain the problem. Nash weight only works off of a sieve to P=511. Obviously a sieve to P=5T is going to be much more accurate. On our "difficulty" stats, our programming guru, Mark (rogue), uses a sieve to P=1M, which is very clearly accurate enough for determing such a stat.[/QUOTE]
I agree Nash weight, sieving to P=511, is not as accurate as seiving to a deeper depth to determine your weight. The weight is just multiplied by the rest of my equation so you can interchange it with any weight you like. Just have to divide that weight by a factor so it scales the same.

The probabilities you show for R620, R702 and R916 are about x1.73 more than my values. That is significant, and would have shown up with all the crunching CRUS has done over the years by now for sure. That makes me go back to my initial assumptions and one I may have made in error is that base 2 has the same density of primes as a randomly chosen set of odd numbers of the same magnitude. I divide the Nash weight by 1751 to give w=1, but maybe I should be dividing by some other value 1751/1.73 = 1012 for instance. I am happy with the rest of the maths.

So if I increase the sieve depth (to increase accuracy) and properly scale the weight my equation should be all good. From what you've said the spreadsheet only allows for an average n, so might only be accurate over small ranges requiring many calculations by parts to remain accurate. My equation is the result of an integration, so remains accurate over any range and doesn't require more than one calculation. Plus you can rearrange the equation and solve for other variables, which is really cool.

I'll have to go over your odds of prime spreadsheet and see what's going on there.

 gd_barnes 2013-09-30 14:57

I agree that the spreadsheet needs some sort of integration (calculus) so that it is accurate over a wide n-range. I wouldn't begin to claim to now how that might be done, especially in Excel. The spreadsheet was originally designed for a large range of k with millions of candidates over a small range of n. For that it is highly accurate. Even with nmax/nmin = 2, it's not too far off.

The "calculation 1" and "calculation 2" formulas are the key. You may need to contact axn here to ask how he came up with them. There also might be a couple of people here at CRUS who might have insight into how they were derived.

 Mini-Geek 2013-09-30 15:12

[QUOTE=gd_barnes;354617]I agree that the spreadsheet needs some sort of integration (calculus) so that it is accurate over a wide n-range. I wouldn't begin to claim to now how that might be done, especially in Excel. The spreadsheet was originally designed for a large range of k with millions of candidates over a small range of n. For that it is highly accurate. Even with nmax/nmin = 2, it's not too far off.[/QUOTE]

Another option is [URL="http://www.mersenneforum.org/showthread.php?p=233457#post233457"]my calcPrimes jar[/URL]. It uses the math behind the spreadsheet, along with simple arithmetic going through each number in the given input file. It should be highly accurate.

 TheCount 2013-10-04 09:01

[QUOTE=gd_barnes;354538]On our "difficulty" stats, our programming guru, Mark (rogue), uses a sieve to P=1M, which is very clearly accurate enough for determing such a stat.[/QUOTE]
What range does rogue sieve for determining the "difficulty" stats? For Nash weight you sieve the range 100,001 to 110,000.

To determine the weight for a random number I can just sieve "k*n-1" for different k values (first 10,000 say) and find the average. srsieve does not support integer sequences of this type though.

I note R916 fell pretty quickly, and with a double hit.

 rogue 2013-10-04 11:00

[QUOTE=TheCount;355202]What range does rogue sieve for determining the "difficulty" stats? For Nash weight you sieve the range 100,001 to 110,000.

To determine the weight for a random number I can just sieve "k*n-1" for different k values (first 10,000 say) and find the average. srsieve does not support integer sequences of this type though.

I note R916 fell pretty quickly, and with a double hit.[/QUOTE]

IIRC, it sieves to 1e6, but also eliminates numbers with algebraic factorizations. The difficulty is a rough guess to the number of tests needed at 1e12. The reason it sieves to 1e6 is because many of the k are larger than 1e5. It isn't perfect, but hopefully more accurate.

 gd_barnes 2013-10-04 12:19

[QUOTE=rogue;355210]IIRC, it sieves to 1e6, but also eliminates numbers with algebraic factorizations. The difficulty is a rough guess to the number of tests needed at 1e12. The reason it sieves to 1e6 is because many of the k are larger than 1e5. It isn't perfect, but hopefully more accurate.[/QUOTE]

That's sieving P-depth. I think he wants to know what n-range that it sieves. In other words, does it sieve n=100001-110000 like for Nash weight? I'm not sure about that. Do you remember?

On the P-depth, it does sieve to P=1e6 and then uses a multiplier to estimate a sieve to P=1e12...and to me it's more than a "rough guess" on the number of tests at P=1e12. I've found it to be very accurate.

 rogue 2013-10-04 14:25

[QUOTE=gd_barnes;355215]That's sieving P-depth. I think he wants to know what n-range that it sieves. In other words, does it sieve n=100001-110000 like for Nash weight? I'm not sure about that. Do you remember?

On the P-depth, it does sieve to P=1e6 and then uses a multiplier to estimate a sieve to P=1e12...and to me it's more than a "rough guess" on the number of tests at P=1e12. I've found it to be very accurate.[/QUOTE]

Yes, that is correct, it uses that range for n. It's good to know that you have found it to be very accurate. Of course you would have let me know if it were wildly off. :smile:

 TheCount 2013-10-04 18:22

OK, so the "n per 25000 at 1e12" value in the Unproven Conjectures table is derived by calling:
> srsieve -n 100001 -N 110000 -P 1e6 "k*b^n+/-1"
For each k, multiplying the results by 1.25, and adding them all together.

The point is that P=1e6 gives a more accurate weight than P=511, so I will use that from now on.

 gd_barnes 2013-10-04 20:04

[QUOTE=TheCount;355239]OK, so the "n per 25000 at 1e12" value in the Unproven Conjectures table is derived by calling:
> srsieve -n 100001 -N 110000 -P 1e6 "k*b^n+/-1"
For each k, multiplying the results by 1.25, and adding them all together.

The point is that P=1e6 gives a more accurate weight than P=511, so I will use that from now on.[/QUOTE]

To clarify, he runs srsieve only one time for each base, which sieves all k's at once usually in just a few seconds. There is no multiple runs for all k's followed by adding them all together. Actually, srsieve gets run once only for bases that have changed since the last time the page was updated. It's very efficient.

 TheCount 2013-10-06 02:18

Part of the reason I like prime finding is the fun of learning some new maths and number theory.
In the Proth/Riesel probability equation I am working on the weight w was the Nash weight divided by 1751.542 giving a coefficient of 1.00 for the average k of base 2.
The trouble with that is it assumes base 2 has the same density of primes as a randomly chosen set of odd numbers of the same magnitude.
So I want to find weight of the a randomly chosen set of odd numbers of the same magnitude as k*b^n-1.

I tried by finding the weight of the term k*n-1.
I could just average sieve runs for the first 100,000 k's but sieve programs don't deal with the term k*n-1.
Instead I used Prime Number Theory which says that the chance of a random integer x being prime is about 1/log x
For x = k*n-1, chance is about 1/log (k*n-1)

To find the probability for a fixed k over a given range we integrate from n = A to n = B.
Integrating 1/log (k*n-1) [URL="http://integrals.wolfram.com/index.jsp?expr=1%2F+Log[kx-1]&random=false"]gives[/URL] li(k*n-1)/k, where li() is the logarithmic integral.
So the number of primes, or fraction thereof, expected in a given range n = A to B for k*n-1 is:
(li(k*B-1)-li(k*A-1))/k

There is a logarithmic integral calculator here:
[URL]http://keisan.casio.com/exec/system/1180573428[/URL]

Now I use this for R620, where b=620 and k=20.
The Unproven Conjectures table has the P=1e6 weights in the n per 25000 at 1e12 column if you divide by 1.25.
For R620 the P=1e6 weight is 1007/1.25=805.6
The weight of the term 20*n-1 is (li(k*B-1)-li(k*A-1))/k = (li(20*110000-1)-li(20*100001-1))/20 = 686.885
So w = 805.6/686.885 = 1.1728

This compares to my old version where the Nash weight was 1855, so w = 1855/1751.542 = 1.0591

So the probability increases from 11.42% to 12.64% using this new w.
This is still too low compared to 19.5%. I'll have to keep thinking on it. :stirpot:

 henryzz 2013-10-06 17:09

1 Attachment(s)
TheCount's calculations fail for b != 2 as base 2 is assumed in the calculation of 1751.

I have spent today delving though the paper linked in [URL]http://www.mersenneforum.org/showpost.php?p=4563&postcount=34[/URL] and have implemented the method it uses in the spreadsheet. It produces the same results as the original method.
I also extended both methods to support ranges of n based on integration. At some point in the future I might add ranges of k as well.

In the attached spreadsheet the first tab is the original method. The second is the new method and also includes a proth weight calculation. It wouldn't surprise me if either the new or old methods have regions where they give better/worse estimates than the other. If you find they are different for certain input then please let me know. I have been unable to find an example.
I left it set up for R620 using Gary's data sieved to 5T.

 TheCount 2013-10-13 09:08

Thanks for that post henryzz. I will read the paper you linked to see what a real Mathematician thinks.

I noted with the integrated probability equation that if you make the reasonable assumptions: ln k << B ln b, and ln k << A ln b, (where w is the weight, b is the base, k is the k, testing range from n=A to n=B)
and use some logarithmic identities that the equation: #Primes = w * (ln(ln k + B*ln b) - ln(ln k + A*ln b))/ln b
becomes #Primes = w * log[sub]b[/sub]B/A
Quote a bit simpler!

Also I used the term k*b*n-1 for the weight of a random set of odd numbers of the same magnitude and I got 18.23% for R620. Much closer to 19.5%.
I'll compare all the 1 k's for the old and new methods in your spreadsheet and see what I can find.

 henryzz 2013-10-14 14:12

In this kind of spreadsheet formula simplification of that form isn't necessary. I would prefer the accuracy myself.

 TheCount 2013-10-17 12:44

1 Attachment(s)
Well, I completed checking the probability formulas for the 1k's. What I found is that:
- henryzz new method is the same as the old. Same inputs and results.
- these methods require the range to be pre-sieved. Only 21% of the 1k's are sieved.
- my new method don't require pre-sieving and is 98.9% correlated on the Riesel side and 97.0% correlated on the Sierpinski side.
- I don't know which method is more accurate.

The weight (w) I used in the spreadsheet column O, attached, is:
w = k*b*P[sub]1e6[/sub] / (li(110,000*k*b)-li(100,001*k*b))

Where li(x) ∼ x/ln(x) + x/ln(x)² + 2x/ln(x)³ + 6x/ln(x)⁴ + 24x/ln(x)⁵ + ...

[ATTACH]10364[/ATTACH]

[I]In the early 1970s, following a counting session, the Count would laugh maniacally, "AH AH AH AH AH!"[/I]

 TheCount 2013-10-27 09:19

[QUOTE=henryzz;356192]In this kind of spreadsheet formula simplification of that form isn't necessary. I would prefer the accuracy myself.[/QUOTE]

I am working on probabilities for the 2k's.
I've been checking P1e6 weights and I got different values compared to column "n per 25000 at 1e12"/1.25 in the "[URL="http://www.noprimeleftbehind.net/crus/vstats_new/crus-unproven.htm"]Unproven Conjectures[/URL]" table for these Bases:
R463, R696, R774, R588, R828, S140, S533
I've been using >srsieve -n 100001 -N 110000 -P 1e6 equation_file.txt
Where equation_file.txt has the equation with each k on a separate line.
All the other 83 2k's bases matched.
Maybe there's an error in the scripts?

 gd_barnes 2013-10-27 09:42

I am working on probabilities for the 2k's.
I've been checking P1e6 weights and I got different values compared to column "n per 25000 at 1e12"/1.25 in the "[URL="http://www.noprimeleftbehind.net/crus/vstats_new/crus-unproven.htm"]Unproven Conjectures[/URL]" table for these Bases:
R463, R696, R774, R588, R828, S140, S533
I've been using >srsieve -n 100001 -N 110000 -P 1e6 equation_file.txt
Where equation_file.txt has the equation with each k on a separate line.
All the other 83 2k's bases matched.
Maybe there's an error in the scripts?[/QUOTE]

Are you using the latest version of srsieve? All of these bases have k's that are perfect squares (Riesel) or perfect cubes (Riesel & Sierp). Where the k is a perfect square, all n==(0 mod 2) are removed and where k is a perfect cube, all n==(0 mod 3) are removed due to algebraic factors. The latest version of srsieve takes this into account.

That is one thing that Mark and I checked closely and we went through a couple of rounds of corrections when the pages were implemented.

We do appreciate any kind of error checking by anyone anywhere on the web pages or 1st postings in several of our threads. There are a large number of things that have to remain synced up and usually at some point, errors and inconsistencies will creep in.

 TheCount 2013-10-27 12:53

I was using srsieve_0.6.17 which has been the latest version since May 31, 2010 as far as I can tell.
Yes all these bases have one or two k being a perfect cube or square.
Does srsieve need some special flag to take account of perfect cubes or squares?
Or do I need to do another step before/after using srsieve?
For R463:
>srsieve -n 100001 -N 110000 -P 1e6 "216*463^n-1", I get 269 terms remaining
>srsieve -n 100001 -N 110000 -P 1e6 "356*463^n-1", I get 252 terms remaining
For R696:
>srsieve -n 100001 -N 110000 -P 1e6 "152*696^n-1", I get 705 terms remaining
>srsieve -n 100001 -N 110000 -P 1e6 "225*696^n-1", I get 1014 terms remaining
For R774:
>srsieve -n 100001 -N 110000 -P 1e6 "25*774^n-1", I get 671 terms remaining
>srsieve -n 100001 -N 110000 -P 1e6 "30*774^n-1", I get 447 terms remaining
For R588:
>srsieve -n 100001 -N 110000 -P 1e6 "3*588^n-1", I get 795 terms remaining
>srsieve -n 100001 -N 110000 -P 1e6 "16*588^n-1", I get 664 terms remaining
For R828:
>srsieve -n 100001 -N 110000 -P 1e6 "64*828^n-1", I get 404 terms remaining
>srsieve -n 100001 -N 110000 -P 1e6 "68*828^n-1", I get 676 terms remaining
For S140:
>srsieve -n 100001 -N 110000 -P 1e6 "8*140^n+1", I get 328 terms remaining
>srsieve -n 100001 -N 110000 -P 1e6 "16*140^n+1", I get 642 terms remaining
For S533:
>srsieve -n 100001 -N 110000 -P 1e6 "38*533^n+1", I get 747 terms remaining
>srsieve -n 100001 -N 110000 -P 1e6 "64*533^n+1", I get 691 terms remaining
The values I get are always higher, so it's consistent that more factors needing to be eliminated.

Keeping on top of all these conjectures is a really big task and you and others obviously put in a massive effort.
I am sure everyone appreciates it. :tu:

 TheCount 2013-10-28 04:46

Splitting into a separate thread was a good idea. I was getting offtrack from 1k's.

I got hold of srsieve v 1.0.7 and it provides the correct factors remaining for those bases with perfect cubes or squares.

My next question is should we be using srsieve_1.0.7 or the srsieve_0.6.17 that comes in the CRUS Pack for prime-search efforts for the conjectures?

 gd_barnes 2013-10-28 05:06

[QUOTE=TheCount;357690]
My next question is should we be using srsieve_1.0.7 or the srsieve_0.6.17 that comes in the CRUS Pack for prime-search efforts for the conjectures?[/QUOTE]

The latest version has shown to be bug free. You can use it.

 TheCount 2013-11-03 02:11

I've done the probabilities for the 3k's.
I am going to put the results on a [URL="http://rogerkarpin.wix.com/thecount123ks"]web page[/URL].

I checked the P1e6 weights and I got different values for R814 (k=14,44,128) and R298 (k=27,92,105).
I am using srsieve v 1.0.7 so cubes etc. should be taken care of.

For R814:
>srsieve -n 100001 -N 110000 -P 1e6 "14*814^n-1", I get 522 terms remaining
>srsieve -n 100001 -N 110000 -P 1e6 "44*814^n-1", I get 869 terms remaining
>srsieve -n 100001 -N 110000 -P 1e6 "128*814^n-1", I get 964 terms remaining
Adds up to 2355 but expected total 2305 based on "Unproven Conjectures" table.

For R298:
>srsieve -n 100001 -N 110000 -P 1e6 "27*298^n-1", I get 279 terms remaining
>srsieve -n 100001 -N 110000 -P 1e6 "92*298^n-1", I get 297 terms remaining
>srsieve -n 100001 -N 110000 -P 1e6 "105*298^n-1", I get 735 terms remaining
Adds up to 1311 but expected total 1276 based on "Unproven Conjectures" table.

All the other 2 and 3k's match.

 TheCount 2014-08-17 22:38

1 Attachment(s)
I've updated my chance spreadsheet after the recent large update by Gary.
I was posting on a webpage with Wix, but its very fiddly and time consuming. Wix does not support tables well. So decided to just post the spreadsheet instead.

Shading shows reservation, same as the CRUS website.
1k's, 2k's, 3k's, 4k's and 13k's chances are calculated.
The "Next Test End Range" is 100k more than the current "Test Limit" but you can set this to whatever you like.
Best chance for a S/R k is highlighted yellow.
For >1k's I also calculate the chance any of the k's will be prime and the chance of all the k's being primed in the next test range.
The error column is a check of the Weight against the CRUS website value. Shaded Salmon colour if it doesn't match.
For the 1k's I also calculate the chance using the Nash weight (sieve for n=100001 to 110000 to P=511) in columns R, S, T.
In columns U to AB I also calculate the chance the same way as the "odds of prime" spreadsheet. Only one in 5 of the 1k's are currently sieved to make a comparison.
For the 19 1k's where we have all 3 chance values it appears that the deeper you sieve the more the chance value seems to approach an asymptote, i.e. the more work you do to refine your chance value the more accurate it will be.

 LaurV 2014-08-18 03:01

[offtopic] Excel Tips (I consider myself an Excel Expert :P)
you can use the "count" function (no pun intended :smile:) and get rid of the last two columns; also get rid of the two empty spreadsheets, etc. and few other tricks, then the size of the file is about half. Which is good when you attach files to a forum like this with limited quota.
[/offtopic]

 TheCount 2014-09-08 15:26

"count" function! Hha Ha ha

Yes is trivial. I use Open Office so I don't pay an MS tax for every install. Using count function and removing tabs with Open Office doesn't materially reduce the file size for me. I posted a zipped version. I think its worth the quota.

Look up "The Count Censored" on You Tube for a laugh.

 All times are UTC. The time now is 07:48.