mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Math

Reply
 
Thread Tools
Old 2008-03-08, 07:45   #1
robert44444uk
 
robert44444uk's Avatar
 
Jun 2003
Oxford, UK

19×103 Posts
Default Twin primes expected first k formula

I am having problems trying to find an empirical formula to calculate the expected k value of the first twin prime for a given n in k*2^n+/-1. Can anyone help?
robert44444uk is offline   Reply With Quote
Old 2008-03-08, 20:33   #2
m_f_h
 
m_f_h's Avatar
 
Feb 2007

24×33 Posts
Default

Quote:
Originally Posted by robert44444uk View Post
I am having problems trying to find an empirical formula to calculate the expected k value of the first twin prime for a given n in k*2^n+/-1. Can anyone help?
seems at least highly nontrivial to me...
but I don't know if the specification of the problem is sufficiently precise...
what does "expected" mean here ?
there must be some averaging done - on what ? should the estimation just take into account the size of n? or some properties like numdiv(), ... ?
m_f_h is offline   Reply With Quote
Old 2008-03-09, 01:37   #3
wblipp
 
wblipp's Avatar
 
"William"
May 2003
New Haven

23×103 Posts
Default

Do you mean empirical or heuristic?

For a quick and dirty heuristic we can ignore the additional information given by k, and observe the probability that any two odd numbers x and x+2 are a twin pair is 4C/ln(x)^2, so the expected value of k is approximately

0.182 n2

For a more refined estimate we can argue analogously to the twin prime constant. The twin prime constant can be developed by the argument that
the probability that either number is prime is 1/ln(x). But the probabilities aren't independent. For p>2, the probability each is not divisible by p is (p-1)/p, so the independent approximation would have the probability that neither is divisible by p as ((p-1)/p)2. But the real probability that neither is divisible by p is (p-2)/p. So the independent probability needs to increased by the factor p*(p-2)/(p-1)2 to compensate for the error in divisibility by p. Doing the same thing for all values of p gives the twin prime constant. p=2 must be argued separately
The presence of "k" in the fomula changes this argument to:
the probability that either number is prime is 1/ln(x). But the probabilities aren't independent. For p>3, the probability each is not divisible by p is (p-1)/p, so the independent approximation would have the probability that neither is divisible by p as ((p-1)/p)2. But the real probability that neither is divisible by p depends on k. With probability 1/p, p divides k and hence divides neither number. With probability (p-1)/p p does not divide k. In this case there are p-1 values near x with one divisible p, so the probability neither of our pair is divisible by p is (p-3)/(p-1). Hence the probability neither of these is divisible by p is

1/p * 0 + (p-1)/p * (p-3)/(p-1) = (p-3)/p

So the independent probability needs to increased by the factor p*(p-3)/(p-1)2 to compensate for the error in divisibility by p. Doing the same thing for all values of p gives an expression like the product expression for the twin prime constant, but with p-2 changed to p-3. p=2 and p=3 must be argued separately.
So you could evaluate this adjusted twin constant. Or, if you really meant empirical rather than heurisitc, you could fit the known values of k to k(n) = x * n2.
wblipp is offline   Reply With Quote
Old 2008-03-09, 07:20   #4
robert44444uk
 
robert44444uk's Avatar
 
Jun 2003
Oxford, UK

19×103 Posts
Default

Thank you for your responses.

Apologies for using empirical loosely. Actually I mean empirical as well as heuristic!

http://www.mersenneforum.org/showthr...211#post128211 posts 33 and 41 have the factual first k for n from 1 to 2000.

Using 0.182 n^2 has the advantage that it is simple and depends only on n, but it appears to give results that are too low. For example, at n=2000, the formula gives k=728000, but a moving average over the previous 50 n gives a k value in excess of 1 million for all n>1740.

An important observation is that it is a polynomial relationship to k and not logarithmic relationship.

The dependence on k only comes from the fact that we are looking for the first k. This k has the additional property that no k smaller than it provides a twin. So I would think some formula might exist that would have to evaluate the product of probabilities of a twin that every k up the chosen value has, and then deducing a multiplier constant to the resulting probability product, to match the average observed values. How one does this I do not know.

In terms of probability, one empirical hunch I had is that A=ln(ln(k/n)) --> 2. 87% of A values for n from 1501 to 2000 fall in the range 1.5 to 2.1 most exceptions between 2 and 2.1 and the trend of A values looks to be increasing quite slowly - at 2000 A is approx 1.73 on a 50 moving average and 1.77 on a logarithmic trend line.

Regarding variables, I am looking for a formula that depends only on n.

My reason for pursuing this is to prove (at least to myself) that randomly targeting 333333 for example, is not the most efficient way to find a large twin. To begin with, 333333 might be a rogue n with no small k values that are twins. The A>1.5 observation, suggests that it is not worth testing small k, which has been done by the Twin Prime Search. It might be more sensible to start testing for a twin at close to the expected k and then expanding out (up and down) from there. Certainly testing more than one n seems sensible, as it is possible to apply portfolio theory to overcome the reliance on one n.
robert44444uk is offline   Reply With Quote
Old 2008-03-17, 13:36   #5
robert44444uk
 
robert44444uk's Avatar
 
Jun 2003
Oxford, UK

19×103 Posts
Default

And here is another thing -

An analysis of the first k for n from 1 to 3000 shows a preponderance for 0mod5 and 0mod7 and a lesser preponderance towards 0mod11.....

I can understand that k should be 0mod3, (except for n=2), otherwise, one of k*2^n+/-1 will be 0mod3 if the other is prime

k=xmod5:

x count
0 1016
1 489
2 498
3 487
4 510

I suppose this is because k=5#.Y/2 where Y is some integer, guarantees that, if k*2^n-1 is prime, then k*2^n+1 is not 0mod3 whereas any other random k might produce this condition on average 50% of the time.

Results for xmod7:

0 591
1 387
2 400
3 428
4 417
5 403
6 374

and xmod11:

0 335
1 275
2 270
3 258
4 289
5 270
6 262
7 225
8 256
9 296
10 264

Last fiddled with by robert44444uk on 2008-03-17 at 13:45
robert44444uk is offline   Reply With Quote
Old 2008-03-24, 12:03   #6
robert44444uk
 
robert44444uk's Avatar
 
Jun 2003
Oxford, UK

36458 Posts
Default

An analysis of the first 4000 k values, with found first twin k=y and n=x provides "best fit" equations, according to Microsoft Excel:

y = 0.3049x^1.9514 with R2 = 0.6926

The power series seems to consistently underestimate the moving average (over 100n), which appears to increasing along its own curve which mimics the polynomial:

y = 0.2566x^2 + 383.78x - 174906 (with R2 = 0.2964) rather well.

(I am mystified why the power equation produces such a high R2 compared to the polynomial)

Plotting the moving average (over previous 100 n) provides a best fit:

y = 0.1498x^2.1119 with R2 = 0.9947

And average over 100n looking forward 50n and back 50n:

y = 0.5442x^1.951 R2 = 0.9956

I am taking the actual data to much higher levels to see whether I can get a better feel for it.

When the values are viewed graphically they appear like atoms bubbling off a substrate, especially when a relatively narrow bank of values is taken, such as, say n=3000 to n=4000

Last fiddled with by robert44444uk on 2008-03-24 at 12:04
robert44444uk is offline   Reply With Quote
Old 2008-03-25, 11:37   #7
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22×5×373 Posts
Default

Quote:
Originally Posted by robert44444uk View Post
An analysis of the first 4000 k values, with found first twin k=y and n=x provides "best fit" equations, according to Microsoft Excel:

y = 0.3049x^1.9514 with R2 = 0.6926

The power series seems to consistently underestimate the moving average (over 100n), which appears to increasing along its own curve which mimics the polynomial:

y = 0.2566x^2 + 383.78x - 174906 (with R2 = 0.2964) rather well.

(I am mystified why the power equation produces such a high R2 compared to the polynomial)

Plotting the moving average (over previous 100 n) provides a best fit:

y = 0.1498x^2.1119 with R2 = 0.9947

And average over 100n looking forward 50n and back 50n:

y = 0.5442x^1.951 R2 = 0.9956

I am taking the actual data to much higher levels to see whether I can get a better feel for it.

When the values are viewed graphically they appear like atoms bubbling off a substrate, especially when a relatively narrow bank of values is taken, such as, say n=3000 to n=4000

You are trying to apply "curve fitting" without any use of the
underlying mathematics. Anyone can fit a curve to data. Whether that
curve is *meaningful* depends whether it models the underlying process(es).

In this case, your curves have no meaning.

I suggest that you look closely at Hardy & Wright's "Introduction to
Number Theory" where they make conjectures about the distribution
of twin primes.

Theoretically, the number of twin primes of the form k*2^n +/- 1
less than N will be O(N/ [log(k*2^n+1)]^2) as either k varies or
as n varies while the other is held fixed. The implied constant in the
O() estimate can be computed (in theory) as a singular infinite product.
Look again at Hardy & Wright's derivation of the twin prime constant.
R.D. Silverman is offline   Reply With Quote
Old 2008-03-25, 12:44   #8
robert44444uk
 
robert44444uk's Avatar
 
Jun 2003
Oxford, UK

7A516 Posts
Default

Bob, thank you for the guidance and direction, I am certain it is spot on. Will look it up for sure from my e-copy of H&W, although I must admit, I am skating at the very edge of what I can actually understand with that book. Large parts of it are unintelligible to the layman! I am slowly getting comfortable with pi(2) given the excellent and confidence building paper by DA Goldston "Are there infinitely many twin primes?"

I will for sure come back when I understand the O and N in your formula

Last fiddled with by robert44444uk on 2008-03-25 at 12:51
robert44444uk is offline   Reply With Quote
Old 2008-03-25, 13:54   #9
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22·5·373 Posts
Default

Quote:
Originally Posted by robert44444uk View Post
Bob, thank you for the guidance and direction, I am certain it is spot on. Will look it up for sure from my e-copy of H&W, although I must admit, I am skating at the very edge of what I can actually understand with that book. Large parts of it are unintelligible to the layman!
Yes. I quite agree. Ready H&W requires a lot of mathematical maturity
and will be hard to read for a "layman".

What I say now is no reflection on you personally:

I argue that a "layman" trying to do foundation-less curve
fitting to a mathematical process that the "layman" does not understand
is a case of "trying to run before you walk". It is NOT something that
a "layman" can reasonably do, any more than a "layman" can do (say)
brain surgery.
R.D. Silverman is offline   Reply With Quote
Old 2008-03-26, 04:32   #10
robert44444uk
 
robert44444uk's Avatar
 
Jun 2003
Oxford, UK

36458 Posts
Default

Tried reading H&W last night, but maybe I should try in the mornings, and not with a glass of wine in my hand.

In answer to Bob, Maybe that is why I posted this here, knowing that it was beyond my abilities! If I knew how to arrive at the answer, why would I post? That is why I really appreciate the responses to the post.

The reason for curve fitting is to ensure, at least in my mind, that a formula of the kind A*n^2, which has basis in the underlying mathematics, would be appropriate. The fact that I am getting best fit curve fitting of the form a*n^1.95 and b*n^2.11 suggest that in fact, A*n^2 is probably the underlying formula, even if it does not give best fit, it would be close.

wblipp has placed A=0.182 with an underlying theoretical basis involving the Twin Prime Constant, but the results suggest A is in fact larger than that. Curve fitting over many thousands of values will at least give an approximation to A that can be retrofitted into a theory, I would have thought.

I have the confidence I think, in the next set of curve fitting to search for a value of A. I will do this by fitting a curve A*n^2 to moving average values that look forward 50 values and back 50 values. At least I will establish something to mull over.

Last fiddled with by robert44444uk on 2008-03-26 at 04:34
robert44444uk is offline   Reply With Quote
Old 2008-03-26, 07:23   #11
robert44444uk
 
robert44444uk's Avatar
 
Jun 2003
Oxford, UK

19×103 Posts
Default

Line of best fit of the form k=A*n^2 for the first 4621 n values, comparing the moving average of 50 values n-25 to n+25, appears to be A=0.368, which is close to twice wblipps A=0.182

Now it will be interesting to back into the theory

Applying to n=333333, and using A=0.364, then the average expected first prime, on average, would be at k=40.4G. The TPS site is sieving to 100G which is quoted at providing 90% certainty.

Last fiddled with by robert44444uk on 2008-03-26 at 07:59
robert44444uk is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Mersenne Primes p which are in a set of twin primes is finite? carpetpool Miscellaneous Math 3 2017-08-10 13:47
The expected number of primes Batalov Computer Science & Computational Number Theory 5 2016-08-11 01:17
Expected number of primes in OEIS A007908 ewmayer Probability & Probabilistic Number Theory 6 2015-11-10 16:33
I get 13% less primes than I expected:-( mart_r Math 2 2010-10-29 17:31
Odds of prime / expected # of primes gd_barnes Riesel Prime Search 15 2010-10-14 22:00

All times are UTC. The time now is 06:01.


Thu Oct 28 06:01:10 UTC 2021 up 97 days, 30 mins, 0 users, load averages: 1.35, 1.63, 1.60

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.