![]() |
Twin primes expected first k formula
I am having problems trying to find an empirical formula to calculate the expected k value of the first twin prime for a given n in k*2^n+/-1. Can anyone help?
|
[quote=robert44444uk;128139]I am having problems trying to find an empirical formula to calculate the expected k value of the first twin prime for a given n in k*2^n+/-1. Can anyone help?[/quote]
seems at least highly nontrivial to me... but I don't know if the specification of the problem is sufficiently precise... what does "expected" mean here ? there must be some averaging done - on what ? should the estimation just take into account the size of n? or some properties like numdiv(), ... ? |
Do you mean empirical or heuristic?
For a quick and dirty heuristic we can ignore the additional information given by k, and observe the probability that any two odd numbers x and x+2 are a twin pair is 4C/ln(x)^2, so the expected value of k is approximately 0.182 n[sup]2[/sup] For a more refined estimate we can argue analogously to the twin prime constant. The twin prime constant can be developed by the argument that [INDENT]the probability that either number is prime is 1/ln(x). But the probabilities aren't independent. For p>2, the probability each is not divisible by p is (p-1)/p, so the independent approximation would have the probability that neither is divisible by p as ((p-1)/p)[sup]2[/sup]. But the real probability that neither is divisible by p is (p-2)/p. So the independent probability needs to increased by the factor p*(p-2)/(p-1)[sup]2[/sup] to compensate for the error in divisibility by p. Doing the same thing for all values of p gives the twin prime constant. p=2 must be argued separately[/INDENT] The presence of "k" in the fomula changes this argument to: [INDENT]the probability that either number is prime is 1/ln(x). But the probabilities aren't independent. For p>3, the probability each is not divisible by p is (p-1)/p, so the independent approximation would have the probability that neither is divisible by p as ((p-1)/p)[sup]2[/sup]. But the real probability that neither is divisible by p depends on k. With probability 1/p, p divides k and hence divides neither number. With probability (p-1)/p p does not divide k. In this case there are p-1 values near x with one divisible p, so the probability neither of our pair is divisible by p is (p-3)/(p-1). Hence the probability neither of these is divisible by p is 1/p * 0 + (p-1)/p * (p-3)/(p-1) = (p-3)/p So the independent probability needs to increased by the factor p*(p-3)/(p-1)[sup]2[/sup] to compensate for the error in divisibility by p. Doing the same thing for all values of p gives an expression like the product expression for the twin prime constant, but with p-2 changed to p-3. p=2 and p=3 must be argued separately.[/INDENT] So you could evaluate this adjusted twin constant. Or, if you really meant empirical rather than heurisitc, you could fit the known values of k to k(n) = x * n[sup]2[/sup]. |
Thank you for your responses.
Apologies for using empirical loosely. Actually I mean empirical as well as heuristic! [url]http://www.mersenneforum.org/showthread.php?p=128211#post128211[/url] posts 33 and 41 have the factual first k for n from 1 to 2000. Using 0.182 n^2 has the advantage that it is simple and depends only on n, but it appears to give results that are too low. For example, at n=2000, the formula gives k=728000, but a moving average over the previous 50 n gives a k value in excess of 1 million for all n>1740. An important observation is that it is a polynomial relationship to k and not logarithmic relationship. The dependence on k only comes from the fact that we are looking for the first k. This k has the additional property that no k smaller than it provides a twin. So I would think some formula might exist that would have to evaluate the product of probabilities of a twin that every k up the chosen value has, and then deducing a multiplier constant to the resulting probability product, to match the average observed values. How one does this I do not know. In terms of probability, one empirical hunch I had is that A=ln(ln(k/n)) --> 2. 87% of A values for n from 1501 to 2000 fall in the range 1.5 to 2.1 most exceptions between 2 and 2.1 and the trend of A values looks to be increasing quite slowly - at 2000 A is approx 1.73 on a 50 moving average and 1.77 on a logarithmic trend line. Regarding variables, I am looking for a formula that depends only on n. My reason for pursuing this is to prove (at least to myself) that randomly targeting 333333 for example, is not the most efficient way to find a large twin. To begin with, 333333 might be a rogue n with no small k values that are twins. The A>1.5 observation, suggests that it is not worth testing small k, which has been done by the Twin Prime Search. It might be more sensible to start testing for a twin at close to the expected k and then expanding out (up and down) from there. Certainly testing more than one n seems sensible, as it is possible to apply portfolio theory to overcome the reliance on one n. |
And here is another thing -
An analysis of the first k for n from 1 to 3000 shows a preponderance for 0mod5 and 0mod7 and a lesser preponderance towards 0mod11..... I can understand that k should be 0mod3, (except for n=2), otherwise, one of k*2^n+/-1 will be 0mod3 if the other is prime k=xmod5: x count 0 1016 1 489 2 498 3 487 4 510 I suppose this is because k=5#.Y/2 where Y is some integer, guarantees that, if k*2^n-1 is prime, then k*2^n+1 is not 0mod3 whereas any other random k might produce this condition on average 50% of the time. Results for xmod7: 0 591 1 387 2 400 3 428 4 417 5 403 6 374 and xmod11: 0 335 1 275 2 270 3 258 4 289 5 270 6 262 7 225 8 256 9 296 10 264 |
An analysis of the first 4000 k values, with found first twin k=y and n=x provides "best fit" equations, according to Microsoft Excel:
y = 0.3049x^1.9514 with R2 = 0.6926 The power series seems to consistently underestimate the moving average (over 100n), which appears to increasing along its own curve which mimics the polynomial: y = 0.2566x^2 + 383.78x - 174906 (with R2 = 0.2964) rather well. (I am mystified why the power equation produces such a high R2 compared to the polynomial) Plotting the moving average (over previous 100 n) provides a best fit: y = 0.1498x^2.1119 with R2 = 0.9947 And average over 100n looking forward 50n and back 50n: y = 0.5442x^1.951 R2 = 0.9956 I am taking the actual data to much higher levels to see whether I can get a better feel for it. When the values are viewed graphically they appear like atoms bubbling off a substrate, especially when a relatively narrow bank of values is taken, such as, say n=3000 to n=4000 |
[QUOTE=robert44444uk;129572]An analysis of the first 4000 k values, with found first twin k=y and n=x provides "best fit" equations, according to Microsoft Excel:
y = 0.3049x^1.9514 with R2 = 0.6926 The power series seems to consistently underestimate the moving average (over 100n), which appears to increasing along its own curve which mimics the polynomial: y = 0.2566x^2 + 383.78x - 174906 (with R2 = 0.2964) rather well. (I am mystified why the power equation produces such a high R2 compared to the polynomial) Plotting the moving average (over previous 100 n) provides a best fit: y = 0.1498x^2.1119 with R2 = 0.9947 And average over 100n looking forward 50n and back 50n: y = 0.5442x^1.951 R2 = 0.9956 I am taking the actual data to much higher levels to see whether I can get a better feel for it. When the values are viewed graphically they appear like atoms bubbling off a substrate, especially when a relatively narrow bank of values is taken, such as, say n=3000 to n=4000[/QUOTE] You are trying to apply "curve fitting" without any use of the underlying mathematics. Anyone can fit a curve to data. Whether that curve is *meaningful* depends whether it models the underlying process(es). In this case, your curves have no meaning. I suggest that you look closely at Hardy & Wright's "Introduction to Number Theory" where they make conjectures about the distribution of twin primes. Theoretically, the number of twin primes of the form k*2^n +/- 1 less than N will be O(N/ [log(k*2^n+1)]^2) as either k varies or as n varies while the other is held fixed. The implied constant in the O() estimate can be computed (in theory) as a singular infinite product. Look again at Hardy & Wright's derivation of the twin prime constant. |
Bob, thank you for the guidance and direction, I am certain it is spot on. Will look it up for sure from my e-copy of H&W, although I must admit, I am skating at the very edge of what I can actually understand with that book. Large parts of it are unintelligible to the layman! I am slowly getting comfortable with pi(2) given the excellent and confidence building paper by DA Goldston "Are there infinitely many twin primes?"
I will for sure come back when I understand the O and N in your formula |
[QUOTE=robert44444uk;129675]Bob, thank you for the guidance and direction, I am certain it is spot on. Will look it up for sure from my e-copy of H&W, although I must admit, I am skating at the very edge of what I can actually understand with that book. Large parts of it are unintelligible to the layman! [/QUOTE]
Yes. I quite agree. Ready H&W requires a lot of mathematical maturity and will be hard to read for a "layman". What I say now is no reflection on you personally: I argue that a "layman" trying to do foundation-less curve fitting to a mathematical process that the "layman" does not understand is a case of "trying to run before you walk". It is NOT something that a "layman" can reasonably do, any more than a "layman" can do (say) brain surgery. |
Tried reading H&W last night, but maybe I should try in the mornings, and not with a glass of wine in my hand.
In answer to Bob, Maybe that is why I posted this here, knowing that it was beyond my abilities! If I knew how to arrive at the answer, why would I post? That is why I really appreciate the responses to the post. The reason for curve fitting is to ensure, at least in my mind, that a formula of the kind A*n^2, which has basis in the underlying mathematics, would be appropriate. The fact that I am getting best fit curve fitting of the form a*n^1.95 and b*n^2.11 suggest that in fact, A*n^2 is probably the underlying formula, even if it does not give best fit, it would be close. wblipp has placed A=0.182 with an underlying theoretical basis involving the Twin Prime Constant, but the results suggest A is in fact larger than that. Curve fitting over many thousands of values will at least give an approximation to A that can be retrofitted into a theory, I would have thought. I have the confidence I think, in the next set of curve fitting to search for a value of A. I will do this by fitting a curve A*n^2 to moving average values that look forward 50 values and back 50 values. At least I will establish something to mull over. |
Line of best fit of the form k=A*n^2 for the first 4621 n values, comparing the moving average of 50 values n-25 to n+25, appears to be A=0.368, which is close to twice wblipps A=0.182
Now it will be interesting to back into the theory Applying to n=333333, and using A=0.364, then the average expected first prime, on average, would be at k=40.4G. The TPS site is sieving to 100G which is quoted at providing 90% certainty. |
Looking at medians rather than averages - the median of 45 ranges of 100 is 66.3% of the average taken by adding the first ks and dividing by 100, (looks as if it could be approx 2/3rds) which would bring the formula for A down to 0.24, which is still higher than theory.
The median is a better measure. The average top value of k in each range of 100 is 546% of the average, or 824% of the median, so the rogue k are providing a skew in the calculation. The measure now for n=333333 is k=26.7G - checking up to this range would provide 50% chance of finding a twin. Unless I am doing something horribly wrong, then I think it makes more sense to check a wide range of n, and check only low values of k. Why? The 10th percentile in the blocks of 100n has an average k value of 16.5% of the median. Checking 5n values to 20% of the average k value is equivalent to checking one n to the median k value, and there is more chance a value will come up. Similar calculations at 20th, 30th percentile etc are: 10 16.5% 82.4% of effort 20 33.6% 84% of effort 30 51.6% 86% of effort 40 73.3% 91.6% of effort 50 100% 100% 60 131.1% 109.2% 70 179% 128.8% 80 239.6% 149.8% 90 341.3% 189.6% 100 824.1% 412% If the 333333 search is going as high as 90% then it appears that they are wasting a lot of effort. A better strategy, based on this analysis, would be to check 200n (333334 to 333533) up to 0.82% of the calculated k (219M). This would provide close to 100% chance checking only 43.8G of k. Oh, and checking lower values of k is faster! |
[QUOTE=robert44444uk;129769]
The reason for curve fitting is to ensure, at least in my mind, that a formula of the kind A*n^2, which has basis in the underlying mathematics, would be appropriate. The fact that I am getting best fit curve fitting of the form a*n^1.95 and b*n^2.11 suggest that in fact, A*n^2 is probably the underlying formula, even if it does not give best fit, it would be close. .[/QUOTE] What are you measuring? What is n? The first thing in any mathematical effort is to define your variables! What is/are the dependent variable(s)? What is the independent variable? The number of twin primes of the form k*2^n +/-1 less than N can NOT be a quadratic function of N. You say that your formula has a basis in the underlying mathematics. Please explain that basis to us. If we allow both k and n to vary, the number of twin primes less than N of the form k*2^n +/-1 will be given by the double Stieltjes integral: C * integral from 2 to N/2 integral from 2 to lg(N) of (1/log(k*2^n -1))^2 d[k] d[n] where [] designates the floor function. There will be an error term. A smaller error can be obtained by evaluating the double integral via Euler-MacLauren. The constant C will be given by a singular series. This integral has no closed form expression in terms of elementary functions, but can probably be expressed in terms of hypergeometric or polylog functions. |
[QUOTE=R.D. Silverman;129792]What are you measuring? What is n? The first thing in any mathematical
effort is to define your variables! What is/are the dependent variable(s)? What is the independent variable? The number of twin primes of the form k*2^n +/-1 less than N can NOT be a quadratic function of N. You say that your formula has a basis in the underlying mathematics. Please explain that basis to us. If we allow both k and n to vary, the number of twin primes less than N of the form k*2^n +/-1 will be given by the double Stieltjes integral: C * integral from 2 to N/2 integral from 2 to lg(N) of (1/log(k*2^n -1))^2 d[k] d[n] where [] designates the floor function. There will be an error term. A smaller error can be obtained by evaluating the double integral via Euler-MacLauren. The constant C will be given by a singular series. This integral has no closed form expression in terms of elementary functions, but can probably be expressed in terms of hypergeometric or polylog functions.[/QUOTE] I cranked through the integral with Mathematica. Then I realized something. I confess to a mea-culpa. The actual double integral counts ALL twin primes. EVERY twin prime pair, [p, p+2] is of the form [k*2^n - 1, k*2^n+1] for SOME k. Thus, by allowing both k and n to vary we count all of the prime pairs. One needs to fix either k or n to obtain a meaningful result. |
[QUOTE=R.D. Silverman;129792]What are you measuring? What is n? The first thing in any mathematical effort is to define your variables! What is/are the dependent variable(s)? What is the independent variable?
... You say that your formula has a basis in the underlying mathematics. Please explain that basis to us.[/QUOTE] The answers to all the questions and to the mathematical basis for a quadratic solution are in the first three messages of this thread. Quadratic is a reasonable approximation when the fixed n is large, so the value of ln(k*2[sup]n[/sup]-1) is nearly constant. |
[QUOTE=R.D. Silverman;129792]What are you measuring? What is n? The first thing in any mathematical
effort is to define your variables! What is/are the dependent variable(s)? What is the independent variable? [/QUOTE] I am looking at the first k -- call it k(n1) -- to produce a twin in the series k*2^(n1)+/1, n1 fixed, k variable. So k(n1)*2^(n1)+/1 is the first twin for the series n1 fixed, k variable. No k<k(n1) produces a twin. The value of k(n1) is indeterminate from a formula, otherwise twins would b rather easy to spot, but for a small range of n from say, n(a).....n(1)=n(a+49).....n(a+99), then I am purporting that the median value of k(na)....k(n1)....k(n(a+99)) is approximately 0.24*(n1)^2 and the average value approximately 0.364*(n1)^2 Hope this clarifies By the way, Bob, you did not define N or O, had to guess those :smile: Seriously, I really appreciate your interest, I intend to follow up on your comments on double Stieltje's integrals. |
[QUOTE=robert44444uk;129800]I am looking at the first k -- call it k(n1) -- to produce a twin in the series k*2^(n1)+/1, n1 fixed, k variable.
Hope this clarifies By the way, Bob, you did not define N or O, had to guess those :smile: Seriously, I really appreciate your interest, I intend to follow up on your comments on double Stieltje's integrals.[/QUOTE] O() needs no definition, it is standard notation. And I did say "number of twin primes less than N". This defines N. The rest of your work is the same as a very well known problem, only in DISGUISE. By fixing n (or n1 in your notation), the set you are considering has the first prime of the pair in ARITHMETIC PROGRESSION. The question of when the first prime in an A.P. occurs is a very well studied problem. Here, we simply ask for first TWIN prime, when the first prime lies in a specific A.P. The theory is going to be quite similar. (trivially the second prime is also in an A.P. offset by 2 from the first) Do a web search for "first prime in arithmetic progression". |
Your search - "first prime in arithmetic progression" - did not match any documents per Google, but I appreciate where you are coming from, I will look into it.
Shows one should not take things too literally |
I'm doing some work on this as well. I have compiled the data of the 5000 k and n values posted in the Twin Prime Search area, and also did some calculations with excel as to the curve of the average twin result. As he said though, they are pretty much meaningless when trying to use them as a guide.
I'll do some standard deviation calculations for the 80th, 90th, and 95th percentile so as to find a more efficient range to search. If a twin is not found, the previous area can then be checked in short time, and in any case this would have to be done to confirm it is the smallest. But this should at least shorten the time to find twins. I'll post the results in a couple hours, I think. |
| All times are UTC. The time now is 23:22. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.