mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Probability & Probabilistic Number Theory (https://www.mersenneforum.org/forumdisplay.php?f=115)
-   -   Expected number of primes in OEIS A007908 (https://www.mersenneforum.org/showthread.php?t=20535)

 ewmayer 2015-10-09 01:50

Expected number of primes in OEIS A007908

Recently Neil Sloane (curator of the OEIS) sent a message to the NMBRTHRY mailing list re. the sequence [url=https://oeis.org/A007908]A007908[/url]:

[quote]From: Neil Sloane [e-mail redacted to thwart site-scrapers]
Date: September 29, 2015 6:16:17 PM PDT
Subject: lovely open problem

To Number Theory List,
Consider the sequence with nth term equal to the
concatenation of the decimal numbers 1234...n ([url]https://oeis.org/A007908[/url]).
When is the first prime? The comments in A007908 say
that there should be infinitely many primes, and that there
are no primes among the first 64000 terms.
in A007908 saying that there are no primes among terms X through Y,
or, of course, that n = Z gives a (probable) prime, which would be
pretty exciting.

Best regards
Neil[/quote]
Based on the sequence comments, I see our own Charles Greathouse has done some work on the above.

Observations:

1. Based on the trivial observation that only terms ending in 1,3,7,9 have chance of being prime, of the first (say) 100 sequence terms, only 40 can possibly be prime, but in fact less than half of the 40 can be prime because

2. ...at least 2 of every 1/3/7/9-ending quartet are divisible by 3, and for some quartets every member is divisible by 3. Specifically the divisibility pattern for such 1/3/7/9-ending quartets is in the form of repeating triplets (where 0 indicates != 0 (mod 3), 1 indicates divisible by 3) ...0101 1010 1111..., thus precisely 4 of every 30 sequence terms starting with the first 30 can possibly be prime. (This is not terribly difficult to prove, but I'll let readers confirm it for themselves, as it's a fun little bit of maths.)

3. The factorizations of the remaining not-div-by-3 terms appear to be 'random', i.e. modelable by the statistics of randomly chosen odd integers of similar size.

4. Using [1-3] plus a few more simple observations and some basic number theory we can generate an expected number (or density, if one prefers) of primes for the sequence. However when I do this I get a result which is somewhat at odds with Charles' comment in the notes:
[quote]I checked that there are no primes in the first 5000 terms. Heuristically there are infinitely many, about 0.5 log log n through the n-th term.[/quote]
(I PMed CRG about the math behind his estimate but got no reply as yet.)

Here is what I get:

The odds of a randomly selected odd integer x being prime is ~2/ln(x) ... summing this for the odds-not-divisible-by-3 for terms 1-30, 31-60 and 61-90, &c, each of which intervals contains 4 possibly-prime terms, we get the expected #primes for the first few of said intervals to be:

1- 30: 0.22749
31- 60: 0.05150
61- 90: 0.02700
91-120: 0.01809
121-150: 0.01242
151-180: 0.00939
181-210: 0.00755

I did the above by hand (assisted by Pari) ... at this point in order to investigate the convergence (or not) of the estimated #primes I wrote some simple Pari code for playing with this sequence - uncomment the if(isprime...) code snip just below the update of nexpect if you want to check for prime terms, at the cost of drastically increased runtime:
[code]
ilog10 = 1/log(10);
n = 1; i = 2; nexpect = 0.;
while(i < 1000000,\
ndd = ceil(log(i+0.5)*ilog10); /* Need + 0.5 so e.g. ndd(100) comes out = 3 rather than 2 */\
pow10 = 10^ndd;\
n = pow10^2*n + pow10*i + (i+1);\
if(i%1000 == 0,\
print("i = ",i,"; nexpect = ",nexpect);\
);\
if(i%10 != 4, /* Skip terms divisible by 5 */\
if(n%3 != 0, /* Skip terms divisible by 3 */\
nexpect += 2/log(n);\
/* if(isprime(n),\
print("i = ",i+1,": n prime!");\
); */\
);\
);\
i += 2;\
);\
[/code]
(Hit <return> after pasting into a Pari shell to begin execution, and type 'nexpect' on loop exit to see the final value.)

Here are the results for successive powers of 10 from 10^3 to 10^6 - I use logarithmically constant increments here because if the resulting increments in the expectation value decrease from one power of 10 to the next we at least have hope that there may be a limit at infinity:

10^3: 0.4206922620678406265572242819
10^4: 0.4959359595134930290178514034
10^5: 0.5545675055579183966439241436
10^6: 0.6026039035873964125108005995

One might expect the summation to diverge as n --> oo based on divergence of the harmonic series - note that even knocking out fixed patterns of terms from the harmonic as we do here using divisibility patterns - does not alter the divergence property.

The reason I think the present summation may in fact converge is that due to increasing digit length of the appended numbers, the terms grow faster than log(T_n) ~ n. Thus, rather than the expected #primes being given by a harmonic sum(1/n), which diverges, it is rather given by something which is perhaps like sum(1/n^a) (a.k.a. the p-series or hyperharmonic series) or sum(1/(n * log(n)^a)) (a.k.a. the ln-series, where the summation starts at n = 2 rather than n = 1) with a > 1, both of which converge for all a > 0. Actually, on second thought, considering the logarithmic growth rate of appended numbers, perhaps log(T_n) ~ n log(n) (i.e. the expectation value governed by the ln-series with a = 1, which does in fact diverge, albeit slowly) is the correct asymptotic estimate.

However, even if the sum does diverge, it does so sufficiently slowly that the absence of primes in the ranges tested to date should not be surprising.

It's certainly an interesting problem, in any event - comments, corrections, further insights appreciated!

 science_man_88 2015-10-09 02:21

[QUOTE=ewmayer;412273]Recently Neil Sloane (curator of the OEIS) sent a message to the NMBRTHRY mailing list re. the sequence [url=https://oeis.org/A007908]A007908[/url]:
[/QUOTE]

and :

[QUOTE="https://oeis.org/A007908"]Elementary congruence arguments show that primes can occur only at indices congruent to 1, 7, 13, or 19 mod 30. - Roderick MacPhee, Oct 05 2015[/QUOTE]

 ewmayer 2015-10-09 04:00

Thanks, sm88 - did not see that thread previously, but see nothing useful about #primes estimation in it. The MacPhee comment restates my note about only 4 n's per 30 possibly yielding a prime.

 science_man_88 2015-10-09 12:32

you can make your Pari code even simpler other than to knock out the remaining multiples of five you can do:

[CODE]forstep(x=1,10000,6,...)[/CODE]
or:
[CODE]parfor(x=1,10000,...;if(x%5!=4,...))[/CODE]

for example.

because all the indexes that aren't even and don't divide by 3 are of form 6y+1 and then you can show they are 1,7,13,19,25 mod 30 and 25 mod 30 can always be taken away for the index because it will always end in 5.

 ewmayer 2015-10-17 21:24

The prime-odds have just been raised! (Actually the odds remain the same, it is my estimate thereof which has increased.)

[QUOTE=ewmayer;412273]3. The factorizations of the remaining not-div-by-3 terms appear to be 'random', i.e. modelable by the statistics of randomly chosen odd integers of similar size.[/QUOTE]

That statement neglects that by construction, the remaining terms are not divisible by 3 or 5, i.e. if the factorization is indeed random (or at least random as far as aggregate odds are concerned), it is so w.r.to primes > 5. Thus my Pari-loop estimates all need to multiplied by (3/2)*(5/4) = 15/8, i.e. are nearly doubled.

Of course those are the naive odds - the conditional odds given [url=http://www.mersenneforum.org/showthread.php?goto=newpost&t=20527]no primes found below 10^5[/url] are still quite low for the 10^5 - 10^6 range, around 10%.

 chalsall 2015-10-17 21:32

[QUOTE=ewmayer;412957]Of course those are the naive odds - the conditional odds given [url=http://www.mersenneforum.org/showthread.php?goto=newpost&t=20527]no primes found below 10^5[/url] are still quite low for the 10^5 - 10^6 range, around 10%.[/QUOTE]

I resonate with what you say. (:smile: :wink:)

 CRGreathouse 2015-11-10 16:33

My reasoning for the heuristic given above:

A random number would be coprime to 30 for 8/30 of the time. But if you look at the concatenations mod 30 there are only 4 which lead to numbers coprime to 30, so these numbers would be expected to be prime half as often as random numbers. Then all you need is to find the size of the n-th term, which is about (n times the average length of a number from 1 to n) decimal digits, or roughly 10^(n*log_{10} n) = exp(n log n), so putting the two together the expected 'chance' that the n-th concatenation is prime is 0.5 n log n. Now integrate from n = 1 to x and you get 0.5 log log x.

 All times are UTC. The time now is 07:04.