View Single Post
Old 2015-10-09, 01:50   #1
ewmayer's Avatar
Sep 2002
Rep├║blica de California

2×5×7×167 Posts
Default Expected number of primes in OEIS A007908

Recently Neil Sloane (curator of the OEIS) sent a message to the NMBRTHRY mailing list re. the sequence A007908:

From: Neil Sloane [e-mail redacted to thwart site-scrapers]
Date: September 29, 2015 6:16:17 PM PDT
Subject: lovely open problem

To Number Theory List,
Consider the sequence with nth term equal to the
concatenation of the decimal numbers 1234...n (
When is the first prime? The comments in A007908 say
that there should be infinitely many primes, and that there
are no primes among the first 64000 terms.
If you would like to help with this search, you could leave a comment
in A007908 saying that there are no primes among terms X through Y,
or, of course, that n = Z gives a (probable) prime, which would be
pretty exciting.

Best regards
Based on the sequence comments, I see our own Charles Greathouse has done some work on the above.


1. Based on the trivial observation that only terms ending in 1,3,7,9 have chance of being prime, of the first (say) 100 sequence terms, only 40 can possibly be prime, but in fact less than half of the 40 can be prime because

2. least 2 of every 1/3/7/9-ending quartet are divisible by 3, and for some quartets every member is divisible by 3. Specifically the divisibility pattern for such 1/3/7/9-ending quartets is in the form of repeating triplets (where 0 indicates != 0 (mod 3), 1 indicates divisible by 3) ...0101 1010 1111..., thus precisely 4 of every 30 sequence terms starting with the first 30 can possibly be prime. (This is not terribly difficult to prove, but I'll let readers confirm it for themselves, as it's a fun little bit of maths.)

3. The factorizations of the remaining not-div-by-3 terms appear to be 'random', i.e. modelable by the statistics of randomly chosen odd integers of similar size.

4. Using [1-3] plus a few more simple observations and some basic number theory we can generate an expected number (or density, if one prefers) of primes for the sequence. However when I do this I get a result which is somewhat at odds with Charles' comment in the notes:
I checked that there are no primes in the first 5000 terms. Heuristically there are infinitely many, about 0.5 log log n through the n-th term.
(I PMed CRG about the math behind his estimate but got no reply as yet.)

Here is what I get:

The odds of a randomly selected odd integer x being prime is ~2/ln(x) ... summing this for the odds-not-divisible-by-3 for terms 1-30, 31-60 and 61-90, &c, each of which intervals contains 4 possibly-prime terms, we get the expected #primes for the first few of said intervals to be:

1- 30: 0.22749
31- 60: 0.05150
61- 90: 0.02700
91-120: 0.01809
121-150: 0.01242
151-180: 0.00939
181-210: 0.00755

I did the above by hand (assisted by Pari) ... at this point in order to investigate the convergence (or not) of the estimated #primes I wrote some simple Pari code for playing with this sequence - uncomment the if(isprime...) code snip just below the update of nexpect if you want to check for prime terms, at the cost of drastically increased runtime:
ilog10 = 1/log(10);
n = 1; i = 2; nexpect = 0.;
while(i < 1000000,\
	ndd = ceil(log(i+0.5)*ilog10);	/* Need + 0.5 so e.g. ndd(100) comes out = 3 rather than 2 */\
	pow10 = 10^ndd;\
	n = pow10^2*n + pow10*i + (i+1);\
	if(i%1000 == 0,\
		print("i = ",i,"; nexpect = ",nexpect);\
	if(i%10 != 4,	/* Skip terms divisible by 5 */\
		if(n%3 != 0,	/* Skip terms divisible by 3 */\
			nexpect += 2/log(n);\
		/*	if(isprime(n),\
				print("i = ",i+1,": n prime!");\
			);	*/\
	i += 2;\
(Hit <return> after pasting into a Pari shell to begin execution, and type 'nexpect' on loop exit to see the final value.)

Here are the results for successive powers of 10 from 10^3 to 10^6 - I use logarithmically constant increments here because if the resulting increments in the expectation value decrease from one power of 10 to the next we at least have hope that there may be a limit at infinity:

10^3: 0.4206922620678406265572242819
10^4: 0.4959359595134930290178514034
10^5: 0.5545675055579183966439241436
10^6: 0.6026039035873964125108005995

One might expect the summation to diverge as n --> oo based on divergence of the harmonic series - note that even knocking out fixed patterns of terms from the harmonic as we do here using divisibility patterns - does not alter the divergence property.

The reason I think the present summation may in fact converge is that due to increasing digit length of the appended numbers, the terms grow faster than log(T_n) ~ n. Thus, rather than the expected #primes being given by a harmonic sum(1/n), which diverges, it is rather given by something which is perhaps like sum(1/n^a) (a.k.a. the p-series or hyperharmonic series) or sum(1/(n * log(n)^a)) (a.k.a. the ln-series, where the summation starts at n = 2 rather than n = 1) with a > 1, both of which converge for all a > 0. Actually, on second thought, considering the logarithmic growth rate of appended numbers, perhaps log(T_n) ~ n log(n) (i.e. the expectation value governed by the ln-series with a = 1, which does in fact diverge, albeit slowly) is the correct asymptotic estimate.

However, even if the sum does diverge, it does so sufficiently slowly that the absence of primes in the ranges tested to date should not be surprising.

It's certainly an interesting problem, in any event - comments, corrections, further insights appreciated!

Last fiddled with by ewmayer on 2015-10-09 at 03:58 Reason: Added n = 10^6 #primes estimate
ewmayer is offline   Reply With Quote