View Single Post
Old 2020-09-17, 17:51   #6
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·32·17·19 Posts
Default Benford's Law

Benford's Law https://en.wikipedia.org/wiki/Benford's_law comes from observations that many cases of observed or measured sets of numbers have a nonuniform distribution of leading digits. That is, 1 is the most common leading digit, at 30.1% rather than 11.1%,and 9 the least, at 4.6% rather than 11.1%, for base ten numbers. It also applies to second and third digits, and to expression in other number bases. Numbers following Benford's law have digit probabilities based on the logarithm of the numbers. It requires that the numbers are distributed over several orders of magnitude, which at 2 to 82589933, Mersenne prime exponents certainly are, as well as the Mersenne primes' number of decimal digits or digits in other number bases. Log10(82589933/2)=7.616; log10(24862048/1) =7.396.

Benford's law is one of the tests used to detect fabricated numbers, such as in tax or accounting or research fraud. The numbers that people think are random when faking values generally are not random, generally avoiding round numbers, repeated digits, ascending or descending series, palindromes, etc. and may even reveal individuals' distinct tendencies. (I have no reason to suspect fakery in the Mersenne primes that have been repeatedly checked, so expected them to exhibit usual probability behavior including probable deviations from expected statistical values, with significant deviation expected due to low number of samples available.)

Considering the 51 known Mersenne primes, in base ten, their exponents follow Benford's law quite closely, with occurrence within 10% of expected, usually within +-1.
But not so for the number of digits in the decimal expression of the Mersenne primes.
https://www.mersenne.org/primes/
In decimal, the leading digit 4 is noticeably underrepresented, 5 is absent, and 6 is substantially overrepresented. The absence of a leading 5 seems to line up with some of the larger ratio gaps between successive exponents.
Based on the Lenstra & Pomerance conjecture we expect on average a ratio of ~1.47576 on exponent. (https://primes.utm.edu/notes/faq/NextMersenne.html)
Mersenne primes with exponents 127 and 521 (exponent ratio 4.102) have 39 and 157 decimal digits respectively; 1279 and 2203 (exponent ratio 1.722) have 386 and 664; 11213 and 19937 (exponent ratio 1.778) have 3376 and 6002 digits;
132049 and 216091 (exponent ratio 1.636) have 39751 and 65050;
1398269 and 2976221 (exponent ratio 2.129) have 420921 and 895932;
13466917 and 20996011 (exponent ratio 1.559) have 4053946 and 6320430.
ALL the ranges where we could look for a known Mersenne prime with a leading 5 in the number of decimal digits have greater than expected average gaps present, and no Mersenne prime, except for the first, 13 and 17, exponent ratio 1.308, having 4 and 6 digits respectively. Average observed ratio for gaps excluding 5 as a leading digit of the number of digits is 2.033, about 38% larger than the expected average ratio.

So let's convert that list of decimal number of decimal digits into hexadecimal. And octal.
Following are for each known Mersenne prime, the sequence number, exponent p, decimal number of decimal digits, of the Mersenne prime, and hexadecimal and octal representations of number of decimal digits;
Code:
 #         p   digits (hex)     (octal)
 1         2        1 0x1       o1
 2         3        1 0x1       o1
 3         5        2 0x2       o2
 4         7        3 0x3       o3
 5        13        4 0x4       o4
 6        17        6 0x6       o6
 7        19        6 0x6       o6
 8        31       10 0xA       o12
 9        61       19 0x13      o23
10        89       27 0x1B      o33
11       107       33 0x21      o41
12       127       39 0x27      o47
13       521      157 0x9D      o235
14       607      183 0xB7      o267
15      1279      386 0x182     o602
16      2203      664 0x298     o1230
17      2281      687 0x2AF     o1257
18      3217      969 0x3C9     o1711
19      4253     1281 0x501     o2401
20      4423     1332 0x534     o2464
21      9689     2917 0xB65     o5545
22      9941     2993 0xBB1     o5661
23     11213     3376 0xD30     o6460
24     19937     6002 0x1772    o13562
25     21701     6533 0x1985    o14605
26     23209     6987 0x1B4B    o15513 
27     44497    13395 0x3453    o32123
28     86243    25962 0x656A    o62552
29    110503    33265 0x81F1    o100761
30    132049    39751 0x9B47    o115507
31    216091    65050 0xFE1A    o177032
32    756839   227832 0x379F8   o674770
33    859433   258716 0x3F29C   o771234
34   1257787   378632 0x5C708   o1343410
35   1398269   420921 0x66C39   o1466071
36   2976221   895932 0xDABBC   o3325674
37   3021377   909526 0xDE0D6   o3360326
38   6972593  2098960 0x200710  o10003420
39  13466917  4053946 0x3DDBBA  o17355672
40  20996011  6320430 0x60712E  o30070456
41  24036583  7235733 0x6E6895  o33464225
42  25964951  7816230 0x774426  o35642046
43  30402457  9152052 0x8BA634  o42723064
44  32582657  9808358 0x95A9E6  o45324746
45  37156667 11185272 0xAAAC78  o52526170
46  42643801 12837064 0xC3E0C8  o60760310
47  43112609 12978189 0xC6080D  o61404015
48* 57885161 17425170 0x109E312 o102361422
49* 74207281 22338618 0x154DC3A o125156072
50* 77232917 23249425 0x162C211 o130541021
51* 82589933 24862048 0x17B5D60 o136656540
Again, leading digit 6 is over-represented in the hexadecimal, at more than double the expected count. Despite ones being favored by the available range of data. Excluding +-1 variations from expected frequency, 4 7 and E are underrepresented; E is absent. Similarly, in octal, leading digit 6 is more than twice as frequent as expected.
The substantial over-representation of leading digit 6 in bases 8, 10 and 16 seems odd. If there's some reason for that, other than statistics in low population sizes, please comment in a discussion thread.
Maybe it is surprising, maybe it isn't. Consider that if there's one digit that is the most over-represented, in one base, by pure randomness, its chance of being the most over-represented in a second base also seems to be 1/base. That's only mildly low for 10 or 16, not very low.
The usual test criterion, for rejecting that the difference between observed and expected, is less than 5% probability from chance, is not an issue in any of the Pearson chi-squared cases tabulated, although one is borderline. The 3 measures of x2, m and d gave widely different results for the same case.


An interesting paper on Benford's law and Mersenne numbers is here. It advises against using powers of two as the number base. Oops.


Note to self: pursue further; perhaps try to calculate probability distribution for the various numbers of the different digits. It's not as "simple" as a single binomial distribution. Purpose is to put some numbers to how likely or unlikely the extent of digit-6 over-representation is in either base 10 or 16. Or some study of number theory and statistics.
There's a discussion thread I began, in which jwaltos makes some recommendations.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf benfords law.pdf (33.0 KB, 191 views)

Last fiddled with by kriesel on 2021-09-16 at 22:40 Reason: fixed typo; added link to misc-math discussion thread of this topic
kriesel is offline