mersenneforum.org  

Go Back   mersenneforum.org > Math Stuff > Computer Science & Computational Number Theory

Reply
 
Thread Tools
Old 2019-05-09, 22:21   #1
hansl
 
hansl's Avatar
 
Apr 2019

3158 Posts
Default Efficient storage of all primes up to some n

I've been thinking about some methods of storing a large sequence of primes in a compact manner, while requiring trivial computation to "extract" them.

For example, say I want to store all primes up to 2^32. The simplest way would be a file of 32bit words, which would take up ~775.5 MiB

Code:
primepi(2^32)*4/1024/1024.
%5 = 775.45250320434570312500000000000000000
Now I'm just exploring some ideas using PARI to get some ballpark estimates. Some of the cases below technically would require some additional metadata, like a count for how many primes exist in each bitlength, etc., however this should mostly be negligible, so I'm ignoring that small added cost at the moment for simplicity.

So, one simple way to compact this would be to split this into separate chunks in byte ranges. Store all primes < 2^8 as one byte, then primes < 2^16 as 2 bytes, etc:
Code:
\\PARI
? (primepi(2^8)+2*(primepi(2^16)-primepi(2^8))+3*(primepi(2^24)-primepi(2^16))+4*(primepi(2^32)-primepi(2^24)))/1024/1024.
%8 = 774.41827487945556640625000000000000000
Ok, so we saved one MiB or 0.133% lol, not great.

What if we take it further to bit-packing and splitting our primes among every power of 2?
Code:
? sum(n=1,32,n*(primepi(2^n)-primepi(2^(n-1))))/8/1024/1024.
%10 = 749.43927836418151855468750000000000000
Now we save about 26MiB, or 3.35% over the original size. Marginally better.

Since all primes > 2 will be odd, lets say we can save that last bit and just store our values as x where p = (2x+1)
Code:
? sum(n=1,32,(n-1)*(primepi(2^n)-primepi(2^(n-1))))/8/1024/1024.
%11 = 725.20638763904571533203125000000000000
Now we save about 50MiB or 6.48%. Still not particularly compelling.

So, what if we just store the gap between adjacent primes? According to wikipedia, the largest gap for p < 2^32 is 336. This can be held within 9 bits.
Code:
9*primepi(2^32)/8/1024/1024.
%14 = 218.09601652622222900390625000000000000
With this we would compress the size by 71.87%

And again since all prime gaps after (2,3) are multiples of 2, we can actually shave off another bit per gap stored, assuming the extracting program handles the initial case 2,3 specially:
Code:
? 8*primepi(2^32)/8/1024/1024.
%15 = 193.86312580108642578125000000000000000
This brings us to exactly 75% the original size. Not bad!

We could extend this further by splitting the primes into ranges based on the bitsize of the maximum gaps in the range. I haven't yet attempted to estimate the size for such a scheme, but it might save a few more percent in the total size.

One limitation of this would be that extraction is now restricted to sequential only, as opposed to before, where we could have random access if needed. That is, unless we implement some sort of "keyframe" concept, where we periodically store the full prime in the file, costing additional metadata, but maybe useful in some scenario.

Anyways, we could extend the prime gap representation further, by splitting into ranges based on the bitsize of the maximum gaps in the range. I haven't yet attempted to estimate the size for such a scheme, but it might save a few more percent in the total size.

I am wondering if any sort of trial factoring application or something like that would have a use for this sort of purely sequential representation that could be quickly streamed out? Or also, since I'm guessing I'm not the first person in the world to think about such things if there is any names or references for these sort of domain-specific prime compression concepts?

Any other thoughts on ways to further compress this information, while still limiting the extraction complexity to simple,fast operations (all of the above ideas should be do-able with essentially just binary shifts and additions). Of course LZ compression or something like that could probably compact things even greater, but then it becomes a question of if you are even saving any computation over sieving it yourself.

I used the limit of 2^32 as a simple example which is easy to calculate, but of course, would be interested in potentially storing more. Some exercises in case anyone else finds this interesting:
Give your best imagined storage scheme, and how compactly could all p < 2^64 be stored?
Or alternatively, what's the largest n where all p < n could be stored within 1GB, or even say 1TB, etc?
hansl is offline   Reply With Quote
Old 2019-05-09, 22:34   #2
hansl
 
hansl's Avatar
 
Apr 2019

5×41 Posts
Default

Another idea would be to create a large sieve bitfield, where each 1 bit represents a prime.
By once again ignoring the only even prime 2, we could cut the bitfield in half
2^32/8/1024/1024/2 = 256 MiB

Simple, but not as effective as the prime gap idea.
hansl is offline   Reply With Quote
Old 2019-05-09, 22:39   #3
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

72×31 Posts
Default

Wheel.
R. Gerbicz is offline   Reply With Quote
Old 2019-05-09, 22:59   #4
hansl
 
hansl's Avatar
 
Apr 2019

20510 Posts
Default

Quote:
Originally Posted by R. Gerbicz View Post
Wheel.
I have read a little bit about wheel factorization, and was honestly thinking about mentioning that too. It generates sequences of "mostly prime" numbers, but I'm curious what would be a good scheme to store pre-computed "misses" given some limited wheel size, or how to determine an optimum wheel size for a given n?

Also I would count the wheel itself as part of the pre-computed data.
I'll have to think about it some to determine what kind of optimal gains this would entail.
hansl is offline   Reply With Quote
Old 2019-05-09, 23:18   #5
a1call
 
a1call's Avatar
 
"Rashid Naimi"
Oct 2015
Remote to Here/There

2·1,087 Posts
Default

I believe Pari-GP stores the difference between the primes which you mentioned. Lossless compressions such as zipping can perhaps overdo any math based compressions.

But all these will be to slow to read for very large files and likely less efficient than regenerating them from scratch.
Another consideration is the enormous amount storage space required.

There are some related estimates in this thread.

https://www.mersenneforum.org/showthread.php?t=22076

ETA
Here is another experiment that suggests there might not be much in the way of efficiency to store primes rather than calculate them. But probably a significant factor in this is the very poor read/write capabilities of Pari-GP.

Last fiddled with by a1call on 2019-05-09 at 23:49
a1call is offline   Reply With Quote
Old 2019-05-10, 00:21   #6
a1call
 
a1call's Avatar
 
"Rashid Naimi"
Oct 2015
Remote to Here/There

1000011111102 Posts
Default

Some related insight can be found here:
https://pari-users.pari.math.u-borde...9Mw/primelimit
a1call is offline   Reply With Quote
Old 2019-05-10, 00:47   #7
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

3×1,993 Posts
Default

Quote:
Originally Posted by R. Gerbicz View Post
Wheel.
Using a mod-30 wheel the primes up to 2^32 take up about 136.5 MiB.
CRGreathouse is offline   Reply With Quote
Old 2019-05-10, 01:11   #8
hansl
 
hansl's Avatar
 
Apr 2019

3158 Posts
Default

Quote:
Originally Posted by CRGreathouse View Post
Using a mod-30 wheel the primes up to 2^32 take up about 136.5 MiB.
Could you explain how you got to that estimate? What exactly is being stored in this case?
hansl is offline   Reply With Quote
Old 2019-05-10, 02:55   #9
axn
 
axn's Avatar
 
Jun 2003

5,197 Posts
Default

Quote:
Originally Posted by hansl View Post
Could you explain how you got to that estimate? What exactly is being stored in this case?
There are 8 modular classes that don't have trivial factors (mod 30). These are {1,7,11,13,17,19,23,29}. So you use 8 bitmaps each of length 2^32/30 bits. The first bitmap will store all numbers of the form 1+30n -- bit 0 will be 1, bit 1 will be 31, bit 2 will be 61, etc..
Second bitmap will store 7+30n, and so on.

Total size for all these bitmaps would be (2^32/30)*8 bits or 2^32/30 byte which is about 136.5 MB.

To see if a number k > 5 and < 2^32 is prime, decompose k into
q = floor(k/30), r=k%30
if r is not one of {1,7,11,13,17,19,23,29}, it is not prime.
Else lookup qth bit from the appropriate bitmap.
axn is offline   Reply With Quote
Old 2019-05-10, 05:53   #10
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

27·52 Posts
Default

You could also store them sequentially: byte 0 will be 1-29, byte 1 will be 31-59, byte n will be n*30 + {1,7,11,13,17,19,23,29}

If you go to a mod-210 wheel there are 48 modular classes so 6 bytes for every 210 numbers so 2^32 in 117 MB (122.7M bytes).
ATH is online now   Reply With Quote
Old 2019-05-10, 06:31   #11
hansl
 
hansl's Avatar
 
Apr 2019

5·41 Posts
Default

Quote:
Originally Posted by axn View Post
There are 8 modular classes that don't have trivial factors (mod 30). These are {1,7,11,13,17,19,23,29}. So you use 8 bitmaps each of length 2^32/30 bits. The first bitmap will store all numbers of the form 1+30n -- bit 0 will be 1, bit 1 will be 31, bit 2 will be 61, etc..
Second bitmap will store 7+30n, and so on.

Total size for all these bitmaps would be (2^32/30)*8 bits or 2^32/30 byte which is about 136.5 MB.

To see if a number k > 5 and < 2^32 is prime, decompose k into
q = floor(k/30), r=k%30
if r is not one of {1,7,11,13,17,19,23,29}, it is not prime.
Else lookup qth bit from the appropriate bitmap.
Ah, thanks, I think I get it now.

I worked out some functions to help calculate storing up to various bit limits with different wheel sizes. I figured the modular classes for large wheels could be stored as gaps between classes, so many of the functions rely on looking up prime gaps from tables:
Code:
bits(n) = ceil(log(n)/log(2));

\\ Nth Primorial (product of the first n primes)
nthprimorial(n)=prod(i=1,n,prime(i));

\\ Primorial (product of all primes <= n)
primorial(n)=nthprimorial(primepi(n));

\\ number of primes < nthprimorial(n) (see https://oeis.org/A000849 )
nthprimorialpi(n)={my(a=[0,1,3,10,46,343,3248,42331,646029,12283531,300369796,8028643010,259488750744,9414916809095,362597750396740,15397728527812858,742238179058722891,40068968501510691894]); 
if(n>#a-1,print("error, unknown for n>",#a-1),a[n+1]);}

\\ number of primes < primorial(n)
primorialpi(n)=nthprimorialpi(primepi(n));

\\ nth maximal gap between primes (see https://oeis.org/A005250 )
maxgap(n)={my(gaps=[1,2,4,6,8,14,18,20,22,34,36,44,52,72,86,96,112,114,118,132,148,154,180,210,220,222,234,248,250,282,288,292,320,336,354,382,384,394,456,464,468,474,486,490,500,514,516,532,534,540,582,588,602,652,674,716,766,778,804,806,906,916,924,1132,1184,1198,1220,1224,1248,1272,1328,1356,1370,1442,1476,1488,1510,1526,1530,1550]);
gaps[n]}

\\ max gap between primes for all p <= prime(n) (see https://oeis.org/A005669 )
maxgap_nthprime(n)={my(gap_prime_ns=[1,2,4,9,24,30,99,154,189,217,1183,1831,2225,3385,14357,30802,31545,40933,103520,104071,149689,325852,1094421,1319945,2850174,6957876,10539432,10655462,20684332,23163298,64955634,72507380,112228683,182837804,203615628,486570087,910774004,981765347,1094330259,1820471368,5217031687,7322882472,9583057667,11723859927,11945986786,11992433550,16202238656,17883926781,23541455083,28106444830,50070452577,52302956123,72178455400,94906079600,251265078335,473258870471,662221289043,1411461642343,2921439731020,5394763455325,6822667965940,35315870460455,49573167413483,49749629143526,1175661926421598,1475067052906945,2133658100875638,5253374014230870,5605544222945291,7784313111002702,8952449214971382,10160960128667332,10570355884548334,20004097201301079,34952141021660495,135962332505694894,160332893561542066,360701908268316580,408333670434942092,423731791997205041]);
for(i=1,#gap_prime_ns,if(gap_prime_ns[i]>n,return(gap_prime_ns[i-1]),if(gap_prime_ns[i]==n,return(maxgap(i)))));print("n=",n," max gap unknown for n > ",gap_prime_ns[#gap_prime_ns]," guess 2047?"); 2047}

\\ max gap between primes prime gap for primes <= n (see https://oeis.org/A000101 )
maxgap_n(n)={my(gap_primes=[3,5,11,29,97,127,541,907,1151,1361,9587,15727,19661,31469,156007,360749,370373,492227,1349651,1357333,2010881,4652507,17051887,20831533,47326913,122164969,189695893,191913031,387096383,436273291,1294268779,1453168433,2300942869,3842611109,4302407713,10726905041,20678048681,22367085353,25056082543,42652618807,127976335139,182226896713,241160624629,297501076289,303371455741,304599509051,416608696337,461690510543,614487454057,738832928467,1346294311331,1408695494197,1968188557063,2614941711251,7177162612387,13829048560417,19581334193189,42842283926129,90874329412297,171231342421327,218209405437449,1189459969826399,1686994940956727,1693182318747503,43841547845542243,55350776431904441,80873624627236069,203986478517457213,218034721194215521,305405826521089141,352521223451365651,401429925999155063,418032645936713497,804212830686679111,1425172824437700887,5733241593241198219,6787988999657779307,15570628755536096243,17678654157568189057,18361375334787046697]);
for(i=1,#gap_primes,if(gap_primes[i]>n,return(gap_primes[i-1]),if(gap_primes[i]==n,return(maxgap(i)))));print("n=",n," max gap unknown for n > ",gap_primes[#gap_primes]," guess 2047?"); 2047}

\\ size of wheel factorization bitmap for storing all primes < 2^b, using a nthprimorial(n) base wheel
wheelbitmapMiB(b, n)=2.^(b-23)/nthprimorial(n)*eulerphi(nthprimorial(n));

\\ size of wheel itself for wheel factorization, using a nthprimorial(n) base wheel. Storing the gap between each modular class
wheelMiB(n)=(eulerphi(nthprimorial(n))-1)*bits(maxgap_n(nthprimorial(n))/2)/2.^23

\\ total size of wheel and bitmap for storing all primes < 2^b, using a nthprimorial(n) base wheel
wheeltotalMiB(b, n)=wheelMiB(n)+wheelbitmapMiB(b,n);

\\primorial_numeral(n)= \\ Convert to primorial number system, vector of digits
And here's the results I got
Code:
? wheeltotalMiB(32,3)
%305 = 136.5333366711934407552083333
? wheeltotalMiB(32,4)
%306 = 117.0286050455910818917410714
? wheeltotalMiB(32,5)
%307 = 106.3901814021073378525771104
? wheeltotalMiB(32,6)
%308 = 98.21540557397352708326829421
? wheeltotalMiB(32,7)
%309 = 92.62673454240674648048412984
? wheeltotalMiB(32,8)
%310 = 91.91488279250780099343562301
? wheeltotalMiB(32,9)
%311 = 201.2229731159129979738501068
? wheeltotalMiB(40,8)
%312 = 22420.81124958361143420233199
? wheeltotalMiB(40,9)
%313 = 21559.29775874218603843453359
? wheeltotalMiB(40,10)
%314 = 24600.58342260438659563560571
? wheeltotalMiB(40,11)
%315 = 155250.8481199464342168732928
? wheeltotalMiB(48,10)
%316 = 5303727.482159470771217090061
? wheeltotalMiB(48,11)
%317 = 5264083.330768526806431184056
? wheeltotalMiB(48,12)
%318 = 10515841.27706460812349303656
? wheeltotalMiB(56,11)
%319 = 1313125198.888805102093294739
? wheeltotalMiB(56,12)
%320 = 1283029359.117316411105306225
? wheeltotalMiB(56,13)
%321 = 1493681169.718112853646773902
? wheeltotalMiB(64,12)
%322 = 327046489926.2217779744494826
? wheeltotalMiB(64,13)
%323 = 319311691479.0883192569093850
? wheeltotalMiB(64,14)
%324 = 322695438533.3537066932409118
So, based on this code, best case for 2^32 could be stored in 91.9MiB, using a 8## wheel.

2^40 in ~21.6 GB using mod 9## wheel
2^48 in ~5.26 TB using mod 11## wheel
2^56 in ~1.28 PB using mod 12## wheel
2^64 in ~319 PB using mod 13## wheel

Quote:
Originally Posted by ATH
If you go to a mod-210 wheel there are 48 modular classes so 6 bytes for every 210 numbers so 2^32 in 117 MB (122.7M bytes).
Oops, and I just realized that I'm not calculating the modular classes correctly, I thought it would be primepi(nthprimorial(n))-n+1, but I guess that misses all the composites which should still be included? So actual estimates would be a bit(a lot?) higher than the above results.

Is there a formula to get the modular classes given the nth primorial?

edit: Fixed code and results, Thanks LaurV!

Last fiddled with by hansl on 2019-05-10 at 06:54 Reason: Updated code and data with proper modular class count
hansl is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Intel Compute Stick (1GB RAM/8GB storage w/ Ubuntu 64-bit) ssybesma Hardware 4 2019-03-10 06:19
Prime gaps and storage HellGauss Computer Science & Computational Number Theory 18 2015-11-16 14:21
feature request: P-1 file storage on server tha Software 13 2014-03-04 15:09
Two (unrelated) questions about storage and hypert Dubslow Hardware 11 2011-07-30 06:08
A Lot Of Storage moo Hardware 0 2005-11-23 03:16

All times are UTC. The time now is 23:00.


Sun Dec 5 23:00:54 UTC 2021 up 135 days, 17:29, 0 users, load averages: 2.32, 1.68, 1.45

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.