20120512, 10:51  #1 
Apr 2012
5 Posts 
Prime gaps and storage
From http://en.wikipedia.org/wiki/Prime_gap:
The prime gap between two succesive prime is always even (except for 23) and is always <512 for smallmedium number (up to 304G). We can therefore have a table of primes in which each prime occupy 1 unsigned byte. The following trial division code generates a table of the first 203 280 219 primehalfgap (there are 203280221 primes up to 2^32), starting from the halfgap(53)=1. It is not optimized (it will take up to 1hour). The .dat generated file has MD5=004ECB69C49B63D3A6CEEA0E1B94317B Code:
#include "stdio.h" #include "time.h" int main(void) { time_t start,end; time(&start); unsigned int* smallprime=new unsigned int [7000];//about 7000 primes<65536 unsigned int* pt; unsigned int* pt2; unsigned char* gap=new unsigned char [210000000];//about 210000000 primes<2^32 unsigned int max0; unsigned int max=4294967293;//4294967293 = 2^32  3 ; change this to obtain smaller dat file int sq=0;//to stop checking if prime^2>number unsigned int x,flag,t; int smallprimecount=1; unsigned int currprime; smallprime[0]=3;//counting only odd primes; FILE* datfile; unsigned int number; pt=smallprime; t=*pt;t*=t; for (number=5;number<65535;number+=2) { if (t==number) { pt++; t=*pt;t*=t; } else { flag=0; for(pt2=smallprime;pt2<=pt;pt2++) { if ((number%*pt2)==0) { flag=1; break; } } if(flag==0) { smallprime[smallprimecount]=number; smallprimecount++; } } } time(&end); printf("\n%d primes up to %d (t=%d sec)\n",smallprimecount+1,65536,endstart); currprime=smallprime[smallprimecount1]; max0=currprime;max0*=max0;//we need this to avoid overflow with t (see later) if (max0>max) max0=max; flag=smallprimecount1;//flag=temp var for(x=0;x<flag;x++) gap[x]=(unsigned char)((smallprime[x+1]smallprime[x])/2); for(number=65537;number<max0;number+=2) { if (t==number) { pt++; t=*pt;t*=t; } else { flag=0; for(pt2=smallprime;pt2<=pt;pt2++) { if ((number%*pt2)==0) { flag=1; break; } } if(flag==0) { gap[x]=(unsigned char)((numbercurrprime)/2); currprime=number; x++; } } } //check for primes near 2^32 for(number=max0+2;number<=max;number+=2) { flag=0; for(pt2=smallprime;pt2<=pt;pt2++) { if ((number%*pt2)==0) { flag=1; break; } } if(flag==0) { gap[x]=(unsigned char)((numbercurrprime)/2); currprime=number; x++; } } time(&end); printf("\n%d primes up to %u (t=%d sec)\n\nWriting %d halfgaps (wait)\n",x+2,max,endstart,x); datfile=fopen("gapprime.dat","wb"); fwrite(gap,1,x,datfile); fclose(datfile); delete [] smallprime; delete [] gap; time(&end); printf("\nDone! (t=%d sec)\n", endstart); getchar(); return 1; } Code:
#include "stdio.h" #include "time.h" int main(void) { time_t start,end; int x; printf("Loading primes... (t=0sec)");time(&start); FILE* datfile=fopen("gapprime.dat","rb"); unsigned char* hgap=new unsigned char [203280219]; fread(hgap,1,203280219,datfile); fclose(datfile); unsigned int* prime=new unsigned int [203280221]; prime[0]=2;prime[1]=3; for(x=0;x<203280219;x++) prime[x+2]=prime[x+1]+2*((unsigned int)(hgap[x])); delete [] hgap;//we no longer need gaps time(&end); printf("\nPrimes loaded (t=%dsec, n.primes=%d)\n\n",endstart,x+2); printf("%u\n%u\n%u\n%u\n%u\n%u\n", prime[0],prime[2],prime[203280220],prime[99999999],prime[50847533],prime[50847534]); delete [] prime; getchar(); return 1; } 
20120512, 13:57  #2  
Nov 2003
2^{2}×5×373 Posts 
Quote:


20120512, 14:46  #3 
"Forget I exist"
Jul 2009
Dumbassville
20300_{8} Posts 
my best idea so far is a forstep loop only hitting numbers that are 1 or 5 mod 6 and writing out only the status of if they are prime as 1 or 0, which if my math is correct should store it in under 171 MB, and that's without RLE etc.
Last fiddled with by science_man_88 on 20120512 at 14:46 
20120512, 14:59  #4 
Aug 2006
3×1,993 Posts 
In the extreme, they can be packed into 0 bytes by recomputing as needed. Different levels of compression are good for different things. But the OP's method is actually in use in at least one major system, so it's certainly useful for some tasks.

20120512, 15:07  #5 
Einyen
Dec 2003
Denmark
2×3×13×41 Posts 
http://www.rsok.com/~jrm/printprimes.html
Using this technique you can store primes for every 210 numbers in 6 bytes. I have a file taking 285,714,288 bytes with the first 455052515 primes up to 210*285714288/6 = 10,000,000,080. The first 4 primes 2,3,5,7 is not stored in the file. You can read the file like this (long long is 64bit integer): Code:
#include <stdio.h> #include <stdlib.h> void main () { long long a,b,c,d,e,f,i,prime; long long * primes; FILE *file; int pri[] = {0,1,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,121,127,131,137,139, 143,149,151,157,163,167,169,173,179,181,187,191,193,197,199,209}; primes = (long long*) malloc (455052600*sizeof(long long)); f=0; file = fopen("e:\\primes.dat","rb"); primes[1]=2; primes[2]=3; primes[3]=5; primes[4]=7; prime=4; for(a=0;a<(10000000080/210);a++) { for(b=0;b<6;b++) { d=fgetc(file); e=256; for(c=1;c<=8;c++) { e=e/2; if (d>=e) { d=de; prime++; primes[prime]=f+pri[b*8+c]; } } } f=f+210; } } 
20120512, 17:40  #6 
Apr 2012
5 Posts 
I was indeed looking for the 'best' tradeoff between loading speed and storage size.
My method as the advantage to do very few ops (some pointer increasing, an addition and a times2 operation) to calculate primes. For primes up to 2^32, the only two ways to do less ops is 1)Full store the primes (this require 800M instead of 200M) 2)Remove the *2 operation and store unsigned short gaps (400M) The ATH approach is also interesting, i will take a look. It requires lot of more ops, but i think that the weak point of loading primes is to open the file and read, so some more ops are welcome. There is also some issue on compressing: if i .7z my .dat I can compress to about 50%, since I usually do not have high gaps. I think this can be done also for the ATH .dat file (you will have lots of 0 in your .dat file). I realize that i have used the same ATH idea, but only for mod%2. Moreover ATH does not use gaps, so we can combine the two methods: here some ideas: 1)We can use a larger sieve: up to 13# (30030 is still a short). We can store this sieve in a sieve.dat file 2)We can store each prime with an average of 4bits (half byte). We will use the value 014 for small gaps, and the value 15 for higher gaps (we will read another (or more) halfbyte in this case). I will take a look next week. PS: @ATH: you should remove the first 0 from the sieve; also the first prime in c/c++ is primes[0]. Also i suggest you to use some more memory and load the .dat file at once, using the fread function once rather than 450MTimes fgetc. You can delete [] this array once you have finished to calculate primes. 
20120512, 21:18  #7 
Mar 2010
2^{6}·3 Posts 
This is exactly what I did to pack prime numbers from 2 to 2^32. The received file, I called if "dtst", has a size 203316597 bytes. I used it for my "Number Theory Calculator", which can be downloaded from http://www.literka.addr.com/rchelp/installthcalc.exe. It can be used for factorizations, but with this respect, program is not competitive with existing programs.
I have updates of this program, but I did not upload them. Last fiddled with by literka on 20120512 at 21:20 
20120513, 04:07  #8  
Einyen
Dec 2003
Denmark
2·3·13·41 Posts 
Quote:
I know arrays start at index 0 but if I put primes in an array I prefer to start at 1 so the n'th prime is at index n. Alternately you can create a RAM disk and put the primes.dat on it (it's just 285 Mb) and just access the primes you need directly with fseek. If you just need primes up to 2^32 it will take only round(2^32*6/210) + 1 = 122,713,352 bytes. Last fiddled with by ATH on 20120513 at 04:10 

20120513, 04:25  #9 
Einyen
Dec 2003
Denmark
2×3×13×41 Posts 
This will take (21)*(31)*(51)*(71)*(111)*(131) = 5760 bits = 720 bytes for each chunk of 30030 numbers. It will save 1(720/30030)/(6/210) ~ 16% space compared to the 210 in 6 bytes method but it will require a sieve up to 30030 instead of 210 to calculate the primes, not sure that is worth it for 16% less space but yes we can store the sieve in a file as well.
Last fiddled with by ATH on 20120513 at 04:27 
20120513, 09:30  #10 
Jan 2008
France
2·5^{2}·11 Posts 
I'm wondering if it is really worth the pain storing primes in a file given how fast a carefully implemented sieve can be.

20120513, 14:40  #11 
Apr 2012
5 Posts 
The issue is not only about hd storage. Suppose you want to find all primes (say) from 10000000000G To 10000000050G (unsigned int64). You need somewhere all primes up to 2^32. And you may want to have a good tradeoff between memory sotrage and computation efficency.

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Prime gaps  Terence Schraut  Miscellaneous Math  10  20200901 23:49 
Dr Nicely's 1st occurrence Prime Gaps  rudy235  Prime Gap Searches  192  20200206 09:48 
Welcome to the Prime Gaps Search Forum  robert44444uk  Prime Gap Searches  2  20190923 01:00 
Gaps and more gaps on <300 site  gd_barnes  Riesel Prime Search  11  20070627 04:12 
A Lot Of Storage  moo  Hardware  0  20051123 03:16 