mersenneforum.org Prime gaps and storage
 Register FAQ Search Today's Posts Mark Forums Read

 2012-05-12, 10:51 #1 HellGauss   Apr 2012 5 Posts Prime gaps and storage From http://en.wikipedia.org/wiki/Prime_gap: The prime gap between two succesive prime is always even (except for 2-3) and is always <512 for small-medium number (up to 304G). We can therefore have a table of primes in which each prime occupy 1 unsigned byte. The following trial division code generates a table of the first 203 280 219 prime-halfgap (there are 203280221 primes up to 2^32), starting from the halfgap(5-3)=1. It is not optimized (it will take up to 1hour). The .dat generated file has MD5=004ECB69C49B63D3A6CEEA0E1B94317B Code: #include "stdio.h" #include "time.h" int main(void) { time_t start,end; time(&start); unsigned int* smallprime=new unsigned int [7000];//about 7000 primes<65536 unsigned int* pt; unsigned int* pt2; unsigned char* gap=new unsigned char [210000000];//about 210000000 primes<2^32 unsigned int max0; unsigned int max=4294967293;//4294967293 = 2^32 - 3 ; change this to obtain smaller dat file int sq=0;//to stop checking if prime^2>number unsigned int x,flag,t; int smallprimecount=1; unsigned int currprime; smallprime[0]=3;//counting only odd primes; FILE* datfile; unsigned int number; pt=smallprime; t=*pt;t*=t; for (number=5;number<65535;number+=2) { if (t==number) { pt++; t=*pt;t*=t; } else { flag=0; for(pt2=smallprime;pt2<=pt;pt2++) { if ((number%*pt2)==0) { flag=1; break; } } if(flag==0) { smallprime[smallprimecount]=number; smallprimecount++; } } } time(&end); printf("\n%d primes up to %d (t=%d sec)\n",smallprimecount+1,65536,end-start); currprime=smallprime[smallprimecount-1]; max0=currprime;max0*=max0;//we need this to avoid overflow with t (see later) if (max0>max) max0=max; flag=smallprimecount-1;//flag=temp var for(x=0;x
2012-05-12, 13:57   #2
R.D. Silverman

Nov 2003

22·5·373 Posts

Quote:
 Originally Posted by HellGauss From http://en.wikipedia.org/wiki/Prime_gap: The prime gap between two succesive prime is always even (except for 2-3) and is always <512 for small-medium number (up to 304G). We can therefore have a table of primes in which each prime occupy 1 unsigned byte.
One can do a lot better.

2012-05-12, 14:46   #3
science_man_88

"Forget I exist"
Jul 2009
Dumbassville

100000110000002 Posts

Quote:
 Originally Posted by R.D. Silverman One can do a lot better.
my best idea so far is a forstep loop only hitting numbers that are 1 or 5 mod 6 and writing out only the status of if they are prime as 1 or 0, which if my math is correct should store it in under 171 MB, and that's without RLE etc.

Last fiddled with by science_man_88 on 2012-05-12 at 14:46

2012-05-12, 14:59   #4
CRGreathouse

Aug 2006

3×1,993 Posts

Quote:
 Originally Posted by R.D. Silverman One can do a lot better.
In the extreme, they can be packed into 0 bytes by recomputing as needed. Different levels of compression are good for different things. But the OP's method is actually in use in at least one major system, so it's certainly useful for some tasks.

 2012-05-12, 15:07 #5 ATH Einyen     Dec 2003 Denmark 3,313 Posts http://www.rsok.com/~jrm/printprimes.html Using this technique you can store primes for every 210 numbers in 6 bytes. I have a file taking 285,714,288 bytes with the first 455052515 primes up to 210*285714288/6 = 10,000,000,080. The first 4 primes 2,3,5,7 is not stored in the file. You can read the file like this (long long is 64bit integer): Code: #include #include void main () { long long a,b,c,d,e,f,i,prime; long long * primes; FILE *file; int pri[] = {0,1,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,121,127,131,137,139, 143,149,151,157,163,167,169,173,179,181,187,191,193,197,199,209}; primes = (long long*) malloc (455052600*sizeof(long long)); f=0; file = fopen("e:\\primes.dat","rb"); primes[1]=2; primes[2]=3; primes[3]=5; primes[4]=7; prime=4; for(a=0;a<(10000000080/210);a++) { for(b=0;b<6;b++) { d=fgetc(file); e=256; for(c=1;c<=8;c++) { e=e/2; if (d>=e) { d=d-e; prime++; primes[prime]=f+pri[b*8+c]; } } } f=f+210; } }`
 2012-05-12, 17:40 #6 HellGauss   Apr 2012 5 Posts I was indeed looking for the 'best' trade-off between loading speed and storage size. My method as the advantage to do very few ops (some pointer increasing, an addition and a times2 operation) to calculate primes. For primes up to 2^32, the only two ways to do less ops is 1)Full store the primes (this require 800M instead of 200M) 2)Remove the *2 operation and store unsigned short gaps (400M) The ATH approach is also interesting, i will take a look. It requires lot of more ops, but i think that the weak point of loading primes is to open the file and read, so some more ops are welcome. There is also some issue on compressing: if i .7z my .dat I can compress to about 50%, since I usually do not have high gaps. I think this can be done also for the ATH .dat file (you will have lots of 0 in your .dat file). I realize that i have used the same ATH idea, but only for mod%2. Moreover ATH does not use gaps, so we can combine the two methods: here some ideas: 1)We can use a larger sieve: up to 13# (30030 is still a short). We can store this sieve in a sieve.dat file 2)We can store each prime with an average of 4bits (half byte). We will use the value 0-14 for small gaps, and the value 15 for higher gaps (we will read another (or more) half-byte in this case). I will take a look next week. PS: @ATH: you should remove the first 0 from the sieve; also the first prime in c/c++ is primes[0]. Also i suggest you to use some more memory and load the .dat file at once, using the fread function once rather than 450MTimes fgetc. You can delete [] this array once you have finished to calculate primes.
 2012-05-12, 21:18 #7 literka     Mar 2010 3008 Posts This is exactly what I did to pack prime numbers from 2 to 2^32. The received file, I called if "dtst", has a size 203316597 bytes. I used it for my "Number Theory Calculator", which can be downloaded from http://www.literka.addr.com/rchelp/installthcalc.exe. It can be used for factorizations, but with this respect, program is not competitive with existing programs. I have updates of this program, but I did not upload them. Last fiddled with by literka on 2012-05-12 at 21:20
2012-05-13, 04:07   #8
ATH
Einyen

Dec 2003
Denmark

3,313 Posts

Quote:
 Originally Posted by HellGauss PS: @ATH: you should remove the first 0 from the sieve; also the first prime in c/c++ is primes[0]. Also i suggest you to use some more memory and load the .dat file at once, using the fread function once rather than 450MTimes fgetc. You can delete [] this array once you have finished to calculate primes.
This was just a simple demonstration of how to calculate the primes not the best optimized way. I haven't used fread before but I can see it just loads the entire file into a char array, but that doesn't calculate the actual primes for us. Btw it's not 450M times fgetc its 10^10 / 210 ~ 48M times.

I know arrays start at index 0 but if I put primes in an array I prefer to start at 1 so the n'th prime is at index n.

Alternately you can create a RAM disk and put the primes.dat on it (it's just 285 Mb) and just access the primes you need directly with fseek.

If you just need primes up to 2^32 it will take only round(2^32*6/210) + 1 = 122,713,352 bytes.

Last fiddled with by ATH on 2012-05-13 at 04:10

2012-05-13, 04:25   #9
ATH
Einyen

Dec 2003
Denmark

1100111100012 Posts

Quote:
 Originally Posted by HellGauss 1)We can use a larger sieve: up to 13# (30030 is still a short). We can store this sieve in a sieve.dat file
This will take (2-1)*(3-1)*(5-1)*(7-1)*(11-1)*(13-1) = 5760 bits = 720 bytes for each chunk of 30030 numbers. It will save 1-(720/30030)/(6/210) ~ 16% space compared to the 210 in 6 bytes method but it will require a sieve up to 30030 instead of 210 to calculate the primes, not sure that is worth it for 16% less space but yes we can store the sieve in a file as well.

Last fiddled with by ATH on 2012-05-13 at 04:27

 2012-05-13, 09:30 #10 ldesnogu     Jan 2008 France 2·281 Posts I'm wondering if it is really worth the pain storing primes in a file given how fast a carefully implemented sieve can be.
 2012-05-13, 14:40 #11 HellGauss   Apr 2012 5 Posts The issue is not only about hd storage. Suppose you want to find all primes (say) from 10000000000G To 10000000050G (unsigned int64). You need somewhere all primes up to 2^32. And you may want to have a good trade-off between memory sotrage and computation efficency.

 Similar Threads Thread Thread Starter Forum Replies Last Post Terence Schraut Miscellaneous Math 10 2020-09-01 23:49 rudy235 Prime Gap Searches 192 2020-02-06 09:48 robert44444uk Prime Gap Searches 2 2019-09-23 01:00 gd_barnes Riesel Prime Search 11 2007-06-27 04:12 moo Hardware 0 2005-11-23 03:16

All times are UTC. The time now is 14:34.

Fri May 27 14:34:30 UTC 2022 up 43 days, 12:35, 1 user, load averages: 1.16, 1.34, 1.51