mersenneforum.org  

Go Back   mersenneforum.org > Math Stuff > Computer Science & Computational Number Theory

Reply
 
Thread Tools
Old 2012-05-12, 10:51   #1
HellGauss
 
Apr 2012

516 Posts
Default Prime gaps and storage

From http://en.wikipedia.org/wiki/Prime_gap:

The prime gap between two succesive prime is always even (except for 2-3) and is always <512 for small-medium number (up to 304G). We can therefore have a table of primes in which each prime occupy 1 unsigned byte.

The following trial division code generates a table of the first 203 280 219 prime-halfgap (there are 203280221 primes up to 2^32), starting from the halfgap(5-3)=1. It is not optimized (it will take up to 1hour). The .dat generated file has MD5=004ECB69C49B63D3A6CEEA0E1B94317B

Code:
#include "stdio.h"
#include "time.h"

int main(void)
{
  time_t start,end;
  time(&start);
  unsigned int* smallprime=new unsigned int [7000];//about 7000 primes<65536
  unsigned int* pt;
  unsigned int* pt2;
  unsigned char* gap=new unsigned char [210000000];//about 210000000 primes<2^32
  unsigned int max0;
  unsigned int max=4294967293;//4294967293 = 2^32 - 3 ; change this to obtain smaller dat file
  int sq=0;//to stop checking if prime^2>number
  
  unsigned int x,flag,t;
  int smallprimecount=1;
  unsigned int currprime;
  smallprime[0]=3;//counting only odd primes;
  FILE* datfile;
    
  unsigned int number;
    
  pt=smallprime;
  t=*pt;t*=t; 
  for (number=5;number<65535;number+=2)
  {
    if (t==number)
    {
      pt++;
      t=*pt;t*=t; 
    }
    else
    {
      flag=0;
      for(pt2=smallprime;pt2<=pt;pt2++)
      {
       if ((number%*pt2)==0) 
       {
         flag=1;
         break;
       }
      }
      if(flag==0)
      {       
       smallprime[smallprimecount]=number;
       smallprimecount++;
      }
    }
  }
  
  time(&end);
  
  printf("\n%d primes up to %d (t=%d sec)\n",smallprimecount+1,65536,end-start);
  currprime=smallprime[smallprimecount-1];
  max0=currprime;max0*=max0;//we need this to avoid overflow with t (see later)
  if (max0>max) max0=max;
  
  flag=smallprimecount-1;//flag=temp var
  for(x=0;x<flag;x++)
    gap[x]=(unsigned char)((smallprime[x+1]-smallprime[x])/2);
  
  for(number=65537;number<max0;number+=2)
  {
    if (t==number)
    {
      pt++;
      t=*pt;t*=t;
    }
    else
    {
      flag=0;
      for(pt2=smallprime;pt2<=pt;pt2++)
      {
       if ((number%*pt2)==0) 
       {
         flag=1;
         break;
       }
      }
      if(flag==0)
      {
       gap[x]=(unsigned char)((number-currprime)/2);
       currprime=number;
       x++;
      }
    }
  }
  
  //check for primes near 2^32
  for(number=max0+2;number<=max;number+=2)
  {
    flag=0;
    for(pt2=smallprime;pt2<=pt;pt2++)
    {
     if ((number%*pt2)==0) 
     {
       flag=1;
       break;
     }
    }
    if(flag==0)
    {
     gap[x]=(unsigned char)((number-currprime)/2);
     currprime=number;
     x++;
    }
  }
  time(&end);
  
  printf("\n%d primes up to %u  (t=%d sec)\n\nWriting %d half-gaps (wait)\n",x+2,max,end-start,x);
  datfile=fopen("gapprime.dat","wb");
  fwrite(gap,1,x,datfile);
  fclose(datfile);
  
  delete [] smallprime;
  delete [] gap;
  
  time(&end);
  
  printf("\nDone! (t=%d sec)\n", end-start);
  
  getchar();
  
  return 1;
}
Providing that we know that 2 and 3 are the first two prime numbers, we can read the .dat file and obtain the array of all primes<2^32 (unsigned int) adding gaps from .dat file, as in the following test example:

Code:
#include "stdio.h"
#include "time.h"

int main(void)
{
  time_t start,end;
  int x;
  
  printf("Loading primes... (t=0sec)");time(&start);
  
  FILE* datfile=fopen("gapprime.dat","rb");
  unsigned char* hgap=new unsigned char [203280219];
  
  fread(hgap,1,203280219,datfile);
  fclose(datfile);
  
  unsigned int* prime=new unsigned int [203280221];
  prime[0]=2;prime[1]=3;
  
  for(x=0;x<203280219;x++)
    prime[x+2]=prime[x+1]+2*((unsigned int)(hgap[x]));
    
  delete [] hgap;//we no longer need gaps
  
  time(&end);  
  printf("\nPrimes loaded (t=%dsec, n.primes=%d)\n\n",end-start,x+2);
  printf("%u\n%u\n%u\n%u\n%u\n%u\n", prime[0],prime[2],prime[203280220],prime[99999999],prime[50847533],prime[50847534]);
  
delete [] prime;
  getchar();

return 1;
}
HellGauss is offline   Reply With Quote
Old 2012-05-12, 13:57   #2
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

26×113 Posts
Default

Quote:
Originally Posted by HellGauss View Post
From http://en.wikipedia.org/wiki/Prime_gap:

The prime gap between two succesive prime is always even (except for 2-3) and is always <512 for small-medium number (up to 304G). We can therefore have a table of primes in which each prime occupy 1 unsigned byte.
One can do a lot better.
R.D. Silverman is offline   Reply With Quote
Old 2012-05-12, 14:46   #3
science_man_88
 
science_man_88's Avatar
 
"Forget I exist"
Jul 2009
Dumbassville

8,369 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
One can do a lot better.
my best idea so far is a forstep loop only hitting numbers that are 1 or 5 mod 6 and writing out only the status of if they are prime as 1 or 0, which if my math is correct should store it in under 171 MB, and that's without RLE etc.

Last fiddled with by science_man_88 on 2012-05-12 at 14:46
science_man_88 is offline   Reply With Quote
Old 2012-05-12, 14:59   #4
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

2×2,969 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
One can do a lot better.
In the extreme, they can be packed into 0 bytes by recomputing as needed. Different levels of compression are good for different things. But the OP's method is actually in use in at least one major system, so it's certainly useful for some tasks.
CRGreathouse is offline   Reply With Quote
Old 2012-05-12, 15:07   #5
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

32·331 Posts
Default

http://www.rsok.com/~jrm/printprimes.html

Using this technique you can store primes for every 210 numbers in 6 bytes. I have a file taking 285,714,288 bytes with the first 455052515 primes up to 210*285714288/6 = 10,000,000,080.

The first 4 primes 2,3,5,7 is not stored in the file.

You can read the file like this (long long is 64bit integer):

Code:
#include <stdio.h>
#include <stdlib.h>

void main ()
{
long long a,b,c,d,e,f,i,prime;
long long * primes;
FILE *file;
int pri[] = {0,1,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,121,127,131,137,139,
             143,149,151,157,163,167,169,173,179,181,187,191,193,197,199,209};

	primes = (long long*) malloc (455052600*sizeof(long long));
	f=0;
	file = fopen("e:\\primes.dat","rb");
	primes[1]=2; primes[2]=3; primes[3]=5; primes[4]=7;
	prime=4;
	for(a=0;a<(10000000080/210);a++)
	{
		for(b=0;b<6;b++)
		{
			d=fgetc(file);
			e=256;
			for(c=1;c<=8;c++)
			{
				e=e/2;
				if (d>=e) { d=d-e; prime++; primes[prime]=f+pri[b*8+c]; }
			}
		}
		f=f+210;
	}
}
ATH is offline   Reply With Quote
Old 2012-05-12, 17:40   #6
HellGauss
 
Apr 2012

5 Posts
Default

I was indeed looking for the 'best' trade-off between loading speed and storage size.

My method as the advantage to do very few ops (some pointer increasing, an addition and a times2 operation) to calculate primes. For primes up to 2^32, the only two ways to do less ops is

1)Full store the primes (this require 800M instead of 200M)
2)Remove the *2 operation and store unsigned short gaps (400M)

The ATH approach is also interesting, i will take a look. It requires lot of more ops, but i think that the weak point of loading primes is to open the file and read, so some more ops are welcome.

There is also some issue on compressing: if i .7z my .dat I can compress to about 50%, since I usually do not have high gaps. I think this can be done also for the ATH .dat file (you will have lots of 0 in your .dat file).

I realize that i have used the same ATH idea, but only for mod%2. Moreover ATH does not use gaps, so we can combine the two methods: here some ideas:

1)We can use a larger sieve: up to 13# (30030 is still a short). We can store this sieve in a sieve.dat file
2)We can store each prime with an average of 4bits (half byte). We will use the value 0-14 for small gaps, and the value 15 for higher gaps (we will read another (or more) half-byte in this case).

I will take a look next week.

PS: @ATH: you should remove the first 0 from the sieve; also the first prime in c/c++ is primes[0]. Also i suggest you to use some more memory and load the .dat file at once, using the fread function once rather than 450MTimes fgetc. You can delete [] this array once you have finished to calculate primes.
HellGauss is offline   Reply With Quote
Old 2012-05-12, 21:18   #7
literka
 
literka's Avatar
 
Mar 2010

2·83 Posts
Default

This is exactly what I did to pack prime numbers from 2 to 2^32. The received file, I called if "dtst", has a size 203316597 bytes. I used it for my "Number Theory Calculator", which can be downloaded from http://www.literka.addr.com/rchelp/installthcalc.exe. It can be used for factorizations, but with this respect, program is not competitive with existing programs.
I have updates of this program, but I did not upload them.

Last fiddled with by literka on 2012-05-12 at 21:20
literka is offline   Reply With Quote
Old 2012-05-13, 04:07   #8
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

297910 Posts
Default

Quote:
Originally Posted by HellGauss View Post
PS: @ATH: you should remove the first 0 from the sieve; also the first prime in c/c++ is primes[0]. Also i suggest you to use some more memory and load the .dat file at once, using the fread function once rather than 450MTimes fgetc. You can delete [] this array once you have finished to calculate primes.
This was just a simple demonstration of how to calculate the primes not the best optimized way. I haven't used fread before but I can see it just loads the entire file into a char array, but that doesn't calculate the actual primes for us. Btw it's not 450M times fgetc its 10^10 / 210 ~ 48M times.

I know arrays start at index 0 but if I put primes in an array I prefer to start at 1 so the n'th prime is at index n.

Alternately you can create a RAM disk and put the primes.dat on it (it's just 285 Mb) and just access the primes you need directly with fseek.

If you just need primes up to 2^32 it will take only round(2^32*6/210) + 1 = 122,713,352 bytes.

Last fiddled with by ATH on 2012-05-13 at 04:10
ATH is offline   Reply With Quote
Old 2012-05-13, 04:25   #9
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

1011101000112 Posts
Default

Quote:
Originally Posted by HellGauss View Post
1)We can use a larger sieve: up to 13# (30030 is still a short). We can store this sieve in a sieve.dat file
This will take (2-1)*(3-1)*(5-1)*(7-1)*(11-1)*(13-1) = 5760 bits = 720 bytes for each chunk of 30030 numbers. It will save 1-(720/30030)/(6/210) ~ 16% space compared to the 210 in 6 bytes method but it will require a sieve up to 30030 instead of 210 to calculate the primes, not sure that is worth it for 16% less space but yes we can store the sieve in a file as well.

Last fiddled with by ATH on 2012-05-13 at 04:27
ATH is offline   Reply With Quote
Old 2012-05-13, 09:30   #10
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

3·179 Posts
Default

I'm wondering if it is really worth the pain storing primes in a file given how fast a carefully implemented sieve can be.
ldesnogu is offline   Reply With Quote
Old 2012-05-13, 14:40   #11
HellGauss
 
Apr 2012

1012 Posts
Default

The issue is not only about hd storage. Suppose you want to find all primes (say) from 10000000000G To 10000000050G (unsigned int64). You need somewhere all primes up to 2^32. And you may want to have a good trade-off between memory sotrage and computation efficency.
HellGauss is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime gaps Terence Schraut Miscellaneous Math 10 2020-09-01 23:49
Dr Nicely's 1st occurrence Prime Gaps rudy235 Prime Gap Searches 192 2020-02-06 09:48
Welcome to the Prime Gaps Search Forum robert44444uk Prime Gap Searches 2 2019-09-23 01:00
Gaps and more gaps on <300 site gd_barnes Riesel Prime Search 11 2007-06-27 04:12
A Lot Of Storage moo Hardware 0 2005-11-23 03:16

All times are UTC. The time now is 11:16.

Sat Nov 28 11:16:28 UTC 2020 up 79 days, 8:27, 3 users, load averages: 1.15, 1.30, 1.21

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.