20210128, 23:57  #1 
"Viliam Furík"
Jul 2018
Martin, Slovakia
1215_{8} Posts 
Prime representing constant program for calculation of the digits
Hello, not so long ago I have posted in the ycruncher subforum about a program for the Prime representing constant (PRC). Since then I have developed quite a fast program, thanks to the help of casevh, author of gmpy2 library, and successfully computed 1 billion (10^{9}) digits of PRC. I have also finally managed to figure out how to parallelize the algorithm, but that led me to another problem...
Next big milestone after 10^{9} digits is 10^{12}. That requires 1 TB of disk space to contain all digits in a text file (or less if any compression is used), computing through primes up to about 2,5 * 10^{12}, basically the same amount of RAM (that problem has a solution  writing the intermediary numbers to disk, and reading them from there, as ycruncher does, or at least how I understand it does it), but most importantly, it needs to be able to sieve out the primes themselves. Storing 2,5 * 10^{12} bits in RAM would require about 300 GB of RAM space, which is too much for what I seek. Sieving them in a file would bring up a lot of hassle with making a procedure that could do such thing and another that can read the actual values from there. These might not be extremely hard problems, but there may be easier ways to do this. So I would like to ask, whether anyone can help with some idea on how to effectively handle such gigantic sized numbers and amounts of data. BTW, I am also thinking about making it into a GPU program, it could play nicely with TF highperforming GPUs (basically INT32), as the primes are not that big, compared to the blob of prime jelly that is being computed. If somebody has some ideas about this too, I will be grateful for any advice. BTW2: Are there any volunteers who would like to test the upcoming C version of the code? It will have the multicore computation implemented in it. Last fiddled with by Viliam Furik on 20210129 at 12:49 
20210129, 09:53  #2 
"Oliver"
Sep 2017
Porta Westfalica, DE
239_{16} Posts 
What are the requirements for the way you want to store the primes ((either) in RAM or on disk?). If none, you should be able to cut the memory requirement roughly in half, maybe even more. There are nearly 91 billion primes up to 2.5\(\cdot\)10^{12}. If you would store the differences between primes like this instead of their exact values, it will save a lot of memory, but you can only access a single item by reading all values before it, so you would have random access in O(n). One could optimize this a lot by having intermediate values every so often, which might slightly increase memory demands, but will improve random access greatly.
Yes. 
20210129, 11:16  #3  
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
13·733 Posts 
Quote:
Which 'Prime generating constant' are you calculating? Mills'? Or are you calculating Euler–Mascheroni gamma? 

20210129, 12:24  #5 
Feb 2017
Nowhere
4,813 Posts 
Yes, I believe the original "Prime representing constant" was a better word choice.

20210129, 12:48  #6 
"Viliam Furík"
Jul 2018
Martin, Slovakia
653 Posts 

20210129, 13:13  #7 
"Viliam Furík"
Jul 2018
Martin, Slovakia
653 Posts 
@kruoli: As of now, the program uses a bit array for the primes, which it iterates. The parallelized algorithm relies on some way to store the primes, and then it chops the list of primes into about equal pieces (whether by the length of the range, or prime count in that range  about the same for these quantities of primes), saves the actual value of the last prime in each range (one range per worker), and when the actual values are needed in the computation, it calculates the from the position in the range, range number (used for finding the last prime in previous range) and if running from a savefile, also the last prime used in the computation of the savefile.
BTW, I forgot to mention, that I need a way to efficiently sieve them, too. Sieving is done quite easily in RAM, but sieving on disk is a bit more complicated, albeit not too complicated. To be clear, I am not thinking about using other methods than Eratosthenes (unless they are more efficient), but the storage of bits representing the integers is the tricky part when you need a lot of them. Storing only the prime gaps seems to be a lot better, but is there a better way to store the sieving data? @Batalov: As kruoli kindly explained, I am calculating the Prime representing constant, which can only generate (thus my confusion in the name) primes which were used in its computation, using the recursive formula. So 1 billion digits of the constant only suffice for primes under about 2 302 586 000. Using the formula to generate the primes back is perhaps the only way to confirm the correctness of the digits, apart from computing them with some different method (there is another formula, but it is equivalent to the one mentioned in the paper  it is basically the same, only the one in the paper has its terms grouped, thus has fewer terms for the same amount of digits, but the terms take longer to compute, so it equals out). 
20210129, 13:57  #8 
"Oliver"
Sep 2017
Porta Westfalica, DE
569 Posts 
Sieving itself should not be a problem. You should use primesieve if you are not eager to implement everything yourself:
Code:
$ /usr/bin/time v primesieve 2500000000000 Sieve size = 256 KiB Threads = 16 100% Seconds: 62.344 Primes: 90882915772 Command being timed: "primesieve 2500000000000" User time (seconds): 992.74 System time (seconds): 0.12 Percent of CPU this job got: 1592% Elapsed (wall clock) time (h:mm:ss or m:ss): 1:02.35 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 47436 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 6 Minor (reclaiming a frame) page faults: 15683 Voluntary context switches: 210 Involuntary context switches: 4092 Swaps: 0 File system inputs: 928 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 So my next question would be: Do you really need to store the primes? According to your last thread and the paper, you will only use one prime at a time, in order. Only in the division step in the end, you use the primorial up to your largest used prime, and you do not need all primes stored for that. Last fiddled with by kruoli on 20210129 at 14:44 Reason: Being pedantic. Removed something I'm not yet sure about. 
20210129, 15:55  #9 
"Viliam Furík"
Jul 2018
Martin, Slovakia
653 Posts 
I don't think I can do it without storing them. Primorial is the product of all the primes used. In the computation of the numerator, each prime is used once, that's true. I have looked at the primesieve_next_prime function, but I don't think it can be faster to compute the primes instead of storing some sort of a list of them.

20210129, 16:18  #10 
Feb 2017
Nowhere
4,813 Posts 

20210212, 19:22  #11 
"Viliam Furík"
Jul 2018
Martin, Slovakia
653_{10} Posts 
Quick question, does GMP, when using C on Windows, have to be compiled or something? It seems to me it does, as there is only a source code available on the website. I assume the answer will be the same for the primesieve library.

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Prime representing constant  Viliam Furik  ycruncher  13  20210114 09:43 
Did the P1 percentage calculation program change recently?  petrw1  Marin's Mersennearies  8  20200913 16:07 
Prime generating series  Citrix  Open Projects  18  20130824 05:24 
Program to TF Mersenne numbers with more than 1 sextillion digits?  Stargate38  Factoring  24  20111103 00:34 
Primegenerating polynomials  Orgasmic Troll  Math  28  20041202 16:37 