mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Viliam Furik (https://www.mersenneforum.org/forumdisplay.php?f=174)
-   -   Prime representing constant program for calculation of the digits (https://www.mersenneforum.org/showthread.php?t=26455)

 Viliam Furik 2021-01-28 23:57

Prime representing constant program for calculation of the digits

Hello, not so long ago I have posted in the y-cruncher subforum about a program for the Prime representing constant (PRC). Since then I have developed quite a fast program, thanks to the help of casevh, author of gmpy2 library, and successfully computed 1 billion (10[SUP]9[/SUP]) digits of PRC. I have also finally managed to figure out how to parallelize the algorithm, but that led me to another problem...

Next big milestone after 10[SUP]9[/SUP] digits is 10[SUP]12[/SUP]. That requires 1 TB of disk space to contain all digits in a text file (or less if any compression is used), computing through primes up to about 2,5 * 10[SUP]12[/SUP], basically the same amount of RAM (that problem has a solution - writing the intermediary numbers to disk, and reading them from there, as y-cruncher does, or at least how I understand it does it), but most importantly, it needs to be able to sieve out the primes themselves. Storing 2,5 * 10[SUP]12[/SUP] bits in RAM would require about 300 GB of RAM space, which is too much for what I seek. Sieving them in a file would bring up a lot of hassle with making a procedure that could do such thing and another that can read the actual values from there.

These might not be extremely hard problems, but there may be easier ways to do this. So I would like to ask, whether anyone can help with some idea on how to effectively handle such gigantic sized numbers and amounts of data.

BTW, I am also thinking about making it into a GPU program, it could play nicely with TF high-performing GPUs (basically INT32), as the primes are not that big, compared to the blob of prime jelly that is being computed. If somebody has some ideas about this too, I will be grateful for any advice.

BTW2: Are there any volunteers who would like to test the upcoming C version of the code? It will have the multi-core computation implemented in it.

 kruoli 2021-01-29 09:53

What are the requirements for the way you want to store the primes ((either) in RAM or on disk?). If none, you should be able to cut the memory requirement roughly in half, maybe even more. There are nearly [URL="http://www.wolframalpha.com/input/?i=primes+up+to+2.5e12"]91 billion primes[/URL] up to 2.5[$]\cdot[/$]10[SUP]12[/SUP]. If you would store the differences between primes like [URL="http://en.wikipedia.org/wiki/Variable-length_quantity"]this[/URL] instead of their exact values, it will save a lot of memory, but you can only access a single item by reading all values before it, so you would have random access in O(n). One could optimize this a lot by having intermediate values every so often, which might slightly increase memory demands, but will improve random access greatly.

[QUOTE=Viliam Furik;570370]BTW2: Are there any volunteers who would like to test the upcoming C version of the code? It will have the multi-core computation implemented in it.[/QUOTE]

Yes. :smile:

 Batalov 2021-01-29 11:16

[QUOTE=Viliam Furik;570370]Hello, not so long ago I have posted in the y-cruncher subforum about a program for the Prime generating constant (PGC). Since then I have developed quite a fast program, thanks to the help of casevh, author of gmpy2 library, and successfully computed 1 billion (10[SUP]9[/SUP]) digits of PGC.[/QUOTE]
...but then you can generate a 1 billion digits prime, can you?

Which 'Prime generating constant' are you calculating? Mills'?

Or are you calculating Euler–Mascheroni gamma?

 kruoli 2021-01-29 11:44

In the last thread he mentioned ([URL="http://mersenneforum.org/showthread.php?t=26298"]this one[/URL]), he wanted to compute the constant described in [URL="http://arxiv.org/pdf/2010.15882.pdf"]this[/URL] paper.

 Dr Sardonicus 2021-01-29 12:24

Yes, I believe the original "Prime [i][b]representing[/b][/i] constant" was a better word choice.

 Viliam Furik 2021-01-29 12:48

[QUOTE=Dr Sardonicus;570400]Yes, I believe the original "Prime [i][b]representing[/b][/i] constant" was a better word choice.[/QUOTE]

Oopsie... My bad, I confused the name. Will correct it.

 Viliam Furik 2021-01-29 13:13

@kruoli: As of now, the program uses a bit array for the primes, which it iterates. The parallelized algorithm relies on some way to store the primes, and then it chops the list of primes into about equal pieces (whether by the length of the range, or prime count in that range - about the same for these quantities of primes), saves the actual value of the last prime in each range (one range per worker), and when the actual values are needed in the computation, it calculates the from the position in the range, range number (used for finding the last prime in previous range) and if running from a savefile, also the last prime used in the computation of the savefile.

BTW, I forgot to mention, that I need a way to efficiently sieve them, too. Sieving is done quite easily in RAM, but sieving on disk is a bit more complicated, albeit not too complicated. To be clear, I am not thinking about using other methods than Eratosthenes (unless they are more efficient), but the storage of bits representing the integers is the tricky part when you need a lot of them.

Storing only the prime gaps seems to be a lot better, but is there a better way to store the sieving data?

@Batalov: As kruoli kindly explained, I am calculating the Prime representing constant, which can only generate (thus my confusion in the name) primes which were used in its computation, using the recursive formula. So 1 billion digits of the constant only suffice for primes under about 2 302 586 000.

Using the formula to generate the primes back is perhaps the only way to confirm the correctness of the digits, apart from computing them with some different method (there is another formula, but it is equivalent to the one mentioned in the paper - it is basically the same, only the one in the paper has its terms grouped, thus has fewer terms for the same amount of digits, but the terms take longer to compute, so it equals out).

 kruoli 2021-01-29 13:57

Sieving itself should not be a problem. You should use primesieve if you are not eager to implement everything yourself:

[CODE]$/usr/bin/time -v primesieve 2500000000000 Sieve size = 256 KiB Threads = 16 100% Seconds: 62.344 Primes: 90882915772 Command being timed: "primesieve 2500000000000" User time (seconds): 992.74 System time (seconds): 0.12 Percent of CPU this job got: 1592% Elapsed (wall clock) time (h:mm:ss or m:ss): 1:02.35 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 [B]Maximum resident set size (kbytes): 47436[/B] Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 6 Minor (reclaiming a frame) page faults: 15683 Voluntary context switches: 210 Involuntary context switches: 4092 Swaps: 0 File system inputs: 928 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0[/CODE] That's not even 50 MB. The space requirement for that command was in [$]\mathcal{O}(\sqrt n)[/$]. If you use the [URL="http://primesieve.org/api/"]API[/URL] of primesieve and do execute a command that is not only counting, it will of course take a bit longer, but not in a way that would cause delays. So my next question would be: Do you really need to store the primes? According to your last thread and the paper, you will only use one prime at a time, in order. Only in the division step in the end, you use the primorial up to your largest used prime, and you do not need all primes stored for that. [STRIKE]For an efficient implementation, you'd have to store in the order of [$]\mathcal{O}(\log p_\text{count})[/\$] primes at any given time.[/STRIKE] The problem would be the result (divisor) getting large quickly.

 Viliam Furik 2021-01-29 15:55

[QUOTE=kruoli;570407]So my next question would be: Do you really need to store the primes?[/QUOTE]

I don't think I can do it without storing them. Primorial is the product of all the primes used. In the computation of the numerator, each prime is used once, that's true. I have looked at the primesieve_next_prime function, but I don't think it can be faster to compute the primes instead of storing some sort of a list of them.

 Dr Sardonicus 2021-01-29 16:18

[QUOTE=Viliam Furik;570401]Oopsie... My bad, I confused the name. Will correct it.[/QUOTE]:tu:

Blogging, as it should be! :grin:

 Viliam Furik 2021-02-12 19:22

Quick question, does GMP, when using C on Windows, have to be compiled or something? It seems to me it does, as there is only a source code available on the website. I assume the answer will be the same for the primesieve library.

 All times are UTC. The time now is 12:31.