mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Blogorrhea > Viliam Furik

Reply
 
Thread Tools
Old 2021-01-28, 23:57   #1
Viliam Furik
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

2×3×113 Posts
Default Prime representing constant program for calculation of the digits

Hello, not so long ago I have posted in the y-cruncher subforum about a program for the Prime representing constant (PRC). Since then I have developed quite a fast program, thanks to the help of casevh, author of gmpy2 library, and successfully computed 1 billion (109) digits of PRC. I have also finally managed to figure out how to parallelize the algorithm, but that led me to another problem...

Next big milestone after 109 digits is 1012. That requires 1 TB of disk space to contain all digits in a text file (or less if any compression is used), computing through primes up to about 2,5 * 1012, basically the same amount of RAM (that problem has a solution - writing the intermediary numbers to disk, and reading them from there, as y-cruncher does, or at least how I understand it does it), but most importantly, it needs to be able to sieve out the primes themselves. Storing 2,5 * 1012 bits in RAM would require about 300 GB of RAM space, which is too much for what I seek. Sieving them in a file would bring up a lot of hassle with making a procedure that could do such thing and another that can read the actual values from there.

These might not be extremely hard problems, but there may be easier ways to do this. So I would like to ask, whether anyone can help with some idea on how to effectively handle such gigantic sized numbers and amounts of data.


BTW, I am also thinking about making it into a GPU program, it could play nicely with TF high-performing GPUs (basically INT32), as the primes are not that big, compared to the blob of prime jelly that is being computed. If somebody has some ideas about this too, I will be grateful for any advice.

BTW2: Are there any volunteers who would like to test the upcoming C version of the code? It will have the multi-core computation implemented in it.

Last fiddled with by Viliam Furik on 2021-01-29 at 12:49
Viliam Furik is offline   Reply With Quote
Old 2021-01-29, 09:53   #2
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

3·5·43 Posts
Default

What are the requirements for the way you want to store the primes ((either) in RAM or on disk?). If none, you should be able to cut the memory requirement roughly in half, maybe even more. There are nearly 91 billion primes up to 2.5\(\cdot\)1012. If you would store the differences between primes like this instead of their exact values, it will save a lot of memory, but you can only access a single item by reading all values before it, so you would have random access in O(n). One could optimize this a lot by having intermediate values every so often, which might slightly increase memory demands, but will improve random access greatly.

Quote:
Originally Posted by Viliam Furik View Post
BTW2: Are there any volunteers who would like to test the upcoming C version of the code? It will have the multi-core computation implemented in it.
Yes.
kruoli is offline   Reply With Quote
Old 2021-01-29, 11:16   #3
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

23·5·239 Posts
Question

Quote:
Originally Posted by Viliam Furik View Post
Hello, not so long ago I have posted in the y-cruncher subforum about a program for the Prime generating constant (PGC). Since then I have developed quite a fast program, thanks to the help of casevh, author of gmpy2 library, and successfully computed 1 billion (109) digits of PGC.
...but then you can generate a 1 billion digits prime, can you?

Which 'Prime generating constant' are you calculating? Mills'?

Or are you calculating Euler–Mascheroni gamma?
Batalov is offline   Reply With Quote
Old 2021-01-29, 11:44   #4
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

3×5×43 Posts
Default

In the last thread he mentioned (this one), he wanted to compute the constant described in this paper.
kruoli is offline   Reply With Quote
Old 2021-01-29, 12:24   #5
Dr Sardonicus
 
Dr Sardonicus's Avatar
 
Feb 2017
Nowhere

3×5×331 Posts
Default

Yes, I believe the original "Prime representing constant" was a better word choice.
Dr Sardonicus is offline   Reply With Quote
Old 2021-01-29, 12:48   #6
Viliam Furik
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

12468 Posts
Default

Quote:
Originally Posted by Dr Sardonicus View Post
Yes, I believe the original "Prime representing constant" was a better word choice.
Oopsie... My bad, I confused the name. Will correct it.
Viliam Furik is offline   Reply With Quote
Old 2021-01-29, 13:13   #7
Viliam Furik
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

2·3·113 Posts
Default

@kruoli: As of now, the program uses a bit array for the primes, which it iterates. The parallelized algorithm relies on some way to store the primes, and then it chops the list of primes into about equal pieces (whether by the length of the range, or prime count in that range - about the same for these quantities of primes), saves the actual value of the last prime in each range (one range per worker), and when the actual values are needed in the computation, it calculates the from the position in the range, range number (used for finding the last prime in previous range) and if running from a savefile, also the last prime used in the computation of the savefile.

BTW, I forgot to mention, that I need a way to efficiently sieve them, too. Sieving is done quite easily in RAM, but sieving on disk is a bit more complicated, albeit not too complicated. To be clear, I am not thinking about using other methods than Eratosthenes (unless they are more efficient), but the storage of bits representing the integers is the tricky part when you need a lot of them.

Storing only the prime gaps seems to be a lot better, but is there a better way to store the sieving data?

@Batalov: As kruoli kindly explained, I am calculating the Prime representing constant, which can only generate (thus my confusion in the name) primes which were used in its computation, using the recursive formula. So 1 billion digits of the constant only suffice for primes under about 2 302 586 000.

Using the formula to generate the primes back is perhaps the only way to confirm the correctness of the digits, apart from computing them with some different method (there is another formula, but it is equivalent to the one mentioned in the paper - it is basically the same, only the one in the paper has its terms grouped, thus has fewer terms for the same amount of digits, but the terms take longer to compute, so it equals out).
Viliam Furik is offline   Reply With Quote
Old 2021-01-29, 13:57   #8
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

3·5·43 Posts
Default

Sieving itself should not be a problem. You should use primesieve if you are not eager to implement everything yourself:

Code:
$ /usr/bin/time -v primesieve 2500000000000
Sieve size = 256 KiB
Threads = 16
100%
Seconds: 62.344
Primes: 90882915772
        Command being timed: "primesieve 2500000000000"
        User time (seconds): 992.74
        System time (seconds): 0.12
        Percent of CPU this job got: 1592%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:02.35
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 47436
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 6
        Minor (reclaiming a frame) page faults: 15683
        Voluntary context switches: 210
        Involuntary context switches: 4092
        Swaps: 0
        File system inputs: 928
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
That's not even 50 MB. The space requirement for that command was in \(\mathcal{O}(\sqrt n)\). If you use the API of primesieve and do execute a command that is not only counting, it will of course take a bit longer, but not in a way that would cause delays.

So my next question would be: Do you really need to store the primes? According to your last thread and the paper, you will only use one prime at a time, in order. Only in the division step in the end, you use the primorial up to your largest used prime, and you do not need all primes stored for that. For an efficient implementation, you'd have to store in the order of [$]\mathcal{O}(\log p_\text{count})[/$] primes at any given time. The problem would be the result (divisor) getting large quickly.

Last fiddled with by kruoli on 2021-01-29 at 14:44 Reason: Being pedantic. Removed something I'm not yet sure about.
kruoli is offline   Reply With Quote
Old 2021-01-29, 15:55   #9
Viliam Furik
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

2A616 Posts
Default

Quote:
Originally Posted by kruoli View Post
So my next question would be: Do you really need to store the primes?
I don't think I can do it without storing them. Primorial is the product of all the primes used. In the computation of the numerator, each prime is used once, that's true. I have looked at the primesieve_next_prime function, but I don't think it can be faster to compute the primes instead of storing some sort of a list of them.
Viliam Furik is offline   Reply With Quote
Old 2021-01-29, 16:18   #10
Dr Sardonicus
 
Dr Sardonicus's Avatar
 
Feb 2017
Nowhere

115458 Posts
Default

Quote:
Originally Posted by Viliam Furik View Post
Oopsie... My bad, I confused the name. Will correct it.


Blogging, as it should be!
Dr Sardonicus is offline   Reply With Quote
Old 2021-02-12, 19:22   #11
Viliam Furik
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

2·3·113 Posts
Default

Quick question, does GMP, when using C on Windows, have to be compiled or something? It seems to me it does, as there is only a source code available on the website. I assume the answer will be the same for the primesieve library.
Viliam Furik is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime representing constant Viliam Furik y-cruncher 13 2021-01-14 09:43
Did the P1 percentage calculation program change recently? petrw1 Marin's Mersenne-aries 8 2020-09-13 16:07
Prime generating series Citrix Open Projects 18 2013-08-24 05:24
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34
Prime-generating polynomials Orgasmic Troll Math 28 2004-12-02 16:37

All times are UTC. The time now is 03:58.


Sat Oct 16 03:58:00 UTC 2021 up 84 days, 22:26, 0 users, load averages: 1.35, 1.07, 1.05

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.