![]() |
Should my program use so much memory?
Last November, I wrote program capable of finding testing ranges of mersenne prime numbers for fun in C using the C Standard Libraries and the GNU Multi-Precision Library. I wanted to multithread it with the pthreads library, but could not get compilation to work so I put the program down until this weekend. I have gotten multithreading (coarse multithreading, as it runs separate Lucas Lehmer tests in separate threads, rather than actually multithreading the Lucas Lehmer tests themselves) working on both Unix and Windows.
Anyway, I have never used the program test large prime numbers (I have always tested low ranges, like M1 to M12000) and I decided to give testing M32582657 a try. I calculated that my program should use approximately 12.6MB (1 MB = 2^20 bytes, with (32582657 * 2 + 1) / 64 bytes for the sieve and 32582657 / 8 * 3 bytes for the Lucas Lehmer test) of RAM for data and assumed that it would use another megabyte of RAM for its instructions. I was wrong. It uses approximately 30MB to 55 MB of RAM, alternating between the two and while I am describing it as an alternation, it is not a true alternation, but that is the best way I can describe its behavior. If I could plot a normal distribution of its memory consumption, I would imagine that the mean of the normal distribution would be approximately 44 to 45 MB. To explain this, I reasoned that the GMP library functions are allocating their own memory as a scratch space to perform computations. I tried a few things to lower my program's memory consumption such as changing the GMP functions used in the Lucas Lehmer test and switching back to the single threaded version, which I am using to run the M32582657 test now, before realizing this. Digging through the code for GMP's mpz_powm_ui() function confirmed this. However, I cannot see how that would require 3 to 4 times the amount of memory I computed. I installed Prime95 to see how much memory it used and it allocated exactly 22,780 KB, although it has since declined to 18,512 KB since I started typing this. Does a program using GMP to do Lucas Lehmer tests requiring 2 (+/- 0.5) times the amount of memory Prime95 uses sound reasonable to anyone? |
First, you shouldn't use mpz_powm_ui() for the modular exponentiation. You can do the reduction mod 2^n-1 with a shift and add, but mpz_powm_ui() does not know that. Just square and reduce repeatedly instead.
The Schönhage-Strassen multiplication in GMP makes a copy of your data cut into pieces, to act as the coefficients of polynomials that get multiplied. The input polynomials are zero padded to degree equal to that of the product polynomial, and each coefficient is zero-padded to the largest possible size of coefficients in the product polynomial, which explains a 4-fold increase over the size of the input numbers. You could avoid some of the zero padding by writing a DWT on top of the Schönhage-Strassen convolution, but that would require messing with some of the core GMP routines. Alex |
How would you do the reduction mod 2^n-1 with a shift and add? I knew how to do reductions mod 2^n with a binary AND, but I did not think there was any equivalent fast operation for mod 2^n-1.
|
[QUOTE=ShiningArcanine;126899]How would you do the reduction mod 2^n-1 with a shift and add? I knew how to do reductions mod 2^n with a binary AND, but I did not think there was any equivalent fast operation for mod 2^n-1.[/QUOTE]
Put N = 2^n A + B + A - A = A(2^n-1) + B + A. |
Another way of putting it is 2^n == 1 (mod 2^n-1)
So 2^n * A + B == 1 * A + B == A+B (mod 2^n-1) So take the unreduced product, separate high and low bits with mpz_tdiv_q_2exp() and mpz_tdiv_r_2exp() and add the two pieces. Alex |
[QUOTE=akruppa;126901]Another way of putting it is 2^n == 1 (mod 2^n-1)
So 2^n * A + B == 1 * A + B == A+B (mod 2^n-1) So take the unreduced product, separate high and low bits with mpz_tdiv_q_2exp() and mpz_tdiv_r_2exp() and add the two pieces. [/QUOTE]... and if there is an overflow past 2^n-2, truncate to n bits and add 1. But this won't catch a final 2^n-1 result, if that would be a problem, then instead: if there is an overflow past 2^n-2, subtract 2^n-1. |
I am an undergraduate Biochemistry/Computer Science double major and I did not quite understand what you guys said, so I asked one of the graduate mathematics students to look over it and after explaining binary operations to him, he said that what you were saying is in order to compute y, where y = x mod (2^n - 1), I am sum every n bits AND (2^n - 1).
He used a square bracket notation that is alien to me to indicate that a number was being reduced modulo another number, so I am assuming that the course where I would learn that is abstract/applied algebra. So to compute 111000 mod 10^n - 1 where n = 10, I do (000 AND 011) + (111 AND 011), to get 10. Is that what you were saying or do I have things terribly wrong? |
The binary (binary as in base 2) AND operator does not work if you do it with decimal digits.
Saying "sum every n bits AND (2^n - 1)" seems correct (although a bit vague) but is misleading, there's never any AND operation actually performed. Your example doesn't look right to me, if n=10, you'd be reducing modulo 9999999999, so 111000 would remain unchanged. Simply look at the algebra: N = 2[sup]n[/sup] * A + B ≡ A+B (mod 2[sup]n[/sup]-1) If you want to reduce N modulo 2[sup]n[/sup]-1, cut N into two pieces: the lower n bits (=B) and the remaining upper bits (=A). Then add these pieces. There's nothing more to it. As retina pointed out, this sum is not necessarily < 2[sup]n[/sup]-1; if N ≤ 2[sup]2n[/sup]-1, then A, B ≤ 2[sup]n[/sup]-1 and A+B ≤ 2[sup]n+1[/sup]-1. If you square this again, you get something ≤ 2[sup]2n+2[/sup]-1, etc., i.e. the size can run away. You can prevent this by applying the reduction twice. Alex |
I should have mentioned that my example uses base 2 to perform computations, so 2 (base 10) = 10 (base 2) and 111000 (base 2) = 56 (base 10). The AND operator works here, as I am doing binary arithmetic.
|
Ah, ok... the 10^n had me confused.
Alex |
I have an update. I replaced mpz_mod(n->s, n->s, n->num); with the following in my code:
[QUOTE]mpz_tdiv_q_2exp(n->t, n->s, n->exp); mpz_tdiv_r_2exp(n->s, n->s, n->exp); mpz_add(n->s, n->s, n->t); if (mpz_cmp(n->s, n->num) >= 0) { mpz_sub(n->s, n->s, n->num); }[/QUOTE] Tests on my university's Unix server of all of the mersenne numbers from 0 to 3217 now take 2 seconds instead of 7 and a test from 0 to 10000 now takes 90 seconds. Testing M21701 now takes 8 seconds instead of 31 seconds. On my computer, testing M21701 now takes 20 seconds instead of 90 seconds and testing all of the mersenne numbers from 0 to 3217 now takes 9 seconds instead of 30 seconds. I had no idea that GMP's modulus operation was so expensive. Edit: I did some more testing. It takes 2-3 seconds for Prime95 to test M44497 on my computer and 124 seconds for my program, which uses GMP. Prior to this optimization, my program took 555 seconds on my computer to test M44497 and the speed-up is proportional (to 2 significant figures) to the one I observed for M21701, which surprised me. Anyway, this means that Prime95 is 40 to 60 times faster than my program at testing prime numbers around M44497, which is remarkable. |
| All times are UTC. The time now is 23:03. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.