![]() |
|
|
#1 |
|
"Yves Gallot"
Apr 2023
Toulouse
2×7 Posts |
I think that the following algorithm is faster than the current generic modular reduction included in gwnum for fast exponentiation.
The method is to compute Montgomery modular multiplication using R = 2n - 1 (a Mersenne number). Let z = x * y mod p, where p is an odd integer. R is a Mersenne number 2n - 1 >= p. q = 1/p (mod R). REDC algorithm is
The number of multiplications is two (N x N => 2N) and one (N x N => N), rather than three (N x N => 2N). We can go further. We have to compute tlo = t mod R, thi = t / R. t_lo is the cyclic convolution of x and y. t can be calculated using a cyclic convolution and a negacyclic convolution. Let (a_1·x + a_0)·(b_1·x + b_0) mod x^2 - 1 = (a_0·b_1 + a_1·b_0)·(x - 1) + (a_0·b_1 + a_1·b_0) + (a_0·b_0 + a_1·b_1) = h·(x - 1) + l. If C = (a_0 + a_1)·(b_0 + b_1) and N = (a_0 - a_1)·(b_0 - b_1) we have l = C and h = (C - N)/2. If the 2 wide multiplications are evaluated using this method, we don't have to compute t mod R, t / R and (r * p) / R. The number of operations is now three cyclic convolutions mod 2n - 1 and two negacyclic convolutions mod 2n + 1. All operations are now N x N => N where N is the size of the tested number p. The multiplication modulo any number is five times slower than the multiplication modulo a Mersenne number of the same size. I implemented a 2-prp test using this method and 64-bit integers. Of course big numbers and fast transforms must be used in practice. |
|
|
|
|
|
#2 |
|
Dec 2011
After 1.58M nines:)
1,699 Posts |
Is this algo only for Mersenne number, or can be used on others numbers and bases?
|
|
|
|
|
|
#3 | |
|
Sep 2002
Database er0rr
5×937 Posts |
Quote:
George, is this any good for GWNUM? If so, how long before it is implemented? |
|
|
|
|
|
|
#4 |
|
"Mark"
Apr 2003
Between here and the
3×2,447 Posts |
This could be helpful for primorial and factorial searches, such as what PrimeGrid is doing.
|
|
|
|
|
|
#5 |
|
Jun 2003
22·11·37 Posts |
This might be faster:-
https://www.ams.org/journals/mcom/20...03-01543-6.pdf Another faster approach for k*2^n+1 (especially larger k) would be to use the Chinese remainder FFT method using k, R=2^n-1 and coprime R'=2^(n+1)-1. The modulus step can be computed by addition. (Slight modification of section 7 in https://www.ams.org/journals/mcom/19...-1185244-1.pdf) Last fiddled with by Citrix on 2023-04-09 at 08:02 Reason: making more precise |
|
|
|
|
|
#6 | |
|
"Yves Gallot"
Apr 2023
Toulouse
11102 Posts |
Quote:
The number of operations is two cyclic convolutions mod 2n - 1 (a·b and (a·b)·N') and two negacyclic convolutions mod 2n + 1 (a·b and m·N). Then it is four times slower than the multiplication modulo a Mersenne number. Last fiddled with by Yves Gallot on 2023-04-09 at 09:58 |
|
|
|
|
|
|
#7 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
827910 Posts |
|
|
|
|
|
|
#8 | |
|
"Yves Gallot"
Apr 2023
Toulouse
2·7 Posts |
Quote:
If g = 1 we have R = 2n - 1, Q' = 2n+1 - 1: two cyclic transforms of the same size with different weights. Note that the last step of variation 1 is not correct: we have 0 <= t < 2Q' then t - N if t >= N is not enough for the modular reduction. A common optimisation of REDC is to set N' = 1/N (mod R) (and not -1/N (mod R)). We have here:
Last fiddled with by Yves Gallot on 2023-04-10 at 10:12 |
|
|
|
|
|
|
#9 |
|
Bemusing Prompter
"Danny"
Dec 2002
California
9C816 Posts |
Could this benefit programs like mfaktc and mfakto?
|
|
|
|
|
|
#10 | |
|
Sep 2016
38010 Posts |
Quote:
CRT of cyclic (2^N - 1) and negacyclic (2^N + 1) is mathematically the same as the first radix 2 reduction of a normal FFT. The difference is where you perform the carryout. Last fiddled with by Mysticial on 2023-04-11 at 12:30 |
|
|
|
|
|
|
#11 | |
|
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
3×23×89 Posts |
Quote:
|
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Another factor algorithm using lattice reduction | Till | Factoring | 1 | 2021-08-24 14:03 |
| Manual assignments cpu/generic vs gpu TF | kriesel | PrimeNet | 1 | 2019-06-04 15:00 |
| Fast modular reduction for primes < 512 bits? | BenR | Computer Science & Computational Number Theory | 2 | 2016-03-27 00:37 |
| Efficient modular reduction with SSE | fivemack | Computer Science & Computational Number Theory | 15 | 2009-02-19 06:32 |
| CPU stats reduction | Prime95 | PrimeNet | 3 | 2008-11-17 19:57 |