mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2023-04-08, 13:27   #1
Yves Gallot
 
"Yves Gallot"
Apr 2023
Toulouse

2·7 Posts
Default A new algorithm for generic modular reduction

I think that the following algorithm is faster than the current generic modular reduction included in gwnum for fast exponentiation.
The method is to compute Montgomery modular multiplication using R = 2n - 1 (a Mersenne number).

Let z = x * y mod p, where p is an odd integer. R is a Mersenne number 2n - 1 >= p. q = 1/p (mod R).
REDC algorithm is
  • t = x * y (a wide multiplication N x N => 2N).
  • tlo = t mod R, thi = t / R.
  • r = (tlo * q) mod R (a multiplication N x N => N, q is constant)
  • r' = (r * p) / R (a wide multiplication N x N => 2N, p is constant)
The result is thi - r'.
The number of multiplications is two (N x N => 2N) and one (N x N => N), rather than three (N x N => 2N).

We can go further. We have to compute tlo = t mod R, thi = t / R. t_lo is the cyclic convolution of x and y. t can be calculated using a cyclic convolution and a negacyclic convolution.
Let (a_1·x + a_0)·(b_1·x + b_0) mod x^2 - 1 = (a_0·b_1 + a_1·b_0)·(x - 1) + (a_0·b_1 + a_1·b_0) + (a_0·b_0 + a_1·b_1) = h·(x - 1) + l.
If C = (a_0 + a_1)·(b_0 + b_1) and N = (a_0 - a_1)·(b_0 - b_1) we have l = C and h = (C - N)/2.
If the 2 wide multiplications are evaluated using this method, we don't have to compute t mod R, t / R and (r * p) / R.
The number of operations is now three cyclic convolutions mod 2n - 1 and two negacyclic convolutions mod 2n + 1.
All operations are now N x N => N where N is the size of the tested number p.
The multiplication modulo any number is five times slower than the multiplication modulo a Mersenne number of the same size.

I implemented a 2-prp test using this method and 64-bit integers. Of course big numbers and fast transforms must be used in practice.
Attached Files
File Type: cpp modMersenne.cpp (2.7 KB, 33 views)
Yves Gallot is offline   Reply With Quote
Old 2023-04-08, 19:16   #2
pepi37
 
pepi37's Avatar
 
Dec 2011
After 1.58M nines:)

1,699 Posts
Default

Is this algo only for Mersenne number, or can be used on others numbers and bases?
pepi37 is online now   Reply With Quote
Old 2023-04-08, 19:42   #3
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

5×937 Posts
Default

Quote:
Originally Posted by pepi37 View Post
Is this algo only for Mersenne number, or can be used on others numbers and bases?
It is not for Mersenne or simple k*2^n+-1. It is for other numbers which do not use "special" mod. It is for "generic" mod.

George, is this any good for GWNUM? If so, how long before it is implemented?
paulunderwood is offline   Reply With Quote
Old 2023-04-08, 21:27   #4
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

3·2,447 Posts
Default

This could be helpful for primorial and factorial searches, such as what PrimeGrid is doing.
rogue is online now   Reply With Quote
Old 2023-04-09, 07:48   #5
Citrix
 
Citrix's Avatar
 
Jun 2003

22·11·37 Posts
Default

This might be faster:-
https://www.ams.org/journals/mcom/20...03-01543-6.pdf

Another faster approach for k*2^n+1 (especially larger k) would be to use the Chinese remainder FFT method using k, R=2^n-1 and coprime R'=2^(n+1)-1. The modulus step can be computed by addition.

(Slight modification of section 7 in https://www.ams.org/journals/mcom/19...-1185244-1.pdf)

Last fiddled with by Citrix on 2023-04-09 at 08:02 Reason: making more precise
Citrix is offline   Reply With Quote
Old 2023-04-09, 09:57   #6
Yves Gallot
 
"Yves Gallot"
Apr 2023
Toulouse

2×7 Posts
Default

Quote:
Originally Posted by Citrix View Post
Yes, the last step of variation 2 is a single negacyclic convolution.
The number of operations is two cyclic convolutions mod 2n - 1 (a·b and (a·b)·N') and two negacyclic convolutions mod 2n + 1 (a·b and m·N).
Then it is four times slower than the multiplication modulo a Mersenne number.

Last fiddled with by Yves Gallot on 2023-04-09 at 09:58
Yves Gallot is offline   Reply With Quote
Old 2023-04-09, 17:13   #7
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

17·487 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
George, is this any good for GWNUM? If so, how long before it is implemented?
It's a long-shot.

GWNUM is designed to do lots of operations using one FFT. Here, you are mixing results from 2 FFTs (cyclic and negacyclic).
Prime95 is offline   Reply With Quote
Old 2023-04-10, 09:35   #8
Yves Gallot
 
"Yves Gallot"
Apr 2023
Toulouse

2·7 Posts
Default

Quote:
Originally Posted by Prime95 View Post
GWNUM is designed to do lots of operations using one FFT. Here, you are mixing results from 2 FFTs (cyclic and negacyclic).
McLaughlin variation 1 should be an easiest option and is as fast as variation 2.

If g = 1 we have R = 2n - 1, Q' = 2n+1 - 1: two cyclic transforms of the same size with different weights.
Note that the last step of variation 1 is not correct: we have 0 <= t < 2Q' then t - N if t >= N is not enough for the modular reduction.
A common optimisation of REDC is to set N' = 1/N (mod R) (and not -1/N (mod R)). We have here:
  1. m = a·b·N' mod R.
  2. s = ((2m + 1)·N - (2a)·b) mod Q′.
  3. If a·b + (m + 1)·N = s (mod 2) then t = s else t = s + Q′.
  4. We have 0 <= t < 2Q', return t mod N.

Last fiddled with by Yves Gallot on 2023-04-10 at 10:12
Yves Gallot is offline   Reply With Quote
Old 2023-04-10, 17:10   #9
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

9C816 Posts
Default

Could this benefit programs like mfaktc and mfakto?
ixfd64 is offline   Reply With Quote
Old 2023-04-11, 12:30   #10
Mysticial
 
Mysticial's Avatar
 
Sep 2016

22×5×19 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Here, you are mixing results from 2 FFTs (cyclic and negacyclic).
Off topic for this thread, but figured I'd mention that I do something similar in y-cruncher as a memory/storage reduction measure for the absolute largest of multiplications. It is slower due to needing to do additional passes over the dataset, but it almost halves the scratch memory/storage required since you halve the transform lengths.

CRT of cyclic (2^N - 1) and negacyclic (2^N + 1) is mathematically the same as the first radix 2 reduction of a normal FFT. The difference is where you perform the carryout.

Last fiddled with by Mysticial on 2023-04-11 at 12:30
Mysticial is offline   Reply With Quote
Old 2023-04-11, 13:13   #11
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Liverpool (GMT/BST)

3×23×89 Posts
Default

Quote:
Originally Posted by Mysticial View Post
Off topic for this thread, but figured I'd mention that I do something similar in y-cruncher as a memory/storage reduction measure for the absolute largest of multiplications. It is slower due to needing to do additional passes over the dataset, but it almost halves the scratch memory/storage required since you halve the transform lengths.

CRT of cyclic (2^N - 1) and negacyclic (2^N + 1) is mathematically the same as the first radix 2 reduction of a normal FFT. The difference is where you perform the carryout.
Presumably this ~doubles the FFT length that will fit in a L3 cache. This could be very useful for some cache size/FFT length combinations.
henryzz is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Another factor algorithm using lattice reduction Till Factoring 1 2021-08-24 14:03
Manual assignments cpu/generic vs gpu TF kriesel PrimeNet 1 2019-06-04 15:00
Fast modular reduction for primes < 512 bits? BenR Computer Science & Computational Number Theory 2 2016-03-27 00:37
Efficient modular reduction with SSE fivemack Computer Science & Computational Number Theory 15 2009-02-19 06:32
CPU stats reduction Prime95 PrimeNet 3 2008-11-17 19:57

All times are UTC. The time now is 14:03.


Fri Jul 7 14:03:10 UTC 2023 up 323 days, 11:31, 0 users, load averages: 0.55, 0.97, 1.08

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔