![]() |
![]() |
#1 |
Jun 2003
30538 Posts |
![]()
I am trying to multiply 2, 31 bit integers and then add a number to the result; modulo a prime p(which is 31 bit)
I am using a 64 bit integer to store the product, multiplication done mod p. Then adding the result to some other integer, mod p. Q. Is there a faster way to do this? After the multiplication step, I do not need 64 bit precision. Is there a way to reduce it to a 32 bit number? Assuming that 64 bit additions are slower than 32 bit additions? Would reducing the number to 32 bit help at all. Thanks for the answer in advance. ![]() edit: Doing this in VC++ 6.0 Last fiddled with by Citrix on 2006-05-26 at 08:52 |
![]() |
![]() |
![]() |
#2 |
Jul 2005
2×193 Posts |
![]()
Does p remain constant over many calculations?
If not:- Look down about 8 posts here:- http://groups.google.com/group/sci.c...s&rnum=5&hl=en If p does remain constant then look at Montgomery multiplication. The _Handbook of Applied Cryptography_ has a good description of Montgomery Multiplication. If you choose a word size of 32-bits then you can use a much simpler version of the code. |
![]() |
![]() |
![]() |
#3 |
Jun 2003
110001010112 Posts |
![]()
Thank you for the reply. P remains constant over several multiplications. I have read on how you calculate xy*2^-n to save over the divison step. But how do you optomize this further to speed the code up if you are using the same p for several multplications.
Since in each case x and y are different. For calculating b^e it might help, but if x, y are changing how do you optomize it. edit: Is the time to add 2 numbers stored as 32 bit integers the same as adding 2 numbers stored as 64 bit numbers, even when the numbers are less than say 2^16? Last fiddled with by Citrix on 2006-05-27 at 03:02 |
![]() |
![]() |
![]() |
#4 |
Sep 2002
2·331 Posts |
![]()
Using 32 bit ALU registers in assembly
the 64 bit adds require twice the instructions. ALIGN 16 num1_32 dd 30000 num2_32 dd 1000 num1_64 dd 30000, 0 ; could be one dq (64 bit number) num2_64 dd 1000, 0 num3_64 dq 30000 mov eax, num2_32 add num1_32, eax versus mov eax, num2_64 mov edx, num2_64 + 4 add num1_64, eax adc num1_64, edx If the values were all in registers it would be add eax, ebx versus add eax, ebx adc edx, ecx In SSE2 there are instructions that work on 64 bit integers. (There probably is better SSE2 code, I am far more familar with ALU instructions. There are instructions that add the two 64 bit halves of a 128 bit xmm register.) ALIGN 16 num1_64 dq 16386, 0 ; dq is 64 bit integer also called qword num2_64 dq 1024, 0 num1_128 do 00000000000040020000000000000400 movq mm0, num2_64 paddq mm0, num1_64 |
![]() |
![]() |
![]() |
#5 |
Sep 2002
2×331 Posts |
![]()
Also check a post in this forum by Dresdenboy
Fast calculations modulo small mersenne primes like M61 http://www.mersenneforum.org/showthread.php?t=1955 |
![]() |
![]() |
![]() |
#6 |
Jan 2005
Caught in a sieve
5×79 Posts |
![]()
Chris Monico's old ECCP-109 project would appear to describe the solution to your problem. The problem there was to multiply 2 128-bit (okay, 109-bit) numbers, then take them modulus a constant 109-bit number, so this should be a cinch by comparison!
Here's his comments in the assembly code after the multiplication is done: Code:
/***************************************************************/ /* The 218-bit product is now stored in t[0],...,t[6]. Our next mission, should we choose to accept it, is to reduce this number mod 'p'. We proceed in several steps: 1) Notice that we can write: t = (a 128-bit number) + 2^{128}t4 + 2^{160}t5 + 2^{196}t6 Where t4,t5,t6 each have 32-bits. Furthermore, we've precomputed the reductions : r4 := 2^128 mod p = 7f0 55455b5c 27a9ef92 540f3986 (107 bits) r5 := 2^160 mod p = acc f5a4e730 b2ec7424 32d0a6bf (108 bits) r6 := 2^192 mod p = a7c a3ea9cfb 531af0b6 b786c901 (108 bits) So that r4*t4 will have 32+107 = 139 bits r5*t5 will have 32+108 = 140 bits r6*t6 will have 22+108 = 130 bits Let's look carefully at what we'll do: t = (a 128-bit number) + 2^{128}t4 + 2^{160}t5 + 2^{196}t6 = (128 bits) + (139 bits) + (140 bits) + (130 bits) = (128 bits + 139 bits) + (140 bits + 130 bits) <= (140 bits) + (141 bits) <= (142 bits) 2) Use the precomputed tables (T) to reduce it to a number in [0,2p) 3) Perform a single subtraction, if necessary, to get it into [0,p) */ /***************************************************************/ You can get the entire mess from my web site. Unfortunately, the good comments are only in the AT&T-syntax assembly, which is backwards to the standard Intel assembly (last argument first), and only compiles with certain Linux-style assemblers (gcc, et al). Good luck! P.S. If your modulus system is good enough, and you know the sum fits in 64 bits, it can be worthwhile to add a 32-bit number to the 64-bit number before taking the modulus. See my final version for that kind of code. Last fiddled with by Ken_g6 on 2006-07-03 at 01:35 |
![]() |
![]() |
![]() |
#7 |
Mar 2003
New Zealand
22058 Posts |
![]()
In case anyone is still interested in multiplying a*b (mod p) for 0 <= a,b < p < 2^31, here are some SSE2 functions that work out faster than the usual mul/div code provided p is fixed for a reasonable number of calcualtions. If b is also fixed then it gets even faster.
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Possible addition for the smiling Smilies? | schickel | Forum Feedback | 40 | 2012-08-15 00:40 |
CPU clocks statistics (TF, modular multiplication) | tichy | Software | 1 | 2010-12-15 08:13 |
Best practices in addition chains | SPWorley | Programming | 10 | 2009-07-28 13:50 |
Addition to GMP-ECM... | WraithX | GMP-ECM | 6 | 2007-03-06 07:56 |
New addition to the family! | SlashDude | 15k Search | 5 | 2004-03-13 21:28 |