Go Back > Extra Stuff > Programming

Thread Tools
Old 2006-05-26, 08:52   #1
Citrix's Avatar
Jun 2003

1,579 Posts
Default Modular multiplication and addition

I am trying to multiply 2, 31 bit integers and then add a number to the result; modulo a prime p(which is 31 bit)

I am using a 64 bit integer to store the product, multiplication done mod p.
Then adding the result to some other integer, mod p.

Q. Is there a faster way to do this?
After the multiplication step, I do not need 64 bit precision. Is there a way to reduce it to a 32 bit number? Assuming that 64 bit additions are slower than 32 bit additions? Would reducing the number to 32 bit help at all.

Thanks for the answer in advance.

edit: Doing this in VC++ 6.0

Last fiddled with by Citrix on 2006-05-26 at 08:52
Citrix is offline   Reply With Quote
Old 2006-05-26, 13:29   #2
Greenbank's Avatar
Jul 2005

6028 Posts

Does p remain constant over many calculations?

If not:-

Look down about 8 posts here:-

If p does remain constant then look at Montgomery multiplication.

The _Handbook of Applied Cryptography_ has a good description of Montgomery Multiplication.

If you choose a word size of 32-bits then you can use a much simpler version of the code.
Greenbank is offline   Reply With Quote
Old 2006-05-27, 02:55   #3
Citrix's Avatar
Jun 2003

1,579 Posts

Thank you for the reply. P remains constant over several multiplications. I have read on how you calculate xy*2^-n to save over the divison step. But how do you optomize this further to speed the code up if you are using the same p for several multplications.

Since in each case x and y are different. For calculating b^e it might help, but if x, y are changing how do you optomize it.


Is the time to add 2 numbers stored as 32 bit integers the same as adding 2 numbers stored as 64 bit numbers, even when the numbers are less than say 2^16?

Last fiddled with by Citrix on 2006-05-27 at 03:02
Citrix is offline   Reply With Quote
Old 2006-05-27, 04:18   #4
dsouza123's Avatar
Sep 2002

12268 Posts

Using 32 bit ALU registers in assembly
the 64 bit adds require twice the instructions.

num1_32 dd 30000
num2_32 dd 1000

num1_64 dd 30000, 0 ; could be one dq (64 bit number)
num2_64 dd 1000, 0

num3_64 dq 30000

mov eax, num2_32
add num1_32, eax


mov eax, num2_64
mov edx, num2_64 + 4
add num1_64, eax
adc num1_64, edx

If the values were all in registers it would be

add eax, ebx


add eax, ebx
adc edx, ecx

In SSE2 there are instructions that work on 64 bit integers.
(There probably is better SSE2 code, I am far more familar with ALU instructions.
There are instructions that add the two 64 bit halves of a 128 bit xmm register.)

num1_64 dq 16386, 0 ; dq is 64 bit integer also called qword
num2_64 dq 1024, 0

num1_128 do 00000000000040020000000000000400

movq mm0, num2_64
paddq mm0, num1_64
dsouza123 is offline   Reply With Quote
Old 2006-05-27, 04:25   #5
dsouza123's Avatar
Sep 2002

2·331 Posts

Also check a post in this forum by Dresdenboy

Fast calculations modulo small mersenne primes like M61
dsouza123 is offline   Reply With Quote
Old 2006-07-03, 01:32   #6
Ken_g6's Avatar
Jan 2005
Caught in a sieve

1100010112 Posts

Chris Monico's old ECCP-109 project would appear to describe the solution to your problem. The problem there was to multiply 2 128-bit (okay, 109-bit) numbers, then take them modulus a constant 109-bit number, so this should be a cinch by comparison!

Here's his comments in the assembly code after the multiplication is done:
/* The 218-bit product is now stored in t[0],...,t[6].
   Our next mission, should we choose to accept it, is to
   reduce this number mod 'p'.
   We proceed in several steps:
   1) Notice that we can write:
      t = (a 128-bit number) + 2^{128}t4 + 2^{160}t5 + 2^{196}t6
      Where t4,t5,t6 each have 32-bits.
      Furthermore, we've precomputed the reductions :
        r4 := 2^128 mod p = 7f0 55455b5c 27a9ef92 540f3986 (107 bits)
        r5 := 2^160 mod p = acc f5a4e730 b2ec7424 32d0a6bf (108 bits)
        r6 := 2^192 mod p = a7c a3ea9cfb 531af0b6 b786c901 (108 bits)

      So that
      r4*t4 will have 32+107 = 139 bits
      r5*t5 will have 32+108 = 140 bits
      r6*t6 will have 22+108 = 130 bits

      Let's look carefully at what we'll do:
      t =  (a 128-bit number) + 2^{128}t4 + 2^{160}t5 + 2^{196}t6
        =  (128 bits) + (139 bits) + (140 bits) + (130 bits)
        =  (128 bits + 139 bits) + (140 bits + 130 bits)
        <= (140 bits)            + (141 bits)
        <= (142 bits)

   2) Use the precomputed tables (T) to reduce it to a number
      in [0,2p)
   3) Perform a single subtraction, if necessary, to get it into
I believe the code does this at least once more on the 142-bit number.

You can get the entire mess from my web site. Unfortunately, the good comments are only in the AT&T-syntax assembly, which is backwards to the standard Intel assembly (last argument first), and only compiles with certain Linux-style assemblers (gcc, et al).

Good luck!

P.S. If your modulus system is good enough, and you know the sum fits in 64 bits, it can be worthwhile to add a 32-bit number to the 64-bit number before taking the modulus. See my final version for that kind of code.

Last fiddled with by Ken_g6 on 2006-07-03 at 01:35
Ken_g6 is offline   Reply With Quote
Old 2007-01-26, 01:00   #7
geoff's Avatar
Mar 2003
New Zealand

13·89 Posts

In case anyone is still interested in multiplying a*b (mod p) for 0 <= a,b < p < 2^31, here are some SSE2 functions that work out faster than the usual mul/div code provided p is fixed for a reasonable number of calcualtions. If b is also fixed then it gets even faster.
Attached Files
File Type: zip (2.5 KB, 220 views)
geoff is offline   Reply With Quote

Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Possible addition for the smiling Smilies? schickel Forum Feedback 40 2012-08-15 00:40
CPU clocks statistics (TF, modular multiplication) tichy Software 1 2010-12-15 08:13
Best practices in addition chains SPWorley Programming 10 2009-07-28 13:50
Addition to GMP-ECM... WraithX GMP-ECM 6 2007-03-06 07:56
New addition to the family! SlashDude 15k Search 5 2004-03-13 21:28

All times are UTC. The time now is 22:28.

Wed May 5 22:28:24 UTC 2021 up 27 days, 17:09, 0 users, load averages: 2.10, 2.57, 2.67

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.