mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2005-11-09, 16:39   #12
Greenbank
 
Greenbank's Avatar
 
Jul 2005

2×193 Posts
Default

fmadd certainly exists for the G4.

Google for "MPCFPE32B". The 3.5MB PDF that's first in the list is the Programming Environments Manual for the G4.

[EDIT] I've also spotted jasong's msieve comments and fast mulmod for < 2^50 bit numbers. It's faster than my mulmod code so it'll fit in nicely (if it all works properly) for sieving p < 2^50.

Last fiddled with by Greenbank on 2005-11-09 at 16:41
Greenbank is offline   Reply With Quote
Old 2005-11-10, 10:03   #13
fatphil
 
fatphil's Avatar
 
May 2003

3·7·11 Posts
Default

Quote:
Originally Posted by Greenbank
fmadd certainly exists for the G4.

Google for "MPCFPE32B". The 3.5MB PDF that's first in the list is the Programming Environments Manual for the G4.
Thanks, I'll grab that.

I'll also ask around here, and see if there are any 'internal' documents that I can get my hands on too. (G4 only, alas)
fatphil is offline   Reply With Quote
Old 2005-11-10, 11:34   #14
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

12D316 Posts
Default

Quote:
Originally Posted by Greenbank
I've also spotted jasong's msieve comments and fast mulmod for < 2^50 bit numbers. It's faster than my mulmod code so it'll fit in nicely (if it all works properly) for sieving p < 2^50.
Wasn't he jasonp?

Luigi

Last fiddled with by ET_ on 2005-11-10 at 11:34
ET_ is offline   Reply With Quote
Old 2005-11-10, 12:27   #15
Greenbank
 
Greenbank's Avatar
 
Jul 2005

2·193 Posts
Default

Indeed it was jasonp, my attribution was from memory.

On a related note, here's why G5's rock and G4's lead to madness (or at least hideous amounts of complication).

Looking at my 32-bit code for the core of the montgomery multiplication.

For a good reference see Handbook of Applied Cryptography Chapter 14.
Google for "HAC" and it is the second link. PDF's can be downloaded free for personal use (see copyright notice).

For calculating a*b (mod m) :-

It assumes:

a and b are numbers < 2^64 (i.e they fit in 2 32-bit words)
B = 2^32
n = 2 (since a and b have 2 words)
R= B^n = (2^32)^2 = 2^64
m' = (-m)^-1 (mod B) has been calculated
p = m (I tend to use these terms interchangably!)
x = a
y = bR (mod m) has been calculated.

The code is inside the (method == 1 ) code block of:

http://www.greenbank.org/sob/macasm/montmul.c

20 muls, loads of adds to handle numerous carries. A nightmare but still quite quick.

Most importantly, no division!

Now, to do this on a G5 you need this:-

Since the G5 is 64-bit we can use B=2^64
Since both values (a and b) are 64-bit they only fit in one double-word so n=1.
R = B^n = B^1 = B = 2^64.
This also means we don't need the loop inside the algorithm.
Since we only need one iteration of the loop we can assume that A is 0 all the way until we set it with the result at the end simplifying it more.

Now we just need to do (untested):-

mulld r2,x,y // r2 = low_word(x*y)
mulld r3,r2,m' // r3 = x * y * m' (mod B) : mod implicit as we only use low double-word of result
mulhdu r4,x,y // r4 = high_word(x*y)
mulld r5,r3,p // r5 = low_word(u*m)
mulhdu r6,r3,p // r6 = high_word(u*m)
add r7,r4,r6 // Worry about carry? Not if p < 2^63. What if any of a,b,p are 64-bit?
addco r2,r2,r5 // r2 now useless, we just want the carry bit
addze r7,r7 // Add carry from low_word addition on to r7

So for 64 bit processors it comes down to 5 multiplies and 3 adds. The only other bit to do is the final subtraction if r7 > p :-

cmpld r7,p // (Logical/unsigned) Compare result to p
blt done // If it is less than the skip the subtract
subf r7,r7,p // r7 = r7 - p
done: // the end

64-bit * 64-bit (mod 64-bit) with 5 muls, 3 adds, 1 comparison and a conditional subtract. No divides.

Nice.

Last fiddled with by Greenbank on 2005-11-10 at 12:31
Greenbank is offline   Reply With Quote
Old 2005-11-10, 19:00   #16
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

19×613 Posts
Default

Quote:
Originally Posted by Greenbank
Now we just need to do (untested):-

mulld r2,x,y // r2 = low_word(x*y)
mulld r3,r2,m' // r3 = x * y * m' (mod B) : mod implicit as we only use low double-word of result
mulhdu r4,x,y // r4 = high_word(x*y)
mulld r5,r3,p // r5 = low_word(u*m)
mulhdu r6,r3,p // r6 = high_word(u*m)
add r7,r4,r6 // Worry about carry? Not if p < 2^63. What if any of a,b,p are 64-bit?
addco r2,r2,r5 // r2 now useless, we just want the carry bit
addze r7,r7 // Add carry from low_word addition on to r7

So for 64 bit processors it comes down to 5 multiplies and 3 adds.
You should only need 4 MULs - I don't understand why you need both the low and high half of u*m. Here is a simple version of the code I use for 64-bit Montgomery modmul - q is the modulus, qinv*q == 1 mod 2^64, and x stores the datum one wants. This is taken from the guts of a left-to-right binary exponentiation loop, that's why the initial double-wide product is a squaring in my case. The (x > hi)? normalization is followed by a mod-q conditional doubling, depending on the current bit of the exponent of the modular powering - I've precomputed qhalf = q/2 for that:

Code:
		SQR_LOHI(x,lo,hi);	/* mulld, mulhdu */
		lo *= qinv;		/* mulld */
		MUL_HIGH(q,lo,lo);	/* mulhdu */
		x = hi - lo;

		if(x > hi)
		{
			x += q;	/* had a borrow */
		}

		if({current_bit_set})
		{
			if(x > qhalf)	/* Combines overflow-on-add and need-to-subtract-q-from-sum checks */
			{
				x = x + x;
				x -= q;
			}
			else
				x = x + x;
		}
ewmayer is online now   Reply With Quote
Old 2005-11-10, 19:11   #17
Orgasmic Troll
Cranksta Rap Ayatollah
 
Orgasmic Troll's Avatar
 
Jul 2003

10100000012 Posts
Default

hey phil, I've got a G5 iMac at home, can you put it to work?

(I'm also working on a Dual 2.7 PowerMac with 4GB of RAM here at work, but I don't think I can appropriate cycles. The best part is the 30" cinema display and the 23" off to the side.. fricken sweeeet )
Orgasmic Troll is offline   Reply With Quote
Old 2005-11-10, 19:51   #18
Greenbank
 
Greenbank's Avatar
 
Jul 2005

1100000102 Posts
Default

Quote:
Originally Posted by ewmayer
You should only need 4 MULs - I don't understand why you need both the low and high half of u*m.
OK, if a and b fit in one double-word then:-

m' = (-m)^-1 mod 2^64

u = x*y*m' mod 2^64

u = (x * y mod 2^64) * m' (mod 2^64)

So for calculating u you can get away with ignoring the high double-word.

However:-

A = (x*y + u*m ) / 2^64

does not imply you can do:-

high(x*y) + high(u*m)

You need the low half of u*m because if low(u*m)+low(x*y) > 2^64 then your answer will be off by one.

Let's see if can concoct an example with m=13. b=16 (2^4).

m' = (-13)^-1 (mod 16)
m' = 3^-1 (mod 16)

3*11 = 33 = 1 (mod 16)

m' = 11

Now pick a trivial a and b.

x=a=1
b=1

y = bR (mod m) = 1*16 (mod 13) = 3

u = x*y*m' mod 16

u = 1*3*11 mod 16 = 33 mod 16 = 1

A = (x*y + u*m ) / 16

A = (1*3 + 1*13) / 16

Without looking at the low word of (u*m) you're doing:-

A = (1*3)/16 + (1*13)/16 = 0 + 0 = 0

The correct way does:-

A = (1*3 + 1*13) / 16 = (3 + 13) / 16 = 16/16 = 1

Tomorrow I'll find an example for 50-bit values of a,b and m if you like.

Last fiddled with by Greenbank on 2005-11-10 at 19:52
Greenbank is offline   Reply With Quote
Old 2005-11-10, 20:21   #19
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

19×613 Posts
Default

My code is adapted from some sieving code I got several years ago from Peter Montgomery himself: here is the relevant core operation sequence from his code comments (his only allows operands < 2^63, so is a few ALU ops cheaper):
Code:
                   t0 = UMULH(power2, power2);
                   if (t0 >= q) t0 -= q;
                   power2 = UMULH(q, MULQ(qinv64, MULQ(power2, power2))) - t0;
                   power2 = (q >> (bitnow & power2)) - (power2 >> bitnow);
Again, just 4 MULs - 2 lower-half and 2 upper-half. My and Peter's codes have been extensively self-tested as far as the core modmul routines go, so I think we'd know if carries were getting dropped with O(1) probability.
ewmayer is online now   Reply With Quote
Old 2005-11-10, 21:02   #20
xilman
Bamboozled!
 
xilman's Avatar
 
"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across

250428 Posts
Default

Quote:
Originally Posted by ewmayer
so I think we'd know if carries were getting dropped with O(1) probability.
Better than o(1) probability ...


Paul
xilman is offline   Reply With Quote
Old 2005-11-11, 08:58   #21
Greenbank
 
Greenbank's Avatar
 
Jul 2005

6028 Posts
Default

Heh, I'm willing to be proven wrong (and learn at the same time). Plus it means I could get rid of a MUL.

I'll find a 50-bit example and then we can trace through the code...
Greenbank is offline   Reply With Quote
Old 2005-11-11, 11:10   #22
Greenbank
 
Greenbank's Avatar
 
Jul 2005

18216 Posts
Default

Hmm, try as I might I can't find a situation where:

mullw(x*y)+mullw(u*m) < 2^64.

for m=882861709928411 and random a,b values. My test program has been running for 25 minutes and it checks 100,000 random values of a and b a second.

Which leads me to believe that it is assumed that the carry will always be present. However I can find a case for m=13 (will post that in a sec).

Here's an example where it is present:-

R=2^64
a=72185797819121
b=151612213027919
m=882861709928411
mprime=12466608506227884973

(a*b)%m
344317125598003

y=(b*2^64)%m
105419857829375

u=(a*y*mprime)%B
7194243690555063843

A=(a*y + u*m) / 2^64
= 6351529896101669193770756995022848 / 2^64 = 344317125598003

A_other = (x*y)/2^64 + (u*m)/2^64
= 412528981 + 344316713069021 = 344317125598002 (out by one)
Greenbank is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 22:44.


Fri Aug 6 22:44:06 UTC 2021 up 14 days, 17:13, 1 user, load averages: 4.35, 4.20, 3.79

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.