![]() |
|
|
#12 |
|
Jul 2005
38610 Posts |
fmadd certainly exists for the G4.
Google for "MPCFPE32B". The 3.5MB PDF that's first in the list is the Programming Environments Manual for the G4. [EDIT] I've also spotted jasong's msieve comments and fast mulmod for < 2^50 bit numbers. It's faster than my mulmod code so it'll fit in nicely (if it all works properly) for sieving p < 2^50. Last fiddled with by Greenbank on 2005-11-09 at 16:41 |
|
|
|
|
|
#13 | |
|
May 2003
3×7×11 Posts |
Quote:
I'll also ask around here, and see if there are any 'internal' documents that I can get my hands on too. (G4 only, alas) |
|
|
|
|
|
|
#14 | |
|
Banned
"Luigi"
Aug 2002
Team Italia
481910 Posts |
Quote:
Luigi Last fiddled with by ET_ on 2005-11-10 at 11:34 |
|
|
|
|
|
|
#15 |
|
Jul 2005
2·193 Posts |
Indeed it was jasonp, my attribution was from memory.
On a related note, here's why G5's rock and G4's lead to madness (or at least hideous amounts of complication). Looking at my 32-bit code for the core of the montgomery multiplication. For a good reference see Handbook of Applied Cryptography Chapter 14. Google for "HAC" and it is the second link. PDF's can be downloaded free for personal use (see copyright notice). For calculating a*b (mod m) :- It assumes: a and b are numbers < 2^64 (i.e they fit in 2 32-bit words) B = 2^32 n = 2 (since a and b have 2 words) R= B^n = (2^32)^2 = 2^64 m' = (-m)^-1 (mod B) has been calculated p = m (I tend to use these terms interchangably!) x = a y = bR (mod m) has been calculated. The code is inside the (method == 1 ) code block of: http://www.greenbank.org/sob/macasm/montmul.c 20 muls, loads of adds to handle numerous carries. A nightmare but still quite quick. Most importantly, no division! Now, to do this on a G5 you need this:- Since the G5 is 64-bit we can use B=2^64 Since both values (a and b) are 64-bit they only fit in one double-word so n=1. R = B^n = B^1 = B = 2^64. This also means we don't need the loop inside the algorithm. Since we only need one iteration of the loop we can assume that A is 0 all the way until we set it with the result at the end simplifying it more. Now we just need to do (untested):- mulld r2,x,y // r2 = low_word(x*y) mulld r3,r2,m' // r3 = x * y * m' (mod B) : mod implicit as we only use low double-word of result mulhdu r4,x,y // r4 = high_word(x*y) mulld r5,r3,p // r5 = low_word(u*m) mulhdu r6,r3,p // r6 = high_word(u*m) add r7,r4,r6 // Worry about carry? Not if p < 2^63. What if any of a,b,p are 64-bit? addco r2,r2,r5 // r2 now useless, we just want the carry bit addze r7,r7 // Add carry from low_word addition on to r7 So for 64 bit processors it comes down to 5 multiplies and 3 adds. The only other bit to do is the final subtraction if r7 > p :- cmpld r7,p // (Logical/unsigned) Compare result to p blt done // If it is less than the skip the subtract subf r7,r7,p // r7 = r7 - p done: // the end 64-bit * 64-bit (mod 64-bit) with 5 muls, 3 adds, 1 comparison and a conditional subtract. No divides. Nice. Last fiddled with by Greenbank on 2005-11-10 at 12:31 |
|
|
|
|
|
#16 | |
|
∂2ω=0
Sep 2002
República de California
1164710 Posts |
Quote:
Code:
SQR_LOHI(x,lo,hi); /* mulld, mulhdu */
lo *= qinv; /* mulld */
MUL_HIGH(q,lo,lo); /* mulhdu */
x = hi - lo;
if(x > hi)
{
x += q; /* had a borrow */
}
if({current_bit_set})
{
if(x > qhalf) /* Combines overflow-on-add and need-to-subtract-q-from-sum checks */
{
x = x + x;
x -= q;
}
else
x = x + x;
}
|
|
|
|
|
|
|
#17 |
|
Cranksta Rap Ayatollah
Jul 2003
641 Posts |
hey phil, I've got a G5 iMac at home, can you put it to work?
(I'm also working on a Dual 2.7 PowerMac with 4GB of RAM here at work, but I don't think I can appropriate cycles. The best part is the 30" cinema display and the 23" off to the side.. fricken sweeeet )
|
|
|
|
|
|
#18 | |
|
Jul 2005
2·193 Posts |
Quote:
m' = (-m)^-1 mod 2^64 u = x*y*m' mod 2^64 u = (x * y mod 2^64) * m' (mod 2^64) So for calculating u you can get away with ignoring the high double-word. However:- A = (x*y + u*m ) / 2^64 does not imply you can do:- high(x*y) + high(u*m) You need the low half of u*m because if low(u*m)+low(x*y) > 2^64 then your answer will be off by one. Let's see if can concoct an example with m=13. b=16 (2^4). m' = (-13)^-1 (mod 16) m' = 3^-1 (mod 16) 3*11 = 33 = 1 (mod 16) m' = 11 Now pick a trivial a and b. x=a=1 b=1 y = bR (mod m) = 1*16 (mod 13) = 3 u = x*y*m' mod 16 u = 1*3*11 mod 16 = 33 mod 16 = 1 A = (x*y + u*m ) / 16 A = (1*3 + 1*13) / 16 Without looking at the low word of (u*m) you're doing:- A = (1*3)/16 + (1*13)/16 = 0 + 0 = 0 The correct way does:- A = (1*3 + 1*13) / 16 = (3 + 13) / 16 = 16/16 = 1 Tomorrow I'll find an example for 50-bit values of a,b and m if you like. Last fiddled with by Greenbank on 2005-11-10 at 19:52 |
|
|
|
|
|
|
#19 |
|
∂2ω=0
Sep 2002
República de California
19·613 Posts |
My code is adapted from some sieving code I got several years ago from Peter Montgomery himself: here is the relevant core operation sequence from his code comments (his only allows operands < 2^63, so is a few ALU ops cheaper):
Code:
t0 = UMULH(power2, power2);
if (t0 >= q) t0 -= q;
power2 = UMULH(q, MULQ(qinv64, MULQ(power2, power2))) - t0;
power2 = (q >> (bitnow & power2)) - (power2 >> bitnow);
|
|
|
|
|
|
#20 | |
|
Bamboozled!
"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across
2·5,393 Posts |
Quote:
Paul |
|
|
|
|
|
|
#21 |
|
Jul 2005
1100000102 Posts |
Heh, I'm willing to be proven wrong (and learn at the same time). Plus it means I could get rid of a MUL.
I'll find a 50-bit example and then we can trace through the code... |
|
|
|
|
|
#22 |
|
Jul 2005
2·193 Posts |
Hmm, try as I might I can't find a situation where:
mullw(x*y)+mullw(u*m) < 2^64. for m=882861709928411 and random a,b values. My test program has been running for 25 minutes and it checks 100,000 random values of a and b a second. Which leads me to believe that it is assumed that the carry will always be present. However I can find a case for m=13 (will post that in a sec). Here's an example where it is present:- R=2^64 a=72185797819121 b=151612213027919 m=882861709928411 mprime=12466608506227884973 (a*b)%m 344317125598003 y=(b*2^64)%m 105419857829375 u=(a*y*mprime)%B 7194243690555063843 A=(a*y + u*m) / 2^64 = 6351529896101669193770756995022848 / 2^64 = 344317125598003 A_other = (x*y)/2^64 + (u*m)/2^64 = 412528981 + 344316713069021 = 344317125598002 (out by one) |
|
|
|