![]() |
|
|
#12 |
|
Jul 2004
1000112 Posts |
Here are the results after implementing repeated subtraction.
You are right, it is much faster by using repeated subtraction. The table below shows how effective when the number of trials is increased. The first column is how far to trial subtract (0 means use divide only). The second column is the timing in clocks/inverse for p=65521. The column is timing for p=2^31-1. The correctness of my code is verified for the two cases of p. Code:
0 386 486 1 326 415 2 293 375 3 282 363 4 275 352 5 268 346 6 265 342 7 264 340 8 261 340 9 258 331 Last fiddled with by xenon on 2005-07-26 at 13:33 |
|
|
|
|
|
#13 | |
|
"Bob Silverman"
Nov 2003
North of Boston
5·17·89 Posts |
Quote:
very first mod reduction has a partial quotient of about 2000. This *immediately* reduces the problem by about 3 digits... Try running your benchmarks for (say) a = 1.1 x 10^9 to a = 1.1 x 10^9 + 1M The values of a that you tried are much smaller than the average value of a. As this routine is actually used in NFS, p is typically (on average) about 22 to 24 bits, rather than 31. In fact, if someone can produce a faster optimized routine that works for p < 2^25, this would have great value to me and others. This challenge is harder than people think at first glance. I suspect that improvements will require algorithmic as well as coding improvements. |
|
|
|
|
|
|
#14 |
|
"Nancy"
Aug 2002
Alexandria
2,467 Posts |
Here's what I could manage last night:
Code:
int
single_modinv_my (int a, int modulus)
{ /* start of single_modinv */
unsigned int ps, ps1, ps2 = 1, parity, dividend, divisor, rem, q;
q = modulus / a;
rem = modulus - a * q;
dividend = a;
divisor = rem;
ps1 = q;
parity = 0;
while (rem > 1)
{
q = ps1;
rem = dividend - divisor;
if (rem >= divisor)
{
q <<= 1;
rem -= divisor;
if (rem >= divisor)
{
q += ps1;
rem -= divisor;
if (rem >= divisor)
{
q += ps1;
rem -= divisor;
if (rem >= divisor)
{
q += ps1;
rem -= divisor;
}
if (rem >= divisor)
{
q += ps1;
rem -= divisor;
}
if (rem >= divisor)
{
q = dividend/divisor;
rem = dividend - q * divisor;
q *= ps1;
}
}
}
}
ps = q + ps2;
ps2 = ps1;
ps1 = ps;
dividend = divisor;
divisor = rem;
parity ^= 1;
}
if (parity == 0)
return(modulus - ps1);
else
return (ps1);
} /* end of single_modinv */
Alex |
|
|
|
|
|
#15 |
|
"Nancy"
Aug 2002
Alexandria
2,467 Posts |
Forgot to mention: my code expects a>1.
Alex |
|
|
|
|
|
#16 | |
|
"Bob Silverman"
Nov 2003
North of Boston
166158 Posts |
Quote:
Ah. Rather than setting q at each subtraction, then computing q*ps1 + ps2 you accumulate ps1 at each subtraction. This saves a multiplication. cute. Is flipping parity by xor faster than by subtraction? Both should take just 1 cycle. BTW, here is Lenstra's code: int binary_inv(int n, int p) { /* start of binary_inv */ register int n1,n2, preg, m1, m2; /* m1*n == n1 mod p, m2*n == n2 mod p */ /* n2 is odd, n1 is odd after initialization */ n1 = n; n2 = p; preg = p; m1 = 1; m2 = 0; while (!(n1 & 1)) { n1 >>= 1; m1 += preg & -(m1 & 1); // if (m1 & 1) m1 += preg; m1 >>= 1; } if (n1 == 1) return m1; while (n1 != n2) { if (n1 >n2) { n1 -= n2; m1 -= m2; if (m1 < 0) m1 += preg; do { n1 >>= 1; m1 += preg & -(m1 & 1); // if (m1 & 1) m1 += preg; m1 >>= 1; } while (!(n1 & 1)); if (n1 == 1) return m1; } else { n2 -= n1; m2 -= m1; if (m2 < 0) m2 += preg; do { n2 >>= 1; m2 += preg & -(m2 & 1); // if (m2 & 1) m2 += preg; m2 >>= 1; } while (!(n2 & 1)); if (n2 == 1) return m2; } } return 0; } /* end of binary_inv */ |
|
|
|
|
|
|
#17 |
|
"Nancy"
Aug 2002
Alexandria
46438 Posts |
>Is flipping parity by xor faster than by subtraction? Both should take
>just 1 cycle. They do, but the subtraction needs to take place in a register. On a register cripp^wchallenged architecture like x86 that means something else needs to be moved out first, so you get several loads and stores. The xor can operate right on memory. Alex /Godspeed, Discovery! |
|
|
|
|
|
#18 | |
|
"Bob Silverman"
Nov 2003
North of Boston
5×17×89 Posts |
Quote:
my own. We now have 3 alternate routines. Hack away.. /************************************************************************/ /* */ /* Fast modular inverse; uses extended binary method */ /* */ /************************************************************************/ new_modinv(input, modulus) int input, modulus; { /* start of new_modinv */ int u,v,B,D; if (!(modulus & 1)) {(void) printf("odd mod fail in inverse\n"); exit(0); } u = input; v = modulus; B = 0; D = 1; while (1) { while (! (u & 1)) { u = u >> 1; if (! (B & 1)) B = B << 1; else B = (B - input) << 1; } while (! (v & 1)) { v = v << 1; if (! (D & 1)) D = D << 1; else D = (D - input) << 1; } if (u >= v) { u = u - v; B = B - D; } else { v = v - u; D = D - B; } if (u == 0) return(D); } } /* end of new_modinv */ |
|
|
|
|
|
|
#19 | |
|
Jul 2004
5·7 Posts |
Quote:
I've coded the Lenstra binary inverse in assembly. At the moment, it's not as fast as the first code (Euclidean). I can post a timing comparison after I have coded all three alternate routines. |
|
|
|
|
|
|
#20 | |
|
"Bob Silverman"
Nov 2003
North of Boston
11101100011012 Posts |
Quote:
cobbled it together some time ago, but have not used it. You may be correct, but note that as you keep shifting, eventually you will get 0 in the LSB. |
|
|
|
|
|
|
#21 |
|
Jul 2004
5×7 Posts |
I've tried and the code doesn't work (at all).
while (! (v & 1)) means while (v is even). v = v << 1; means v = v * 2. It certainly locks up in the loop. I haven't figure out how the code works. But I tried changing << to >> and wrong result produced. |
|
|
|
|
|
#22 | |
|
"Bob Silverman"
Nov 2003
North of Boston
5·17·89 Posts |
Quote:
the siever stopped working. (infinite loop) There must be a bug somewhere. Perhaps my code somewhere calls it with a = 1. Does this cause an I.L. in your routine? Bob |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Calling airsquirrels | Prime95 | GPU Computing | 16 | 2015-09-29 18:06 |
| Help from coders | ET_ | GPU Computing | 5 | 2014-01-26 13:58 |
| Calling all 64-bit Linux sievers! | frmky | NFS@Home | 25 | 2013-10-16 15:58 |
| IA-32 Assembly Coders, anyone? | xenon | Programming | 6 | 2005-06-02 13:26 |
| Bob, I'm calling you out! | synergy | Miscellaneous Math | 17 | 2004-10-26 15:26 |