mersenneforum.org NFS algebraic square root questions
 Register FAQ Search Today's Posts Mark Forums Read

 2006-11-22, 17:25 #1 jasonp Tribal Bullet     Oct 2004 33×131 Posts NFS algebraic square root questions I'm slowly grinding through the algebraic square root code for the GNFS module of msieve, and while all the pieces are in place one thing has been bothering me. You take all the relations and multiply them together into a polynomial S modulo the (assumed monic) algebraic polynomial f. If f is of degree d then S is a polynomial with d huge coefficients. To compute the square root of S modulo f, you pick a small prime q that is larger than the number N to be factored (does it have to be *much* larger?), reduce the coefficients of S modulo q, then find the square root of S in a finite field of size $q^d$. Actually you find the inverse square root to make the next step easier. Pretend that's done, yielding $S_1$; now to get the actual square root of S I'm planning on performing a Newton iteration: $S_{k+1} = \frac{S_k (3 - S * S_k^2)}{2} mod q^{2^k}$ where all the variables are polynomials with d coefficients, each reduced modulo $q^{2^k}$. This computes the inverse square root, and multiplying by S yields the square root of S modulo f. First, are there hidden gotchas in implementing this? Will the iteration not converge under some circumstances? Second, how is division by 2 implemented in this context? Is it enough to just shift all the coefficients of the numerator down by one, or do you really have to compute the inverse of 2 modulo a huge power of a prime? (Before you say this is impractical, I already have code that computes S for reasonably big NFS factorizations. For factoring a 100-digit N it multiplies 453000 relations together, producing an S with 16-million-digit coefficients, in 4.5 minutes on the opteron here. The memory use is less than is needed by the NFS linear algebra, and 95% of the time is spent performing FFTs that can be made up to 5x faster with decent FFT code. Allowing 10x more relations is still feasible, and should suffice for performing the square root for a C135 or C140) jasonp Last fiddled with by jasonp on 2006-11-22 at 17:26 Reason: typo
2006-11-22, 18:45   #2
alpertron

Aug 2002
Buenos Aires, Argentina

25108 Posts

Quote:
 Originally Posted by jasonp Second, how is division by 2 implemented in this context? Is it enough to just shift all the coefficients of the numerator down by one, or do you really have to compute the inverse of 2 modulo a huge power of a prime?
If you want to divide by 2 modulo an odd number m, you have to inspect the least significant bit: if it zero just perform a normal division by 2 by shifting right 1 bit, otherwise add m and then divide by 2.

2006-11-22, 19:58   #3
jasonp
Tribal Bullet

Oct 2004

33×131 Posts

Quote:
 Originally Posted by alpertron If you want to divide by 2 modulo an odd number m, you have to inspect the least significant bit: if it zero just perform a normal division by 2 by shifting right 1 bit, otherwise add m and then divide by 2.
Thanks, that makes perfect sense.

jasonp

 2006-11-23, 10:58 #4 Chris Card     Aug 2004 100000102 Posts Hi Jason, this is very interesting. It sounds like you are calculating the square root directly, rather than following the Montgomery/Nguyen approach, and if it works, it's (mathematically) much simpler. Do you think this is practical now because of improvements in hardware compared to say 10 years ago? I thought it was considered impractical because of the size of the coefficients, but I must admit this opinion is only based on papers written some time ago and involves no original analysis of my own. Chris
2006-11-23, 15:00   #5
jasonp
Tribal Bullet

Oct 2004

67218 Posts

Quote:
 Originally Posted by Chris Card this is very interesting. It sounds like you are calculating the square root directly, rather than following the Montgomery/Nguyen approach, and if it works, it's (mathematically) much simpler. Do you think this is practical now because of improvements in hardware compared to say 10 years ago? I thought it was considered impractical because of the size of the coefficients, but I must admit this opinion is only based on papers written some time ago and involves no original analysis of my own.
I waffled for several weeks deciding whether to try for Montgomery/Nguyen ("MN") or the direct approach, but the decision was easy when I realized how much code is really needed to implement MN. Not to mention trying to grind through Cohen's book and all the source code available, with disappointing results :)

The direct approach may not be applicable to the very largest problems, at least not without very specialized huge-FFT convolution algorithms, but I think the current preference to avoid the direct approach is a combination of

- an aversion to FFTs (especially floating point FFTs) for integer computations
- trusting that 'the problem is solved' because of the MN algorithms from 1995 and the fact that Pari implements all the machinery needed
- all the papers on the algebraic square root (except Nguyen's) date back to the time that hardware was expensive. Cripes, a gig of DDR2 memory is \$124 at MicroCenter
- everyone looking at the theoretical best-case complexity of the direct approach and concluding immediately that it wasn't worth pursuing.

Especially in this forum, which lives and breathes large-integer multiplies and FFTs, somebody should have tried this a while ago. I must admit that a big reason I decided on the direct approach was because I've spent a long time studying fast FFTs, so the learning curve was much shorter for me.

Back-of-the-envelope calculation: for a 512-bit factorization, suppose you have a 7Mx7M linear system (i.e. what RSA155 needed). Suppose the matrix uses cycles with an average of 3 relations, and a dependency needs half of those cycles. That means having to multiply about 5 million relations together. Assuming a lattice siever, each relation has a ~40-bit 'a' value, and the leading algebraic poly coefficient has ~25 bits. This means each relation contributes about 65 bits to the coefficients of the product, which after doubling to allow for zero-padding would require an FFT of a little over 40 million 16-bit elements. So an FFT of $2^{26}$ elements should be sufficient. This really is not very big at all; the pi freaks have been computing multi-billion element FFTs for years now :)

jasonp

Last fiddled with by jasonp on 2006-11-23 at 15:11

2006-11-23, 16:11   #6
R.D. Silverman

Nov 2003

164448 Posts

Quote:
 Originally Posted by jasonp I'm slowly grinding through the algebraic square root code for the GNFS module of msieve, and while all the pieces are in place one thing has been bothering me. You take all the relations and multiply them together into a polynomial S modulo the (assumed monic) algebraic polynomial f. If f is of degree d then S is a polynomial with d huge coefficients. To compute the square root of S modulo f, you pick a small prime q that is larger than the number N to be factored (does it have to be *much* larger?), reduce the coefficients of S modulo q, then find the square root of S in a finite field of size $q^d$. Actually you find the inverse square root to make the next step easier. Pretend that's done, yielding $S_1$; now to get the actual square root of S I'm planning on performing a Newton iteration: $S_{k+1} = \frac{S_k (3 - S * S_k^2)}{2} mod q^{2^k}$ where all the variables are polynomials with d coefficients, each reduced modulo $q^{2^k}$. This computes the inverse square root, and multiplying by S yields the square root of S modulo f. First, are there hidden gotchas in implementing this? Will the iteration not converge under some circumstances? Second, how is division by 2 implemented in this context? Is it enough to just shift all the coefficients of the numerator down by one, or do you really have to compute the inverse of 2 modulo a huge power of a prime? (Before you say this is impractical, I already have code that computes S for reasonably big NFS factorizations. For factoring a 100-digit N it multiplies 453000 relations together, producing an S with 16-million-digit coefficients, in 4.5 minutes on the opteron here. The memory use is less than is needed by the NFS linear algebra, and 95% of the time is spent performing FFTs that can be made up to 5x faster with decent FFT code. Allowing 10x more relations is still feasible, and should suffice for performing the square root for a C135 or C140) jasonp
I tried this, many (16) years ago when factor bases only had 50-100
thousand primes. It was hopelessly slow and the Newton's iteration *frequently* failed to converge. See the comments in the "Development of
the Number Field Sieve" edited by Lenstra & Lenstra. There is a discussion of Couveignes method in the long paper by Buhler, Lenstra, Pomerance.

[OTOH, computers 16 years ago were slow]

There is also a comment about using the "prime at infinity" [i.e. the
projective limit of what you are trying to do] as well as a comment about
how often the Newton's method will converge.

The bottom line is that it is OK from a theoretical point of view, but
totally ineffective in practice.

2006-11-23, 20:26   #7
jasonp
Tribal Bullet

Oct 2004

67218 Posts

Quote:
 Originally Posted by R.D. Silverman I tried this, many (16) years ago when factor bases only had 50-100 thousand primes. It was hopelessly slow and the Newton's iteration *frequently* failed to converge. See the comments in the "Development of the Number Field Sieve" edited by Lenstra & Lenstra. There is a discussion of Couveignes method in the long paper by Buhler, Lenstra, Pomerance. There is also a comment about using the "prime at infinity" [i.e. the projective limit of what you are trying to do] as well as a comment about how often the Newton's method will converge. The bottom line is that it is OK from a theoretical point of view, but totally ineffective in practice.
I don't see any discussion in Buhler et. al. of when the Newton iteration would not converge, though there is some discussion about what to do when no prime q is available for which the algebraic poly is irreducible. Is the failure to converge a consequence of Hensel lifting not being possible for a given combination of parameters?

The algorithmic description I'm working from is a combination of Buhler et. al. and Per Leslie Jensen's thesis on the implementation of PGNFS; the latter does not discuss convergence of the Newton iteration at all. Honestly, the alternative is so much programming work that I may as well try out the direct method. Almost all the code needed can be reused for the conventional algorithm too.

jasonp

 2006-11-23, 20:59 #8 alpertron     Aug 2002 Buenos Aires, Argentina 25108 Posts Newton iteration never fails in this context: Suppose that we have a correct inverse square root of S modulo an odd number m (which will be a power of your number q). Let call this number r. Working mod m2 the number r will have the form $r = t + u*m \,\pmod {m^2}$ where t is the correct inverse square root modulo m2. This means that $t^2*S \eq 1\pmod{m^2}$ Now applying the Newton formula for the inverse of the square root: $\large v \eq \frac{r(3-S*r^2)}{2}\,\pmod{m^2}$ $\large v \eq \frac{(t+u*m)(3-S*(t+u*m)^2)}{2}\,\pmod {m^2}$ $\large v \eq \frac{3*t + 3*u*m - S*t^3 - 2*t^2*S*u*m - S*t^2*u*m}{2}\,\pmod{m^2}$ $\large v \eq \frac {3*t + 3*u*m - t - 2*u*m - u*m}{2}\,\pmod {m^2}$ $\large v \eq t \,\pmod {m^2}$ So the iteration is correct modulo m2. Last fiddled with by alpertron on 2006-11-23 at 21:00
2006-11-24, 02:32   #9
R.D. Silverman

Nov 2003

22·5·373 Posts

Quote:
 Originally Posted by jasonp I don't see any discussion in Buhler et. al. of when the Newton iteration would not converge, though there is some discussion about what to do when no prime q is available for which the algebraic poly is irreducible. Is the failure to converge a consequence of Hensel lifting not being possible for a given combination of parameters? The algorithmic description I'm working from is a combination of Buhler et. al. and Per Leslie Jensen's thesis on the implementation of PGNFS; the latter does not discuss convergence of the Newton iteration at all. Honestly, the alternative is so much programming work that I may as well try out the direct method. Almost all the code needed can be reused for the conventional algorithm too. jasonp
See page 73 of the Lenstra & Lenstra book.

"The polynomial time algorithm of [refs] does a Newton Iteration modulo the
powers of a single prime ... ... A final possibility is to make use of the
'infinite prime' as was pointed out to us by V. Miller and R. Silverman........
For this algorithm the number of starting values to be tried is 2^(d-s-1)
where s is 1/2 of the number of real embeddings of K into C....."

You start with a real embedding and guess the starting value of the
Newton iteration. The number 2^(d-s-1) is the *expected* number
of guesses to try before succeeding. I once heard H. Cohen give a brief
explanation of why not all guesses converge, but I do not remember the
explanation; it involved some stuff in algebraic geometry with which I was
not (and am not now) familar. It involved the spanning degree of elements
within some scheme. The orbits of some elements contained the square
root while others did not [under the iterative map induced by the Newton
iteration]

And yes, there are a number of conditions that need to be satisfied for
the Hensel lifting to succeed. IIRC, the best explanation I saw of
this was in Hua Loo Keng's "Number Theory" in the chapter on p-adics.

I did try the direct method a long time ago as an alternative to
Couveigne's CRT method. I wanted something that worked over fields
of even degree. I took the code that I wrote for my paper
"Parallel Polynomial Arithmetic Over Finite Rings" and adapted it. I started
with single primes (ideals) and multiplied them in pairs as polynomials.
I then multiplied those together in pairs etc. When the coefficients
reached about 200 digits, I switched to polynomial convolutions (integer
based). The final product typically was a quartic or quintic with
coefficients near 1.2 million digits. I then guessed a square root
approximation via floating point, rounded to integer and proceeded
with Newton-Raphson (using 1/q(x) rather than q(x) directly to avoid
the divisions in ordinary N-R). It was done using all integer code and
a single square root took (typically) about 36 hours to converge (when it
did).

2006-11-24, 15:36   #10
jasonp
Tribal Bullet

Oct 2004

33·131 Posts

Quote:
 Originally Posted by alpertron Newton iteration never fails in this context: Suppose that we have a correct inverse square root of S modulo an odd number m (which will be a power of your number q). Let call this number r.
Maybe the complications arise because what is sought is a square root in an algebraic number field. The final answer is a polynomial (modulo some monic f(x)) with coefficients reduced modulo a power of q, not simply an integer modulo a power of q.

jasonp

 2006-11-24, 19:09 #11 alpertron     Aug 2002 Buenos Aires, Argentina 23·132 Posts I don't see the problem, just replace S, r, t and u by polynomials mod m2 and the reasoning shown above should work. The only problem I see would occur when $t^2*S\,\not\equiv\,1\,\pmod{m^2}$. This would happen when $\gcd(S, m) > 1$, but in this case I think you can divide S by the proper power of q before attempting the Newton iterations. Last fiddled with by alpertron on 2006-11-24 at 19:10

 Similar Threads Thread Thread Starter Forum Replies Last Post chris2be8 Msieve 13 2020-10-14 07:08 davar55 Lounge 0 2016-03-16 20:19 bsquared Msieve 17 2010-04-09 14:11 Damian Math 3 2010-01-01 01:56 davar55 Puzzles 3 2007-09-05 15:59

All times are UTC. The time now is 04:39.

Fri May 7 04:39:34 UTC 2021 up 28 days, 23:20, 0 users, load averages: 2.37, 2.10, 1.95