![]() |
|
|
#1 |
|
Tribal Bullet
Oct 2004
354110 Posts |
I'm slowly grinding through the algebraic square root code for the GNFS module of msieve, and while all the pieces are in place one thing has been bothering me.
You take all the relations and multiply them together into a polynomial S modulo the (assumed monic) algebraic polynomial f. If f is of degree d then S is a polynomial with d huge coefficients. To compute the square root of S modulo f, you pick a small prime q that is larger than the number N to be factored (does it have to be *much* larger?), reduce the coefficients of S modulo q, then find the square root of S in a finite field of size Actually you find the inverse square root to make the next step easier. Pretend that's done, yielding where all the variables are polynomials with d coefficients, each reduced modulo First, are there hidden gotchas in implementing this? Will the iteration not converge under some circumstances? Second, how is division by 2 implemented in this context? Is it enough to just shift all the coefficients of the numerator down by one, or do you really have to compute the inverse of 2 modulo a huge power of a prime? (Before you say this is impractical, I already have code that computes S for reasonably big NFS factorizations. For factoring a 100-digit N it multiplies 453000 relations together, producing an S with 16-million-digit coefficients, in 4.5 minutes on the opteron here. The memory use is less than is needed by the NFS linear algebra, and 95% of the time is spent performing FFTs that can be made up to 5x faster with decent FFT code. Allowing 10x more relations is still feasible, and should suffice for performing the square root for a C135 or C140) jasonp Last fiddled with by jasonp on 2006-11-22 at 17:26 Reason: typo |
|
|
|
|
|
#2 |
|
Aug 2002
Buenos Aires, Argentina
2·683 Posts |
If you want to divide by 2 modulo an odd number m, you have to inspect the least significant bit: if it zero just perform a normal division by 2 by shifting right 1 bit, otherwise add m and then divide by 2.
|
|
|
|
|
|
#3 |
|
Tribal Bullet
Oct 2004
3,541 Posts |
|
|
|
|
|
|
#4 |
|
Aug 2004
8216 Posts |
Hi Jason,
this is very interesting. It sounds like you are calculating the square root directly, rather than following the Montgomery/Nguyen approach, and if it works, it's (mathematically) much simpler. Do you think this is practical now because of improvements in hardware compared to say 10 years ago? I thought it was considered impractical because of the size of the coefficients, but I must admit this opinion is only based on papers written some time ago and involves no original analysis of my own. Chris |
|
|
|
|
|
#5 | |
|
Tribal Bullet
Oct 2004
354110 Posts |
Quote:
The direct approach may not be applicable to the very largest problems, at least not without very specialized huge-FFT convolution algorithms, but I think the current preference to avoid the direct approach is a combination of - an aversion to FFTs (especially floating point FFTs) for integer computations - trusting that 'the problem is solved' because of the MN algorithms from 1995 and the fact that Pari implements all the machinery needed - all the papers on the algebraic square root (except Nguyen's) date back to the time that hardware was expensive. Cripes, a gig of DDR2 memory is $124 at MicroCenter - everyone looking at the theoretical best-case complexity of the direct approach and concluding immediately that it wasn't worth pursuing. Especially in this forum, which lives and breathes large-integer multiplies and FFTs, somebody should have tried this a while ago. I must admit that a big reason I decided on the direct approach was because I've spent a long time studying fast FFTs, so the learning curve was much shorter for me. Back-of-the-envelope calculation: for a 512-bit factorization, suppose you have a 7Mx7M linear system (i.e. what RSA155 needed). Suppose the matrix uses cycles with an average of 3 relations, and a dependency needs half of those cycles. That means having to multiply about 5 million relations together. Assuming a lattice siever, each relation has a ~40-bit 'a' value, and the leading algebraic poly coefficient has ~25 bits. This means each relation contributes about 65 bits to the coefficients of the product, which after doubling to allow for zero-padding would require an FFT of a little over 40 million 16-bit elements. So an FFT of jasonp Last fiddled with by jasonp on 2006-11-23 at 15:11 |
|
|
|
|
|
|
#6 | |
|
Nov 2003
22·5·373 Posts |
Quote:
thousand primes. It was hopelessly slow and the Newton's iteration *frequently* failed to converge. See the comments in the "Development of the Number Field Sieve" edited by Lenstra & Lenstra. There is a discussion of Couveignes method in the long paper by Buhler, Lenstra, Pomerance. [OTOH, computers 16 years ago were slow] There is also a comment about using the "prime at infinity" [i.e. the projective limit of what you are trying to do] as well as a comment about how often the Newton's method will converge. The bottom line is that it is OK from a theoretical point of view, but totally ineffective in practice. |
|
|
|
|
|
|
#7 | |
|
Tribal Bullet
Oct 2004
DD516 Posts |
Quote:
The algorithmic description I'm working from is a combination of Buhler et. al. and Per Leslie Jensen's thesis on the implementation of PGNFS; the latter does not discuss convergence of the Newton iteration at all. Honestly, the alternative is so much programming work that I may as well try out the direct method. Almost all the code needed can be reused for the conventional algorithm too. jasonp |
|
|
|
|
|
|
#8 |
|
Aug 2002
Buenos Aires, Argentina
55616 Posts |
Newton iteration never fails in this context:
Suppose that we have a correct inverse square root of S modulo an odd number m (which will be a power of your number q). Let call this number r. Working mod m2 the number r will have the form Now applying the Newton formula for the inverse of the square root: So the iteration is correct modulo m2. Last fiddled with by alpertron on 2006-11-23 at 21:00 |
|
|
|
|
|
#9 | |
|
Nov 2003
22·5·373 Posts |
Quote:
"The polynomial time algorithm of [refs] does a Newton Iteration modulo the powers of a single prime ... ... A final possibility is to make use of the 'infinite prime' as was pointed out to us by V. Miller and R. Silverman........ For this algorithm the number of starting values to be tried is 2^(d-s-1) where s is 1/2 of the number of real embeddings of K into C....." You start with a real embedding and guess the starting value of the Newton iteration. The number 2^(d-s-1) is the *expected* number of guesses to try before succeeding. I once heard H. Cohen give a brief explanation of why not all guesses converge, but I do not remember the explanation; it involved some stuff in algebraic geometry with which I was not (and am not now) familar. It involved the spanning degree of elements within some scheme. The orbits of some elements contained the square root while others did not [under the iterative map induced by the Newton iteration] And yes, there are a number of conditions that need to be satisfied for the Hensel lifting to succeed. IIRC, the best explanation I saw of this was in Hua Loo Keng's "Number Theory" in the chapter on p-adics. I did try the direct method a long time ago as an alternative to Couveigne's CRT method. I wanted something that worked over fields of even degree. I took the code that I wrote for my paper "Parallel Polynomial Arithmetic Over Finite Rings" and adapted it. I started with single primes (ideals) and multiplied them in pairs as polynomials. I then multiplied those together in pairs etc. When the coefficients reached about 200 digits, I switched to polynomial convolutions (integer based). The final product typically was a quartic or quintic with coefficients near 1.2 million digits. I then guessed a square root approximation via floating point, rounded to integer and proceeded with Newton-Raphson (using 1/q(x) rather than q(x) directly to avoid the divisions in ordinary N-R). It was done using all integer code and a single square root took (typically) about 36 hours to converge (when it did). |
|
|
|
|
|
|
#10 | |
|
Tribal Bullet
Oct 2004
354110 Posts |
Quote:
jasonp |
|
|
|
|
|
|
#11 |
|
Aug 2002
Buenos Aires, Argentina
55616 Posts |
I don't see the problem, just replace S, r, t and u by polynomials mod m2 and the reasoning shown above should work.
The only problem I see would occur when Last fiddled with by alpertron on 2006-11-24 at 19:10 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| All square roots failed | chris2be8 | Msieve | 13 | 2020-10-14 07:08 |
| Square Root Days | davar55 | Lounge | 0 | 2016-03-16 20:19 |
| square root crash | bsquared | Msieve | 17 | 2010-04-09 14:11 |
| Square root of 3 | Damian | Math | 3 | 2010-01-01 01:56 |
| Divisible up to Square Root | davar55 | Puzzles | 3 | 2007-09-05 15:59 |