20061122, 17:25  #1 
Tribal Bullet
Oct 2004
2·3·19·31 Posts 
NFS algebraic square root questions
I'm slowly grinding through the algebraic square root code for the GNFS module of msieve, and while all the pieces are in place one thing has been bothering me.
You take all the relations and multiply them together into a polynomial S modulo the (assumed monic) algebraic polynomial f. If f is of degree d then S is a polynomial with d huge coefficients. To compute the square root of S modulo f, you pick a small prime q that is larger than the number N to be factored (does it have to be *much* larger?), reduce the coefficients of S modulo q, then find the square root of S in a finite field of size . Actually you find the inverse square root to make the next step easier. Pretend that's done, yielding ; now to get the actual square root of S I'm planning on performing a Newton iteration: where all the variables are polynomials with d coefficients, each reduced modulo . This computes the inverse square root, and multiplying by S yields the square root of S modulo f. First, are there hidden gotchas in implementing this? Will the iteration not converge under some circumstances? Second, how is division by 2 implemented in this context? Is it enough to just shift all the coefficients of the numerator down by one, or do you really have to compute the inverse of 2 modulo a huge power of a prime? (Before you say this is impractical, I already have code that computes S for reasonably big NFS factorizations. For factoring a 100digit N it multiplies 453000 relations together, producing an S with 16milliondigit coefficients, in 4.5 minutes on the opteron here. The memory use is less than is needed by the NFS linear algebra, and 95% of the time is spent performing FFTs that can be made up to 5x faster with decent FFT code. Allowing 10x more relations is still feasible, and should suffice for performing the square root for a C135 or C140) jasonp Last fiddled with by jasonp on 20061122 at 17:26 Reason: typo 
20061122, 18:45  #2 
Aug 2002
Buenos Aires, Argentina
2×11×61 Posts 
If you want to divide by 2 modulo an odd number m, you have to inspect the least significant bit: if it zero just perform a normal division by 2 by shifting right 1 bit, otherwise add m and then divide by 2.

20061122, 19:58  #3 
Tribal Bullet
Oct 2004
2×3×19×31 Posts 

20061123, 10:58  #4 
Aug 2004
202_{8} Posts 
Hi Jason,
this is very interesting. It sounds like you are calculating the square root directly, rather than following the Montgomery/Nguyen approach, and if it works, it's (mathematically) much simpler. Do you think this is practical now because of improvements in hardware compared to say 10 years ago? I thought it was considered impractical because of the size of the coefficients, but I must admit this opinion is only based on papers written some time ago and involves no original analysis of my own. Chris 
20061123, 15:00  #5  
Tribal Bullet
Oct 2004
6716_{8} Posts 
Quote:
The direct approach may not be applicable to the very largest problems, at least not without very specialized hugeFFT convolution algorithms, but I think the current preference to avoid the direct approach is a combination of  an aversion to FFTs (especially floating point FFTs) for integer computations  trusting that 'the problem is solved' because of the MN algorithms from 1995 and the fact that Pari implements all the machinery needed  all the papers on the algebraic square root (except Nguyen's) date back to the time that hardware was expensive. Cripes, a gig of DDR2 memory is $124 at MicroCenter  everyone looking at the theoretical bestcase complexity of the direct approach and concluding immediately that it wasn't worth pursuing. Especially in this forum, which lives and breathes largeinteger multiplies and FFTs, somebody should have tried this a while ago. I must admit that a big reason I decided on the direct approach was because I've spent a long time studying fast FFTs, so the learning curve was much shorter for me. Backoftheenvelope calculation: for a 512bit factorization, suppose you have a 7Mx7M linear system (i.e. what RSA155 needed). Suppose the matrix uses cycles with an average of 3 relations, and a dependency needs half of those cycles. That means having to multiply about 5 million relations together. Assuming a lattice siever, each relation has a ~40bit 'a' value, and the leading algebraic poly coefficient has ~25 bits. This means each relation contributes about 65 bits to the coefficients of the product, which after doubling to allow for zeropadding would require an FFT of a little over 40 million 16bit elements. So an FFT of elements should be sufficient. This really is not very big at all; the pi freaks have been computing multibillion element FFTs for years now :) jasonp Last fiddled with by jasonp on 20061123 at 15:11 

20061123, 16:11  #6  
Nov 2003
2^{2}·5·373 Posts 
Quote:
thousand primes. It was hopelessly slow and the Newton's iteration *frequently* failed to converge. See the comments in the "Development of the Number Field Sieve" edited by Lenstra & Lenstra. There is a discussion of Couveignes method in the long paper by Buhler, Lenstra, Pomerance. [OTOH, computers 16 years ago were slow] There is also a comment about using the "prime at infinity" [i.e. the projective limit of what you are trying to do] as well as a comment about how often the Newton's method will converge. The bottom line is that it is OK from a theoretical point of view, but totally ineffective in practice. 

20061123, 20:26  #7  
Tribal Bullet
Oct 2004
DCE_{16} Posts 
Quote:
The algorithmic description I'm working from is a combination of Buhler et. al. and Per Leslie Jensen's thesis on the implementation of PGNFS; the latter does not discuss convergence of the Newton iteration at all. Honestly, the alternative is so much programming work that I may as well try out the direct method. Almost all the code needed can be reused for the conventional algorithm too. jasonp 

20061123, 20:59  #8 
Aug 2002
Buenos Aires, Argentina
2×11×61 Posts 
Newton iteration never fails in this context:
Suppose that we have a correct inverse square root of S modulo an odd number m (which will be a power of your number q). Let call this number r. Working mod m^{2} the number r will have the form where t is the correct inverse square root modulo m^{2}. This means that Now applying the Newton formula for the inverse of the square root: So the iteration is correct modulo m^{2}. Last fiddled with by alpertron on 20061123 at 21:00 
20061124, 02:32  #9  
Nov 2003
2^{2}×5×373 Posts 
Quote:
"The polynomial time algorithm of [refs] does a Newton Iteration modulo the powers of a single prime ... ... A final possibility is to make use of the 'infinite prime' as was pointed out to us by V. Miller and R. Silverman........ For this algorithm the number of starting values to be tried is 2^(ds1) where s is 1/2 of the number of real embeddings of K into C....." You start with a real embedding and guess the starting value of the Newton iteration. The number 2^(ds1) is the *expected* number of guesses to try before succeeding. I once heard H. Cohen give a brief explanation of why not all guesses converge, but I do not remember the explanation; it involved some stuff in algebraic geometry with which I was not (and am not now) familar. It involved the spanning degree of elements within some scheme. The orbits of some elements contained the square root while others did not [under the iterative map induced by the Newton iteration] And yes, there are a number of conditions that need to be satisfied for the Hensel lifting to succeed. IIRC, the best explanation I saw of this was in Hua Loo Keng's "Number Theory" in the chapter on padics. I did try the direct method a long time ago as an alternative to Couveigne's CRT method. I wanted something that worked over fields of even degree. I took the code that I wrote for my paper "Parallel Polynomial Arithmetic Over Finite Rings" and adapted it. I started with single primes (ideals) and multiplied them in pairs as polynomials. I then multiplied those together in pairs etc. When the coefficients reached about 200 digits, I switched to polynomial convolutions (integer based). The final product typically was a quartic or quintic with coefficients near 1.2 million digits. I then guessed a square root approximation via floating point, rounded to integer and proceeded with NewtonRaphson (using 1/q(x) rather than q(x) directly to avoid the divisions in ordinary NR). It was done using all integer code and a single square root took (typically) about 36 hours to converge (when it did). 

20061124, 15:36  #10  
Tribal Bullet
Oct 2004
2×3×19×31 Posts 
Quote:
jasonp 

20061124, 19:09  #11 
Aug 2002
Buenos Aires, Argentina
2×11×61 Posts 
I don't see the problem, just replace S, r, t and u by polynomials mod m^{2} and the reasoning shown above should work.
The only problem I see would occur when . This would happen when , but in this case I think you can divide S by the proper power of q before attempting the Newton iterations. Last fiddled with by alpertron on 20061124 at 19:10 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
All square roots failed  chris2be8  Msieve  13  20201014 07:08 
Square Root Days  davar55  Lounge  0  20160316 20:19 
square root crash  bsquared  Msieve  17  20100409 14:11 
Square root of 3  Damian  Math  3  20100101 01:56 
Divisible up to Square Root  davar55  Puzzles  3  20070905 15:59 