20070324, 23:17  #1 
Mar 2007
4_{16} Posts 
Questions about the QS
Hi,
I wanted to understand how the quadratic sieve works. I have understand the most part of it but I have some questions because I don't understand everything. I am sixteen years, sorry if I maybe I ask some stupid things. I found this forum thanks to the good mersenne wiki :) 1) I understand the process of finding congruence of square, and the idea of finding smooth number so that we will have smaller vector but I don't understand the most important process : the sieving. How does that works ? I understand that we first need a factor base with the Legendre symbol 1 (n is the number i want to factorise, p a number thant (n/p) must be one. But after, how do we check that y's ( y(x) = (floor sqrt n + x)²  n) are smooths ?? 2) Why do we have to find one relation more than the size of the factor base to can resolve our matrix. 3) So my big question is about the smooth numbers and the sieving : / Please help me :) Thank you very much !! 
20070325, 14:04  #2  
Jun 2005
lehigh.edu
2^{10} Posts 
Quote:
explain QS, since, as you say, most explanations omit the sieving. As it happens, I was just explaining this in my math 396, intro crypto, course, so have recently reminded myself of the main points. If we use y(x) for the original QS polyn (as distinct from MPQS or SIQS versions), we're supposed to look at y(x) "mod p", for each prime p in our factor base (actually, I'm used to omitting the "tiny" primes, but that seems to depend upon who one asks). Anyway, we write y(x) = (x  r1)(x  r2) = x^2  (r1 + r2)x + (r1)(r2) where r1 and r2 are between 0 and p1 so that " 2 floor sqrt n" + (r1 + r2) is a multiple of p and ("floor sqrt n")^2 n  (r1)(r2) is a multiple of p. We're supposed to be looking for locations "i" so that y(i) is divisible by "sufficiently many" of the factor base primes we're sieving by. That's where the critical "fast memory access" that distinguishes sieving from other computationally intensive tasks come in, so we start by initiallizing all locations in a sieving interval  say [L, L] = L, (L)+1, ... , 1, 0, 1, ..., L  for the sake of discussion. One might consider just factoring the y(i)'s, but the factor base is supposed to be sufficiently small that y(i) is extremely unlikely to factor completely using only factor base primes. We'd be wasting way too much time; factoring way too many numbers, almost all of which have no chance of working. So here's the Key Fact: a factor base prime p divides y(i) if and only if the location i differs from one of the "mod p roots" r1 or r2 by a multiple of p. Another point, it's not just how many primes occur at a given location, but also their size, so we mark the locations given by the Key Fact with log(p). We only send y(i) off to be factored when the location has the sum of the log(p)'s that occur there sufficiently close to log(y(i)). If we set our "sufficently close" too large, we get too many falsepositives, and spend too much time factoring y(i)'s that don't work  too small, and we miss too many locations for which y(i) would have worked. As the numbers we try to factor get larger, we have to take larger factor base sizes. But, eventually, there's a diminshing return that sets in; further increases give fewerandfewer additional factor base primes, and they're lessandless likely to occur in factorizations. That's one of two reason's why it's also the MultiplePolynomials, in MPQS or SIQS, that's the Key Idea to getting QS to work well. There's not just the initial Quadratic Sieve polyn y(x) that has a given collection of primes as quadratic residues, but a large collection of polyn. So as we get to locations i further away from 0, the y(i) get larger/harder to factor; but since there are other polyn to use (for small i's), we just throw y(x) away, and go on to the next polyn. Even better, we break up the possible polyn into disjoint regions, and send different polyn to different computers, and/or to different people to use on their computers, which is what makes the sieving step a distributed computation (better than parallel; during sieving on a given polyn, there's no communication needed at all between cpus). Hope this helps. As I recall reading the early Math Comp papers (Bob's for one) and the factorbyemail ppmpqs code, there was a clear description of what needed to be done to make sure that all of the polyn used give the same collection of quad residue primes. Perhaps I didn't read closely enough, but iirc I wasn't able to see that being done for the g_a,b(x)'s in the wiki. bd postscript  as the numbers being factored get larger yet, even switching polyn doesn't have enough small primes for the factor base; and nfs comes to the rescue by adding small algebraic primes, as determined by the specific Number Field (the nf of nfs) being used. One of the key steps there  to get from RSA130 to RSA140/RSA155=RSA512 for example  is to pick the nf to have many more small primes than average. IIRC, for the (Lehigh) polyn used for RSA155, we have the usual 25 primes smaller than 100, and expect a random nf to also have 25 primes "smaller" than 100, but the nf used actually managed over 50, more than twice as many as a random nf. Quote:
Last fiddled with by bdodson on 20070325 at 14:10 Reason: typo on sieve interval, and a missing log, inconsistency in notation 

20070325, 17:23  #3  
Tribal Bullet
Oct 2004
2^{2}·881 Posts 
Quote:
You use the sieving technique to fill up an array of log values of y(x). Then you scan linearly through the array, looking for values that are 'close' in some sense to log(y(i)) . In order to do that, you can compute log(y(i)) for all the y(i) you're going to be checking, but that's too much work. A better way for ordinary QS is to compute log(y(i)) every once in a while (say, once every 10000 i values) and use that value as a cutoff for the next bunch of y(i). Any value in the sieve array that is almost this number gets trial factored; the best value of the cutoff should therefore allow a little room below log(y(i)), exactly how much room being implmentation dependent. As for your other question, linear algebra tells you that any time you have a matrix A with N linear equations, if those equations have at least N+1 variables then A*x = 0 has a nonzero solution for x. In the QS case, every row of the matrix represents a prime number, every column represents a relation from the sieving, and x is an array of bits that tell you whether or not a relation is used in the QS square root. If A*x = 0, then some combination of relations produce a product of prime numbers where all the primes have even exponents (this is what a zero vector means), and so the resulting product of relations is a square. Wikipedia has a good worked example of QS that attempts to explain all this. jasonp 

20070326, 17:02  #4  
Mar 2007
100_{2} Posts 
Thank you very much for your replies !
bdodson Thank you for this long reply. But it's a bit to far for me, I want first to understand everything in the basic QS of Pomerance and then I'll look for the MPQS. Now I understand how the sieving works. It's like the sieve of Erathostem but we cross the numbers, and the numbers that are the most crossed are smooth. Or if we divide, the smooth number we'll be the number that at the end five "1" because they were divided by all the numbers in the factor base. But what I don't understand, is how do we implant this in an algorithm based on squares to find the smooth numbers ??? I don't understand how do we get these strange roots. From wikipedia : Quote:
Now, for sieving purposes, solve the congruence x^2\equiv 1817\pmod{p} for each p in the factor base. The square roots are: Mod 2: 1 Mod 7: 2 and 5 Mod 13: 6 and 7 Do you have a piece of code, or some documents that I should read about the smooth numbers, quadratic sieve... ? jasonp Thank you now I understand. (I have to revise the matrixes...) Thank you very much for your help ! Last fiddled with by Carmichael on 20070326 at 17:03 

20070326, 18:08  #5  
Tribal Bullet
Oct 2004
6704_{8} Posts 
Quote:
jasonp Last fiddled with by jasonp on 20070326 at 18:10 

20070407, 16:16  #6 
Mar 2007
2^{2} Posts 
Ok,
Now I understand nearly all about the use and implementation of the quadratic sieve. But if I sieve with the trial division. So I mean, i can use and understand the quadratic sieve just for for short number. Larger numbers need the use of sieving with the Shank Tonelli algorithm. I still don't understand how these modular roots can help me in sieving and finding the smooth numbers y(x) :( Please help my brain 
20070407, 20:05  #7  
Tribal Bullet
Oct 2004
2^{2}·881 Posts 
Quote:


20070409, 12:47  #8  
Nov 2003
2^{5}·233 Posts 
Quote:
everything in detail. Let Q(x) be a [quadratic] polynomial. We seek values of x such that Q(x) is smooth with respect to a fixed set of primes {p1, p2, p3.......} Observe that if p divides Q(x), then it also divides Q(x + kp) for all integers k. Solve Q(x) = 0 mod p. There will be two roots (unless p divides the discriminant). Call these r1 and r2. Note now that Q(r1), Q(r1 + p), Q(r1 + 2p), .... and Q(r2), Q(r2 + p)... are ALL divisible by p. Thus, we add log(p) to sieve locations r1, r1+p, r1+2p, .... r2, r2+p, r2+2p.... for ALL primes in our fixed set of primes. Now observe that if the accumulated sum at any location x is close to the value of log(Q(x)), then Q(x) will factored over our fixed set of primes. Estimating log(Q(x)) is easy. 

20070410, 11:30  #9 
Mar 2007
2^{2} Posts 
Thank you very much for all your help, now I understand the QS algorithm

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Two questions:  Dubslow  GPU Computing  1  20110805 18:22 
5 questions  OmbooHankvald  Factoring  6  20050828 19:31 
Questions  OmbooHankvald  Prime Sierpinski Project  2  20050801 20:18 
LLR questions  OmbooHankvald  Math  6  20050623 11:42 
A few questions :)  xtreme2k  Lounge  59  20021031 06:20 