2002-11-01, 01:42 | #1 |
"Phil"
Sep 2002
Tracktown, U.S.A.
3·373 Posts |
Elliptic Curve Method - the theory
This is just a nutshell introduction to the details of Elliptic Curve Method (ECM) factoring for those who are interested. I used ECM in Prime95 for several months before I learned how it works, but the method is so elegant, I thought curious people might enjoy knowing a little bit more about the algorithm. A more complete introduction to this topic can be found in the book "Rational Points on Elliptic Curves" by Silverman and Tate, 1992, Springer, but if you want to know the details of various implementations, I would suggest looking at some of the original papers.
It is probably easiest to start with the P-1 method, as that was the forerunner of ECM. Fermat's little theorem says that if p is a prime number and p does not divide a, then a^(p-1) = 1 mod p. This means that 1 is the remainder upon dividing a^(p-1) by p, or that a^(p-1) - 1 is divisible by p. If we consider the p-1 non-zero remainders mod p, we find they form a "group" under multiplication, meaning that multiplication is associative, there is an identity represented by the remainder 1, and that every remainder c has an inverse c' such that c*c'=1 mod p. It is this last statement that requires p to be a prime number. A theorem of finite groups says that if a group has n elements and a is any element of the group, then a^n = 1, where ^ means repeated group operations (here, multiplication) and 1 is the identity. So since our group has p-1 elements, a^(p-1) = 1 mod p follows from this group theory theorem. Suppose that m is divisible by p-1, so that m=(p-1)q. Then a^m = a^(p-1)q = [a^(p-1)]^q = 1^q = 1 mod p. The idea of P-1 factoring is that if we choose m to be the product of all prime numbers and prime number powers smaller than a certain bound B, then if p-1 happens to also have all its prime factors or prime power factors less than B, then a^m - 1 will be divisible by p. The problem is that if m is large, a^m is enormous, and we quickly run into a computing problem. But suppose p is an unknown factor of some number N which we are trying to factor. We can compute a^m - 1 mod N, that is, only keep the remainder upon dividing by N at each step, which makes the computation problem manageable. Since p is a factor of both a^m - 1 and N, we can take the GCD (Greatest Common Divisor) of these two numbers by the Euclidean algorithm and hope that we get a GCD greater than 1 and less than N, which therefore will be a factor of N. So take m = the product of q^d over all primes q < B, where d is the integer part of log B / log q, i.e., the largest exponent d such that q^d < B. Then we can pick any base b and compute b^m mod N. Actually, rather than compute m and then b^m, we just cycle through all primes q < B and compute b# = b^(q^d) mod p, then replace b by b# and go on to the next prime. Each exponentiation can be done efficiently by repeated squarings and multiplications in approximately log q / log 2 operations. In order to find large factors, we pick B large. GCD operations may be time-consuming, so we put off doing the GCD until we reach B. One distinct advantage of P-1 factoring on Mersenne numbers is that we know that the exponent itself must also be a factor of P-1, so in addition to the primes q < B, we also throw in the exponent itself if it is greater than B. For example, M2796203 has the factor P = 17017419583182311 which is equal to 2*5*89*619*11047*2796203 + 1. Since we know that we have to include 2796203 anyway, we really only need to choose B to be at least 11047 to find this factor. For Fermat numbers 2^(2^n) + 1, any factors must be of the form k*2^(n+2) + 1, so we can also throw some extra large powers of 2 into the product m. We will see later that this is an advantage of P-1 that ECM does not have. To find large factors, we may need to choose B to be large. If N is a large number, we may find it takes a long time to get to B, even with FFT multiplication. In this case, there is a shortcut called stage 2. We now will rename B our stage 1 bound B1 and choose a new bound B2 > B1 for the second stage. Remember that stage 1 ended with a number b^m mod N where m was the product of all primes and prime powers less than B1. Suppose that q is a prime with B1 < q < B2. We would like to compute b^(mq) = (b^m)^q mod N for each such prime q. (From now on, I will omit the mod N from each quantity.) We could do this by exponentiation as in stage 1, but this wouldn't be a shortcut. Stage 2 does this by first computing (b^m)^q1 where q1 is the first prime > B1, and then computing (b^m)^q2 where q2 is the next prime > q1 as (b^m)^q2 = (b^m)^q1 * (b^m)^(q2-q1). The advantage is that (b^m)^q1 has already been computed and the exponent (q2-q1) of (b^m)^(q2-q1) is the difference between two successive primes, usually of the order ln(q2), and must have one of the values 2, 4, 6, 8, etc. Stage 2 speeds up the process by computing (b^m)^2, (b^m)^4, (b^m)^6, etc. and storing these values. This is why stage 2 requires so much more storage than stage 1, but the precomputation means that each step can be accomplished by one multiplication mod N. Actually two multiplications, because we don't want to do an expensive GCD for each q between B1 and B2, we just multiply all the values of (b^m)^q together and do one GCD at the end. We get a large increase in speed, but the price we pay is that we only discover our factor p if p-1 has just one factor between B1 and B2, not more than one. So for example, M2796203 has the factor 23349981773942355169801 which is equal to 2^3*3^2*5^2*99733*2796213*46516439 + 1. It could have been discovered with B1 set to 100,000 and B2 set to 50,000,000 since it has only one factor (other than 2796203) greater than B1. Since most large numbers tend to have one large prime factor which is much larger than all the others, stage 2 processing often finds factors missed in stage 1. We say that a number F is (B1,B2)-smooth if all prime-power factors of F are less than B1 with the exception of possibly one factor between B1 and B2. If we don't find a factor with the P-1 method, the only option is to increase B1 or B2 or both. ECM also works by looking for smooth numbers, but contains additional options. What is an elliptic curve? "Most" second degree equations in two variables represent conic sections, where by "most", we mean that we want to exclude certain degenerate cases, such as an equation like y^2 - x^2 = 0 which actually consists of two straight lines. "Most" third or fourth degree equations in two variables can be equipped with a structure which turns them into elliptic curves. (We can exclude bad cases by throwing out the ones where the discriminant of the equation is equal to zero.) In the good cases, we can come up with a way of "adding" points on the curve, that is combining any two points on the curve to get another point on the curve in a way that the combination is associative, there is an identity element, and each element has an inverse. Talking about elliptic curves is simplified by the fact that we can change variables in a way that transforms the equation into what is known as the Weierstrass form: y^2 = x^3 + A*x^2 + B*x + C, where A, B, and C are constants. (Some variations set either A or C to zero.) The points on this elliptic curve are the solutions of this equation along with one more point called the "point at infinity", denoted O. We can pick this point at infinity to be the identity element. Then, the way that we add two points (x1,y1), (x2,y2) on this curve is: Step 1: Find the equation of the line which passes through these two points. If these two points are the same point, take the tangent line to the curve instead. Step 2: This straight line intersects the equation y^2 = x^3 + A*x^2 + B*x + C in three points unless it is a vertical line. Two of these points are (x1,y1) and (x2,y2). Call the third point (x3,y3). Step 3: Define the "group addition law sum" of (x1,y1) and (x2,y2) to be the point (x3,-y3). Note the minus sign in front of y3. Since y^2 = x^3 + A*x^2 + B*x + C is even in y, this point is also on the curve. If the straight line is vertical, define the sum to be the point O at infinity. This will be the case only when (x2,y2) = (x1,-y1). (We also need to define the sum of two points when one of them is the point at infinity, but since O is the identity, (x,y)+O=(x,y) and O+O=O finishes the definition.) That's it. That's an elliptic curve, the solutions of y^2 = x^3 + A*x^2 + B*x + C together with the group law. We can let the variables x and y represent real numbers, rational numbers, even complex numbers. Where these groups get interesting from our point of view is when we do our arithmetic mod p for some prime number p. Remember that multiplication mod p led to a group with p-1 elements. It turns out that doing elliptic curve addition mod p leads to a group with p-1+r elements, where r can be any positive or negative integer between -2(sqrt(p)) and +2sqrt(p)). If p has 30 digits, p-1+r also has 30 digits, and the first 15 digits are the same as p while the last 15 are usually different. We don't know p, our unknown factor of N, but again, we can do our group operations mod N and take a GCD at the end to hopefully find p. These group operations are a bit more complicated than simple multiplication mod N, but the method works the same way: if we choose bounds B1 and B2 and an arbitrary point on the curve, compute this point added to itself m times in stage 1, then take the result and compute it added to itself q times for each prime q between B1 and B2, and we will find p if the group order p-1+r happens to be (B1,B2)-smooth. Group operations are about 6 times more complicated than straightforward multiplication, but the advantage of ECM is that, if we don't find a factor, rather than increasing the bounds, we can change the group! Remember that the Weierstrass form was y^2 = x^3 + A*x^2 + B*x + C. Different constants A, B, and C lead to different curves, which means different groups and different group orders p-1+r for a given p. So if one group order is not smooth enough, we can change r by randomly changing A, B, and C and try again. The one disadvantage is that we knew that if p was a factor of a Mersenne number 2^n - 1, we already knew that n was one of the factors of p-1. Here we don't know any of the factors of p-1+r, although it is possible to choose a parametrization so that p-1+r at least has a relatively small factor of 8, 12, or 16. This fact probably gives the P-1 method an advantage in the 20-25 digit range where GIMPS has found many new factors of Mersenne numbers, but as factors get larger, ECM definitely has an advantage. I would guess that P-1 is still a viable search method for the 25th and 26th Fermat numbers F25 and F26, but smaller Fermat numbers will require ECM for discovery of new factors. So far, the largest factor discovered by P-1 has 42 digits while the largest factor discovered by ECM has 55 digits. See: http://www.users.globalnet.co.uk/~aads/Pminus1.html and: ftp://ftp.comlab.ox.ac.uk/pub/Documents/techpapers/Richard.Brent/champs.txt The probability of finding an n-digit factor with ECM depends upon the density of smooth n-digit numbers. The table on http://www.loria.fr/~zimmerma/records/ecmnet.html gives suggested numbers of curves to run with given B1, B2 limits to find factors of a given size. Richard McIntosh has submitted a paper to Mathematics of Computation saying that these numbers are too pessimistic, and that fewer curves are needed, but this table still give a good general idea. In the range of 20-50 digits, it appears that the amount of work needed scales approximately as the one-fifth power of the factor. This means that it takes approximately ten times as much work to find a 35 digit factor as a 30 digit factor, and approximately 100 times as much work to find a 40 digit factor as a 30 digit factor, etc. The work also scales approximately according to the number of digits in the number to be factored, so that running a curve to given (B1,B2) bounds on any Fermat number should take approximately twice as long as running the same curve on the next smaller Fermat number. I'll stop here, but if you find this interesting, there is a good readable introduction to ECM in Crandall and Pomerance's new book, as well as the general introduction to Elliptic Curves in the Silverman and Tate book I mentioned earlier. |
2002-11-02, 00:00 | #2 |
∂^{2}ω=0
Sep 2002
República de California
26741_{8} Posts |
Thanks, Phil, for that nicely readable nontechnical introduction to ECM.
I'd like to make a couple of additional small points by way of clarification, mainly for folks who are new to the theory of finite groups: - When Phil discusses the smoothness property of the p-1 method, the group order he refers to is the order of the multiplicative group in question, whereas in the case of ECM it is the additive group defined by arithmetic on the elliptic curve. - If the p-1 stage 1 bound is small enough that the product of primes <= B1 will fit into one's computer memory, one can premultiply all these primes together to get their product m (which needs only O(m*log(m)) work, if one uses a fast multiply algorithm once the partial products get much larger than a computer word) and use a left-to-right binary exponentiation (LRBE) in step 1. This needs just 2/3 the work of the one-prime-at-a-time approach. The one disadvantage of this is if we run a stage 2 and fail to find a factor, then wish to go back and run to a larger stage 1 bound B1', we can't use LRBE for that unless we're willing to start from scratch; if not, we've got to do the continuation via the normal, more-expensive powering method. (But we can apply the slower continuation to a stage 1 residue calculated using either method.) - Stage 2 of p-1 (and ECM as well, if I recall correctly) lends itself to a distributed approach. One uses a single machine (perhaps a fast multiprocessor machine) to generate a stage 1 residue, then can farm that out to any number of independent machines, each one of which processes its own distinct stage 2 interval(s). - For really large numbers (i.e. whose computer representation takes up some significant fraction of the machine memory, memory constaints may rule out ECM, because it needs a lot more memory than p-1. Let's call M the amount of memory computer storage needed by N, the number to be factored. This will typically be ~ 4X the number of bits in N itself, since to do fast multiply modulo N (which operation dominates the computer work for both p-1 and ECM), we need to break it into smaller chunks that only take up a fraction of a computer word - a typical implementation puts each ~16-bit chunk of N in a 64-bit word (floating or integer). Then, doing stage 1 of p-1 via LRBE needs memory only = M (in the case of numbers like Fermats or Mersennes where we know how to do modular multiply sans zero-padding of the input vectors) or 2M in the general case. Stage 2 of p-1 can use anywhere from (roughly) 4 times this much to hundreds of times as much - the latter is more efficient if there is enough RAM to prevent swapping to disk, but the efficiency gain is not overwhelming (in other words we can still do a decently fast stage 2 even using the low-memory version.) By way of contrast, even the lowest-memory version of ECM stage one needs about 15X the memory of p-1 stage 1, and stage 2 needs yet more. - Phil mentions that the largest factors found by p-1 are only ~40 digits, compared to over 50 digits for ECM. But in fact one of the big reasons ECM has found larger factors (and these large factors are all of numbers which are utterly miniscule compared to the ones GIMPSers routinely find 30-40-digit factors of using p-1) is simple: a heck of a lot more machine time has been spent running ECM. This is because the way most people use ECM is to find small (say < 60 digits) factors of just one or a few numbers they're particularly interested in. Since the numbers in question (except for the larger Fermats, starting with F16 or so) are generally quite small, running p-1 to find a factor doesn't usually save much machine time - if the factor is small enough that it had a smooth p-1, it likely will also be found quite quickly by running a few ECM curves. On the other hand, if the factor is decently large (again this is for a single given number N), the odds of it being very p-1 smooth are quite small, and drop exponentially fast as the factor gets bigger. So if shallow p-1 or ECM has failed to turn up a factor and I (for a given amount of machine time) want to maximize my chances of finding a factor of that number, ECM is by far the better way to go, since as p gets bigger the odds of one of many ECM curves yielding an A-smooth group order are much better than those of p-1 being B-smooth, even though B is quite a bit larger than A. However, there is a different way to compare the two methods, which is more relevant to the way GIMPS does factoring. If I have a large collection of numbers of a type known to frequently have small factors (i.e. we're not talking about the RSA challenge numbers here - those are designed to have two factors each, each factor being about as large as it can be) then, for a given amount of machine time, I will generally find significantly more factors using p-1 than ECM. That is because we can think of a p-1 run as being like a single very fast ECM run, i.e. for the same CPU time we can run it to significantly deeper bounds than we can an ECM curve. Since p-1 is even more effective if the numbers in question are known to have factors of the form C + 1, where C is some composite with better-than-average chances of being smooth (and by Legendre's Theorem all numbers of the form a^n +- b^n, with GCD(a,b) = 1 do in fact have factors of this very form), for numbers like Mersennes or Fermats it is far more effective at finding large numbers of small factors than ECM is. The main reason GIMPS hasn't yet found a 50-digit- plus factor using p-1 is that nearly all the p-1 work has been done on the big monster numbers, where it simply doesn't pay to set the stage bounds large enough to have a good chance of finding such a big factor. I hope I don't sound like I'm dissing ECM - it is a wonderfully elegant method, and still the only known one which affords us a good chance of finding large factors of any given number N which too big for the Number Field Sieve to handle - but the idea that ECM is somehow inherently better at finding large factors than p-1 is to some degree a misconception, based on the way the method is normally applied and the relatively larger amount of machine spent using it to try to find factors of smallish numbers N. As a concrete example, consider the Fermat numbers. In the paper "Three new factors of Fermat numbers" by Crandall et al., (available at http://www.perfsci.com/free/techpapers/freepapers.html ,) they mention a new 27-digit factor of F13, p27 = 319546020820551643220672513. This factor has p-1 = 2^19.51309697.11878566851267, i.e. p-1 with a stage 1 bound of 10^8 and stage 2 bound slightly larger than 10^14 (neither unreasonable, especially if Silverman et al's FFT-based stage 2 method is used) would have found the factor. Sure, 10^14 is a deep stage 2, but in one sentence they mention having run ECM for 47 days on dedicated hardware (a Dubner Cruncher) to find the factor, yet in the next breath say "The factorizations of p27+-1 explain why Pollard's p+-1 methods could not find the factor p27 in a reasonable amount of time." (Note that the p+1 method is actually due to Williams.) They of course don't mention whether a similar p+-1 effort was expended, but I guarantee you that with a similarly optimized program running on similarly fast hardware, one could find the same factor using p-1 in roughly the same amount of time. (I've run similarly large Mersenne numbers to B1 = 10^9, B2 = 10^12 in far less machine time, and that program didn't use the FFT stage 2 optimization - the latter would've allowed a stage 2 ~100 times deeper in the same machine time - there's your 10^14.) It's interesting that a month-and-a-half of dedicated crunching is perfectly acceptable for ECM, but somehow seems "unreasonably large" for the p-1 method. In the same paper, the authors describe a new 27-digit factor of F16, p27 = 188981757975021318420037633 = 3^2.31.37.13669.1277254085461, found by running ECM on 6 SPARCstations for a month. Again, the same factor could have been found in a comparable amount of machine time via p-1. Lastly, for F15 they also found (in about 120 CPU-days on a P200) a p33 = 168768817029516972383024127016961 = 5.7.53.97.181.199.1331471.149211834097. Again, a comparable or lesser amount of p-1 work would have found the factor. Now, of course for factors > 25 digits or so the odds of a smooth p-1 decomposition drop rapidly, and when the factors in question have no special form p-1 loses one of its key advantages. But for the aforementioned classes of numbers, if we are trying to maximize our odds of finding 30-40-digit factors, we should definitely give both methods at least a comparable amount of machine time. |
2002-11-12, 01:11 | #3 |
"Phil"
Sep 2002
Tracktown, U.S.A.
3·373 Posts |
ECM and P-1 factoring
Thanks, Ernst, for your comments and analysis. Your observation explains very nicely why we do not do ECM on GIMPS candidates. Such Mersenne numbers have already been trial factored to 19 or 20 digits, so we are looking for factors most likely to be in the range of 20 to 25 digits, although sometimes we get lucky and find factors up to or beyond 30 digits. The constraint for GIMPS is that we do not want to spend more time factoring than we will end up saving by eliminating candidates by Lucas-Lehmer testing, so naturally we want to find as many factors as possible in a given amount of time. P-1 is a superior factoring method for this group of numbers for three reasons:
1) P-1 is a simpler algorithm and therefore runs to the same stage 1 and stage 2 bounds faster than ECM. 2) We hope to be lucky enough to find a smooth group order. We know "a priori" that the group order for P-1 for a Mersenne number 2^k-1 (k prime) is divisible by 2k. We only know that the group order for ECM is divisible by 12 (or 8 or 16, for other implementations). 3) We are looking for factors most likely to be < 30 digits of a large class of numbers. If we are looking for larger factors of one particular number, or of a small class of numbers, such as Fermat numbers small enough to use ECM/P-1, the advantages of P-1 soon diminish. One always wants to do <some> P-1, and of course I'm not saying you might not get lucky and find a large factor that way, but you certainly would not want to spend a comparable amount of time with P-1 looking for large factors of F12 through F20, and probably even through F22, ECM now has a definite edge. Asymptotically, ECM has a complexity estimate of exp(sqrt((2+o(1))(ln f)(ln ln f))) operations to find the factor f. It is the fact that the ln f is inside the square-root that gives it a big advantage over P-1. In P-1, the size of the largest factor of f-1 determines how much work it is to find f. When f-1 has no special form, we expect the natural log of this factor to be, on average, (1-1/e)ln f; i.e., this factor has, on average, about 63% of the digits of f. We usually must do stage 2 P-1 up to this factor to find f. For Mersenne numbers, we get to reduce this to (1-1/e)ln(f/2k), so the amount of work is approximately proportional to (f/2k)^.63. Look at the 9 factors which have been found by ECM since 1988: p40 of F10: 2^12*3*5639*8231*433639*18840862799165386003967 + 1 p21 of F11: 2^14*3*373*67003*136752547 + 1 p22 of F11: 2^13*7*677*91722533083549 + 1 p19 of F13: 2^17*19*1069660739759 + 1 p19 of F13: 2^19*3*4363*525050549 + 1 p27 of F13: 2^19*51309697*11878566851267 + 1 p33 of F15: 2^17*5*7*53*97*181*199*1331471*149211834097 + 1 P27 of F16: 2^20*3^2*31*37*13669*1277254085461 + 1 p23 of F18: 2^23*29*293*1259*905678539 + 1 Of these nine factors, three of them (the p21, the larger p19, and the p23) could have been found fairly easily by P-1, requiring a stage 2 bound of less than 10^9. The p40 is clearly out of reach, and the other five would require a very deep stage 2 factoring effort. I did an experiment on the 27 digit factor of F13: I set 5 866MHz Pentium-3s to work doing ECM on it, running curves with Prime95 with the stage 1 bound (B1) set to 250,000, stage 2 bound (B2) equal to 100 times the stage 1 bound. The computers stopped when they found the factor in: 83 curves, 267 curves, 112 curves, 70 curves, and 205 curves for an average of 147 curves and an average time of 9 hours, 28 minutes per machine. Then I ran P-1 on this same number to B1=40,000,000 and B2=4,000,000,000. Stage 1 took 1 hour, 5 minutes while stage 2 took 4 hours, 29 minutes. (Okay, I know prime95 isn't optimized with SSE2 for P-numbers, but I thought it was still a good order of magnitude estimate.) In order to find the 27 digit factor, I would have to run stage 2 to 1.18x10^13, which would take around 550 days at that rate. Even with a 100 times improvement in stage 2 speed with an FFT polynomial continuation, that still would be 5 days. (By the way, I would like to know who has a good implementation of the FFT polynomial stage 2. Does it take a lot of memory?) I did a little computation with Richard McIntosh's program for figuring the odds of a number being smooth. (By the way, Richard is currently doing distributed stage 2 P-1 work on larger Fermat numbers.) Suppose we have a six-digit Mersenne exponent k, and suppose 2^k - 1 has a 35-digit factor. We would expect to have to run about 1340 ECM curves to B1=1,000,000 and B2=100*B1 to find this factor. Since most of the work in P-1 tends to be in stage 2, we might increase this ratio for P-1 to B2=1000*B1. Then, if we run P-1 to B1=1,000,000 and B2=1000*B1, the odds of a 35 digit factor being smooth enough to turn up are about 1 in 106. Increase B1 to 10,000,000 and the odds go up to 1 in 30. At B1=100,000,000, the odds go up to 1 in 13, and at B1=1,000,000,000, the odds are about 1 in 7. This last bound represents about the same amount of work as the 1340 ECM curves which should have an almost 2/3 chance of finding the factor, but the P-1 work at B1=10^9, B2=10^12 only has a 1 in 7 chance of finding this factor. Go up to a 40-digit factor and the odds for P-1 get even worse! If we are doing ECM for factors of a certain size, we certainly want to do some P-1 work, but the point at which it makes sense to stop is the point where further P-1 work has a smaller chance (per unit of time) of finding the factors than more ECM. The advantage of trying different groups in ECM is a powerful one, which is why it makes more sense to run 1800 curves at B1=1,000,000 in ECM than to run one curve at B1=1,800,000,000. You aren't betting on any one curve being fairly smooth, you are betting on one in 1800 curves being very smooth. By the way, I'm guessing that a fast Athlon could run stage 2 P-1 on F25 up to a stage 2 bound of 500,000,000 within a year. You would probably want to allocate 150MB to 200MB for it. Is anyone game? Phil |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Elliptic curve arithmetic | burrobert | GMP-ECM | 6 | 2012-11-08 13:05 |
Elliptic Curve Arithmetic | Raman | Math | 8 | 2009-04-13 19:20 |
Elliptic curve method | Dirac | Factoring | 11 | 2007-11-01 14:01 |
Elliptic factoring with points *NOT* on the curve | bongomongo | Factoring | 5 | 2006-12-21 18:19 |
Elliptic Curve Method factoring - Fermat numbers | philmoore | Math | 131 | 2006-12-18 06:27 |