![]() |
|
|
#12 |
|
Nov 2005
24·3 Posts |
The second stage of ECM is described at the beginnng of Section 7.4.2 in C&P. It is analogous to the second stage of Pollard p-1.
|
|
|
|
|
|
#13 | |
|
Jul 2005
4010 Posts |
Quote:
In fact, on an FPGA *everything* runs in parallel unless you specifically arrange otherwise.I'm going to try to attach a picture of the design hierarchy so you can see how I've implemented it. The top level module is on the left and is called "main", while the lower level modules are on the right. The "main" module is only for I/O interfacing. "ECM" is the topmost ECM module, and contains the modules GEN_CURVE and ELLIPTIC_MULTIPLY which correspond to steps 2 and 3 of C&P Section 7.4.2, which I assume are the stages 1 and 2 referred to above(?) Ron |
|
|
|
|
|
|
#14 |
|
"Nancy"
Aug 2002
Alexandria
2,467 Posts |
Judging by your comments I'd say ECM does not work the way you think it works. Try to understand the algorithm well enough to write a software implementation before you try attempt it in hardware. Btw, a 704 bit mul/REDC takes about 360 muls on a 64 bit machine, which on an AMD64 ideally takes about 750 clocks - at a 3GHz clock rate.
Somehow I don't think you thought your cunning plan all the way through. Alex Last fiddled with by akruppa on 2006-06-05 at 05:53 |
|
|
|
|
|
#15 |
|
Jul 2004
Potsdam, Germany
3×277 Posts |
Did you already test your implementation with easier targets?
It would be interesting to see your times for e.g. a 100 digit composite number that consists of two 50 digit primes. |
|
|
|
|
|
#16 | |||
|
Jul 2005
23·5 Posts |
Quote:
Quote:
If a mul/REDC instruction does a multiply and a modulo, then my 28 microseconds doesn't hold a candle to the AMD64's 250ns. Some FPGAs have builtin high-speed multipliers, but I doubt they could approach that speed. My only rejoinder is to point out that I can have as many of my 704 bit multiplier/modulo circuits running in parallel on an FPGA as it will accommodate, which for a Spartan3E is probably around a dozen or so depending on the FPGA version. In any case, I just can't help feeling that there is a great future for reconfigurable computing hardware in numeric analysis, and thought this would be a good way to get my feet wet. Quote:
Ron |
|||
|
|
|
|
|
#17 | |
|
Jul 2005
4010 Posts |
P.S.
Quote:
Ron |
|
|
|
|
|
|
#18 | |
|
"Bob Silverman"
Nov 2003
North of Boston
5·17·89 Posts |
Quote:
Step 1 and Step 2 can't "overlap". The output from Step 1 is an input to Step 2. Step 2 can indeed run in parallel. See my joint paper with Peter Montgomery An FFT Extension to the P-1 Factoring Algorithm to see how it might be done. Basically, one generates two sequences of numbers: x_1, x_2, x_3, .... x_n y_1, y_2, ... y_k such that the sequences are co-prime. One then computes product (x_i - y_j) or (using Brent-Suyama polynomials) product(f(x_i) - f(y_j)) and takes a GCD with N. The generation of the sequences and the product is easily partitioned among multiple processors. However, this is NOT an effective way to run ECM in parallel. Step 1 is very hard to do in parallel. Generally, all that can be done is to do the multi-precision modular arithmetic in parallel. But in step 1 the result of one iteration depends upon the value of the previous iteration. I do not follow your "ELLIPTIC_MULTIPLY" begins testing a large number of points comment. In Step 1, we let M = product(p_i ^ a_i) where p_i are successive primes, and a_i are small integers. We compute MP on the curve. We do this by computing 2^a P, then 3^b (2^a P), then 5^c (3^b (2^a P)) etc. We stop periodically to take a GCD. Computing the GCD on "a large number of points" will slow the algorithm down a LOT. We compute GCD's very infrequently. Giving separate curves to separate processors is not "running step 1 in parallel". The best way to run ECM in parallel is to give a single curve to each processor and let each one run independently. I mean no disrespect, but you have not described what you are doing in a way that makes me feel confident that you know what you are doing. This may simply be a failure on your part to express yourself adequately. |
|
|
|
|
|
|
#19 | ||||
|
Jul 2005
23·5 Posts |
Quote:
http://en.wikipedia.org/wiki/Pipelining Quote:
I will certainly try to find a copy of your paper with Peter Montgomery "An FFT Extension to the P-1 Factoring Algorithm" because it sounds very interesting. If anyone happens to have a web link to the paper handy, it would save me some time searching. Although your method may not be a good way to run ECM in parallel on a sequential computer, it might fit nicely into the fabric of an FPGA which works best when things can be partitioned into tiny pieces. An important difference between programming an FPGA and programming a conventional computer is that when you spawn a task (invoke a module) on an FPGA, that task/module gets as much computing power as its parent without suspending the parent - at least until there's no more room left on the FPGA. This comes in very handy at times such as in my Verilog implementation of the extended Euclidean algorithm (part of ECM) in which I have three separate multipliers all running simultaneously. :-) Quote:
Quote:
It may very well be that I have erred in my implementation of ECM, and I will spend some time looking at how others have done it (if I can find any understandable source code) to see if what I've done is equivalent or way off the mark. I had been assuming all along that just because it always gives me correct answers and no incorrect answers that I had implemented the algorithm correctly. In any case, to get back on topic I would still like to see an example of the Prime95 FFT multiplication source code in order to determine whether or not it might be a good candidate for FPGA implementation. Regards, Ron |
||||
|
|
|
|
|
#20 | |
|
"Bob Silverman"
Nov 2003
North of Boston
5·17·89 Posts |
Quote:
ROTFLMA. (we need a smiley for this!) |
|
|
|
|
|
|
#21 | |
|
"Nancy"
Aug 2002
Alexandria
2,467 Posts |
Quote:
What is commonly referred to as Stage 1 multiplies a point on the curve by a large, highly composite integer. This elliptic curve multiplication is a sequential process for one curve. The modular arithmetic may be parallelised. Or you may run stage 1 on several different curves in parallel. What is commonly referred to as Stage 2 takes the resulting point of stage 1 and tries to cover a single, larger prime in the group order by looking for a match (mod the prime factor you hope to discover) between two lists. This can be done by subtracting the elements of one list from the elements of the other list one pair at a time ("enhanced standard stage 2") or by reformulating the process as arithmetic on Z/ZN[X] so one can use efficient algorithms for polynomial arithmetic ("FFT stage 2"). You comments so far mention nothing that resembles an actual stage 2. Trying a hardware implementation of ECM should make a very interesting project and learning exercise. Prof. Franke of the Universität Bonn did experiments on this as well, I can try to dig out a paper he did on the subject if you want. But I really don't think you are ready for trying a hardware implementation yet, you seem to lack basic understanding of the algorithm. Crandall&Pomerance give a good description of ECM, with brief explanation of enh.std.stage 2 and FFT stage 2 iirc, in chapter 7 of their book. For the FFT stage 2, Montgomery's thesis "An FFT extension of the Elliptic Curve Method of factorization" is the definite source. Several algorithms for polynomial arithmetic have been improved since the thesis was written, but the basic concept of the FFT stage 2 is unchanged. Alex Last fiddled with by akruppa on 2006-06-05 at 19:37 |
|
|
|
|
|
|
#22 | ||
|
Jul 2005
2816 Posts |
Quote:
I even precompute the value of p[i] using Maple and "hardcode" it and the number to be factored into the FPGA. The p[i] is used as the multiplier for the elliptic curve points, and for reasons that escape me at the moment, p[i] is divided by two at each iteration (after testing the LSB and taking different actions depending on the result) until it reaches zero, at which point the the elliptic multiply returns either the coordinates of the next point or a successful factorization. There is no guarantee that the p's are prime as far as I know, but it would be impossible in any case for a reasonable sized FPGA to compute a lengthy sequence of large primes or to store them in local storage, so it's fortunate that it is (apparently) unnecessary. It's very difficult to determine the correlation between the original Maple program and the pared down Verilog version, so obviously I need to go back and review things in more detail. If I have indeed bollixed up the algorithm somehow, I'm puzzled as to why it nonetheless yields correct factorizations. Quote:
Thanks, Ron |
||
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| YAFU cli parameters | VolMike | YAFU | 8 | 2012-10-03 14:18 |
| SNFS 200 parameters | JoeCrump | Factoring | 3 | 2009-12-06 14:50 |
| Quartic: Parameters | R.D. Silverman | Factoring | 9 | 2009-02-18 23:24 |
| C140 Parameters | Chris61 | Factoring | 13 | 2007-04-23 09:56 |
| optimal parameters for GMP-ECM , -oe+ , -I | Walter Nissen | GMP-ECM | 16 | 2007-03-20 19:35 |