mersenneforum.org Would like to learn the math behind the factoring programs.
 Register FAQ Search Today's Posts Mark Forums Read

 2006-06-03, 05:30 #1 jasong     "Jason Goatcher" Mar 2005 DB316 Posts Would like to learn the math behind the factoring programs. I would like to learn the math behind the factoring programs. I am totally ignorant of the way things work, so please forgive the following question if it is stupid. I'm hoping that the factoring algorithms are able to be learned by expanding one's knowledge of algebra. Does this idea have any credibility? I have had an education up to college algebra, and am hoping there are books that can teach me factoring math, assuming my guess about factoring relating to algebra is correct. If I'm totally off my rocker, just tell me and I'll shut up. (Actually, now that I think about it, it's possible I'm correct AND totally off my rocker. Now where did I put my meds? )
2006-06-03, 09:11   #2
xilman
Bamboozled!

"πΊππ·π·π­"
May 2003
Down not across

2·3·13·137 Posts

Quote:
 Originally Posted by jasong I would like to learn the math behind the factoring programs. I am totally ignorant of the way things work, so please forgive the following question if it is stupid. I'm hoping that the factoring algorithms are able to be learned by expanding one's knowledge of algebra. Does this idea have any credibility? I have had an education up to college algebra, and am hoping there are books that can teach me factoring math, assuming my guess about factoring relating to algebra is correct. If I'm totally off my rocker, just tell me and I'll shut up. (Actually, now that I think about it, it's possible I'm correct AND totally off my rocker. Now where did I put my meds? )
Seems an entirely sensible question to me, and an entirely sensible plan of attack. I have a slight caveat though --- what highschool students call "algebra" is frequently rather different from what is called "algebra" at university level. Not really a problem, but a frequently a cause of misunderstandings.

There certainly are books out there. I would recommend learning number theory, rather than factorization per se, at least for a start. You need a grounding in number theory to get anywhere and books specializing in factorization usually assume at least a basic knowledge of number theory.

My personal favourite introductory book is Niven, Zuckermann & Montgomery, An Introduction to the Theory of Numbers. I've the 5th edition in paperback. There are several other good introductory books; other folk here may post their recommendations and Amazon will prove a valuable resource.

As for factoring in particular, the one's I'd recommend are Riesel's Prime Numbers and Computer Methods for Factorization and Crandall & Pomerance's Prime Numbers, a computational perspective.

You should probably learn some computer science, if you want to get anywhere implementing the algorithms. If the names "Knuth" and "TAOCP" doesn't mean anything to you, research them immediately!

Beware: you may be shocked by the price of these books if purchased new.

I'm pretty sure I've typed material like this before. Perhaps it should be in a FAQ.

Paul

2006-06-04, 15:07   #3
rdotson

Jul 2005

23·5 Posts

Quote:
 Originally Posted by jasong I would like to learn the math behind the factoring programs.
I would like to see some of the actual Prime95 source code, with hopefully enough documentation so that I could understand what the various parts do - particularly the assembly language FFT multiplication routine.

I've recently taught myself the Verilog hardware design language and have implemented Field Programmable Gate Array (FPGA) circuits for Fermat's factorization method, Pollard-Rho, and Lenstra's Elliptic Curve factoring method (ECM). Fermat's method is the only design that's actually running on a real FPGA at present (it's factoring the RSA-704 challenge number ) because the ECM design for RSA-704 is too large to fit into any FPGA that I can afford.

Meanwhile I have some time on my hands and am looking for some other project with which to entertain myself. I have a hunch that the FFT portion of the Prime95 code would be an ideal candidate for an FPGA based hardware accelerator board (if one hasn't been designed and built already). I have both Spartan3 and ProAsic3E FPGA development boards to work with, but no idea where to obtain the Prime95 source code.

By the way, does anyone have any idea how many eons I should expect to wait for the Lattice FPGA running Fermat's method at 1.16 microseconds per iteration on the RSA-704 challenge number to find the solution?

Ron

2006-06-05, 00:58   #4
jasonp
Tribal Bullet

Oct 2004

DD216 Posts

Quote:
 Originally Posted by rdotson I would like to see some of the actual Prime95 source code, with hopefully enough documentation so that I could understand what the various parts do - particularly the assembly language FFT multiplication routine.
The source is available somewhere on mersenne.org, which appears to be down at the moment. It's not the best resource for learning about FFTs though (and makes no claims to that effect).

You could get started by checking out the many papers on building an FFT into dedicated hardware, then estimate what it would take to convert one of those algorithms to use double precision floating point, and use that to estimate how much parallelism you would need at ~100MHz to beat a Pentium at 3000MHz.

The above sounds cynical, but I'd actually like to know the answers to those questions. Did the references I pointed out previously help at all?

jasonp

2006-06-05, 05:27   #5
dsouza123

Sep 2002

2×331 Posts

Quote:
 1.16 microseconds per iteration on the RSA-704
Wow, roughly 862000 iterations per second !

Is that a fixed time or an average ?
Does the time depend on how close the difference is
to the square root of RSA704 ?

What percent of differences, (ie potential squares),
do you evaluate, (do a square root on) ?

Later today I'll try to figure out the maximum amount of time it will take.
Of course the minimum could be in the next week, day, hour, minute or second.

Dan

 2006-06-05, 05:44 #6 dsouza123     Sep 2002 2·331 Posts I figure you only need to evaluate 1/4 of the differences. Ones that are 9 mod 16, ( 9 AND 15).
2006-06-05, 08:36   #7
rdotson

Jul 2005

1010002 Posts

Quote:
 Originally Posted by jasonp The source is available somewhere on mersenne.org, which appears to be down at the moment.
I know the source code used to be available, but I looked around for it awhile back and was unable to find it. Maybe I just wasn't looking in the right place(?). The version I downloaded years ago is dated 2001, so I'm sure it's hopelessly out of date by now. Here are some of the older Prime95 source files I would like to obtain recent copies of: factor64.asm, xfft1.mac, xfft2.mac, xfft3.mac, etc.

Quote:
 Originally Posted by jasonp You could get started by checking out the many papers on building an FFT into dedicated hardware, then estimate what it would take to convert one of those algorithms to use double precision floating point, and use that to estimate how much parallelism you would need at ~100MHz to beat a Pentium at 3000MHz.
Actually that's already been done in at least one application and the VLSI implementation was about 35 times faster than the software floating point FFT, but I would expect to see the VLSI advantage increase even more as the operand size increases. See: "An Optimum Design of FFT Multi-Digit Multiplier and Its VLSI Implementation" by Syunji Yazaki and Koki Abe at http://almond.cs.uec.ac.jp/papers/pd...yunji-kiyo.pdf Also, I wouldn't be the least bit surprised if there isn't an open source floating point FFT implementation skulking about somewhere free for the taking such as those at http://www.opencores.org

I have ample textbooks describing the FFT multiplication method step by step, but I'm more interested in finding out exactly where and how I might incorporate the technique to improve the speed of Prime95 (or other algorithms) that use a software FFT multiplier. In other words, I only want to carve out a very small specific module or subroutine(s) within the greater Prime95 program that would benefit from FPGA implementation rather than trying to implement the entire thing in an FPGA essentially from scratch. That also has the advantage of giving me immediate and accurate performance numbers to compare against the existing software implementation.

Quote:
 Originally Posted by jasonp The above sounds cynical, but I'd actually like to know the answers to those questions. Did the references I pointed out previously help at all?
I too would like to know the answers to those questions Jason, but my approach is to just build it and then compare the actual performance numbers. I can get some timing figures from the Verilog simulator, but not for any realistic data because the simulator is incredibly slow compared to real time. It's also more than just a matter of how much parallelism would be needed, because you have to keep in mind that unlike a conventional general purpose sequential computer with a predetermined bus width, etc., an FPGA is a totally blank slate that allows the designer to implement not only a particular algorithm, but to design an entire computer architecture to suit the algorithm rather than the other way around. It's almost like comparing apples and oranges. For example, the bus width on my ECM design is parameterizable at compile/synthesis time so that I can very easily set it to whatever bus width is needed. That way everything is always "single-precision" even if the single-precision numbers are 704 bits wide.

The references you supplied in a previous thread were very interesting, but I got bogged down in all the implementation details. In order for an application to be suitable for FPGA implementation, the dataset the FPGA works with at any given time must be relatively small, local, and the task should be computationally intensive. Otherwise data transfer bandwidth issues would erase any potential gains from using an FPGA. I would like to find some small portion of Prime95 that is reasonably well isolated (encapsulated) from the rest of the program and that has the qualities mentioned above.

Ron

2006-06-05, 09:33   #8
rdotson

Jul 2005

23×5 Posts

Quote:
 Originally Posted by dsouza123 Wow, roughly 862000 iterations per second ! Is that a fixed time or an average ?
It's an average. Since I don't have a fast oscilloscope, I designed the circuit so that each time it completes a Fermat iteration it increments a 32 bit counter by one. I display the upper bits (2^31 through 2^25) of the count in the LEDs on the development board. I used a handheld stopwatch to time how long it took for one of the high order bits (2^26 IIRC) to toggle from "off" to "on," then I divided the time on the stopwatch by 2^26 to come up with the 1.16 microsecond per iteration figure.

Quote:
 Does the time depend on how close the difference is to the square root of RSA704 ?
The time for each iteration is independent of the "closeness" of the prime factors to the square root of RSA704, but everything I've read says that the overall algorithm terminates much more quickly if the two factors are "close" together (and also perforce close to the square root of RSA704).

Quote:
 What percent of differences, (ie potential squares), do you evaluate, (do a square root on) ?
None. I use the classic Fermat algorithm that only incorporates addition and subtraction. See Algorithm C in Knuth's Vol 2 of "The Art of Computer Programming" for details.

Quote:
 Later today I'll try to figure out the maximum amount of time it will take. Of course the minimum could be in the next week, day, hour, minute or second.
That would be great Dan. Although I can't help taking a peek at the FPGA LEDs every once in awhile (they're programmed to flash and do a little "dance" if a solution is found), I expect the development board will have disintegrated into dust in the time required to complete the factorization by Fermat's method. Nonetheless, it gives me a feeling of accomplishment to know that I went from knowing virtually nothing about FPGA programming, to having designed and tested a real working FPGA circuit.

Ron

2006-06-05, 13:07   #9
jasonp
Tribal Bullet

Oct 2004

DD216 Posts

Quote:
 Originally Posted by rdotson Actually that's already been done in at least one application and the VLSI implementation was about 35 times faster than the software floating point FFT, but I would expect to see the VLSI advantage increase even more as the operand size increases. See: "An Optimum Design of FFT Multi-Digit Multiplier and Its VLSI Implementation" by Syunji Yazaki and Koki Abe at http://almond.cs.uec.ac.jp/papers/pd...yunji-kiyo.pdf Also, I wouldn't be the least bit surprised if there isn't an open source floating point FFT implementation skulking about somewhere free for the taking such as those at http://www.opencores.org
This is an interesting paper; am I correct that it describes an ASIC implementation? Note that pentiums are more than twice as fast now, gcc is a lot better, and it's not clear from the paper if they compiled FFTW with SSE2 optimizations. The current version of FFTW also has a better FFT generator than 3.0.1 did. I guess the point is that the state of the art is a furiously moving target; still, even 2x faster than the fastest Pentium would be very cool, now that we seem to have shifted to using more cores instead of faster cores.

The last time I looked, opencores had verilog for an IEEE floating point unit but not for an FFT. Admittedly that was years ago.
Quote:
 I have ample textbooks describing the FFT multiplication method step by step, but I'm more interested in finding out exactly where and how I might incorporate the technique to improve the speed of Prime95 (or other algorithms) that use a software FFT multiplier. In other words, I only want to carve out a very small specific module or subroutine(s) within the greater Prime95 program that would benefit from FPGA implementation rather than trying to implement the entire thing in an FPGA essentially from scratch. That also has the advantage of giving me immediate and accurate performance numbers to compare against the existing software implementation.
Google found what looks like a mersenne.org mirror:

http://lettuce.edsc.ulst.ac.uk/gimps/

Their most recent source collection is for v24.14 (last August)

From memory, the core inner loops in prime95 implement a fairly standard radix-4 real-valued FFT. All the fun seems to depend on using the macro in a way that fully exploits caches, write buffers, prefetching and other things that dedicated hardware doesn't have to worry about :)

p4notes.doc is also a very interesting read.

jasonp

Last fiddled with by jasonp on 2006-06-05 at 13:09

2006-06-05, 14:20   #10
R.D. Silverman

Nov 2003

22×5×373 Posts

Quote:
 Originally Posted by rdotson I've recently taught myself the Verilog hardware design language and have implemented Field Programmable Gate Array (FPGA) circuits for Fermat's factorization method, Pollard-Rho, and Lenstra's Elliptic Curve factoring method (ECM). Fermat's method is the only design that's actually running on a real FPGA at present (it's factoring the RSA-704 challenge number ) because the ECM design for RSA-704 is too large to fit into any FPGA that I can afford. Meanwhile I have some time on my hands and am looking for some other project with which to entertain myself. I have a hunch that the FFT portion of the Prime95 code would be an ideal candidate for an FPGA based hardware accelerator board (if one hasn't been designed and built already). I have both Spartan3 and ProAsic3E FPGA development boards to work with, but no idea where to obtain the Prime95 source code. By the way, does anyone have any idea how many eons I should expect to wait for the Lattice FPGA running Fermat's method at 1.16 microseconds per iteration on the RSA-704 challenge number to find the solution? Ron
I heartily applaud your hardware efforts. They merit strong kudos.

But trying to do RSA704 with ECM, Polllard-Rho, or Fermat's Method
(except to test the hardware) will get you labelled as a kook in a hurry.
These methods are hopeless for this number.

Implementing the lattice sieve on FPGA is a VERY VERY hard problem.
I have strong doubts whether they can be used effectively. And what
do you mean by "per-iteration" for the lattice sieve??? The time
will be dependent on factor base size.

If you mean a single lattice point addition every 1.16 usec,
(along with all the overhead) then I am afraid that this will not be
competitive with current PC based implementations. My laptop, for
example, (1.5GHz Pentium M) does an entire special-q value on a
14M x 7M array (98 million lattice points) using a factor base
of 1.8 million primes per polynomial in just 55 seconds. This is
twice as fast as your 1.16 usec result. The time includes all the overhead
to do the lattice reduction, compute the start points, do the sieving,
and to compute the actual factorizations of the smooth norms.

My lattice sieve code could do the sieving for RSA704 in "about" 8 months
on a collection of "about" 400 Pentium-4 PC's (at about 3GHz each).
[this estimate is within a factor of 2]. Dealing with the matrix would be
a MAJOR problem.

 2006-06-05, 14:53 #11 dsouza123     Sep 2002 2·331 Posts Maximum iterations, with the bigger of the two factors of RSA704 being 2^352 - 1 17675749973855171424998162026114852226112656948184937770657347111317090767408538639881686924493222509658 Years 649728432126397408326294393435914916923044910255993098776919748305569032188566456963227775 Last fiddled with by dsouza123 on 2006-06-05 at 14:54

 Similar Threads Thread Thread Starter Forum Replies Last Post smh Factoring 71 2019-02-21 22:33 rogue Factoring 32 2009-09-17 11:40 henryzz Factoring 6 2007-09-19 13:47 ixfd64 Factoring 1 2005-09-08 12:13 ixfd64 Factoring 1 2004-04-27 09:41

All times are UTC. The time now is 03:33.

Mon May 17 03:33:39 UTC 2021 up 38 days, 22:14, 0 users, load averages: 2.54, 2.69, 2.96