mersenneforum.org Interested in C/C++ libraries for factoring
 Register FAQ Search Today's Posts Mark Forums Read

 2009-05-25, 18:31 #1 PythonPower   May 2009 1002 Posts Interested in C/C++ libraries for factoring Hi everyone, I'm researching libraries for fast integer factorization and have been advised that here might be a good place to ask any questions I had. On the assumption that this is the appropriate place to ask, I'll continue! ;) I will be factoring general numbers without any special properties to them of size up to 1012. I have quite a lot of them and the efficiency of the their factorization is critical. I have worked out that for my program to be feasible, I would need to be able to factor 12-digit integers in a millionth of a second (or close) on average. With this in mind, does that sound possible? If it is, I am interested in C/C++ factorization libraries and which one would be ideal. The integers will be stored as long integers in either C or C++. I can afford around 512MiB of memory to be used during factorization (possibly more). Can anyone recommend a library suitable? Thank you for your time, PythonPower PS: My original question that led me here can be found here.
 2009-05-25, 20:14 #2 sean     Aug 2004 New Zealand 2·3·37 Posts I don't think you need to use a library. Trial division by all primes up to 10^6 will be sufficient to factor your numbers. Precompute the primes and store them (e.g. using the sieve of Eratosthenes), they will easily fit in you 512MB constraint. Whether it will be fast enough probably depends on your CPU and whether or not you can do multiple numbers in parallel.
 2009-05-25, 20:22 #3 mklasson   Feb 2004 25810 Posts A millionth of a second sounds like a rather short time interval. You'd only have enough time to use a thousand or so instructions. I'm not sure, but I think you may have trouble completing your 12-digit factorisations that fast. Hopefully I'm wrong. I don't know what the best method might be, but you could always try the gmp-ecm lib and see what kind of timings you get with ecm. You'll probably need to know a little about both the lib and the method to use it. Maybe the msieve lib is interesting? Someone else can no doubt give you a better answer.
 2009-05-25, 20:40 #4 akruppa     "Nancy" Aug 2002 Alexandria 1001101000112 Posts GMP-ECM wouldn't meet the time criterion. The only thing I can think of that would satisfy your speed demand is a sieving technique, if the input numbers follow some pattern that allows sieving. Even an implementation of ECM optimized for small numbers would be about an order of magnitude slower than you want. GMP-ECM will be at least another order of magnitude slower. Alex
 2009-05-25, 20:58 #5 R. Gerbicz     "Robert Gerbicz" Oct 2005 Hungary 2·733 Posts My not public code can factor 12000 (40 bits) numbers in 1 second using Bernstein's method on one core of amd triple core phenom 2.4 GHz , see: http://cr.yp.to/factorization/smoothparts-20040510.pdf. This cost in my code is independent from the numbers, by trial division and a prime test it would be somewhat faster. Last fiddled with by R. Gerbicz on 2009-05-25 at 21:00
 2009-05-25, 21:46 #6 Yamato     Sep 2005 Berlin 2×3×11 Posts Pollard Rho could be a possibility here. To find a factor around 10^6 it will need ~ 10^3 iterations. So you could perform a tiny trial division to remove small factors and then run Pollard Rho. 10^12 fits into a 64-bit integer, so you do not need any library.
2009-05-25, 23:06   #7
R. Gerbicz

"Robert Gerbicz"
Oct 2005
Hungary

2·733 Posts

Quote:
 Originally Posted by Yamato Pollard Rho could be a possibility here. To find a factor around 10^6 it will need ~ 10^3 iterations. So you could perform a tiny trial division to remove small factors and then run Pollard Rho. 10^12 fits into a 64-bit integer, so you do not need any library.
Yes, in fact Bernstein and Pollard Rho (by Brent's modification)'s speed is about the same for 40 bits, but for example for 50 bits as I remember Bernstein is faster. It isn't very surprising, because Bernstein's method is polynomial (if we have got about O(sqrt(N)/log(N)^O(1)) numbers). And note that for 50 bits in the root of the remainder tree the number has got about 50 million bits and currently gmp's division is not optimal, it can be about 4 times faster in this range.

Last fiddled with by R. Gerbicz on 2009-05-25 at 23:10

2009-05-26, 01:14   #8
CRGreathouse

Aug 2006

10111011000012 Posts

Quote:
 Originally Posted by R. Gerbicz My not public code can factor 12000 (40 bits) numbers in 1 second using Bernstein's method on one core of amd triple core phenom 2.4 GHz , see: http://cr.yp.to/factorization/smoothparts-20040510.pdf. This cost in my code is independent from the numbers, by trial division and a prime test it would be somewhat faster.
12000 numbers per second is still 80 times too slow, or 20 times if we're allowed quad cores.

The only primality tests viable at that speed would be trial division or a pseudoprime test and search of small pseudoprimes. There are 101,629 2-pseudoprimes up to 10^12, so that's 17 steps for a binary search. The list is only 800 kB, so it could fit in the cache.

Trial division up to B, a single quick Pollard rho, a pseudoprime test, binary search if needed, another few rho tests, then trial division up to sqrt(remaining cofactor)? That's a lot to do in very little time.

Of course the trial division can be optimized by combining primes into either 32 or 64-bit chunks. (32-bit chunks would allow all but the first division to be 32 bits rather than 64 bits.)

PythonPower: What, if anything, is known about the numbers to be factored? Are they chosen uniformly at random (so lots of small divisors like 2, 3, 5)? Are they all 'hard' composites (semiprimes with roughly equal factors), or all composites? Knowing things like this might help us determine if the task can be done.

2009-05-26, 02:36   #9
frmky

Jul 2003
So Cal

7·13·23 Posts

Quote:
 Originally Posted by CRGreathouse 12000 numbers per second is still 80 times too slow, or 20 times if we're allowed quad cores.
How about using a video card? In CUDA, you are hampered by the facts that 64-bit integer math is emulated using 32-bit integers and that both mod and division are quite slow, but on a newer card you have 240 "cores" to work with. You can expect > 2GB/s bandwidth to and from the card and 70GB/s on the card, so that shouldn't be a bottleneck at a million factors/s.

2009-05-26, 03:07   #10
CRGreathouse

Aug 2006

32·5·7·19 Posts

Quote:
 Originally Posted by frmky How about using a video card? In CUDA, you are hampered by the facts that 64-bit integer math is emulated using 32-bit integers and that both mod and division are quite slow, but on a newer card you have 240 "cores" to work with. You can expect > 2GB/s bandwidth to and from the card and 70GB/s on the card, so that shouldn't be a bottleneck at a million factors/s.
That would certainly make the resource constraint easier -- you'd have something like half a million cycles instead of two thousand cycles to do the factorization. Of course the operation you'd use the most would be borked, but still...

The 32-bit math wouldn't be too much of a limitation since most operations could be reduced fairly quickly to 32 bits. The slowness of the division would hurt, though.

 2009-05-26, 07:36 #11 R. Gerbicz     "Robert Gerbicz" Oct 2005 Hungary 2×733 Posts If you've got lots of RAM, then it is possible to precompute all factorizations by sieving up to 10^12. If you store only those where gcd(n,210)=1 and the index of the smallest prime divisor (or 0 if n is prime), then you need: eulerphi(210)/210*10^12*17./2^33=453 GB RAM. (there are 78498<2^17 prime up to 10^6, so need 17 bits for each index.)

 Similar Threads Thread Thread Starter Forum Replies Last Post ixfd64 Software 10 2011-01-20 23:02 antlu Programming 17 2008-08-20 18:17 andi314 Programming 18 2004-06-21 22:37 ET_ ElevenSmooth 2 2004-03-26 16:29

All times are UTC. The time now is 17:57.

Tue May 11 17:57:16 UTC 2021 up 33 days, 12:38, 1 user, load averages: 3.67, 3.01, 2.97

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.