mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2009-05-25, 18:31   #1
PythonPower
 
May 2009

416 Posts
Default Interested in C/C++ libraries for factoring

Hi everyone,

I'm researching libraries for fast integer factorization and have been advised that here might be a good place to ask any questions I had. On the assumption that this is the appropriate place to ask, I'll continue! ;)

I will be factoring general numbers without any special properties to them of size up to 1012. I have quite a lot of them and the efficiency of the their factorization is critical. I have worked out that for my program to be feasible, I would need to be able to factor 12-digit integers in a millionth of a second (or close) on average. With this in mind, does that sound possible?

If it is, I am interested in C/C++ factorization libraries and which one would be ideal. The integers will be stored as long integers in either C or C++. I can afford around 512MiB of memory to be used during factorization (possibly more). Can anyone recommend a library suitable?

Thank you for your time,
PythonPower

PS: My original question that led me here can be found here.
PythonPower is offline   Reply With Quote
Old 2009-05-25, 20:14   #2
sean
 
sean's Avatar
 
Aug 2004
New Zealand

DE16 Posts
Default

I don't think you need to use a library. Trial division by all primes up to 10^6 will be sufficient to factor your numbers. Precompute the primes and store them (e.g. using the sieve of Eratosthenes), they will easily fit in you 512MB constraint. Whether it will be fast enough probably depends on your CPU and whether or not you can do multiple numbers in parallel.
sean is offline   Reply With Quote
Old 2009-05-25, 20:22   #3
mklasson
 
Feb 2004

2·3·43 Posts
Default

A millionth of a second sounds like a rather short time interval. You'd only have enough time to use a thousand or so instructions. I'm not sure, but I think you may have trouble completing your 12-digit factorisations that fast. Hopefully I'm wrong.

I don't know what the best method might be, but you could always try the gmp-ecm lib and see what kind of timings you get with ecm. You'll probably need to know a little about both the lib and the method to use it. Maybe the msieve lib is interesting? Someone else can no doubt give you a better answer.
mklasson is offline   Reply With Quote
Old 2009-05-25, 20:40   #4
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

9A316 Posts
Default

GMP-ECM wouldn't meet the time criterion. The only thing I can think of that would satisfy your speed demand is a sieving technique, if the input numbers follow some pattern that allows sieving. Even an implementation of ECM optimized for small numbers would be about an order of magnitude slower than you want. GMP-ECM will be at least another order of magnitude slower.

Alex
akruppa is offline   Reply With Quote
Old 2009-05-25, 20:58   #5
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

2·733 Posts
Default

My not public code can factor 12000 (40 bits) numbers in 1 second using Bernstein's method on one core of amd triple core phenom 2.4 GHz , see: http://cr.yp.to/factorization/smoothparts-20040510.pdf. This cost in my code is independent from the numbers, by trial division and a prime test it would be somewhat faster.

Last fiddled with by R. Gerbicz on 2009-05-25 at 21:00
R. Gerbicz is offline   Reply With Quote
Old 2009-05-25, 21:46   #6
Yamato
 
Yamato's Avatar
 
Sep 2005
Berlin

2×3×11 Posts
Default

Pollard Rho could be a possibility here. To find a factor around 10^6 it will need ~ 10^3 iterations. So you could perform a tiny trial division to remove small factors and then run Pollard Rho.
10^12 fits into a 64-bit integer, so you do not need any library.
Yamato is offline   Reply With Quote
Old 2009-05-25, 23:06   #7
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

2·733 Posts
Default

Quote:
Originally Posted by Yamato View Post
Pollard Rho could be a possibility here. To find a factor around 10^6 it will need ~ 10^3 iterations. So you could perform a tiny trial division to remove small factors and then run Pollard Rho.
10^12 fits into a 64-bit integer, so you do not need any library.
Yes, in fact Bernstein and Pollard Rho (by Brent's modification)'s speed is about the same for 40 bits, but for example for 50 bits as I remember Bernstein is faster. It isn't very surprising, because Bernstein's method is polynomial (if we have got about O(sqrt(N)/log(N)^O(1)) numbers). And note that for 50 bits in the root of the remainder tree the number has got about 50 million bits and currently gmp's division is not optimal, it can be about 4 times faster in this range.

Last fiddled with by R. Gerbicz on 2009-05-25 at 23:10
R. Gerbicz is offline   Reply With Quote
Old 2009-05-26, 01:14   #8
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

32·5·7·19 Posts
Default

Quote:
Originally Posted by R. Gerbicz View Post
My not public code can factor 12000 (40 bits) numbers in 1 second using Bernstein's method on one core of amd triple core phenom 2.4 GHz , see: http://cr.yp.to/factorization/smoothparts-20040510.pdf. This cost in my code is independent from the numbers, by trial division and a prime test it would be somewhat faster.
12000 numbers per second is still 80 times too slow, or 20 times if we're allowed quad cores.

The only primality tests viable at that speed would be trial division or a pseudoprime test and search of small pseudoprimes. There are 101,629 2-pseudoprimes up to 10^12, so that's 17 steps for a binary search. The list is only 800 kB, so it could fit in the cache.

Trial division up to B, a single quick Pollard rho, a pseudoprime test, binary search if needed, another few rho tests, then trial division up to sqrt(remaining cofactor)? That's a lot to do in very little time.

Of course the trial division can be optimized by combining primes into either 32 or 64-bit chunks. (32-bit chunks would allow all but the first division to be 32 bits rather than 64 bits.)

PythonPower: What, if anything, is known about the numbers to be factored? Are they chosen uniformly at random (so lots of small divisors like 2, 3, 5)? Are they all 'hard' composites (semiprimes with roughly equal factors), or all composites? Knowing things like this might help us determine if the task can be done.
CRGreathouse is offline   Reply With Quote
Old 2009-05-26, 02:36   #9
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

7×13×23 Posts
Default

Quote:
Originally Posted by CRGreathouse View Post
12000 numbers per second is still 80 times too slow, or 20 times if we're allowed quad cores.
How about using a video card? In CUDA, you are hampered by the facts that 64-bit integer math is emulated using 32-bit integers and that both mod and division are quite slow, but on a newer card you have 240 "cores" to work with. You can expect > 2GB/s bandwidth to and from the card and 70GB/s on the card, so that shouldn't be a bottleneck at a million factors/s.
frmky is offline   Reply With Quote
Old 2009-05-26, 03:07   #10
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

32×5×7×19 Posts
Default

Quote:
Originally Posted by frmky View Post
How about using a video card? In CUDA, you are hampered by the facts that 64-bit integer math is emulated using 32-bit integers and that both mod and division are quite slow, but on a newer card you have 240 "cores" to work with. You can expect > 2GB/s bandwidth to and from the card and 70GB/s on the card, so that shouldn't be a bottleneck at a million factors/s.
That would certainly make the resource constraint easier -- you'd have something like half a million cycles instead of two thousand cycles to do the factorization. Of course the operation you'd use the most would be borked, but still...

The 32-bit math wouldn't be too much of a limitation since most operations could be reduced fairly quickly to 32 bits. The slowness of the division would hurt, though.
CRGreathouse is offline   Reply With Quote
Old 2009-05-26, 07:36   #11
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

101101110102 Posts
Default

If you've got lots of RAM, then it is possible to precompute all factorizations by sieving up to 10^12. If you store only those where gcd(n,210)=1 and the index of the smallest prime divisor (or 0 if n is prime), then you need: eulerphi(210)/210*10^12*17./2^33=453 GB RAM. (there are 78498<2^17 prime up to 10^6, so need 17 bits for each index.)
R. Gerbicz is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
suggestion: separation of libraries ixfd64 Software 10 2011-01-20 23:02
interested in theory+implementation: FFT multipl antlu Programming 17 2008-08-20 18:17
Large Integer Libraries andi314 Programming 18 2004-06-21 22:37
Is anybody interested in this? ET_ ElevenSmooth 2 2004-03-26 16:29

All times are UTC. The time now is 16:58.

Tue May 11 16:58:11 UTC 2021 up 33 days, 11:39, 1 user, load averages: 4.09, 3.65, 3.09

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.