mersenneforum.org Implementing Factoring Algorithms
 Register FAQ Search Today's Posts Mark Forums Read

2008-12-11, 10:09   #1
Raman
Noodles

"Mr. Tuch"
Dec 2007
Chennai, India

4E916 Posts
Implementing Factoring Algorithms

In my leisure time, I can read about factoring algorithms and try to implement them. So far, I have implemented Trial Division and Pollard's Rho, and including a SPRP test for the cofactor in a JAVA program. The source code is attached.

When SPRP test finds that the cofactor is composite, and trial division or Pollard's Rho is unable to find out a factor, it asks for the factor from the user as input. Perhaps that I should implement p-1 and Quadratic Sieve as soon as possible such that the frequency of input becomes lesser, and they should not be as hard as to implement when compared to ECM or the Number Field Sieve.

The code is organized when compared to my older version, such that it is easy to add new factoring algorithms to the program. Just the function "Factor" has to be modified for it. The factors are stored in a stack, and the function "display_factors" displays them in the usual manner.

The code is organized into various functions for each algorithm:
Trial Divide, Pollard's Rho, GCD, SPRP, Modular Exponentiation, Sum of factors, etc.
It may be necessary in the near future to add algorithms for:
Legendre Symbol, Modular Square Root, Euler's phi function, etc.

For learning purposes, as a first step, I am using BigInteger class in Java. Please don't criticize me for using that now, I will later on switch over to some other useful and faster library in C, or write my own Big Integer class.
Perhaps, Trial Division and Pollard's Rho take far less time for execution when compared to, when I implement ECM or Quadratic Sieve. So, Big Integer may be avoided in such advanced algorithms.
Attached Files
 Aliquot.txt (12.9 KB, 345 views)

2008-12-11, 10:40   #2
Raman
Noodles

"Mr. Tuch"
Dec 2007
Chennai, India

4E916 Posts
p-1

Suppose that I want to implement p-1 as my next algorithm. Stage 1.

Quote:
 From Fermat's Little Theorem ap-1=1 (mod p). Further, ak(p-1)=1 (mod p) for any multiplier k. If we choose a bound B such that p-1 divides LCM(1...B), then aL-1=0 (mod p), where L is the LCM of all integers from 1 to B. So, computing GCD(aL-1,N) will give the factor p.
What is the best way to calculate aL?

1) Suppose that first I calculate LCM of all numbers to B and store it in a variable, and then do exponentiation by squaring, say recursively
Code:
aL mod p
if L = 1 then return a
if L mod 2 = 0 then return a2 (L/2) mod p
if L mod 2 = 1 then return a × a2 (L-1)/2 mod p
So, if B ~ 10000, L will be tens of thousands of digits. So, the number of squarings needed will be O(log2 L). I doubt if it will be possible to store and compute big values of L in this case.

2) On the other hand, if I do exponentiation by squaring for every prime, (and for small primes, their powers below B). To the maximum, log2 B squarings are needed so, for every prime. If there are Pi(B) primes below B, then the number of squarings that will be needed so, will thus be, O(Pi(B) log2 B). Which way is more efficient, or are there any more optimization methods? If so, let me know about them up.

 2008-12-11, 12:00 #3 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 47·137 Posts Dan Bernstein, whose expertise in optimisation is terrifying, recommends computing L first if you're going for best-possible performance; you need an efficient multiplication algorithm and I'm not sure whether java BigIntegers use FFT-based multiply. Then compute the product of the primes by sticking them into a priority queue and successively taking the two smallest elements, multiplying them, and putting that back into the queue. There are various 'window methods' (I can't find a good article about them) are a bit more efficient than basic exponentiation-by-squaring; you can't avoid O(log_2 L) squarings, but you can get the number of multiplications by a to be much smaller at the price of a bit of precomputation. The easiest one is just to work in base 4 or 8, precompute a^(2..7) and use a lookup table. You probably want to use Montgomery representation for the modular arithmetic, which is another interesting exercise. NB if I read http://docjar.net/html/api/java/math...eger.java.html correctly, BigInteger's modpow algorithm uses both windowing and Montgomery representation. If you're implementing things for fun, I strongly recommend FFT-based multiplication: it's one of the really magical algorithms.
2008-12-11, 12:58   #4
Raman
Noodles

"Mr. Tuch"
Dec 2007
Chennai, India

3·419 Posts

But is it possible to understand about Fast Fourier Transform algorithm without knowing about ordinary Fourier Transforms or Laplace Transforms? I think that the latter two are not related to number theory, right?

How many times, the speed will increase when using FFT, than without using FFT? Is it too difficult to understand it, or to implement it up?

Quote:
 Originally Posted by fivemack If you're implementing things for fun
I am implementing these algorithms to learn about them.

 2008-12-11, 16:41 #5 jasonp Tribal Bullet     Oct 2004 354310 Posts See the Handbook of Applied Cryptography online for more on efficient modular exponentiation. I think most of the FFT references on the web assume you are interested in signal processing and not number theory. Multiplying two numbers using an FFT involves converting each number into a "signal" and performing a convolution on the two signals using FFTs. This adds a significant amount of overhead, but the asymptotic complexity is so much better than other convolution methods that beyond a certain size of input you have no choice but to use transform arithmetic. The difference is really between 'impossible' and 'a few seconds' once the inputs become large enough.
2008-12-11, 16:53   #6
fivemack
(loop (#_fork))

Feb 2006
Cambridge, England

47·137 Posts

Quote:
 Originally Posted by Raman Sorry to question about FFT without reading about it. But is it possible to understand about Fast Fourier Transform algorithm without knowing about ordinary Fourier Transforms or Laplace Transforms? I think that the latter two are not related to number theory, right?
Fourier and Laplace transforms come up more in the analytic solution of differential equations; the various theories about Lebesgue measure and existence and uniqueness of solutions are completely irrelevant for doing accelerated convolutions.

All you need to know for the FFT is that you're computing

out[j] = sum_{i=0}^{N-1} omega^{i*j} in[i]

where omega is an Nth root of unity, and that you can play about with this to get it as a combination of two similar things involving the even- and odd-numbered elements of 'in'; I strongly recommend working out a length-8 Fourier transform symbolically with pencil and paper to get what the expression looks like.

Quote:
 How many times, the speed will increase when using FFT, than without using FFT? Is it too difficult to understand it, or to implement it up?
Enormously many times; a single multiplication of numbers the size of the ones used in an LL test by naive methods would take about as long as the entire LL test by FFT.

 2008-12-11, 19:40 #7 Raman Noodles     "Mr. Tuch" Dec 2007 Chennai, India 23518 Posts Sieve Hmmm... Looks like FFT is the key boost to p-1 and ECM, and until I understand about that, I cannot implement these algorithms. Getting to sieving, what is the best way to implement sieve algorithms? Take Quadratic Sieve: Suppose that there are B primes in the factor base, such that N is a quadratic residue mod p, for every prime p in the factor base. Should I need to create an array for storing x2-kN, for (say M) consecutive values of x, starting with the sqrt(kN)? In Java, it may take time for the program at run time to allocate M indices of storage for an array or a dynamic Vector. In C, it can be simply done so using int* a=int* malloc(sizeof(int)*m); or in C++ and Java, int a[]=new int[m]; Then, I can check for every prime in the factor base, divides x2-kN if and only if x=sqrt(N) mod p. So, divide by the prime p and mark it as a divisor, for the values of x=Kp+sqrt(N), where sqrt(N) can have two values if the Legendre Symbol (p/N)=1, and no solution if -1. What about the Sieve of Eratosthenes? For listing all the primes till (say so) 10000, should I need to create a boolean array, for all values till 10000, and flag it for multiple of every prime till 100? How much is it better than Trial Division, in which for every (odd) number, you test by all the primes starting with 2, till a factor is found out, or till that number is completely being factored so. In trial division, you test by all the primes for each number. In the Sieve of Eratosthenes, for every prime, you flag all its multiples till 10000. Just to me, one of them is a horizontal process and the other is a vertical process. And the Sieve of Eratosthenes, occupies so more space for the boolean array.
 2008-12-11, 21:10 #8 jasonp Tribal Bullet     Oct 2004 3·1,181 Posts If P-1 or ECM are attempted on large numbers, then arithmetic modulo those large numbers would benefit from using FFTs; but the numbers have to be really large. The P-1 and ECM algorithms themselves don't care whether you use fast transform arithmetic, unless you are performing stage 2 using the advanced techniques in packages like GMP-ECM. Even then you cannot use floating point FFTs, for technical reasons.
2008-12-11, 21:46   #9
bsquared

"Ben"
Feb 2007

1101111110102 Posts

Quote:
 Originally Posted by Raman What about the Sieve of Eratosthenes? For listing all the primes till (say so) 10000, should I need to create a boolean array, for all values till 10000, and flag it for multiple of every prime till 100?
There are things you can do to make the array of flags smaller. For instance, no even number other than 2 is prime so you don't need to include space in the flag array for even numbers. This cuts the size in half and has no downside... sieving the array can proceed as before and when you're done the ith array location is prime 2*i + 3. You can also remove array locations for all positions divisible by 3, 5, etc, but for each of these the sieve process becomes more complicated. Think residue classes.

Sieving can be sped up by cache blocking, bit packing (not sure how the booleans in java are actually represented in hardware... I'm betting they use 8 bits) and the presieving talked about above.

There is a good explanation of some of the possible optimizations here, which has some source code to study as well.

Quote:
 Originally Posted by Raman How much is it better than Trial Division, in which for every (odd) number, you test by all the primes starting with 2, till a factor is found out, or till that number is completely being factored so. In trial division, you test by all the primes for each number. In the Sieve of Eratosthenes, for every prime, you flag all its multiples till 10000.
Are you asking if its better to test a number for primalty by using trial division or by using the Sieve of Eratosthenes? The real answer is neither for any interesting sized number. For smaller number trial division is probably much faster.

2008-12-13, 18:56   #10
Raman
Noodles

"Mr. Tuch"
Dec 2007
Chennai, India

3·419 Posts

Have a look at my brief implementation of Quadratic Sieve.

I take 5 arguments,
N -> The number to be factored
start -> The number whose square to start sieving
k -> The multiplier for x2-kN
it -> The number of iterations to sieve
FB -> The factor base bound

Further value[] holds the value of x2-kN
and residue[] holds the residues after sieving.

Code:
For i=0 to it, i=i+1
{ value[i] = (start+i)2-kN
residue[i]=value[i] }

Sieving,p is the prime

p=2, i=0
if residue[0] mod 2=0 then i=1 else i=0
while i<it
{ Divide residue[i] by 2 till it is even
i=i+2 }

Other primes
For p=3 to FB, p=p+1
{
If p is composite or Legendre symbol (p/kN)=-1, then continue
i1 = sqrt(kN) mod p // Modular square root is a special function that is being written so separately
i2 = p-i1
phase = start mod p // This gives the phase of the starting sieving value, mod p

start1=(i1-phase) mod p
start2=(i2-phase) mod p // These give the starting index of the two arithmetic progressions

For j=start1 to it, j=j+p
{ Divide residue[i] by p till it is divisible by p }
For j=start2 to it, j=j+p
{ Divide residue[i] by p till it is divisible by p }
}

For i=0 to it, i=i+1
{ If residue[i]=1, then Factor residue[i], and thus display its factors }
Hence, are there any more major optimizations that can be done so, with this code?

Here is a sample output:
N: 11111111111
start: 105410
k: 1
Number of iterations is 10000
Factor base limit being 100

Output:
Code:
3319514 = 2 * 11^3 * 29 * 43
11966045 = 5 * 7^2 * 13^2 * 17^2
17872925 = 5^2 * 7 * 41 * 47 * 53
83798525 = 5^2 * 17 * 37 * 73^2
157135993 = 7^2 * 31^2 * 47 * 71
272498525 = 5^2 * 13 * 17 * 31 * 37 * 43
410549810 = 2 * 5 * 11 * 29 * 41 * 43 * 73
478703225 = 5^2 * 7 * 11^2 * 13 * 37 * 47
512316233 = 11 * 13^3 * 17 * 29 * 43
516413450 = 2 * 5^2 * 7^2 * 41 * 53 * 97
516629113 = 7 * 11 * 13^2 * 29 * 37^2
558721618 = 2 * 7^3 * 13 * 31 * 43 * 47
572553170 = 2 * 5 * 7 * 37 * 43 * 53 * 97
629911625 = 5^3 * 7 * 17^2 * 47 * 53
774031250 = 2 * 5^6 * 17 * 31 * 47
827073533 = 11 * 13 * 29 * 53^2 * 71
872350850 = 2 * 5^2 * 7 * 31 * 37 * 41 * 53
876730010 = 2 * 5 * 13 * 43 * 47^2 * 71
889215005 = 5 * 7^2 * 17 * 31 * 71 * 97
953714489 = 7^2 * 13^2 * 41 * 53^2
1017065273 = 7 * 17^2 * 71 * 73 * 97
1435353010 = 2 * 5 * 7 * 13 * 17 * 31 * 41 * 73
1904276114 = 2 * 13^4 * 17 * 37 * 53
2175831250 = 2 * 5^5 * 37 * 97^2
Attached Files
 Smooth.txt (8.6 KB, 345 views)

 2008-12-16, 13:57 #11 Raman Noodles     "Mr. Tuch" Dec 2007 Chennai, India 3×419 Posts I have made notes of the above algorithm, with one trivial and one non-trivial linear dependency. If anybody has spare time, please check whether all of the notes belong to the Dixon's Factorization Method, or some can be transferred to the Quadratic Sieve. Also let me know if there exist any extra concepts to be known for the Quadratic Sieve, like the Multiple Polynomial Quadratic Sieve, and the like, to optimize the algorithm. The next step will be to implement an algorithm for Linear Algebra, to get dependencies. I can implement Gaussian Elimination with my knowledge, but is Block Lanczos more efficient for sparse matrices? I have absolutely no idea on how it works, although I have read that it is based on continued fraction analysis... I have thought of two optimizations to the code for sieving: 1) Instant storage of prime factors, on division by a prime. Perhaps it may consume lots of additional memory space because representation of prime factorization will take larger space when compared to the number itself. 2) If any residue reaches 1, immediately I can store that index in a hash table, and the lookup process for the residues will be a lot easier, even in constant time, obviously. Rather than searching for all of the array whether the residue is 1 or not. Perhaps, I should try to allow partial relations, with one or more large prime, that is not in the factor base. Of course, it may help to collect more relations in lesser amounts of time, although the probability of the relations merging to a perfect squares may go down a little. If more than one large prime is allowed, then Pollard's Rho might be a suitable algorithm indeed, for verifying whether the cofactor is prime, and then, splitting it up, thus.

 Similar Threads Thread Thread Starter Forum Replies Last Post mickfrancis Factoring 21 2016-02-22 19:38 ThiloHarich Factoring 0 2007-09-02 20:32 koders333 Factoring 14 2006-01-25 14:08 koders333 Factoring 1 2006-01-19 20:04 ShiningArcanine Programming 18 2005-12-29 21:47

All times are UTC. The time now is 00:55.

Mon Oct 18 00:55:30 UTC 2021 up 86 days, 19:24, 0 users, load averages: 2.09, 1.57, 1.39