mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Faster factoring with binary splitting? (https://www.mersenneforum.org/showthread.php?t=11266)

__HRB__ 2009-01-03 00:24

Faster factoring with binary splitting?
 
Heuristic for the speedup:

Dividing a n-digit integer m-times by a small integer has the arithmetic complexity of n*m. For sake of simplicity let's say m~n, so the complexity of this process is O(n^2).

Using binary splitting we can reduce this to n*log(n)^2

See, e.g. [URL]http://numbers.computation.free.fr/Constants/Algorithms/splitting.html[/URL]

Therefore, instead of trying whether M(n) is divisible by 2kp+1 we can do the following:

Compute the product P of n possible factors F until P~sqrt(M) using the following method:

P(1,2)=F(1)*F(2) ... P(m-1,m)=F(m-1)*F(m)
P(1,2,3,4)=P(1,2)*P(3,4)
...
P(1,...,m)

Using FFTs for the multiplications, complexity is O(n*log(n)^2).

Then reverse the process to compute the remainders (using FFTs this is also O(n*log(n)^2) ):

R(1,...,m)=M mod P(1,...,m)
...
R(1)=R(1,2) mod F(1)
R(2)=R(1,2) mod F(2)

If any R(m)=0 then F(m) is a factor.

Ballpark analysis for M(3.3E8):

3.3E8/2/64-bit factors ~ 250000

3.3E8/64*log2(3.3E8/64)^2 ~ 2E9 which is in the order of seconds on a 3Ghz processor.

Now, 250000 trial division/second seems pretty good to me. :-)

__HRB__ 2009-01-03 05:29

Maybe I should point out the following:

The file "factor64.asm" in the source code is absolutely littered with div-assembly instructions.

The div instruction with a dividend >64-bit has a latency of 78 clocks on Athlon64s (116 clocks on Core 2!), and to make matters worse, it isn't pipelined.

Therefore, any trial factoring doing x^2 (mod trial_factor) with this instruction will be sloooooooow.

A multiprecision division with Newton's method on the other hand only costs as much as 2 multiprecision multiplies, so the above scheme might not be as silly as it appears at first.

Maybe going all the way to sqrt(M) is too much of a good thing, but I think we could do a 1024-bit bignum (Karatsuba) multiply in the time it takes to complete a division with a divisor>64 bit.

I think trial division could easily see a speedup of 4-8x.

klajok 2009-01-03 15:25

Maybe this comment helps in the discussion:

According to [URL]http://v5www.mersenne.org/various/math.php[/URL] trial dividing is much more efficient than O(n) (where n is number of digits). We have to do only O(log n) operations for small factors. So total cost of trial factoring is O(n log n) [based on your assumption: m ~ n].

__HRB__ 2009-01-03 16:27

[quote=klajok;156647]Maybe this comment helps in the discussion:

According to [URL]http://v5www.mersenne.org/various/math.php[/URL] trial dividing is much more efficient than O(n) (where n is number of digits). We have to do only O(log n) operations for small factors. So total cost of trial factoring is O(n log n) [based on your assumption: m ~ n].[/quote]

That's totally true, but it also depends on the constant implicit in the big-O notation. Just like multiplying numbers with 100s of digits is faster with Karatsuba multiplication than with FFTs, even though Karatsuba is O(n^1.58).

Using the div instruction is O(1), whereas doing trial division for several factors at once with binary splitting is O(log(n)) per factor. If the constant implicit in the O(1) is 100, and the one implicit in O(log(n)) is 10, you break even only at n=22026.

So, doing trial division by products of factors would be faster, if you use a product with less than 22026 factors.


All times are UTC. The time now is 08:21.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.