![]() |
|
|
#1 |
|
Dec 2008
Boycotting the Soapbox
24·32·5 Posts |
Heuristic for the speedup:
Dividing a n-digit integer m-times by a small integer has the arithmetic complexity of n*m. For sake of simplicity let's say m~n, so the complexity of this process is O(n^2). Using binary splitting we can reduce this to n*log(n)^2 See, e.g. http://numbers.computation.free.fr/C...splitting.html Therefore, instead of trying whether M(n) is divisible by 2kp+1 we can do the following: Compute the product P of n possible factors F until P~sqrt(M) using the following method: P(1,2)=F(1)*F(2) ... P(m-1,m)=F(m-1)*F(m) P(1,2,3,4)=P(1,2)*P(3,4) ... P(1,...,m) Using FFTs for the multiplications, complexity is O(n*log(n)^2). Then reverse the process to compute the remainders (using FFTs this is also O(n*log(n)^2) ): R(1,...,m)=M mod P(1,...,m) ... R(1)=R(1,2) mod F(1) R(2)=R(1,2) mod F(2) If any R(m)=0 then F(m) is a factor. Ballpark analysis for M(3.3E8): 3.3E8/2/64-bit factors ~ 250000 3.3E8/64*log2(3.3E8/64)^2 ~ 2E9 which is in the order of seconds on a 3Ghz processor. Now, 250000 trial division/second seems pretty good to me. :-) Last fiddled with by __HRB__ on 2009-01-03 at 01:09 Reason: changed 'factors' to 'possible factors' |
|
|
|
|
|
#2 |
|
Dec 2008
Boycotting the Soapbox
24×32×5 Posts |
Maybe I should point out the following:
The file "factor64.asm" in the source code is absolutely littered with div-assembly instructions. The div instruction with a dividend >64-bit has a latency of 78 clocks on Athlon64s (116 clocks on Core 2!), and to make matters worse, it isn't pipelined. Therefore, any trial factoring doing x^2 (mod trial_factor) with this instruction will be sloooooooow. A multiprecision division with Newton's method on the other hand only costs as much as 2 multiprecision multiplies, so the above scheme might not be as silly as it appears at first. Maybe going all the way to sqrt(M) is too much of a good thing, but I think we could do a 1024-bit bignum (Karatsuba) multiply in the time it takes to complete a division with a divisor>64 bit. I think trial division could easily see a speedup of 4-8x. Last fiddled with by __HRB__ on 2009-01-03 at 05:43 |
|
|
|
|
|
#3 |
|
Dec 2008
11 Posts |
Maybe this comment helps in the discussion:
According to http://v5www.mersenne.org/various/math.php trial dividing is much more efficient than O(n) (where n is number of digits). We have to do only O(log n) operations for small factors. So total cost of trial factoring is O(n log n) [based on your assumption: m ~ n]. |
|
|
|
|
|
#4 | |
|
Dec 2008
Boycotting the Soapbox
24·32·5 Posts |
Quote:
Using the div instruction is O(1), whereas doing trial division for several factors at once with binary splitting is O(log(n)) per factor. If the constant implicit in the O(1) is 100, and the one implicit in O(log(n)) is 10, you break even only at n=22026. So, doing trial division by products of factors would be faster, if you use a product with less than 22026 factors. Last fiddled with by __HRB__ on 2009-01-03 at 16:31 |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Splitting Work over Dual Boot | justinstevens42 | Information & Answers | 3 | 2018-02-07 12:51 |
| Splitting P-1 stages | TheMawn | Information & Answers | 3 | 2013-10-13 00:07 |
| File Splitting Utility | Antonio | Software | 5 | 2013-04-18 14:22 |
| Faster Factoring Algorithm? | Citrix | Factoring | 6 | 2007-12-23 11:36 |
| faster factoring? | S80780 | Lone Mersenne Hunters | 10 | 2003-04-08 21:51 |