mersenneforum.org Some code for factoring numbers up to 2^63
 Register FAQ Search Today's Posts Mark Forums Read

2019-07-14, 19:15   #144
R. Gerbicz

"Robert Gerbicz"
Oct 2005
Hungary

5×13×23 Posts

Quote:
 Originally Posted by jasonp Robert, upon reflection I use Bernstein's batch GCD and not his batch factoring algorithm, as a preprocessing step to reject most relations before trying to use expensive methods to compute their largest few factors. The code can efficiently find the 1-2% of relations whose unfactored part has alll factors less than a large bound, allowing QS variants to effectively run 100x faster. But of course there's a limit to how large that bound can be, which controls the size of the input batch.
Nice code, now understand more of this, at least for me it was new, so:
Set S=p[0]*p[1]*...*p[m-1] where p[i] is the i-th prime and also using a product tree get Z=r[0]*r[1]*...*r[n-1], where you want to get only the smooth parts of r[].
Then using a remainder tree get v[i]=S mod r[i].
Trivially the i-th number's smooth part is gcd(r[i],v[i]) !

What I don't understand is that you could make a (balanced) remainder tree in an explicit way, instead of your recursion. With that you'd compute the product tree for Z only once, not computing multiple times the same subproducts in the tree while you are doing the remainder tree algorithm.
But you need more memory for that, by a factor of log2(n).

2019-07-15, 01:01   #145
dleclair

Mar 2003

7×11 Posts

Quote:
 Originally Posted by R.D. Silverman Do I have to install git to retrieve MPIR? I do not have git currently. I am running VS2010. I prefer not to spend the big  to buy a newer VS compiler. I have looked at Brian's github repository. I see a lot of updated files, but no obvious download of a Windows .lib file. Do I need to download the entire source and compile it? Do I need to download GMP first?
You don't need git to download MPIR's source. On Gladman's GitHub page you can use the green "Clone or Download" option to download a zip of the source. I believe you can build it using VS Community 2019 (VS 2017 Community will certainly work). The Visual Studio Community editions are completely free for how you would use them. See https://visualstudio.microsoft.com/vs/compare/ for details.

 2019-07-15, 04:10 #146 jasonp Tribal Bullet     Oct 2004 3×1,181 Posts Yes, I don't like the waste of recomputing those products instead of caching them, but could not stomach the increased memory use. I only used the batch remainder tree code once for a 'live' factorization, and the total memory use to factor IIRC 500k relations at a time with precomputed tables of primes from 2^25 to 2^31 was about 400MB, so comparable to the sieving. Relations are stored in memory and the batch method runs when enough of them accumulate. Of course there is an upper bound on the prime limit before this becomes impractical, but the goal was to remove cofactorization as a bottleneck. There are lots of other alternatives for doing that, I just find throwing RAM at the problem to be a convenient one. We also don't need to use the product of all primes up to the large prime bound; if the large cofactor of a relation has even one factor smaller than the remainder tree bound then the small factorization methods will split it much more quickly, and we can use heuristics to not bother trying if the remaining cofactor is the wrong size. Without the tree method we would have to run QS or ECM on the whole cofactor to get that information. RDS: the bonus for using the remainder tree method I linked to is to detect all the relations that have a high probability of being worth keeping, without the expense of actually trying an expensive factoring method on each. My memory is very hazy now (this was a decade ago) but on the machine I used, the batch factoring code with SQUFOF and MPQS on the survivors processed a batch at an effective rate of about 2000 relations per second, much faster than treating each relation individually with the small factoring code I had at the time. There are so many sieve reports when using 3 large primes that the cofactorization takes 50% of the total time otherwise. Outputs from the remainder tree that are less than 62 bits get factored by SQUFOF, anything larger goes to my crappy MPQS code to split one factor and SQUFOF is used to split the remaining larger cofactor. We can save some time by modifying the QS code to try multiple dependencies to split the entire cofactor. MPIR can build out of the box in visual studio. You can also install Msys2 (a pile of unix utilities and a gcc toolchain linked against the Microsoft runtime libraries) and use the package manager it has to download precompiled GMP for windows. Honestly I found that much easier than building MPIR from scratch. The tradeoff is that you are using a unix environment, and while gdb has been ported to windows the Visual Studio tools have a really wonderful debugger that you would not be using. Msieve has MSVC build projects in the build.vc{10|11|14|15} directories. Last fiddled with by jasonp on 2019-07-15 at 04:59
2019-07-15, 13:06   #147
R. Gerbicz

"Robert Gerbicz"
Oct 2005
Hungary

5×13×23 Posts

Quote:
 Originally Posted by jasonp Yes, I don't like the waste of recomputing those products instead of caching them, but could not stomach the increased memory use. I only used the batch remainder tree code once for a 'live' factorization, and the total memory use to factor IIRC 500k relations at a time with precomputed tables of primes from 2^25 to 2^31 was about 400MB, so comparable to the sieving. Relations are stored in memory and the batch method runs when enough of them accumulate. Of course there is an upper bound on the prime limit before this becomes impractical, but the goal was to remove cofactorization as a bottleneck. There are lots of other alternatives for doing that, I just find throwing RAM at the problem to be a convenient one.
Then a "hybrid" method would work here, build up the whole subtree in memory if you can hold it (otherwise do recursion), surely at
depth=tree height-6 you should be able to do that:
(where size is the size of integer in the root)
to hold the subtree in memory you need
size/64*(height-6)<size (because even height<64 is true), and that's not much, you need more memory to do the multiplication/division in the root of the tree.

2019-07-16, 12:16   #148
R.D. Silverman

Nov 2003

22×5×373 Posts

Quote:
 Originally Posted by jasonp Yes, I don't like the waste of recomputing those products instead of caching them, but could not stomach the increased memory use. I only used the batch remainder tree code once for a 'live' factorization, and the total memory use to factor IIRC 500k relations at a time with precomputed tables of primes from 2^25 to 2^31 was about 400MB, so comparable to the sieving.
I do not have the resources to factor numbers that are large enough to make use of 3LP.
When I start the redesign/rewrite I plan to just keep using 2LP for now. I may
eventually transition the code to 3LP (assuming that I really do succeed at improving
sieve performance), but for now when using 2LP the code takes ~5% or less of the
total run time to split cofactors. Focusing an optimization effort there is something to
be done later. Implementing Bernstein's factor tree will be done much later, if at all.

In 2008 we had the following post from Chris Card: ( https://www.mersenneforum.org/showth...t=10152&page=3)

Quote:
 "I thought I'd give this a try in my own lattice siever, but it turns out that it makes no difference for me. My lattice siever maps the original rectangular (a,b) region onto a parallelogram for each (q,s) lattice and then does a mapping onto another parallelogram for each (p,r) in the factor base, this final parallelogram being where the actual sieving takes place. The upshot of this is that each (q,s) gives roughly the same number of relations, regardless of the condition number."
This is exactly how my siever works as well. I wound up deactivating the code
that skips highly skewed parallelograms. I find that the cost of finding the boundaries
of the final parallelogram and computing the start/stop boundaries for each sieve vector
for each p in the factor base to be quite large. I want to get rid of this overhead.

Quote:
 This is in contrast to Franke's siever which (as I understand it) uses a fixed size sieve region and so will give a different number of relations for each (q,s) dependent on the shape of the (q,s) parallelogram. My approach is almost certainly less cache-friendly though, so is probably slower overall.
Bingo! This is what I am hoping to fix in mine. I'm trying to understand the Franke/Kleinjung paper. It could be more clearly written.

Quote:
 RDS: are so many sieve reports when using 3 large primes that the cofactorization takes 50% of the total time otherwise.
Yep! It makes me question the following: If using factor trees on a large factorization
using 3LP with (say) a bound of 33 or 33-bit primes, then HOW MUCH BLOODY
MEMORY DOES IT REQUIRE?
. With the large sieve area, the factor bases,
the polynomial roots, the factor tree primes, etc. etc. we seem to be > 8 Gbytes
per sieve process. This severely limits the hardware on which we can run.
And it takes a bloody lot of memory for a multi-core machine.

 2019-08-07, 14:55 #149 Till     "Tilman Neumann" Jan 2016 Germany 11·43 Posts I am trying to make tinyecm.c work on my oldish i7-2630QM notebook (sandy bridge generation). I am using Eclipse CDT as IDE and replaced the main routine of tinyecm.c with a single argument test (keeping the choice of "B1" and "curves" depending on the bit size of the factor argument) because I do not have the test data files. The program compiles but terminates inside the mulredcx() routine (at it's first invocation). Same story with mulredc63x. What is wrong, is my CPU too old or unfitting or do I need some special compiler settings?
2019-08-07, 15:51   #150
bsquared

"Ben"
Feb 2007

2·1,789 Posts

Quote:
 Originally Posted by Till I am trying to make tinyecm.c work on my oldish i7-2630QM notebook (sandy bridge generation). I am using Eclipse CDT as IDE and replaced the main routine of tinyecm.c with a single argument test (keeping the choice of "B1" and "curves" depending on the bit size of the factor argument) because I do not have the test data files. The program compiles but terminates inside the mulredcx() routine (at it's first invocation). Same story with mulredc63x. What is wrong, is my CPU too old or unfitting or do I need some special compiler settings?
Yes, too old. The MULX instruction was introduced in Haswell's BMI2 extension.

If you replace the assembly code in the mulredcx routine with this (developed for the spBrent routine earlier), it should work:

Code:
if (n & 0x8000000000000000)
{
__asm__(
"mulq %2	\n\t"
"movq %%rax, %%r10		\n\t"
"movq %%rdx, %%r11		\n\t"
"movq $0, %%r12 \n\t" "mulq %3 \n\t" "mulq %4 \n\t" "addq %%r10, %%rax \n\t" "adcq %%r11, %%rdx \n\t" "cmovae %4, %%r12 \n\t" "xorq %%rax, %%rax \n\t" "subq %4, %%rdx \n\t" "cmovc %%r12, %%rax \n\t" "addq %%rdx, %%rax \n\t" : "=&a"(x) : "0"(x), "r"(y), "r"(nhat), "r"(n) : "rdx", "r10", "r11", "r12", "cc"); } else { __asm__( "mulq %2 \n\t" "movq %%rax, %%r10 \n\t" "movq %%rdx, %%r11 \n\t" "mulq %3 \n\t" "mulq %4 \n\t" "addq %%r10, %%rax \n\t" "adcq %%r11, %%rdx \n\t" "movq$0, %%rax \n\t"
"subq %4, %%rdx \n\t"
"cmovc %4, %%rax \n\t"
: "=&a"(x)
: "0"(x), "r"(y), "r"(nhat), "r"(n)
: "rdx", "r10", "r11", "cc");

}
return x;
edit:
You'll probably also need this for sqrredcx:

Code:
if (n & 0x8000000000000000)
{
__asm__(
"mulq %1	\n\t"
"movq %%rax, %%r10		\n\t"
"movq %%rdx, %%r11		\n\t"
"movq $0, %%r12 \n\t" "mulq %2 \n\t" "mulq %3 \n\t" "addq %%r10, %%rax \n\t" "adcq %%r11, %%rdx \n\t" "cmovae %3, %%r12 \n\t" "xorq %%rax, %%rax \n\t" "subq %3, %%rdx \n\t" "cmovc %%r12, %%rax \n\t" "addq %%rdx, %%rax \n\t" : "=&a"(x) : "0"(x), "r"(nhat), "r"(n) : "rdx", "r10", "r11", "r12", "cc"); } else { __asm__( "mulq %1 \n\t" "movq %%rax, %%r10 \n\t" "movq %%rdx, %%r11 \n\t" "mulq %2 \n\t" "mulq %3 \n\t" "addq %%r10, %%rax \n\t" "adcq %%r11, %%rdx \n\t" "movq$0, %%rax \n\t"
"subq %3, %%rdx \n\t"
"cmovc %3, %%rax \n\t"
: "=&a"(x)
: "0"(x), "r"(nhat), "r"(n)
: "rdx", "r10", "r11", "cc");

}
which I haven't tested but I think I modified right.

Last fiddled with by bsquared on 2019-08-07 at 15:55

 2019-08-07, 15:59 #151 Till     "Tilman Neumann" Jan 2016 Germany 11×43 Posts That seems to work, thanks a lot!
2019-08-07, 16:11   #152
bsquared

"Ben"
Feb 2007

2·1,789 Posts

Quote:
 Originally Posted by Till That seems to work, thanks a lot!
Good to hear. Also, the test data files just expect a bunch of lines like this:
composite,factor1,factor2

if you wanted to make your own.

 2019-08-07, 16:58 #153 Till     "Tilman Neumann" Jan 2016 Germany 7318 Posts I do not really care about the data files because my aim is to port your tinyecm.c to Java. (You might have guessed that ;) As such all I need is a working C version that helps me to find the problems in my Java port. After fixing sqrredcx() as well I get correct factorizations from tinyecm.c. Thanks again for your quick help. Last fiddled with by Till on 2019-08-07 at 16:59
2019-08-07, 17:12   #154
bsquared

"Ben"
Feb 2007

2·1,789 Posts

Quote:
 Originally Posted by Till I do not really care about the data files because my aim is to port your tinyecm.c to Java. (You might have guessed that ;) As such all I need is a working C version that helps me to find the problems in my Java port. After fixing sqrredcx() as well I get correct factorizations from tinyecm.c. Thanks again for your quick help.
Sounds good. If you're willing to share I'd be interested to see the performance difference. I'm ignorant: is there a way to do inline assembly or intrinsics in Java? It really helps for this kind of low-level stuff where carry-management and data movement matters a lot.

 Similar Threads Thread Thread Starter Forum Replies Last Post paulunderwood Miscellaneous Math 18 2017-08-27 14:56 ShridharRasal Factoring 10 2008-03-20 17:17 Peter Nelson Software 9 2005-03-30 18:28 Axel Fox Software 14 2003-07-04 18:57 Joe O Software 2 2002-09-13 23:39

All times are UTC. The time now is 18:37.

Mon Oct 25 18:37:42 UTC 2021 up 94 days, 13:06, 0 users, load averages: 1.22, 1.48, 1.54