20220927, 03:45  #1 
"Jeff Hurchalla"
Aug 2022
Portland OR
4_{16} Posts 
A new factoring library (EPR) for up to 128bit inputs
I wanted to let the forum know about a C++ factoring library called "EPR" that I wrote. I'll hope anyone needing/wanting factoring up to 128 bits will find the library useful.
Maybe we could also talk about <=128 bit factoring. I have a few questions myself on good methods from ~96128 bit. For EPR, here's the basic info: My goal was to make a correct, fast, and easy to use library to factor and primality check all the C and C++ native types and compiler extensions, e.g. uint32_t, uint64_t and __uint128_t. To get a little more speed, the API allows you to specify if you expect the value to factor is a semiprime with two large factors; by default it assumes arbitrary input values. If you're interested in optimizations it uses and the theory behind them, probably the most interesting optimizations for EPR are explained in a blog series on improving Montgomery arithmetic  esp. Part1, regarding a faster REDC alternative by using the positive inverse (thanks to forum member Ernst Mayer for his feedback!). You could also check out the algorithm variations that EPR uses  very often they increase instruction level parallelism to better exploit typical pipelined/superscalar CPUs. As an aside, the dense millerrabin hash tables mentioned in the link could be useful for anyone looking to implement fast millerrabin for <= 64 bit primality checking. The library has thorough testing, and with the exception of ECM I wrote proofs in comments for much of it. Performance is always fun: For less than 32 bits, EPR uses a PollardRhoBrent variant, and even though there are (depending on input) often faster methods like Lehman, it usually won't matter since CPUs have very little challenge at this range. For values > 32 bits and <= 64 bits (i.e. uint64_t), EPR uses the same PollardRhoBrent variant up to around 40 bits, and ECM above that (based off Ben Buhrow's microecm, see next paragraph). In general EPR should be one of the fastest functions available for 3264 bits, and especially standing out at 53 bits to 64 bits. From 65128 bits, EPR uses a 128bit extension of the 64bit ECM function  again this should be quite fast, but at >64bits I haven't compared to other functions. Many thanks to Ben Buhrow for his great original microecm code for ECM. It completely changed my expectations for what would work best ~4564 bits (I'd assumed that was PollardRhoBrent turf). I ported Ben's original microecm C file to C++ and optimized it for ~2x speedup (in part by using the Montgomery classes I wrote for Clockwork), and then extended to 128 bit. Ben and I merged a lot of those changes back into the original microecm, so yafu microecm has a lot of these optimizations, though not absolutely everything  a few things like optimization by using different Montgomery types can't translate easily to the original file's C language since there's no template programming support. On my system I get notably better performance with EPR microecm due to the higher optimizations, but your mileage may vary. Meanwhile Ben added AVX512 support to yafu's microecm, which if you have a large list of numbers to factor, is a big speed up when using up to 52 bits (I believe that's the max range the AVX512 mods can support  I'd guess due to double precision floating point mantissa length). In any event, this is behind the scenes details for EPR, and the normal API doesn't reference any of it. Last fiddled with by hurchalla on 20220927 at 04:30 
20220927, 05:39  #2 
"Jeff Hurchalla"
Aug 2022
Portland OR
2^{2} Posts 
I'll mention that in addition to the normal recommended API, EPR has an alternative way to interface with factoring:
you can use the functions get_single_factor_ecm() and get_single_factor_pollard_rho() from the file include/hurchalla/factoring/detail/experimental/get_single_factor.h. These functions find only a single factor, and they always assume the number they're given is composite (never testing for primality). As you would expect from the function names, you get to choose whether you want to use ECM or PollardRho, regardless of the size of the input. This can be useful for performance testing. For a helpful example in how to use the functions, you can open in the same directory the test_single_factor.* files. 
20220927, 13:24  #3 
"Ben"
Feb 2007
3·17·73 Posts 
Thank you Jeff, I am happy to see this announcement!
I just wanted to add, Jeff did much more than just add significant optimizations to the lowlevel code in microecm. He also spotted an earlyabort opportunity in stage 2 that was very helpful, vastly improved the interface and made it threadsafe, generally cleaned up and made the code easier to navigate, and put in a ton of work profiling and benchmarking. This is in addition to all the effort he put into his c++ library. It's really nice when someone can step in and make such huge progress on something I had already thought was done. I will be checking out the rest of the library for sure. Cheers! 
20220930, 12:56  #4 
Tribal Bullet
Oct 2004
6736_{8} Posts 
Older benchmark thread for context
CADONFS includes 128bit ECM code as well, perhaps some of the higherlevel optimizations it uses to organize stage 1 or 2 can be brought to bear on these implementations... 
20220930, 18:13  #5 
"Jeff Hurchalla"
Aug 2022
Portland OR
2^{2} Posts 
Thanks for the compliment Ben!
Jason, thanks for the pointer to check out 128bit ECM in CADONFS. I'll look forward to it. Are there other really good software options for 65128 bit factoring? I have in mind to at least look at GMPECM and CADO, to get some ideas and do some benchmarking. I hadn't originally planned on extending Ben's microecm to 128bit but it made sense to try, and it worked out well. I'd suspect above 80 bits or so I could be using better parameters though, and I'll hope to find good optimization ideas from what people have done. A different question I had was on using MillerRabin to primality test a 128 bit number. I've used the first 128 primes for bases/witnesses in this case, looking to get a probability that any pseudoprime still exists to around the likelihood of a cosmic ray flipping a bit and causing the primality test to be incorrect. It's been in the back of my mind to do better on the bases though  I used as my rationale Wikipedia which stated "if n is composite then running k iterations of the Miller–Rabin test will declare n probably prime with a probability at most 4^(k)". If I recall correctly this is overly pessimistic, and there exists a better estimate? It just feels like a loose end, even though the primality testing isn't a bottleneck. Last fiddled with by hurchalla on 20220930 at 18:33 
20221001, 06:35  #6 
Einyen
Dec 2003
Denmark
5·683 Posts 
It is much faster to do 1 or a few MillerRabin tests and then a Lucas test for the combined BPSW test. Lucas test takes the same time as 34 MR tests I think.

20221001, 18:11  #7 
"Jeff Hurchalla"
Aug 2022
Portland OR
2^{2} Posts 
Thanks if I get time I'll implement BPSW at some point, which should be quite a bit better. Using a ton of MillerRabin trials isn't ideal, but last night I found the paper that I probably remembered, which at the bottom of page 178 gives improved probability estimates: when using more than 32 trials on a 128bit number, the probability it estimates is similar to I'd get from the simpler probability estimate (4^(num_trials)) that I had been using, if I use 20 more trials than with the paper's formula. I'm not sure I phrased that well, but my takeaway was that I could likely get my desired probability of any psuedoprime surviving by using 108 trials instead of the 128 trials I had been using.
That said, a Lucas test and MR for BPSW is probably more sensible. Last fiddled with by hurchalla on 20221001 at 18:11 
20221001, 18:40  #8  
Sep 2002
Database er0rr
29×151 Posts 
Quote:
Last fiddled with by paulunderwood on 20221001 at 18:41 

20221012, 09:47  #9  
"Seth"
Apr 2019
11·43 Posts 
Quote:
I'd be interested if there was a more recent faster PRP library. EDIT: I found this mersenneforum thread to dig through too: https://www.mersenneforum.org/showthread.php?t=12209 Last fiddled with by SethTro on 20221012 at 09:56 

20221012, 10:41  #10 
Einyen
Dec 2003
Denmark
5×683 Posts 
Dana Jacobsen (danaj) has MR and Lucas tests for up to 64 bit (and other versions for above 64 bit using GMP):
http://www.sti15.com/nt/pseudoprimes.html https://metacpan.org/dist/MathPrimeUtil 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
128bit mulmod request  ATH  Programming  39  20180822 05:52 
factoring with MIRACL library  prss  Factoring  22  20110615 20:25 
LLR SUM(INPUTS) != SUM(OUTPUTS) error  jbristow  Software  4  20070814 04:07 
v. 2.13: SUM(INPUTS) != SUM(OUTPUTS)  Kaiw  Software  7  20051026 14:49 
SUM(INPUTS) != SUM(OUTPUTS)  rudi_m  Software  3  20050822 15:42 