mersenneforum.org A new factoring library (EPR) for up to 128bit inputs
 Register FAQ Search Today's Posts Mark Forums Read

 2022-09-27, 03:45 #1 hurchalla   "Jeff Hurchalla" Aug 2022 Portland OR 22 Posts A new factoring library (EPR) for up to 128bit inputs I wanted to let the forum know about a C++ factoring library called "EPR" that I wrote. I'll hope anyone needing/wanting factoring up to 128 bits will find the library useful. Maybe we could also talk about <=128 bit factoring. I have a few questions myself on good methods from ~96-128 bit. For EPR, here's the basic info: My goal was to make a correct, fast, and easy to use library to factor and primality check all the C and C++ native types and compiler extensions, e.g. uint32_t, uint64_t and __uint128_t. To get a little more speed, the API allows you to specify if you expect the value to factor is a semiprime with two large factors; by default it assumes arbitrary input values. If you're interested in optimizations it uses and the theory behind them, probably the most interesting optimizations for EPR are explained in a blog series on improving Montgomery arithmetic - esp. Part1, regarding a faster REDC alternative by using the positive inverse (thanks to forum member Ernst Mayer for his feedback!). You could also check out the algorithm variations that EPR uses - very often they increase instruction level parallelism to better exploit typical pipelined/superscalar CPUs. As an aside, the dense miller-rabin hash tables mentioned in the link could be useful for anyone looking to implement fast miller-rabin for <= 64 bit primality checking. The library has thorough testing, and with the exception of ECM I wrote proofs in comments for much of it. Performance is always fun: For less than 32 bits, EPR uses a Pollard-Rho-Brent variant, and even though there are (depending on input) often faster methods like Lehman, it usually won't matter since CPUs have very little challenge at this range. For values > 32 bits and <= 64 bits (i.e. uint64_t), EPR uses the same Pollard-Rho-Brent variant up to around 40 bits, and ECM above that (based off Ben Buhrow's microecm, see next paragraph). In general EPR should be one of the fastest functions available for 32-64 bits, and especially standing out at 53 bits to 64 bits. From 65-128 bits, EPR uses a 128bit extension of the 64bit ECM function - again this should be quite fast, but at >64bits I haven't compared to other functions. Many thanks to Ben Buhrow for his great original microecm code for ECM. It completely changed my expectations for what would work best ~45-64 bits (I'd assumed that was Pollard-Rho-Brent turf). I ported Ben's original microecm C file to C++ and optimized it for ~2x speedup (in part by using the Montgomery classes I wrote for Clockwork), and then extended to 128 bit. Ben and I merged a lot of those changes back into the original microecm, so yafu microecm has a lot of these optimizations, though not absolutely everything - a few things like optimization by using different Montgomery types can't translate easily to the original file's C language since there's no template programming support. On my system I get notably better performance with EPR microecm due to the higher optimizations, but your mileage may vary. Meanwhile Ben added AVX512 support to yafu's microecm, which if you have a large list of numbers to factor, is a big speed up when using up to 52 bits (I believe that's the max range the AVX512 mods can support - I'd guess due to double precision floating point mantissa length). In any event, this is behind the scenes details for EPR, and the normal API doesn't reference any of it. Last fiddled with by hurchalla on 2022-09-27 at 04:30
 2022-09-27, 05:39 #2 hurchalla   "Jeff Hurchalla" Aug 2022 Portland OR 22 Posts I'll mention that in addition to the normal recommended API, EPR has an alternative way to interface with factoring: you can use the functions get_single_factor_ecm() and get_single_factor_pollard_rho() from the file include/hurchalla/factoring/detail/experimental/get_single_factor.h. These functions find only a single factor, and they always assume the number they're given is composite (never testing for primality). As you would expect from the function names, you get to choose whether you want to use ECM or Pollard-Rho, regardless of the size of the input. This can be useful for performance testing. For a helpful example in how to use the functions, you can open in the same directory the test_single_factor.* files.
 2022-09-27, 13:24 #3 bsquared     "Ben" Feb 2007 3×17×73 Posts Thank you Jeff, I am happy to see this announcement! I just wanted to add, Jeff did much more than just add significant optimizations to the low-level code in microecm. He also spotted an early-abort opportunity in stage 2 that was very helpful, vastly improved the interface and made it thread-safe, generally cleaned up and made the code easier to navigate, and put in a ton of work profiling and benchmarking. This is in addition to all the effort he put into his c++ library. It's really nice when someone can step in and make such huge progress on something I had already thought was done. I will be checking out the rest of the library for sure. Cheers!
 2022-09-30, 12:56 #4 jasonp Tribal Bullet     Oct 2004 2×52×71 Posts Older benchmark thread for context CADO-NFS includes 128-bit ECM code as well, perhaps some of the higher-level optimizations it uses to organize stage 1 or 2 can be brought to bear on these implementations...
 2022-09-30, 18:13 #5 hurchalla   "Jeff Hurchalla" Aug 2022 Portland OR 48 Posts Thanks for the compliment Ben! Jason, thanks for the pointer to check out 128bit ECM in CADO-NFS. I'll look forward to it. Are there other really good software options for 65-128 bit factoring? I have in mind to at least look at GMP-ECM and CADO, to get some ideas and do some benchmarking. I hadn't originally planned on extending Ben's microecm to 128bit but it made sense to try, and it worked out well. I'd suspect above 80 bits or so I could be using better parameters though, and I'll hope to find good optimization ideas from what people have done. A different question I had was on using Miller-Rabin to primality test a 128 bit number. I've used the first 128 primes for bases/witnesses in this case, looking to get a probability that any pseudoprime still exists to around the likelihood of a cosmic ray flipping a bit and causing the primality test to be incorrect. It's been in the back of my mind to do better on the bases though - I used as my rationale Wikipedia which stated "if n is composite then running k iterations of the Miller–Rabin test will declare n probably prime with a probability at most 4^(-k)". If I recall correctly this is overly pessimistic, and there exists a better estimate? It just feels like a loose end, even though the primality testing isn't a bottleneck. Last fiddled with by hurchalla on 2022-09-30 at 18:33
 2022-10-01, 06:35 #6 ATH Einyen     Dec 2003 Denmark 3×17×67 Posts It is much faster to do 1 or a few Miller-Rabin tests and then a Lucas test for the combined BPSW test. Lucas test takes the same time as 3-4 MR tests I think.
 2022-10-01, 18:11 #7 hurchalla   "Jeff Hurchalla" Aug 2022 Portland OR 22 Posts Thanks- if I get time I'll implement BPSW at some point, which should be quite a bit better. Using a ton of Miller-Rabin trials isn't ideal, but last night I found the paper that I probably remembered, which at the bottom of page 178 gives improved probability estimates: when using more than 32 trials on a 128bit number, the probability it estimates is similar to I'd get from the simpler probability estimate (4^(-num_trials)) that I had been using, if I use 20 more trials than with the paper's formula. I'm not sure I phrased that well, but my takeaway was that I could likely get my desired probability of any psuedoprime surviving by using 108 trials instead of the 128 trials I had been using. That said, a Lucas test and MR for BPSW is probably more sensible. Last fiddled with by hurchalla on 2022-10-01 at 18:11
2022-10-01, 18:40   #8
paulunderwood

Sep 2002
Database er0rr

438210 Posts

Quote:
 Originally Posted by hurchalla Thanks- if I get time I'll implement BPSW at some point, which should be quite a bit better. Using a ton of Miller-Rabin trials isn't ideal, but last night I found the paper that I probably remembered, which at the bottom of page 178 gives improved probability estimates: when using more than 32 trials on a 128bit number, the probability it estimates is similar to I'd get from the simpler probability estimate (4^(-num_trials)) that I had been using, if I use 20 more trials than with the paper's formula. I'm not sure I phrased that well, but my takeaway was that I could likely get my desired probability of any psuedoprime surviving by using 108 trials instead of the 128 trials I had been using. That said, a Lucas test and MR for BPSW is probably more sensible.
The test of the original BPSW paper is OTT; You can get away with MR plus an efficient (strong) Lucas binary chain test. See my GMP programs: https://mersenneforum.org/showthread.php?t=27660

Last fiddled with by paulunderwood on 2022-10-01 at 18:41

2022-10-12, 09:47   #9
SethTro

"Seth"
Apr 2019

2×3×79 Posts

Quote:
 Originally Posted by hurchalla A different question I had was on using Miller-Rabin to primality test a 128 bit number. I've used the first 128 primes for bases/witnesses in this case, looking to get a probability that any pseudoprime still exists to around the likelihood of a cosmic ray flipping a bit and causing the primality test to be incorrect. It's been in the back of my mind to do better on the bases though - I used as my rationale Wikipedia which stated "if n is composite then running k iterations of the Miller–Rabin test will declare n probably prime with a probability at most 4^(-k)". If I recall correctly this is overly pessimistic, and there exists a better estimate? It just feels like a loose end, even though the primality testing isn't a bottleneck.
I needed a fast 64 bit PRP test and I ended up using the one here: https://people.ksp.sk/~misof/primes/ described in http://ceur-ws.org/Vol-1326/020-Forisek.pdf which made my program approximately 2x faster than using mpz_probab_prime_p from GMP (which is overkill for 64 bit inputs)

I'd be interested if there was a more recent faster PRP library.

Last fiddled with by SethTro on 2022-10-12 at 09:56

 2022-10-12, 10:41 #10 ATH Einyen     Dec 2003 Denmark 3×17×67 Posts Dana Jacobsen (danaj) has MR and Lucas tests for up to 64 bit (and other versions for above 64 bit using GMP): http://www.sti15.com/nt/pseudoprimes.html https://metacpan.org/dist/Math-Prime-Util

 Similar Threads Thread Thread Starter Forum Replies Last Post ATH Programming 39 2018-08-22 05:52 prss Factoring 22 2011-06-15 20:25 jbristow Software 4 2007-08-14 04:07 Kaiw Software 7 2005-10-26 14:49 rudi_m Software 3 2005-08-22 15:42

All times are UTC. The time now is 15:03.

Fri Dec 9 15:03:08 UTC 2022 up 113 days, 12:31, 0 users, load averages: 1.18, 1.23, 1.43