I realize this is an old thread, but thought it wouldn't hurt to add a little.
Quote:
Originally Posted by CRGreathouse
I know that many people have checked this, but I don't know which implementations of BPSW were used (though as you point out, it's at least two).

I've tested Feitsma's PSP/SPSP database with a number of Lucas tests, without any trial division. None had counterexamples to 2^64.
 Standard LucasSelfridge
 Strong LucasSelfridge
 ExtraStrong (Grantham) Lucas with BaillieOEIS parameters (OEIS A217719)
 "almost" extrastrong Lucas (ES test without the U=0 condition, which is what Pari uses for its "BPSW" test) with BaillieOEIS parameters
 "almost" extrastrong test with Pari's parameters (similar to the above but incrementing P by 2 instead of 1, no explanation in the code as to why they chose that)
 GMPY2's strong BPSW (WraithX's mpz_prp)
 MPIR's PSP2+Fibonacci / SPSP2+LucasSelfridge combination (seemingly based on Chen and Greene's work rather than BPSW). It's super fast, but also is nonportable outside of MPIR (much like Pari internals would be).
 GAP's weird Lucas test (an odd mishmash of Lucas and extrastrong Lucas that doesn't seem to help much over the standard Lucas test)
15 are my C and C+GMP implementations. I implemented 8 using my Lucas code to emulate the GAP source algorithm as best I could. I also tested
Jim Sinclair's 7base MR test. which (no surprise) also had no counterexamples. The performance of 4/5 is the fastest, though in GMP one can recover U from V relatively cheaply making the full extra strong test just about the same speed once over 128 or so bits.
Using my C code on an x86_64 that takes advantage of asm mulmod (ty WraithX!), doing a SPSP2 + AESLSP test on a prime takes as long as ~ 2.7 MR tests. A strong LucasSelfridge test takes longer, but since the range is fixed and none have counterexamples, we can use whichever one of the above is fastest for us.
For 32bit, the hash with single test seems like the winner. Albeit only two MR tests are needed up to 1050M now, but still if we're after the best possible performance, a hash plus 1 test is better than 13 (three tests is the most needed for 32bits).
For 64bit, at least on x86_64 and probably gcc+Power7, I think BPSW is faster than MR tests. Your actual results may vary depending on implementation details. A hash to 3 tests vs. 4 would be murkier. I like the simplicity of testing the Lucas test vs. the hash, but some may disagree.
Note that using the SPSP2 database as a test sample is misleading for benchmarks. By definition these are all composites that pass the SPSP2 test. The BPSW test will have to run its full test (SPSP2 which of course will pass, then the Lucas test which will fail). For the particular set of bases in zerothbase's code, it also will run a SPSP2 test, but then most numbers will fail the next base. Swap the two initial bases and it will run almost twice as fast on this data set since almost everything will be weeded out on the first MR test. While I suppose it is application specific, in general one would want to test on random inputs that have made it through the initial smallfactor checks (Steve checks to 7, Jim to 2, I found improvements on random inputs using up to 7 followed by up to 53).
Interestingly, with GMP on a 200bit prime I found my ESLSP test to run about the same speed as 6 MR tests, and the SLSP test about the same as 8 MR tests. I assume this is due to GMP greatly optimizing the mpz_powm internals. The two other GMP Lucas implementations I've compared run slower than mine, and it's faster than Pari's implementation on large numbers, so this isn't just poor coding of the Lucas test. This might be useful to know for someone looking at using GMP on a 32bit machine for 64bit results. Once past 64 bits, some form of BPSW is a better choice than MR tests for most purposes.
Quote:
Originally Posted by R.D. Silverman
If the objective is proved primes,[...]

(quoting from over 2 years ago). Using BPSW below 2^64 would seem to give one a proven prime based on the computerassisted proof of FeitsmaGalway (exhaustive list of PSP2's under 2^64) plus verification of the particular Lucas test used. The BLS75 proofs (even just theorem 5 for factors of N1) are awfully fast for most small (< 128bit) numbers, but you still have to verify the primality of each factor found, which means more tests.