![]() |
[QUOTE=ATH;460572]Where can I find any information on the Euler-Plumb/Plump test?[/QUOTE]
Hmm. It looks like maybe it originated here [url]http://philzimmermann.com/EN/bnlib/[/url] where the author, Colin Plumb, writes [code] /* * Now, check that bn is prime. If it passes to the base 2, * it's prime beyond all reasonable doubt, and everything else * is just gravy, but it gives people warm fuzzies to do it. * * This starts with verifying Euler's criterion for a base of 2. * This is the fastest pseudoprimality test that I know of, * saving a modular squaring over a Fermat test, as well as * being stronger. 7/8 of the time, it's as strong as a strong * pseudoprimality test, too. (The exception being when bn == * 1 mod 8 and 2 is a quartic residue, i.e. bn is of the form * a^2 + (8*b)^2.) The precise series of tricks used here is * not documented anywhere, so here's an explanation. * Euler's criterion states that if p is prime then a^((p-1)/2) * is congruent to Jacobi(a,p), modulo p. Jacobi(a,p) is * a function which is +1 if a is a square modulo p, and -1 if * it is not. For a = 2, this is particularly simple. It's * +1 if p == +/-1 (mod 8), and -1 if m == +/-3 (mod 8). * If p == 3 mod 4, then all a strong test does is compute * 2^((p-1)/2). and see if it's +1 or -1. (Euler's criterion * says *which* it should be.) If p == 5 (mod 8), then * 2^((p-1)/2) is -1, so the initial step in a strong test, * looking at 2^((p-1)/4), is wasted - you're not going to * find a +/-1 before then if it *is* prime, and it shouldn't * have either of those values if it isn't. So don't bother. * * The remaining case is p == 1 (mod 8). In this case, we * expect 2^((p-1)/2) == 1 (mod p), so we expect that the * square root of this, 2^((p-1)/4), will be +/-1 (mod p). * Evaluating this saves us a modular squaring 1/4 of the time. * If it's -1, a strong pseudoprimality test would call p * prime as well. Only if the result is +1, indicating that * 2 is not only a quadratic residue, but a quartic one as well, * does a strong pseudoprimality test verify more things than * this test does. Good enough. * * We could back that down another step, looking at 2^((p-1)/8) * if there was a cheap way to determine if 2 were expected to * be a quartic residue or not. Dirichlet proved that 2 is * a quartic residue iff p is of the form a^2 + (8*b^2). * All primes == 1 (mod 4) can be expressed as a^2 + (2*b)^2, * but I see no cheap way to evaluate this condition. */[/code] The code itself, if I've chosen the right pieces: [code] if (bnCopy(e, bn) < 0) return -1; (void)bnSubQ(e, 1); l = bnLSWord(e); j = 1; /* Where to start in prime array for strong prime tests */ if (l & 7) { bnRShift(e, 1); if (bnTwoExpMod(a, e, bn) < 0) return -1; if ((l & 7) == 6) { /* bn == 7 mod 8, expect +1 */ if (bnBits(a) != 1) return 1; /* Not prime */ k = 1; } else { /* bn == 3 or 5 mod 8, expect -1 == bn-1 */ if (bnAddQ(a, 1) < 0) return -1; if (bnCmp(a, bn) != 0) return 1; /* Not prime */ k = 1; if (l & 4) { /* bn == 5 mod 8, make odd for strong tests */ bnRShift(e, 1); k = 2; } } } else { /* bn == 1 mod 8, expect 2^((bn-1)/4) == +/-1 mod bn */ bnRShift(e, 2); if (bnTwoExpMod(a, e, bn) < 0) return -1; if (bnBits(a) == 1) { j = 0; /* Re-do strong prime test to base 2 */ } else { if (bnAddQ(a, 1) < 0) return -1; if (bnCmp(a, bn) != 0) return 1; /* Not prime */ } k = 2 + bnMakeOdd(e); }[/code] No doubt danaj could elucidate further. Edit: his version is [url=https://github.com/danaj/Math-Prime-Util/blob/4fc8396ad9886652d8d1b22899cb787d43abfcd2/primality.c#L119]here[/url]. |
[QUOTE=Antonio;460557]I am pleased to present (with the kind permission of Dana Jacobsen) a new release of the gap code.
[/QUOTE] [QUOTE=Antonio;460557] The value of delta will need to be tuned to a lower value to get best performance (it was while I was doing this that I found the bug in the original code), for my i5-3570k I needed to change from: gap3_4 -n1 %1 -n2 %2 -gap 1250 -delta 200 -sb 24 -bs 16 -mem 12 -t 4 to: gap5_d -n1 %1 -n2 %2 -gap 1250 -delta 155 -sb 24 -bs 16 -mem 12 -t 4 My throughput increased from around 20.9e9 n/s to around 26.3e9 n/s. [/QUOTE] Thanks! Good code, in my test the rate has been reached 27e9 n/sec. You can get the modinverse roughly 4-5 times faster, but in the code it will be completely in the noise level, because the original is also fast. It is a known idea, the Hensel lifting: [CODE] static inline ui64 modular_inverse64(const ui64 a){ static const char mask[128] = {255,85,51,73,199,93,59,17,15,229,195,89,215,237,203,33, 31,117,83,105,231,125,91,49,47,5,227,121,247,13,235,65,63,149,115,137,7,157,123,81,79, 37,3,153,23,45,11,97,95,181,147,169,39,189,155,113,111,69,35,185,55,77,43,129,127,213, 179,201,71,221,187,145,143,101,67,217,87,109,75,161,159,245,211,233,103,253,219,177,175, 133,99,249,119,141,107,193,191,21,243,9,135,29,251,209,207,165,131,25,151,173,139,225, 223,53,19,41,167,61,27,241,239,197,163,57,183,205,171,1}; ui64 ret=mask[(a>>1)&127]; ret*=2+a*ret; ret*=2+a*ret; ret*=2+a*ret; return ret; } [/CODE] with only 3 lifts you get the inverse mod 2^64 from mod 2^8. |
1 Attachment(s)
[QUOTE=R. Gerbicz;460577]Thanks! Good code, in my test the rate has been reached 27e9 n/sec.
You can get the modinverse roughly 4-5 times faster, but in the code it will be completely in the noise level, because the original is also fast. It is a known idea, the Hensel lifting: [CODE] static inline ui64 modular_inverse64(const ui64 a){ static const char mask[128] = {255,85,51,73,199,93,59,17,15,229,195,89,215,237,203,33, 31,117,83,105,231,125,91,49,47,5,227,121,247,13,235,65,63,149,115,137,7,157,123,81,79, 37,3,153,23,45,11,97,95,181,147,169,39,189,155,113,111,69,35,185,55,77,43,129,127,213, 179,201,71,221,187,145,143,101,67,217,87,109,75,161,159,245,211,233,103,253,219,177,175, 133,99,249,119,141,107,193,191,21,243,9,135,29,251,209,207,165,131,25,151,173,139,225, 223,53,19,41,167,61,27,241,239,197,163,57,183,205,171,1}; ui64 ret=mask[(a>>1)&127]; ret*=2+a*ret; ret*=2+a*ret; ret*=2+a*ret; return ret; } [/CODE]with only 3 lifts you get the inverse mod 2^64 from mod 2^8.[/QUOTE]I added the Hensel lifting code and yes it looks like a speed up of less than 1%. However I know Dana was going to look at another possible speed up which could yield another 5% (or there-about) increase so it could be added in if that happens. For people who want to compile their own version and try it I have attached a version 0.05.e of the source code. |
Thanks to all for this impressive speed-up! :bow:
Meanwhile I've got 4160-4172e15 completed (note the quartet!): [CODE] 4160-4161e15 78 kgaps, largest 1142 @ 4160855564170607801 4161-4162e15 84 kgaps, largest 1380 @ 4161256227746743067 4162-4163e15 111 kgaps, largest 1188 @ 4162650991648045261 4163-4164e15 81 kgaps, largest 1140 @ 4163851461415885439 4164-4165e15 90 kgaps, largest 1204 @ 4164005813345056153 4165-4166e15 86 kgaps, largest 1202 @ 4165268264997138737 4166-4167e15 82 kgaps, largest 1132 @ 4166895731083193611 4167-4168e15 80 kgaps, largest 1200 @ 4167562342954892237 4168-4169e15 79 kgaps, largest 1200 @ 4168914553292110091 4169-4170e15 95 kgaps, largest 1126 @ 4169922460669612393 4170-4171e15 98 kgaps, largest 1234 @ 4170143483650450639 4171-4172e15 83 kgaps, largest 1110 @ 4171138343914429271 & 4171422212516359189 & 4171664786200787539 & 4171829679529702439[/CODE] Reserving 4180-4200e15. If you don't mind, I would also start from 4400e15 (reserving the first 10 blocks). |
4012-4013 completed.
[code] 4012-4013e15 87 kgaps, largest 1196 @ 4012914149323655627 [/code] Reserving 4070-4071 and 4071-4072 |
Is there away someone could update the reservation table with more frequency? TIA
|
[QUOTE=ATH;460572]There are only 31,894,014 strong 2-prps under 2^64. Is that too slow to give any speed benefits?[/quote]
I did not find the Euler or Miller-Rabin test to be consistently faster than a Fermat test, while the "Euler-Plumb" test comes out a small amount faster in my testing. This may be a function of implementation. The benefit for this thread's purpose is in the speed rather than strength, so we'd only want to use M-R if it was consistently faster. Background on my using the test. I have looked for some fast compositeness (nee primality) test that is discriminating but reasonable faster than a M-R test. No real luck yet. The Fermat test is really not much faster than a M-R test (assuming equal implementations), but the Euler-Plumb test seems to be just a bit faster for me. Not enough faster to make it useful as a pretest in this scheme: 1. Simple pretests, e.g. trial division / gcd 2. [I]? pretest ?[/I] 3. M-R base 2 4 (strong or AES or ES) Lucas test 3&4 being the BPSW test. However it is useful in a few places as a "maybe prime" pretest in place of a Fermat or M-R test, and for 64-bit inputs we have a lot of room for modification as it is correct for all inputs. We can probably substitute it in #3 for a small performance gain (only for 64-bit!). [quote]Where can I find any information on the Euler-Plumb/Plump test?[/QUOTE] Charles gives a good reference, and it looks like most others (e.g. GNU Crypto) have copied straight from that. My code is completely different, but the concept is all his. Fermat base 2: [$]2^{p-1} \equiv 1 \pmod p[/$] 119M 64-bit pseudoprimes Euler (Euler-Jacobi) base 2: [$]2^{(p-1)/2} \equiv \left(\frac{2}{p}\right) \pmod p[/$] 64M 64-bit psuedoprimes, all of which are also Fermat base 2 pseudoprimes. So the Euler test is more discriminating than the Fermat test and could arguably be a small bit faster. Colin Plumb started there and did a couple tricks. The first notes that [FONT="Courier New"](2|p)[/FONT] simplifies to [FONT="Courier New"](p % 8 == 1 || p % 8 == 7) ? 1 : -1[/FONT]. The second is to examine what happens if we backed up a square and notes in the [FONT="Courier New"]p % 8 == 1[/FONT] case we can skip the last squaring and accept -1 or +1. This not only sometimes saves another modular squaring but also removes some pseudoprimes. None are added -- all "Euler-Plumb" pseudoprimes are also Euler base 2 pseudoprimes and also Fermat base 2 pseudoprimes. 48M 64-bit pseudoprimes. The Miller-Rabin (strong pseudoprime) test is stronger yet, with all pseudoprimes being a subset of the Fermat, Euler, and Euler-Plumb tests. 32M 64-bit pseudoprimes. |
1 Attachment(s)
[QUOTE=Antonio;459546]OK, so it seems there are 2 problems.
1. It appears that Perl doesn't have write permission in the folder you are using. 2. You have a later version of Term::ReadKey on your system which doesn't work well with windows. Could you try the attached code please and let me know the result.. [/QUOTE] See attachment... The problems as described before are the same (need to press any key in order to keep the calculation going). Good thing is, there's a log file now. |
Why don't you use the available windows binaries? Which processor do you have?
|
While the Perl code should be portable, Robert G's C code is ~10x faster so it is definitely preferred.
While more-or-less similar sieving could be done in the Perl library, it would be hard to get it both reasonably generic and avoid overhead in the C/Perl interface. IMO it's best to leave it all in C. It is still a mildly interesting problem (how much partial sieving would improve results from the current Perl algorithm), but I suspect not really relevant to this thread any more. It looks currently like we should get the 4e18 to 5e18 range covered in under 6 months, which is pretty incredible. Maybe not given Oliveira e Silva's project record, but previously I don't think anyone had come close to matching that performance. |
1 Attachment(s)
[QUOTE=mart_r;460679]See attachment...
The problems as described before are the same (need to press any key in order to keep the calculation going). Good thing is, there's a log file now.[/QUOTE] Sorry about the programming errors, the attached version should work ok for you now though. |
| All times are UTC. The time now is 22:04. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.