mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Computer Science & Computational Number Theory (https://www.mersenneforum.org/forumdisplay.php?f=116)
-   -   Fast isPrime() for n < 2^32 (https://www.mersenneforum.org/showthread.php?t=12209)

SPWorley 2009-07-25 04:11

Fast isPrime() for n < 2^32
 
I needed an isPrime() test for generating hash table sizes. If you google for efficient prime testing code, you usually end up at [URL="http://primes.utm.edu/prove/prove2_3.html"]Chris Caldwell's great page.[/URL]

I ended up making my own test that's pretty efficient for n<2^32. Instead of using 3 or 4 SPRP tests, I use only [B]one [/B]test, but I choose the base of the test based on a hashed value of N itself. A 1KB pregenerated lookup table then gives the base needed for the [B]single [/B]SPRP test to classify the primality any number < 2^32 properly.

Simple code is below. It can definitely be tweaked based on your CPU, language, or compiler, especially how many test divides you want to prescreen with.

I could probably speed this up slightly by replacing the test divides with an Euler GCD and then Bloom filter, but that's likely only a few percentage points speed boost. Of course I haven't tried it yet.

[code]
/*
* Copyright 2009 Steve Worley < m a t h g e e k @(my last name).com >
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL
* WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED
* WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE
* AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR
* CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS
* OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT,
* NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
* CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/

/* isPrime() by Steve Worley

Efficiently computes whether a 32 bit integer is prime or not.
Strategy: Do first screening with a few trial divisions to identify
easy composites. Next use a strong-probable-prime test. Normally
several tests would be necessary but this code hashes the number
into a bin which has a precomputed table listing the SPRP test base
which is sufficient for the numbers in that bin. Thus, you can
prove (guaranteed 100%) whether your number is prime or not with a
single SPRP test. More discussion at
www.worley.com/mathgeek/isprime.html

This algorithm is deliberately unpatented. The BSD license above also
lets you use it even in commercial code.
*/

static const unsigned char
SPRPlookup[1024]={5,6,11,15,76,15,11,26,10,35,29,13,
22,6,14,3,21,24,30,13,6,11,12,6,11,11,5,11,19,3,13,
14,3,6,17,26,6,13,13,13,2,6,5,2,11,15,7,3,10,37,43,
18,33,5,37,13,5,3,37,14,26,5,6,15,19,45,11,24,2,6,
23,2,11,10,2,15,5,5,15,15,15,55,5,92,10,5,18,10,2,
66,22,21,2,42,13,5,17,3,2,6,2,3,2,13,3,14,14,42,6,7,
39,6,15,3,24,5,3,11,14,22,11,5,26,22,14,2,28,6,19,7,
7,10,2,45,40,11,31,2,11,17,33,12,23,2,14,10,13,15,
13,6,3,3,3,38,7,23,18,7,6,10,10,5,11,5,6,2,2,11,6,
12,21,5,43,6,11,15,21,55,38,13,2,2,17,10,13,13,13,7,
10,5,13,3,29,14,11,23,18,5,2,7,2,14,10,2,2,20,10,7,
18,7,6,5,3,10,15,57,18,10,26,12,13,6,17,6,3,3,10,13,
15,44,35,30,26,13,2,13,2,31,37,5,10,31,2,15,5,7,17,
33,17,2,21,23,7,5,17,6,40,17,11,56,30,30,10,3,15,11,
2,15,3,5,2,2,17,22,2,10,2,7,12,13,5,22,5,20,17,7,6,
18,18,6,19,6,5,21,24,2,10,2,2,11,19,42,2,5,17,7,2,42,
7,41,5,5,23,15,11,2,34,10,24,11,12,14,11,2,15,30,28,
19,6,10,12,5,6,45,7,13,11,12,15,13,11,13,2,23,2,33,
15,15,11,10,24,26,13,31,13,7,11,23,5,12,17,14,7,29,
29,15,22,19,14,11,15,19,2,15,2,28,13,20,14,15,60,5,
2,12,69,24,20,7,18,2,13,5,6,10,14,6,24,3,11,23,5,13,
88,13,14,19,3,5,3,21,2,11,6,31,42,18,73,2,7,3,2,3,23,
11,6,6,5,6,14,11,2,11,7,10,7,38,39,5,15,2,19,5,7,22,
13,7,11,24,10,20,14,3,2,10,15,3,13,15,29,11,10,3,6,5,
30,13,26,6,6,17,29,10,19,15,19,21,46,6,17,3,14,3,2,6,
17,26,34,2,11,5,29,19,3,2,19,7,11,14,29,15,7,33,14,21,
12,7,17,15,5,5,7,14,35,21,15,11,15,3,18,3,11,17,7,21,
20,3,15,11,10,3,24,47,10,5,10,5,3,42,6,2,10,21,3,2,2,
5,14,7,3,2,14,15,11,5,12,13,15,15,5,11,10,6,5,7,6,19,
5,2,6,15,5,6,3,2,24,10,10,11,14,11,7,13,3,7,3,18,20,
30,33,17,30,17,3,35,5,5,17,46,6,26,6,20,40,5,10,12,7,
17,5,12,2,6,5,10,5,7,7,7,3,17,11,10,34,10,15,2,6,3,11,
23,13,6,2,18,14,10,7,6,10,7,10,77,6,17,10,5,5,18,17,5,
15,2,11,3,10,13,28,21,20,23,6,7,14,11,3,6,18,5,5,10,6,
23,3,3,2,6,2,15,6,10,5,15,21,7,6,3,7,12,5,3,13,17,20,
21,15,7,2,14,40,20,5,11,7,7,10,10,31,11,13,21,43,3,14,
22,10,22,23,2,10,10,3,7,3,20,7,5,15,7,10,2,30,52,10,
56,14,14,2,2,3,7,5,5,10,3,10,7,15,13,3,89,21,11,5,14,
13,13,11,40,23,21,6,3,20,14,24,6,2,28,6,5,5,6,5,20,7,
13,29,26,31,5,11,3,24,5,7,5,29,38,3,19,13,6,2,22,10,7,
22,5,29,7,11,6,3,2,37,7,7,6,19,12,2,91,7,2,6,56,52,7,
11,10,31,2,6,11,78,34,15,2,53,14,22,31,22,41,2,23,6,
39,6,31,15,15,10,7,30,2,19,6,43,11,38,6,6,3,26,7,10,
11,5,2,11,5,22,22,3,15,2,5,6,24,10,17,7,10,11,2,23,13,
5,11,11,6,21,18,3,21,10,21,2,11,5,7,44,43,31,10,13,3,
2,10,3,7,2,47,13,12,2,13,5,11,2,5,7,11,19,5,10,17,22,
26,2,2,7,2,2,6,6,3,6,22,15,13,57,13,12,2,20,3,5,13,3,
13,15,6,15,2,21,11,10,11,15,2,22,23,10,6,44,7,2,10,5,
31,6,2,5,22,15,5,17,11,2,6,6,13,23,6,5,11,7,28,29,22,
2,7,11,6,35,24,2,3,2,17,13,3,6,2,14,5,11,11,7,23,13,11,
2,13,13,10,12,23,10,18,5,60,10,5,26,5,29,3,20,7,21,47};


/* Computes a^n mod m. Classic binary power ladder evaluation.
It may be possible to optimize this based on your architecture, depending especially on its integer % speed, and/or support for native 64 bit ints. */
unsigned long powmod(unsigned long a, unsigned long n, unsigned long m)
{
unsigned long long result=n&1 ? a : 1;
unsigned long long aa=a;
n=n>>1;

while (n) {
// note 64 bit math used for intermediate answers before mod.
aa=(aa*aa)%m;
if (n&1) result=(aa*result)%m;
n=n>>1;
}
return (unsigned long)result;
}

static int isSPRP(unsigned long n, unsigned long a)
{
unsigned long d=n-1;
unsigned long ad;
long s=0;

// break down n-1 into d*(2^s). Linux ffs() can be better
while (0==(d&1)) {
++s; d>>=1;
}

ad=(powmod(a, d, n)); // (a^d) mod n

if (ad==1) return 1; // 1 == a^d mod n
if (s>0 && ad==(n-1)) return 1; // -1 == a^d mod n (tests (a^d)^(2^0))
for (unsigned long r=1; r<s; ++r) {
// ad is currently ad^(2^(r-1)) so we square it to get ad^(2^r));
ad=(((unsigned long long)ad)*ad)%n;
if (ad==(n-1)) return 1; //tests (a^d)^(2^(r+1)))
}
return 0; // false, composite
}

int isPrime(unsigned long v)
{
/* this initial screening test is OPTIONAL, and on many platforms will
give faster rejection of most composites without a full SPRP test. */
if (0==v%2) return v==2;
if (0==v%3) return v==3;
if (0==v%5) return v==5;
if (0==v%7) return v==7;
if (v<121) return true;

// Hash the value to find the lookup table index for a single SPRP test.
return isSPRP(v, (unsigned long)SPRPlookup[(0x3AC69A35*v)>>22]);
}

#include <stdio.h>
int main(int argc, char **argv)
{
unsigned long i;

for (i=2; i<300; ++i)
if (isPrime(i)) printf("%ld is prime.\n", i);
return 0;
}

[/code]

zerothbase 2011-04-16 20:25

1 Attachment(s)
Attached is a code to extend this principle to 1e15 (using [URL="http://www.cecm.sfu.ca/Pseudoprimes/"]William Galway's list[/URL]) with only 2 miller-rabin tests.

It uses a SPRP-2, and a hash function to get a value from a table of 2048 "magic" values. Just for fun, the hash uses a very pi-like prime and a HHGTTG-shift.

Since it uses 64-bit numbers, powmod is required to get into 128 bit math for each multiplication to use the basic [URL="http://en.wikipedia.org/wiki/Modular_exponentiation#Right-to-left_binary_method"]Right-to-left Binary Method[/URL]. Thus I have two powmod methods: one that uses [URL="http://gmplib.org/"]GMP[/URL], and one that uses a mulmod to do the multiplications. The code, as written, just makes use of the GMP method (since it far faster than using powmod2/mulmod) and the powmod2 is simply ignored. This is definitely the weakest part of the entire code. Anyone have a faster way to do 64-bit powmod in 64-bits with basic C code? I know the entire code would be sped up significantly if I just kept all the numbers in GMP/mpz_t's, but I was trying to isolate GMP as much as possible so it could be removed if I found a short/fast 64-bit powmod.

The code translates directly to Java as well (using BigInteger for powmod and long for ullint). The hash results are the same in Java as in C (even though Java uses signed 64-bit longs).

--Jim Sinclair

CRGreathouse 2011-04-16 21:01

Have you thought of extending this to 2^64 with Feitsma's data?
[url]http://gilchrist.ca/jeff/factoring/pseudoprimes.html[/url]

WraithX 2011-04-16 22:02

[QUOTE=zerothbase;258713]Attached is a code to extend this principle to 1e15 (using [URL="http://www.cecm.sfu.ca/Pseudoprimes/"]William Galway's list[/URL]) with only 2 miller-rabin tests.

It uses a SPRP-2, and a hash function to get a value from a table of 2048 "magic" values. Just for fun, the hash uses a very pi-like prime and a HHGTTG-shift.

Anyone have a faster way to do 64-bit powmod in 64-bits with basic C code? I know the entire code would be sped up significantly if I just kept all the numbers in GMP/mpz_t's, but I was trying to isolate GMP as much as possible so it could be removed if I found a short/fast 64-bit powmod.
--Jim Sinclair[/QUOTE]

I am very curious how you found the multiplier and the list of 2048 magic values. Can you share? I was working on Feitsma's database to try and create a 2-Miller-Rabin test just like you have created. But, I stuck with primes, and only the first couple of hundred at that.

Here is the code that I use to do mulmod64 and powmod64. I haven't actually tested it against a GMP implementation, but I think it should be faster since it just uses the native machine type. As long as you have a 64-bit C compiler that supports inline asm, this should work just fine for you.

[CODE]
u64_t mulmod64(u64_t a, u64_t b, u64_t c)
{
u64_t d; /* to hold the result of a*b mod c */

/* calculates a*b mod c, stores result in d */
asm ("mov %1, %%rax;" /* put a into rax */
"mul %2;" /* mul a*b -> rdx:rax */
"div %3;" /* (a*b)/c -> quot in rax remainder in rdx */
"mov %%rdx, %0;" /* store result in d */
:"=r"(d) /* output */
:"r"(a), "r"(b), "r"(c) /* input */
:"%rax", "%rdx" /* clobbered registers */
);
return d;
}/* method mulmod64 */

/*================================================================================*/

/* powmod64 from Geoff Reynolds code at */
/* http://www.geocities.com/g_w_reynolds/srsieve/ */
u64_t powmod64(u64_t b, u64_t n, u64_t p)
{
u64_t a=1;
goto tst;

mul:
a = mulmod64(a,b,p);

sqr:
b = mulmod64(b,b,p);
n >>= 1;

tst:
if (n & 1)
goto mul;
if (n > 0)
goto sqr;

return a;
}/* method powmod64 */
[/CODE]

zerothbase 2011-04-17 00:31

[QUOTE=CRGreathouse]Have you thought of extending this to 2^64 with Feitsma's data?[/QUOTE]
Ya, but I haven't seen a straight download link for it the whole thing. Am I missing it? Has [URL="http://www.janfeitsma.nl/math/psp2/statistics"]Feitsma[/URL] completed it yet? The statistics page shows red (incomplete) for the >10^17. After having looked at 1e15, I suspect that I'd need more miller-rabin tests though. Perhaps 4 or 5 bases? I'd have to play with it to know for sure.

[QUOTE=WraithX]how you found the multiplier[/QUOTE]
Just made it up. I just took a few digits of Pi and found the next prime number after it (turned out to be the next number after 3141592653589793238 lol). Basically you can use pretty much any large prime number; it's related to [URL="http://en.wikipedia.org/wiki/Linear_congruential_generator"]Pseudorandom LCG[/URL]. My choice doesn't follow all the advice on a perfect LCG, but it distributes the numbers well enough.

[QUOTE=WraithX]and the list of 2048 magic values[/QUOTE]
Basically we know that 2 is a good enough witness for every number <1e15 other than those on the list of 419489 SPRPs.

First step was to create the hash function (just using whatever) and verify that it distributes the 419489 SPRP-2's fairly evenly regardless of what i use for bucket size (it does).

Step 2, I started with 64 buckets, and ran all the numbers in bucket 1 (a random bucket) through using 3, 4, 5... up to about a million as the second witness before I killed it. For each witness I kept track of the number of SPRPs that failed. The count of SPRPs that failed varied a lot but seemed to have a bell curve of some kind. With 64 buckets there was no witness less than 1 million that "solved" bucket 1, or got anywhere close. There were usually hundreds left in bucket 1 unsolved. So I needed to break the buckets down into smaller buckets.

Step 3, double the number of buckets (128...256...) and repeat step 2 until I could reasonably expect to get a solved bucket for every bucket. That is, when the average number of liars for a given witness was something like 5 to 10. With 1024 buckets it is close, but 2048 was guaranteed. If I played more with the hashing function and such, I could probably narrow it down, but I was being lazy and did this yesterday afternoon just messing around.

Step 4, for all 2048 buckets of SPRPs, find the least number that "solved" the bucket (0 liars left). This is your "magic" value for each bucket.

Step 5 verify that the code gets every answer correct up to the highest witness in the 2048 list (which is 1,095,558). The reason for this is that there is a small chance that a random number could hash to a value that is equal to itself or a multiple of itself. For instance, 1219 hashed to a bucket with 1219 in it; this would obviously fail. There were about 5 of these values that had to be replaced out of the 2048. I just did step 4 again and used a different value for that bucket (and reverified).

After this, we are good. We know that miller-rabin always returns the answer "prime" when a number is prime (assuming the number isn't tested with itself or a multiple). So all prime numbers <1e15 will return correctly. We know the composite SPRP-2 liars, and I verified them all with the hashed 2048 magic numbers.

[QUOTE=WraithX]u64_t mulmod64(u64_t a, u64_t b, u64_t c)[/QUOTE]
I love it - works great. Probably just my environment, but I seem to get random floating point exceptions when I try to use the method generically for 33-bit to 64-bit numbers. I get them more frequently when I use numbers closer to 64-bit (on an exponential curve). So for numbers <1e15, I haven't seen an exception yet. The function is, of course blazing fast (about 30x faster than GMP in my env). I'm using Linux Mint 10 in Virtual Box 4.0.4.

I'd still like to see a C-only implementation of mulmod though for maximum portability. That is, without basically building an entire MP-library in place or using an Intel/AMD specific instruction. I guess I just suspect that there must be a number-theoretic way of doing this easily. I've tried reading some of web info on Montgomery reduction (don't know if that would even help), but it was beyond my Friday afternoon skills for number theory. Perhaps another day.

CRGreathouse 2011-04-17 02:37

[QUOTE=zerothbase;258732]Ya, but I haven't seen a straight download link for it the whole thing. Am I missing it? Has [URL="http://www.janfeitsma.nl/math/psp2/statistics"]Feitsma[/URL] completed it yet? The statistics page shows red (incomplete) for the >10^17. After having looked at 1e15, I suspect that I'd need more miller-rabin tests though. Perhaps 4 or 5 bases? I'd have to play with it to know for sure.[/QUOTE]

Yes, I think Will Galway has the full list on his site -- presumably Jan doesn't have the bandwidth. I downloaded it some months ago. (If you can't otherwise locate it, I suppose I could get it to you, but it's a *lot* of data and if you can get it from Galway so much the better.)

zerothbase 2011-04-17 02:59

Found it (with some creative Googling) and downloading now. thanks! I'll take a look and play with it a little bit.

zerothbase 2011-04-19 23:00

Fast isPrime for n < 2^64
 
1 Attachment(s)
Fast isPrime for n < 2^64 attached.

Three static SPRP/Miller-Rabin tests w/ a hash to find the 4th test.

It seemed a little too easy to find this. I think with more computation time I could find a way to reduce either the number of tests to 3 total or the size of the hash (just a hunch). But it's a good starting point to get a 100% guaranteed test for 64-bit numbers (assuming the completeness of Feitsma's data).

Note: mulmod is Intel 64-bit assembly. Substitute as needed for cross platform/architecture.

Thanks WraithX for the mulmod implementation, and SPWorley for the inspiration.

CRGreathouse 2011-04-20 01:04

I would really like to see a version that reduces the tests to 3 -- or perhaps just 2 plus some trial division? The latter would need a really big hash...

geoff 2011-04-20 05:18

Here is some SPRP code I put together for the AP26 project. You can find the full source at [url]http://sites.google.com/site/geoffreywalterreynolds/programs/ap26[/url], look in the PrimeQ_x86.h header.

The mulmod function only works up to 2^62 (or 2^52 on some machines), but it is easy to replace with another one that works up to 2^64 (There are much faster methods than the mul/div asm version). If you are doing multiple SPRP tests on the same number then it would probably be worthwhile writing a vector version of the function that tests all bases in parallel.

[code]
/* Returns r = a*b (mod p) in the range 0 <= r < p.
Assumes 0 <= a,b < p < 2^LDBL_MULMOD_LIMIT.
Assumes inv = 1.0L/p.
*/
static int64_t mulmod(int64_t a, int64_t b, int64_t p, long double inv)
{
int64_t q, r;
long double x, y;

/* ab (mod p) = ab-pq, where q = floor(ab/p) */

x = a;
y = b;
q = x*y*inv;
r = a*b - q*p;

/* Because the default FPU rounding mode is round-to-nearest the result
could be out by +/-p. One of these branches could be eliminated by
using a non-default rounding mode. (Round-to-zero is ideal I think).
*/
if (r < 0)
r += p;
else if (r >= p)
r -= p;

return r;
}


/* Returns 0 only if N is composite.
Otherwise N is a strong probable prime to base a.
Assumes N odd, a < N < 2^LDBL_MULMOD_LIMIT.
*/
static int strong_prp(int64_t a, int64_t N)
{
long double inv;
int64_t r;
uint64_t d;
unsigned int s, t;

/* If N is prime and N = d*2^t+1, where d is odd, then either
1. a^d = 1 (mod N), or
2. a^(d*2^s) = -1 (mod N) for some s in 0 <= s < t */

inv = 1.0L/N;

#ifdef __GNUC__
t = __builtin_ctzll(N-1);
d = (uint64_t)N >> t;
#else
for (d = (uint64_t)N >> 1, t = 1; !(d & 1); d >>= 1, t++)
;
#endif

/* r <-- a^d mod N, assuming d odd */
for (r = a, d >>= 1; d > 0; d >>= 1)
{
a = mulmod(a,a,N,inv);
if (d & 1)
r = mulmod(r,a,N,inv);
}

/* Clause 1. and s = 0 case for clause 2. */
if (r == 1 || r == N-1)
return 1;

/* 0 < s < t cases for clause 2. */
for (s = 1; s < t; s++)
if ((r = mulmod(r,r,N,inv)) == N-1)
return 1;

return 0;
}
[/code]

R.D. Silverman 2011-04-20 14:06

[QUOTE=CRGreathouse;259055]I would really like to see a version that reduces the tests to 3 -- or perhaps just 2 plus some trial division? The latter would need a really big hash...[/QUOTE]

You really should consider just a SINGLE MR test followed by either a
Lucas test or a Frobenius-Grantham. There is no known pseudoprime
for a single MR followed by a single Lucas test. (Preda Mihailescu
and I looked for one without success).

If the objective is proved primes, one can just trial divide N-1 and N+1
to N^(1/6), and apply the Selfridge-Lehmer-Brillhart test that allows
one to prove primality if enough factors of N-1 and N+1 are known.


All times are UTC. The time now is 21:10.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.