View Single Post
Old 2020-05-27, 04:26   #23
Citrix's Avatar
Jun 2003

1,571 Posts
Default gcwsieve enhancements

// We're trying to determine if n*b^n (mod p) = +1/-1.  This requires an two
// operations, an exponentiation (b^n) and a multiplication (n).  We can make a
// change to eliminate one of those operations and convert the second one into addition which    // should be faster. Doing this requires us to
// compute the multiplicative inverse of b.  The multiplicative inverse of
// b (which I will call B) is the number such at b*B = 1 mod p.  Here is how
// the algorithm works.
// We start with our number:
//       n*b^n (mod p) = +1/-1
// We then generate B such that b*B=1 (mod p)
// // To get then next term (which I will call N) where N>n , we need to
// check the following  
//	   N=n+r
//       N*b^N(mod p) = +1/-1
//       (n+r)*b^(n+r) (mod p) = +1/-1
// We multiply both sides by B^r
//       (n+r)*b^(n+r))B^r (mod p) = +B^r/-B^r
//       (n+r)*b^n (mod p) = +B^r/-B^r
//       n*b^n + r*b^n (mod p) = +B^r/-B^r
// We already know n*b^n and b^n are from the previous step.

// (Array 1) Now we pre-compute +-B^r for a range of r since we will be using it multiple times (1 to rmax = M*sqrt(nmax-nmin))
// (Array 2) for each n*b^n in the loop we pre-compute r*b^n for small consecutive values of r (1 to rdiffmax) – can be done by modular addition
// We can calculate each successive candidate by just adding a term from Array 2 and checking it with corresponding term from Array 1.

M for size of Array 1 can be adjusted for maximum efficiency depending on efficiency of the loops.

// once we reach rmax then we multiply last term by b^rmax and b^n by b^rmax 
   n*b^n + rmax*b^n (mod p)
   (n+rmax)* b^n (mod p)
   (n+rmax)* b^n*b^rmax(mod p)
   n'*b^n'(mod p)
   and also generate b^n'=b^n*b^rmax. (mod p)
// We can now repeat the loop again starting at n'.

Other suggestions
Your current code takes into account n values left mod 2 but not for other higher primes.
Code can be made faster if you only look at subsequences left for mod 6 or higher number. BestQ code from srsieve can be used here. 
Depending on size of subsequence being sieved we can adjust size of the loops for maximum efficiency.

On a GPU if memory storage is a concern Array 1 can be reduced in size and Array 2 might not be worth saving. This will be still faster than current code.
Citrix is offline   Reply With Quote