View Single Post
Old 2020-05-27, 15:41   #24
rogue's Avatar
Apr 2003
Between here and the

17·347 Posts

Originally Posted by Citrix View Post
// We're trying to determine if n*b^n (mod p) = +1/-1.  This requires an two
// operations, an exponentiation (b^n) and a multiplication (n).  We can make a
// change to eliminate one of those operations and convert the second one into addition which 
// should be faster. Doing this requires us to
// compute the multiplicative inverse of b.  The multiplicative inverse of
// b (which I will call B) is the number such at b*B = 1 mod p.  Here is how
// the algorithm works.
// We start with our number:
//       n*b^n (mod p) = +1/-1
// We then generate B such that b*B=1 (mod p)
// To get then next term (which I will call N) where N>n , we need to
// check the following  
//	   N=n+r
//       N*b^N(mod p) = +1/-1
//       (n+r)*b^(n+r) (mod p) = +1/-1
// We multiply both sides by B^r
//       (n+r)*b^(n+r))B^r (mod p) = +B^r/-B^r
//       (n+r)*b^n (mod p) = +B^r/-B^r
//       n*b^n + r*b^n (mod p) = +B^r/-B^r
// We already know n*b^n and b^n are from the previous step.

// (Array 1) Now we pre-compute +-B^r for a range of r since we will be using it multiple times (1 to rmax = M*sqrt(nmax-nmin))
// (Array 2) for each n*b^n in the loop we pre-compute r*b^n for small consecutive values of r (1 to rdiffmax) – can be done by modular addition
// We can calculate each successive candidate by just adding a term from Array 2 and checking it with corresponding term from Array 1.

M for size of Array 1 can be adjusted for maximum efficiency depending on efficiency of the loops.

// once we reach rmax then we multiply last term by b^rmax and b^n by b^rmax 
   n*b^n + rmax*b^n (mod p)
   (n+rmax)* b^n (mod p)
   (n+rmax)* b^n*b^rmax(mod p)
   n'*b^n'(mod p)
   and also generate b^n'=b^n*b^rmax. (mod p)
// We can now repeat the loop again starting at n'.

Other suggestions
Your current code takes into account n values left mod 2 but not for other higher primes.
Code can be made faster if you only look at subsequences left for mod 6 or higher number. BestQ code from srsieve can be used here. 
Depending on size of subsequence being sieved we can adjust size of the loops for maximum efficiency.

On a GPU if memory storage is a concern Array 1 can be reduced in size and Array 2 might not be worth saving. This will be still faster than current code.
It took me a while, but I believe I see where you are going with this. Obviously as the range of n increases, so does the benefit.

Memory is definitely a concern on the GPU, but I think that something similar could be done there as well.

I will table your other suggestions because I don't understand them. Someone else (maybe you) would be interested implementing.
rogue is offline   Reply With Quote