Quote:
Originally Posted by Citrix
Code:
// We're trying to determine if n*b^n (mod p) = +1/1. This requires an two
// operations, an exponentiation (b^n) and a multiplication (n). We can make a
// change to eliminate one of those operations and convert the second one into addition which
// should be faster. Doing this requires us to
// compute the multiplicative inverse of b. The multiplicative inverse of
// b (which I will call B) is the number such at b*B = 1 mod p. Here is how
// the algorithm works.
//
// We start with our number:
// n*b^n (mod p) = +1/1
// We then generate B such that b*B=1 (mod p)
//
// To get then next term (which I will call N) where N>n , we need to
// check the following
// N=n+r
// N*b^N(mod p) = +1/1
// (n+r)*b^(n+r) (mod p) = +1/1
// We multiply both sides by B^r
// (n+r)*b^(n+r))B^r (mod p) = +B^r/B^r
// (n+r)*b^n (mod p) = +B^r/B^r
// n*b^n + r*b^n (mod p) = +B^r/B^r
// We already know n*b^n and b^n are from the previous step.
// (Array 1) Now we precompute +B^r for a range of r since we will be using it multiple times (1 to rmax = M*sqrt(nmaxnmin))
// (Array 2) for each n*b^n in the loop we precompute r*b^n for small consecutive values of r (1 to rdiffmax) – can be done by modular addition
// We can calculate each successive candidate by just adding a term from Array 2 and checking it with corresponding term from Array 1.
M for size of Array 1 can be adjusted for maximum efficiency depending on efficiency of the loops.
// once we reach rmax then we multiply last term by b^rmax and b^n by b^rmax
n*b^n + rmax*b^n (mod p)
(n+rmax)* b^n (mod p)
(n+rmax)* b^n*b^rmax(mod p)
n'*b^n'(mod p)
and also generate b^n'=b^n*b^rmax. (mod p)
// We can now repeat the loop again starting at n'.
Other suggestions
Your current code takes into account n values left mod 2 but not for other higher primes.
Code can be made faster if you only look at subsequences left for mod 6 or higher number. BestQ code from srsieve can be used here.
Depending on size of subsequence being sieved we can adjust size of the loops for maximum efficiency.
On a GPU if memory storage is a concern Array 1 can be reduced in size and Array 2 might not be worth saving. This will be still faster than current code.

It took me a while, but I believe I see where you are going with this. Obviously as the range of n increases, so does the benefit.
Memory is definitely a concern on the GPU, but I think that something similar could be done there as well.
I will table your other suggestions because I don't understand them. Someone else (maybe you) would be interested implementing.