I see. This is how I have commented the non-AVX code:

Code:

// We're trying to determine if n*b^n (mod p) = +1/-1. This requires an two
// operations, an exponentiation (b^n) and a multiplcation (n). By precomputing
// certain values, we can actually replace the exponentiation and mulitplication
// with array lookups. Doing this requires us to compute the multiplicative inverse
// of b and to build two arrays. The multiplicative inverse of b (which I will call B)
// is the number such at b*B = 1 mod p.
//
// We start with our n where n=minN:
// n*b^n (mod p) = +1/-1
// -> n*b^n*B^n (mod p) = +1/-1 * B^n (mod p)
// -> n (mod p) = +1/-1 B^n (mod p)
//
// Find all distinct r staring with n=minN where n+r is the next term until n=maxN.
//
// Pre-compute B^r (mod p) for all possible r.
// Pre-compute n*b^n + r*b^n (mod p) for all possible r.
//
// For each successive term:
// (n+r)*b^(n+r) (mod p) = +1/-1
// -> (n+r)*b^(n+r)*B^r (mod p) = +/-B^r (mod p)
// -> (n+r)*b^n (mod p) = +/-B^r (mod p)
// -> n*b^n + r*b^n (mod p) = +/-B^r (mod p)
//
// Since both n*b^n + r*b^n (mod p) and +/-B^r (mod p) have been pre-cmoputed
// we just need to compare elements of two arrays.
//
// If n*b^n + r*b^n (mod p) = -B^r, then p is a factor of n*b^n+1.
// If n*b^n + r*b^n (mod p) = +B^r, then p is a factor of n*b^n-1

I think that your suggestion for bestQ is not necessary as I build a table of "the covering set" of r infrequently.

There are now two loops for each group of primes. One to compute B^r (mod p) and n*b^n + r*b^n (mod p) and the one that detects if we have a factor. This has not been tested yet, so I have likely made mistakes along the way. Because of the need to convert so many doubles back to uint64_t, I'm not certain that an AVX version will be faster than this. That will need some testing. On the GPU side, I can make additional changes to eliminate unused elements of the left and right arrays.

Code:

while (ii_Terms[termIndex] > 0)
{
theN = ii_Terms[termIndex];
diff = prevN - theN;
for (pIdx=0; pIdx<4; pIdx++)
{
// If n*b^n + r*b^n (mod p) = -B^r, then p is a factor of n*b^n+1.
// If n*b^n + r*b^n (mod p) = +B^r, then p is a factor of n*b^n-1
if (il_LeftTerms[DIFF_INDEX(diff)+pIdx] == il_RightTerms[DIFF_INDEX(diff)+pIdx])
if (ip_CullenWoodallApp->ReportFactor(ps[pIdx], theN, -1))
VerifyFactor(ps[pIdx], theN, -1);
if (il_LeftTerms[DIFF_INDEX(diff)+pIdx] == ps[pIdx] - il_RightTerms[DIFF_INDEX(diff)+pIdx])
if (ip_CullenWoodallApp->ReportFactor(ps[pIdx], theN, +1))
VerifyFactor(ps[pIdx], theN, +1);
}
prevN = theN;
termIndex++;
};