I am not sure if you understood the algorithm correctly.

Here is the pseudocode

Code:

Given a base b, nmin, nmax, prime p to sieve.
Choose a value C where C<=nmax-nmin && (nmax-nmin+1)%C==0; need to experimentally find the best C value so code is fastest. This will be constant for all primes. Can adjust nmax so it solves the second modular part.
C=0 or C=nmax-nmin are not the best solutions. The best solution is somewhere in between based on efficiency of the code. (see ***)
Then generate B such that b*B=1 (mod p)
Then create an Array Invs[] = B^0, B^1,..., B^C (mod p)
Then create an Array Invsm[] = p-B^0, p-B^1,..., p-B^C (mod p)
Then we create a loop starting at X=nmin with step C upto nmax (e.g. for(int X=nmin; X<nmax; X=X+C)) {
We generate b^X and X*b^X (in each iteration of the loop can calculate B^lastX*B^C=B^CurrentX and then multiply by X)
We create an array Array MaxDiff []= 0*b^X, 1*b^X, ...., m*b^X (mod p; modular addition)
m would be around 50 for base 2. Could be smaller or larger based on density of remaining candidates in file. Similar to size of MAX_POWERS in current code. Use mean and standard deviations to find best m value. The small values 1*b^X are rarely used and might not need them.
Now we find all remaining candidates in the sieve between n=X and X+C (read from a bitmap)
Let the first candidate be n=r1
Then we add X^b^x+MaxDiff[r1-X] (mod p) = y
Check if y==Invs[r1-X] (for Woodall) and y==Invsm[r1-X] (for Cullen) --> Print factors
Let the next candidate be n=r2
Then we add y+MaxDiff[r2-r1] (mod p) = y
Check if y==Invs[r2-X] (for Woodall) and y==Invsm[r2-X] (for Cullen) --> Print factors
and so on till we reach the last term under X+C
Need to check if difference between 2 consecutive remaining terms is greater than Size of MaxDiff then we would need an extra step (similar to current code)
}// end loop
Best Q can further help as some values in Array Invs and Invsm might not need to be generated. Need to choose C value that is a multiple of Best_Q and adjust the nmin value so it is also a multiple of the Best_Q.
*For odd bases MaxDiff only needs 1/2 the values - otherwise should not apply Best_Q to MaxDiff
Why are you converting from double to uint64_t? You can do modular addition in double type.
***Notes on calculating best C value
Work done = C/q mulmod in first loop + N_range/C*(2 mulmod + 50 add and 50 subtract for MaxDiff array) in second loop.
Where q is adjustment due to Best_Q improvements
50 add+ 50 subtract = t mulmod (base on CPU time)
Then we have C/q + (N_range* (2+t))/C needs to be the smallest so work done is fastest.
Then C= sqrt((N_range*(2+t)*q)) is the best solution.
Then Array Invs should be of size ~= sqrt(n_range)*sqrt((2+t)*q) for best results.

I hope this makes more sense.