Are you factoring F1 completely before testing candidates with powermod? Alex Kruppa & I used the following to try and balance time spent factoring against time spent on powermod.
[code:1]unsigned int ismersfac(unsigned int f) {
unsigned int fm1 = (f1) >> 1;
unsigned int p, d;
int earlyout = 0;
if (f == 3) return 2;
if (f == 7) return 3;
if (f == 31) return 5;
while ((fm1 & 1) == 0) { fm1 >>= 1; earlyout = 1; }
while (fm1 % 3 == 0) { fm1 /= 3; earlyout = 1; }
while (fm1 % 5 == 0) { fm1 /= 5; earlyout = 1; }
if (earlyout && pow2mod(fm1, f) != 1) return 0;
for (p = 7, d = 1; p*p <= fm1; p += mod30diff[d], d = (d+1)&7) {
if (fm1 % p == 0) {
if (pow2mod(p, f) == 1) return p;
while (fm1 % p == 0) fm1 /= p;
if (pow2mod(fm1, f) != 1) return 0;
}
}
if (pow2mod(fm1, f) == 1)
return fm1;
return 0;
}
[/code:1]
The 2,3,5 earlyout sieving could be continued until p > log2(fm1), since F can't be a factor of a smaller primeexponent mersenne.
My demo program took less than 10 seconds to generate the 3320+ primes +1 mod 8 in a 172800 wide block at 515396390401. Which is pretty good for javascript.
Of course, that is at 2982618 * 172800 using 32 bit vars, and you're at 27328509738828965 * 172800, so it *may* ;) slow down a bit more. But since the difference between Atkins and regular sieving increases with F, I don't think you can check the primality of 23000 F values quicker.
If you don't want to port my code into yours, I would strongly suggest you use something similar to my function filter() to sieve a large block of F values at once. (I filter out q = k*p^2, you want q = k*p.)
Bruce
