Quote:
Originally Posted by Kosmaj
Two questions/suggestions.
(1)While doing this are you using the fact that k must be a multiple of several first primes, for example, for DodecaProths k must be an odd multiple of 3*5*7 for all n, while for some n (like 58 above) it also must be a multiple of 11. Then you can increase step by including up to 3 (or 4) more primes. By doing this you can also reduce the values kept in internal variables, for example instead of k you can keep k/105 and thus increase the range of k while using the same int type.

Yes that's included in dodeca_2_0 and 2_5, but not dodeca_1_0 ( that was the reason that this was too slow! in some cases ). Now I'm using g==105 mod 210 see the code, note that I'm not using your multiple 11 trick ( I've observed this also before ), because this is really a zero speedup ( OK less than 0.1% ).
Quote:
Originally Posted by Kosmaj
(2) In the second step are you sure you are not oversieving? It is possible that sieving above a certain value removes only a small portion of candidates and is therefore more time consuming than prp test because modular division ("%" in C) requires many cpu cycles. Maybe in case of DodecaProths we can stop sieving at 10000 instead of going to 32000 (there are more than 2200 primes between 10k and 32k)?

No there is no oversieve! ( note there is also no oversieve in octo ). In fact there is a veryvery small oversieve is still exist ( say we examine one or two more numbers for every good g value ). When I've written this dodeca program I've just rewritten the original octo program so the two programs contains the same ideas.
OK perhaps we can choose smaller parameters. The reason why I've choosen 32000 that in this case I can use unsigned and signed short variables ( because 32000<2^15=32768 ).
See my quick computations:
my suggestion: use prime limit=2000, block length=8000, magic_constant=8000. Then we can get much more prp tests, but how many?
This is simple: (ln(32000)/ln(2000))^8=12.04 times more prp tests, this will be still not much ( note that we are testing very small n values ).
But what can we gain from sieve? It's also simple by Merten's theorem:
it would be: lnln(32000)/lnln(2000)=1.15 times faster. So we can get in theory an 15% faster sieve. But there is an additional speedup, because in this case 4*prime limit=block length, so there would be much less initialization in the sieve.
The overall Ram is also much smaller, probable all of them will fit in the very fast L1 cache?
Tomorrow I'll try these parameters, now I'm running octo program, all night.