Quote:
Originally Posted by Citrix
I was able to compile sr5sieve. 5040 seemed the fastest. (Btw 1440 has 36 factors, I think you meant 5040.)

Yes my mistake, 5040 has 60 divisors. The ideal setting for POWER_RESIDUE_LCM will be larger if there are fewer sequences in the sieve, or if the nrange is larger, or the BSGS routine is slower.
At the moment POWER_RESIDUE_LCM must be less than 2^15, this can be extended but needs other changes to be made to the source.
Quote:
My compiled exe only reached 250kp/sec, but the exe I downloaded from your website gives me 350kp/sec. Do you know why?
I used gcc to compile using the make file for pentium 4.

I compiled the 1.5.x versions with `make ARCH=x86intel CC="gcc V3.4"', for some reason I don't understand yet GCC version 3.4 is a bit faster than version 4.1 when compiling the 1.5.x code (It was the other way around with the 1.4.x code).
Quote:
2) t=p1%1440. We do SPH over the t value. It would be nice if this could be set as a switch so that the POWER_RESIDUE_DIVISORS, POWER_LCM etc can be modified, so memory needs can be tested out.
Btw, why do you have to look stuff up a table, can't you factorize the p1%1440 over small integers?

Dividing a 64bit number on a 32bit CPU is extremely slow, it is worthwhile using just about any trick you can think of to avoid it.
I said before that the algorithm used information about the factors of gcd(p1,720), but in fact it just needs to know the greatest r dividing 720 such that p=1 (mod r). For example, if p=1 (mod 120) then the algorithm doesn't need to know about the factors of 120.