View Single Post
Old 2007-05-25, 03:37   #25
geoff's Avatar
Mar 2003
New Zealand

13·89 Posts

Originally Posted by Citrix View Post
I was able to compile sr5sieve. 5040 seemed the fastest. (Btw 1440 has 36 factors, I think you meant 5040.)
Yes my mistake, 5040 has 60 divisors. The ideal setting for POWER_RESIDUE_LCM will be larger if there are fewer sequences in the sieve, or if the n-range is larger, or the BSGS routine is slower.

At the moment POWER_RESIDUE_LCM must be less than 2^15, this can be extended but needs other changes to be made to the source.

My compiled exe only reached 250kp/sec, but the exe I downloaded from your website gives me 350kp/sec. Do you know why?

I used gcc to compile using the make file for pentium 4.
I compiled the 1.5.x versions with `make ARCH=x86-intel CC="gcc -V3.4"', for some reason I don't understand yet GCC version 3.4 is a bit faster than version 4.1 when compiling the 1.5.x code (It was the other way around with the 1.4.x code).

2) t=p-1%1440. We do SPH over the t value. It would be nice if this could be set as a switch so that the POWER_RESIDUE_DIVISORS, POWER_LCM etc can be modified, so memory needs can be tested out.
Btw, why do you have to look stuff up a table, can't you factorize the p-1%1440 over small integers?
Dividing a 64-bit number on a 32-bit CPU is extremely slow, it is worthwhile using just about any trick you can think of to avoid it.

I said before that the algorithm used information about the factors of gcd(p-1,720), but in fact it just needs to know the greatest r dividing 720 such that p=1 (mod r). For example, if p=1 (mod 120) then the algorithm doesn't need to know about the factors of 120.
geoff is offline   Reply With Quote