![]() |
I have a faster version of the mulmod for PPC that does not save/restore registers. It should be about 10%-15% faster. I'll investigate doing the same for expmod, but the gain would be much smaller.
|
sr5sieve 1.4.17
This version has a SSE2 powmod function, it runs about 7% faster on my P4/Celeron.
[QUOTE=tnerual]i use one core, sieving for one k (285728), speed was 2600000 and now is 4650000 :geek: it's a 78% improve ... and factor density is the same after 1 day running so it may be true. [/QUOTE] This speed increase is correct, but it is not just because of SSE2. I made a change to the sieve so that it would take a quick exit if all subsequences were eliminated during the power-residue checking step. For the full base 5 sieve there are over 6000 subsequences and the chances of them all being eliminated is tiny, not even worth checking for, but for projects with just a few sequences in the sieve it can save a lot of unnecessary work (it gives a good speedup to SR base 4). [QUOTE=rogue] I have a faster version of the mulmod for PPC that does not save/restore registers. It should be about 10%-15% faster. I'll investigate doing the same for expmod, but the gain would be much smaller[/QUOTE] Nice one, this will be in the next version. Maybe the inline version won't be needed anymore, it was probably faster only because it didn't always have to save registers. |
sr5sieve 1.4.18
This one has rogue's improved ppc64 mulmod function, and some small improvements to the SSE2 powmod function.
For those of you working on other bases, it is probably much faster to use sr5sieve compiled for the base in question instead of using srsieve. This is because there have been many improvements in sr5sieve that I haven't got around to adding to srsieve. You will still need to use srsieve for factors smaller than the largest k value, but once that is done use the srsieve -aC options to write the output as an ABCD file and create a checkpoint file, rename the ABCD file it to sr<base>data.txt, and add a range starting at zero to sr<base>work.txt. sr<base>sieve should then continue from the checkpoint file. I can compile a Windows or Linux binary for any base you need, or if you have GCC and make then you can do it yourself by changing the definition of BASE in sr5sieve.h, setting the ARCH variable in Makefile, running make, and renaming the resulting sr5sieve executable. |
srsieve 0.6.4 bug fix
srsieve 0.6.2 fixes a longstanding bug that could cause the list of sequences to be corrupted if two or more were removed from the sieve at the same time. In at least one case this resulted in a sequence being incorrectly removed. Thanks rogue for finding this bug.
This version also includes the improved ppc64 and x86-64 mulmod code. |
sr1sieve 1.0.4
[url]http://www.geocities.com/g_w_reynolds/sr1sieve/[/url]
This sieve is based on sr5sieve. It can only sieve one sequence k*b^n+/-1 at a time, but should be somewhat faster than srsieve or sr5sieve in this case. It might be useful for some of you who sieve one sequence separate from the distributed sieving effort. It works with NewPGen format files: sr5sieve -g 331882 200000 500000 will create a NewPGen format file called 331882.txt containing the sequence 331882*5^n-1 for 200000 < n < 500000. Then: sr1sieve -i 331882.txt -o 331882.new -f factors.txt -p 10e12 -P 11e12 will sieve this one sequence from 10 to 11 trillion, recording the new factors in factors.txt, and saving the updated sieve file to 331882.new. |
Did you get my PM? Did you understand the method I described? Any progress?
:question: :question: :question: |
[QUOTE=Citrix;97159]Did you get my PM? Did you understand the method I described? Any progress?[/QUOTE]
Thanks for the description of the SPH algorithm, I can follow the basic idea, but I don't fully understand the mathematics behind it. Implementing it seems like a lot of work, it is much more complicated than the existing algorithm. I am going back to school this year, but will be writing essays rather than thinking about maths, so I doubt I will be able to start making progress until next (southern) summer unless someone else can help with writing code. |
[QUOTE=geoff;97164]Thanks for the description of the SPH algorithm, I can follow the basic idea, but I don't fully understand the mathematics behind it.
Implementing it seems like a lot of work, it is much more complicated than the existing algorithm. I am going back to school this year, but will be writing essays rather than thinking about maths, so I doubt I will be able to start making progress until next (southern) summer unless someone else can help with writing code.[/QUOTE] I would read this. It is a short read [url]http://math.scu.edu/~eschaefe/crylec.pdf[/url] If you still can't get it, I would wait for school to get over and then buy and read a good number theory book. Alternatively, I can write the code (time permitting), but you will first have to help me convert your program to Visual C++, so I can compile your program. I am not so sure about the Lib structure of gcc. Thanks! |
[QUOTE=masser]Currently, srsieve breaks up our sequences of interest into many subsequences. With one of the commandline options on srfile, the exponent sequence is displayed. Would it be possible to add a commandline option that would print the exponent forms for the subsequences AND counts for that sequence?
What I'm thinking is that if there are some extremely low-weight subsequences it might be possible to get rid of those subsequences (by testing or P-1 factoring) and thus speed up the overall project sieve. [/QUOTE] That is a good idea, I'll try to include it in the next version. [QUOTE=Citrix;97167]Alternatively, I can write the code (time permitting), but you will first have to help me convert your program to Visual C++, so I can compile your program. I am not so sure about the Lib structure of gcc. [/QUOTE] The source itself should compile with any C99 compiler that supports anonymous unions. AFAIK MS compilers support these. Nothing GCC-specific is used unless the compiler defines __GNUC__. The assembler functions are GCC specific, but generic C replacements are provided. However some of the library functions used are not in the standard C library, but are available on GNUish systems. getopt_long() is one I can think of. I don't think it will be too hard to find replacements for these, I'll have a look. Here is a description of the algorithm used by sr5sieve. In version 1.4.18, R={2,3,4,5,8} and so Q=120: [code] [Init] Let R={2,...} be a set of small prime powers Let Q=lcm(R) Input a list S of sequences k*b^n+c, where b is fixed and c is in {-1,+1} For each s = k*b^n+c in S Let T[s] be the list of subsequences (k*b^d)*(b^Q)^m+c, where 0 <= d < Q Let N0,N1 be bounds N0 <= n <= N1 on all k*b^n+c in S Let K be an upper bound k <= K on all k*b^n+c in S Input P0,P1 satisfying K < P0 < P1 [Main loop] For each prime p in P0 < p < P1 Let Z be an empty list of candidate subsequences Let r be the greatest divisor of Q satisfying p = 1 (mod r) Let X = (1/b)^((p-1)/r) (mod p) For each sequence s = k*b^n+c in S If -c*k or -c*k*b is a quadratic residue with respect to p If (r < 3) Append T[s] to Z Else [Check whether -c*k*b^d is an r-th power residue with respect to p] Let Y = (-c*k)^((p-1)/r) (mod p) For each subsequence t = (k*b^d)*(b^Q)^m+c in T[s] If X=Y^d (mod p) Append t to Z If (|Z| > 0) [Baby-steps giant-steps] Let M0 = floor(N0/Q) Let M1 = floor(N1/Q) Let G = sqrt((M1-M0)/|Z|) Let B = ceil((M1-M0)/G) Let H be an empty hashtable [Baby steps] For each j in 0 <= j < B Insert (j, (b^Q)^(j+M0) (mod p)) into H For each subsequence z = (k*b^d)*(b^Q)^m+c in Z Let A[z] = -c/(k*b^d) (mod p) If (j, A[z]) is in H Report that p is a factor of k*b^(Q*(M0+j)+d)+c [Giant steps] For each i in 0 < i < G For each subsequence z = (k*b^d)*(b^Q)^m+c in Z A[z] <-- A[z]*(1/b)^(Q*B) (mod p) If (j, A[z]) is in H Report that p is a factor of k*b^(Q*(M0+iB+j)+d)+c [/code] Note that as is the algorithm is not very efficient for sequences k*b^n+c which have both odd and even n terms. It can be modified to handle that case (which is one of the changes made in sr1sieve) by changing: [code] If (r < 3) Append T[s] to Z [/code] to: [code] If (r < 3) If -c*k is a quadratic residue with respect to p For each t = (k*b^d)*(b^Q)^m+c in T[s] such that d is even Append t to Z If -c*k*b is a quadratic residue with respect to p For each t = (k*b^d)*(b^Q)^m+c in T[s] such that d is odd Append t to Z [/code] |
Geoff when I try to compile your program using VC++, it says some of the
# include <...> files not found. I do not know what to do for them. I can post the errror messages I get, if you want. May be I am not using the correct source, could you post a link to the correct source. Thanks |
[QUOTE=Citrix;97291]Geoff when I try to compile your program using VC++, it says some of the
# include <...> files not found. I do not know what to do for them. [/QUOTE] Try compiling sr5sieve 1.4.18 to start with, source in [url]http://www.geocities.com/g_w_reynolds/sr5sieve[/url]. I expect <getopt.h> is missing. Since command line options are optional for sr5sieve, to get it to compile you can just delete the whole while() loop at the top of main(). PM me with any other errors. |
| All times are UTC. The time now is 22:20. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.