mersenneforum.org Idea for faster sieve
 Register FAQ Search Today's Posts Mark Forums Read

2007-07-08, 00:25   #45
Citrix

Jun 2003

32×52×7 Posts

Geoff, I found a bug in primes.c. I don't think, it affects the main sr2sieve version, since we do not do any bound checking there. (I think it occurs if a person is sieving beyond 2^48, range_size^2). This is probably what was slowing things down.

Attached is primes.c, look at the comments.
(You can hold off on implementing the inverse method, since we can use the factoring of X, to speed things up. Look at my code in the comments)

Attached Files
 primes.txt (10.1 KB, 93 views)

Last fiddled with by Citrix on 2007-07-08 at 00:27

 2007-07-08, 00:49 #46 Citrix     Jun 2003 32·52·7 Posts ALso when you program does Are we missing a possible prime by doing composite_table[i] = last_composite + max_prime; Shouldn't we look to see if last_composite is already greater than low_end_of_range or not? Code: for (i = save_used_in_range; i < primes_used_in_range; i++) { max_prime = prime_table[i]; last_composite = (low_end_of_range / max_prime) * max_prime; composite_table[i] = last_composite + max_prime; /* We only care about odd composites since the sieve range only refers to odd values. */ if (!(composite_table[i] & 1)) composite_table[i] += max_prime; }
 2007-07-08, 01:07 #47 Citrix     Jun 2003 32·52·7 Posts Another thing slowing up the sieve is that if gcd(x,max_prime)>1, then you would do 100's of additions just to for the factor, since it will never have a possible solution. Extra additions for each factor=range_size*x/max_prime. If max_prime=3 and x=720. Then 240*2^23=2013265920 extra additions per factor. So we should avoid these additions. Last fiddled with by Citrix on 2007-07-09 at 00:51
2007-07-10, 00:36   #48
geoff

Mar 2003
New Zealand

13·89 Posts

Quote:
 Originally Posted by Citrix Shouldn't we look to see if last_composite is already greater than low_end_of_range or not? Code:  last_composite = (low_end_of_range / max_prime) * max_prime;
That ensures last_composite can't be greater than low_end_of_range.

I think the original sieve works fine when x is a power of 2, say 2^y, because last_composite is odd, and so the inverse of x mod last_composite is always found after at most y additions, and so overflow doesn't have to be checked as long as you don't expect to sieve beyond 2^(64-y).

However I think the prime sieve is not the only problem you face. Sr2sieve doesn't do SPH, it doesn't use the Chinese Remainder Theorem, it just knocks out some congruence classes and sends what remains to BSGS. The fastest part of sr2sieve is the BSGS, most of the rest of the code is not particularly flash. But by sieving with smooth p-1 you end up minimising the time in BSGS, and maximising the time spent in the slow code.

So this is the problem: If you restrict sieving to very smooth p-1 then you get bogged down in the slow code. But if you allow less smooth p-1 then the gains over normal sieving are not going to be great enough to justify the effort.

The solution might seem to be to speed up the slow code, but that means a lot of work in all sorts of different areas that is only going to have a small benefit to the normal sieve.

2007-07-10, 07:08   #49
Citrix

Jun 2003

30478 Posts

Quote:
 Originally Posted by geoff That ensures last_composite can't be greater than low_end_of_range. I think the original sieve works fine when x is a power of 2, say 2^y, because last_composite is odd, and so the inverse of x mod last_composite is always found after at most y additions, and so overflow doesn't have to be checked as long as you don't expect to sieve beyond 2^(64-y). However I think the prime sieve is not the only problem you face. Sr2sieve doesn't do SPH, it doesn't use the Chinese Remainder Theorem, it just knocks out some congruence classes and sends what remains to BSGS. The fastest part of sr2sieve is the BSGS, most of the rest of the code is not particularly flash. But by sieving with smooth p-1 you end up minimising the time in BSGS, and maximising the time spent in the slow code. So this is the problem: If you restrict sieving to very smooth p-1 then you get bogged down in the slow code. But if you allow less smooth p-1 then the gains over normal sieving are not going to be great enough to justify the effort. The solution might seem to be to speed up the slow code, but that means a lot of work in all sorts of different areas that is only going to have a small benefit to the normal sieve.
I agree with you. There is no point to speed up the slow code for now. The main goal of this exercise is to look at very smooth P-1 factors and thus reduce the time per factor. This will help save some PRP (First pass+ double check) tests. This might become an alternative to the P_1/P+1/ECM method. Also, as I tested in the sr5sieve forum, using smooth p-1's does actually reduce the time per factor. (without improving the slow code)

 2007-08-01, 03:11 #50 Citrix     Jun 2003 62716 Posts Geoff, I have looked into couple of things regarding making srsieve faster. 1) I have found a way to implement pollard rho with only ~2*sqrt(x) multiplications for each prime. There are 2 problems though a) We will not able to combine the steps so each k will be done separately. b) There is no way to limit the sieve to only values under 50M, unless you use SPH. So basically if time saved from writing things in memory and reading them later is more than doing all k's separately then this method will be useful, otherwise there is no point implementing it. It might help on machines that have low memory, but the BSGS loop takes little memory from what I understand and the initial sieve takes the most memory. The other place where this could be useful is in sr1sieve, but we will have to use SPH. 2) The only other way we can speed up things is using SPH with BSGS. If I write the code for Chinese Remainder Theorem and some of the other things that the program will need, do you think we can make the program faster? Last fiddled with by Citrix on 2007-08-01 at 03:45
2007-08-03, 03:05   #51
geoff

Mar 2003
New Zealand

13·89 Posts

Quote:
 Originally Posted by Citrix 2) The only other way we can speed up things is using SPH with BSGS. If I write the code for Chinese Remainder Theorem and some of the other things that the program will need, do you think we can make the program faster?
I think you have a much better understanding of the algorithms than I do, wouldn't it would make more sense for you to implement the main algorithm? I could then help speed up some of the lower level routines if necessary. If I were to try to implement SPH I would probably start the programming from scratch anyway, the srsieve code has become a bit complicated.

Probably the quickest way to get a boost in sieve productivity in the short term would be to figure out how to compile srsieve or jjsieve for 64-bit Windows machines.

2007-08-03, 07:23   #52
Citrix

Jun 2003

30478 Posts

Quote:
 Originally Posted by geoff I think you have a much better understanding of the algorithms than I do, wouldn't it would make more sense for you to implement the main algorithm? I could then help speed up some of the lower level routines if necessary. If I were to try to implement SPH I would probably start the programming from scratch anyway, the srsieve code has become a bit complicated.
Sounds like a good idea to me. I basically would need to borrow the mulmod code from srsieve, the rest would be better to write from scratch.

Maybe you can create an .obj or .lib or .dll file that I can use with my program that has pre-compiled functions for modular multiplications. Basically you will be writing a fast 64 bit library. I think several projects can benefit from this.
(This way I won't have to deal with any low level stuff)

edit
On second thought, there is no point re-writing the code your program uses. I will just re-write the initial Sieve of Eratosthenes and the corresponding SPH/ CRT code needed. I think we can keep the rest of the code the same, there is no need to write the rest from scratch.

Last fiddled with by Citrix on 2007-08-04 at 20:49

 Similar Threads Thread Thread Starter Forum Replies Last Post jasong jasong 8 2017-04-07 00:23 Dubslow Forum Feedback 7 2016-12-02 14:26 binu Factoring 3 2013-04-13 16:32 science_man_88 Homework Help 0 2010-04-23 16:33 ThomRuley Lone Mersenne Hunters 5 2003-07-15 05:51

All times are UTC. The time now is 04:48.

Thu Oct 29 04:48:35 UTC 2020 up 49 days, 1:59, 1 user, load averages: 1.25, 1.53, 1.54