mersenneforum.org Idea for faster sieve
 Register FAQ Search Today's Posts Mark Forums Read

2007-06-10, 22:53   #34
Citrix

Jun 2003

62616 Posts

Quote:
 Originally Posted by geoff BTW I think the existing method I am using for sieving numbers mod 2^y will fail when p > 2^(64-y). For example when sieving GFNs with k=3^16 the limit for the sieve will be 2^59 instead of 2^62.
Why? If so then we are currently around 2^50. So we cannot have N larger than 4096. This would limit the possible speed gains..

Quote:
 No, I just haven't got around to it yet :-). RieselSieve are rapidly approaching the 2^52 limit of the current x86-64 mulmods, so I am working on extending that limit.
I will wait, but since there are not a huge amount to change, if you could make the changes soon, I will appreciate it.

Thanks.

Last fiddled with by Citrix on 2007-06-11 at 02:03

2007-06-13, 23:00   #35
geoff

Mar 2003
New Zealand

13·89 Posts

Quote:
 Originally Posted by Citrix Why? If so then we are currently around 2^50. So we cannot have N larger than 4096. This would limit the possible speed gains..
It is caused by not checking a sequence of additions for overflow. It can be fixed easily enough.

2007-06-24, 02:43   #36
Citrix

Jun 2003

2×787 Posts

Geoff,

I have further modified srsieve and simplified the Sieve of Eratosthenes for the idea above. Attached is the code. Ignore all C++ comments that I wrote. It should just take a few minutes to implement my request. ( I hope this reduces work for you and you are able to implement it soon.)

So in the code X=2*what ever you want number to multiple of . SO p=X*n+1 where X must be even and n can be odd.

Basically in setup_sieve when you generate the composite table for new primes, you have to make sure that the composite table has composites that are odd and composite==1 (mod X). I do this through a simple while loop, since this is only done once.

Then in prime_sieve,
You make sure low_end_of_range is a multiple of X, this is again done via a simple while loop as only needs to be done once.

you calculate the sieve index by
uint_fast32_t sieve_index = (composite_table[i] - low_end_of_range-1)/X;

Then add prime to the sieve index.
In the end
composite_table[i] = low_end_of_range + 2*X/2*(uint64_t)sieve_index + 1;

Then to generate the candiate later you do

candidate = low_end_of_range + 2*X/2*i + 1;

Then send candidate to fun. Note since all numbers are multiples of X, SPH can be done over X and no factors etc.. need to be stored.

Questions/ Thoughts?

edit: You can avoid the division by X for each prime, by storing composites as n where composite=n*X+1. Also there are other things that could be improved in your code, I am not sure why you haven't looked into them? Perhaps improving the code will not produce huge speed benefits.
Attached Files
 primes.txt (10.6 KB, 102 views)

Last fiddled with by Citrix on 2007-06-24 at 23:21

2007-06-29, 03:48   #37
geoff

Mar 2003
New Zealand

13×89 Posts

Quote:
 Originally Posted by Citrix edit: You can avoid the division by X for each prime, by storing composites as n where composite=n*X+1. Also there are other things that could be improved in your code, I am not sure why you haven't looked into them? Perhaps improving the code will not produce huge speed benefits.
Thanks for the code, I will have a look at it. You are right that in large projects like SoB/Riesel/SR5 the amount of time spent in the Sieve of Eratosthenes code is quite small, so work on it hasn't been a priority. I think the prime_sieve() speed would have to double just to get a 2-3% increase in the total throughput for these projects. From experimental results the sieve in NewPGen is many times faster than the one in sr2sieve.

 2007-07-02, 02:18 #38 geoff     Mar 2003 New Zealand 13×89 Posts Citrix, I have implemented your changes to prime_sieve() in version 1.5.11x. It seems to produce correct results, but it also seems a lot slower than 1.5.5x. Last fiddled with by geoff on 2007-07-02 at 02:20
2007-07-02, 04:01   #39
Citrix

Jun 2003

2×787 Posts

Quote:
 Originally Posted by geoff Citrix, I have implemented your changes to prime_sieve() in version 1.5.11x. It seems to produce correct results, but it also seems a lot slower than 1.5.5x.
What x value did you try? It will probably be slower for large X values. If the X value was small enough, then it is possibly the division step that is slowing it down. Does the new version do SPH over the factorization of X?
(I will try to play with the code and figure out what is slowing it down, thanks for implementing though)

2007-07-02, 04:20   #40
geoff

Mar 2003
New Zealand

13×89 Posts

Quote:
 Originally Posted by Citrix What x value did you try? It will probably be slower for large X values.
I tried x=720, but I didn't do much testing. I think the slow loop is in setup_sieve(), where I made a note about testing for overflow. It is hard to work out how many iterations this loop actually does.

2007-07-02, 04:27   #41
Citrix

Jun 2003

62616 Posts

Quote:
 Originally Posted by geoff I tried x=720, but I didn't do much testing. I think the slow loop is in setup_sieve(), where I made a note about testing for overflow. It is hard to work out how many iterations this loop actually does.
It will never be more than 720. On average about 360 I assume. I have been running your program for 5-10 min hasn't printed the speed yet. Do you know why?
What if x=2. Do we get the same speed?

edit: Since x will always have small factors, we can always reduce the number of steps in this to the sum of the factors. Is there any checking to see what is the max X , a user can use? Is SPH done over the factorization of X, in the new version?

Last fiddled with by Citrix on 2007-07-02 at 04:48

2007-07-02, 04:54   #42
geoff

Mar 2003
New Zealand

13·89 Posts

Quote:
 Originally Posted by Citrix It will never be more than 720. On average about 360 I assume. I have been running your program for 5-10 min hasn't printed the speed yet. Do you know why?
It only checks whether to make a progress report after each block of 65536 candidates. If is it spending a lot of time sieving for primes it may not have started testing any candidates yet.

Quote:
 What if x=2. Do we get the same speed.
No it should be a bit slower because of the use of % instead of & to test for p=1 (mod 2), and the use of * instead of << for multiplication.

2007-07-02, 05:07   #43
Citrix

Jun 2003

62616 Posts

Quote:
 Originally Posted by geoff No it should be a bit slower because of the use of % instead of & to test for p=1 (mod 2), and the use of * instead of << for multiplication.
All of which are not needed in reality. (May be we should rewrite the sieve code and get the extra 2-3%)

2007-07-04, 01:21   #44
Citrix

Jun 2003

2·787 Posts

Quote:
 Originally Posted by geoff I tried x=720, but I didn't do much testing. I think the slow loop is in setup_sieve(), where I made a note about testing for overflow. It is hard to work out how many iterations this loop actually does.
We can avoid this step by using modular inverses. I am sure you know about modular inverses, in case you don't here is a link http://www.hars.us/Papers/ModInv.pdf

so we need to find c+n*x=0 (mod y)
c+n*x=m*y
Assuming y <x and x=g (mod y)
we get

c+n*g=h*y
now g<y and y=k (mod g)

so c+f*g=k*h

and so on until one of the letter is 0. (Basically this can be set up as a recursive loop)

Let me know if you have any questions. Is there any limit on the value of -x X?

 Similar Threads Thread Thread Starter Forum Replies Last Post jasong jasong 8 2017-04-07 00:23 Dubslow Forum Feedback 7 2016-12-02 14:26 binu Factoring 3 2013-04-13 16:32 science_man_88 Homework Help 0 2010-04-23 16:33 ThomRuley Lone Mersenne Hunters 5 2003-07-15 05:51

All times are UTC. The time now is 07:29.

Wed Oct 21 07:29:05 UTC 2020 up 41 days, 4:40, 0 users, load averages: 1.59, 1.65, 1.49