mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Twin Prime Search

Reply
 
Thread Tools
Old 2009-04-13, 23:49   #1
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13·89 Posts
Default Sieve algorithm

I am trying to write a sieve to help with the TPS project at PrimeGrid. This is the algorithm I am working with:

Code:
Algorithm for finding factors p dividing either one of k*2^n+/-1,
where n is fixed, k is odd, k0 <= k < k1, and k1-k0 < p0 <= p < p1:

For each probable prime p in p0 <= p < p1
{
  k <-- 2^{-n} mod p

  If k is even
    k <-- p-k

  If k0 <= k < k1
    If k = 3 (mod 6)
      Report that p divides k*2^n+/-1
}
I don't know anything about the mathematical properties of twin primes, is there a more efficient algorithm?

I will post source and some binaries here for testing tpsieve, although it is not yet fast enough to be much use.
geoff is offline   Reply With Quote
Old 2009-04-16, 21:55   #2
Joshua2
 
Joshua2's Avatar
 
Sep 2004

13·41 Posts
Default

You should consider posting in the math thread. What does <-- mean? Is it store right side in variable on left?
Joshua2 is offline   Reply With Quote
Old 2009-04-22, 02:09   #3
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13·89 Posts
Default

Quote:
Originally Posted by Joshua2 View Post
You should consider posting in the math thread. What does <-- mean? Is it store right side in variable on left?
Yes, `x <-- y' means assign y to x.

A correction: The algorithnm above only works when p0 > k1.

Last fiddled with by geoff on 2009-04-22 at 02:09
geoff is offline   Reply With Quote
Old 2009-08-31, 02:23   #4
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13×89 Posts
Default

Here is an outline of the improved algorithm used in tpsieve 0.3.0:

Algorithm for finding factors p of k*2^n+/-1, where k is odd, k0 <= k <= k1, n0 <= n <= n1, k1 < p0 <= p <= p1:
Code:
Let m be the greatest integer satisfying k1*2^{m-1} < p0

For each probable prime p in p0 <= p <= p1
{
  n <-- max(n0,n1-m+1)
  x <-- 2^{-n} mod p

  While n >= n0
  {
    Assign k,d such that k is odd and k*2^d = x
    If k0 <= k <= k1 AND k = 3 (mod 6) AND d < m
      Report that p divides k*2^{n+d}-1

    Assign k,d such that k is odd and k*2^d = p-x
    If k0 <= k <= k1 AND k = 3 (mod 6) AND d < m
      Report that p divides k*2^{n+d}+1

    n <-- n-m
    x <-- x*2^m mod p
  }
}
geoff is offline   Reply With Quote
Old 2009-09-03, 22:39   #5
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

2·197 Posts
Default

Since Geoff was nice enough to explain his new algorithm, I decided to explain my SSE2 optimization of it.

First I should explain the SSE2 completion of the floating point multiplication. SSE2 can't do a full 64-bit multiply, but it can do a 32-bit*32-bit=64-bit multiply. So the first part is just two regular school-arithmetic multiplies, where each "digit" is 32-bits long, except the result is truncated to 64 bits.
Code:
  AB
x CD
----
  BD
 AD
+BC
----
  YZ
I just forget about A*C since its entire result would be above the 64-bit range of the result. When squaring, it's a little easier:
Code:
   AB
x  AB
-----
   BB
+2AB
-----
   YZ
After that, I write the result and then check whether the result is >= P by subtraction. If K >= P, K-P >= 0; hence if the high bit is not set after the subtraction, I do K -= P. SSE2 doesn't do if's, but it lets you extract the high bits of a register (which contains 2 64-bit numbers in this case) and they can be checked with if's outside of SSE2. In cases where a result is rare, like here, it's faster to do that.

Now obviously the new algorithm, with two if's per check, poses a bit of a challenge in SSE2. My first idea was that in one of those cases, K or K-P, the number is odd and "d" is 0. The obvious use for this is to precompute K-P and let 32-bit code sort out which is which. This is faster, but not by much. A better approach involves separating the odds from the evens, saving out the evens to be processed with 32-bit code, and processing the odds in SSE2. Without if's, this is a challenge! My solution comes from an old in-place swapping trick:

Given A = A0, B = B0, and "^" is the Exclusive OR instruction:
1. A ^= B (so now A = A0^B0)
2. B ^= A (so now B = B0^(A0^B0). The B0's cancel, leaving A0.)
3. A ^= B (so now A = (A0^B0)^A0. The A0's cancel, leaving B0.)

So, I get K and P, compute K-P, K^(K-P), and (K&1)-1. This last effectively turns the low bit of K into a register with either all 1's or all 0's. I then AND this with K^(K-P) so that it's either K^(K-P) if K was even, or 0 if K was odd. I then XOR that value (K^(K-P)|0) with both K and K-P, so that the K register holds both the odd values and the other register holds both the even values. I then save off the even values to the T(emp) array, to be worked on later with 32-bit code, and check whether the odd values are <= kmax, by the same kind of logic I use at the end of the multiply code.

I would have liked to find d (the number of low 0's, from Geoff's explanation above) and shifted it out in SSE2, but SSE2 has neither a lower zero count instruction nor a shift that can shift different distances in different parts of the register. However, if I take the inequality:

k >> d <= kmax

and rearrange it:

k / 2^d <= kmax
k <= kmax * 2^d

Well, there is a method for finding 2^d without conditionals (x AND -x). And there is an SSE2 multiply instruction. However, the multiply only does 32-bit*32-bit -> 64-bit. So for this method to work, both kmax and what it multiplies by have to be < 2^32. So this sets a limit on kmax; but if d >= 32 (which it can be), 2^d in 32-bit would be 0. My solution is one more rearrangement:

k <= kmax * (2^d-1) + kmax

That's heavy on the right-side calculation, so I switched one thing over to the left, so it can be computed in parallel:

k - kmax <= kmax * (2^d-1)

Now look at the edge cases. For large d, 2^d-1 = 2^32-1. k <= kmax * 2^32 is true if P (which k is less than) is < kmax*2^32. For d=0 on the other hand, which happens half the time when K is odd, we have:

k - kmax <= kmax * 0
k - kmax <= 0
k <= kmax

Exactly what you'd expect. So this method works, uses entirely SSE2 code except for "exceptional" cases, and is 80% as fast as 64-bit, but only when P < kmax*2^32, and when kmax < 2^32. Otherwise I fall back to the above algorithm, which is "only" 70% as fast as 64-bit.
Ken_g6 is offline   Reply With Quote
Old 2010-11-07, 17:02   #6
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

2·197 Posts
Default

Here is an outline of the improved algorithm used in tpsieve-CUDA 0.2.1 and up:

Algorithm for finding factors p of k*2^n+/-1, where k is odd, k0 <= k <= k1, n0 <= n <= n1, k1 < p0 <= p <= p1:
Code:
Let m be the greatest integer satisfying k1*2^{m-1} < p0
Subtract one from m.

For each probable prime p in p0 <= p <= p1
{
  n <-- n0
  x <-- 2^{-n} mod p

  Assign k =x
  If k is even, let k=p-k

  If k0 <= k <= k1 AND k = 3 (mod 6)
    Report that p divides k*2^n-1 if y=x, or k*2^n+1 if y != x.

  While n < n1
  {
    Assign y =x
    If y is odd, let y=p-y

    Assign k,d such that k is odd and k*2^d = y
    If k0 <= k <= k1 AND k = 3 (mod 6) AND d <= m
      Report that p divides k*2^{n+d}-1 if y=x, or k*2^{n+d}+1 if y != x.

    n <-- n+m
    x <-- x*2^-m mod p
  }
}
By decreasing m by one, any cases that otherwise would have been odd become even in the last iteration. The exception is the first iteration, hence the special case.

Note that x*2^-m mod p is accomplished quite easily with Montgomery math. I limit m to 32 in most cases, for hardware-specific performance reasons only.
Ken_g6 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Advantage of lattice sieve over line sieve binu Factoring 3 2013-04-13 16:32
new factorization algorithm jasonp Math 2 2012-06-17 20:04
TF algorithm(s) davieddy Miscellaneous Math 61 2011-07-06 20:13
Using p=2 for sieving (Quadratic sieve algorithm) R1zZ1 Factoring 36 2007-11-02 15:59
Prime95's FFT Algorithm Angular Math 6 2002-09-26 00:13

All times are UTC. The time now is 18:07.

Sat Sep 26 18:07:44 UTC 2020 up 16 days, 15:18, 1 user, load averages: 1.94, 2.24, 2.65

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.