20090413, 23:49  #1 
Mar 2003
New Zealand
13·89 Posts 
Sieve algorithm
I am trying to write a sieve to help with the TPS project at PrimeGrid. This is the algorithm I am working with:
Code:
Algorithm for finding factors p dividing either one of k*2^n+/1, where n is fixed, k is odd, k0 <= k < k1, and k1k0 < p0 <= p < p1: For each probable prime p in p0 <= p < p1 { k < 2^{n} mod p If k is even k < pk If k0 <= k < k1 If k = 3 (mod 6) Report that p divides k*2^n+/1 } I will post source and some binaries here for testing tpsieve, although it is not yet fast enough to be much use. 
20090416, 21:55  #2 
Sep 2004
1000010101_{2} Posts 
You should consider posting in the math thread. What does < mean? Is it store right side in variable on left?

20090422, 02:09  #3  
Mar 2003
New Zealand
13×89 Posts 
Quote:
A correction: The algorithnm above only works when p0 > k1. Last fiddled with by geoff on 20090422 at 02:09 

20090831, 02:23  #4 
Mar 2003
New Zealand
13·89 Posts 
Here is an outline of the improved algorithm used in tpsieve 0.3.0:
Algorithm for finding factors p of k*2^n+/1, where k is odd, k0 <= k <= k1, n0 <= n <= n1, k1 < p0 <= p <= p1: Code:
Let m be the greatest integer satisfying k1*2^{m1} < p0 For each probable prime p in p0 <= p <= p1 { n < max(n0,n1m+1) x < 2^{n} mod p While n >= n0 { Assign k,d such that k is odd and k*2^d = x If k0 <= k <= k1 AND k = 3 (mod 6) AND d < m Report that p divides k*2^{n+d}1 Assign k,d such that k is odd and k*2^d = px If k0 <= k <= k1 AND k = 3 (mod 6) AND d < m Report that p divides k*2^{n+d}+1 n < nm x < x*2^m mod p } } 
20090903, 22:39  #5 
Jan 2005
Caught in a sieve
613_{8} Posts 
Since Geoff was nice enough to explain his new algorithm, I decided to explain my SSE2 optimization of it.
First I should explain the SSE2 completion of the floating point multiplication. SSE2 can't do a full 64bit multiply, but it can do a 32bit*32bit=64bit multiply. So the first part is just two regular schoolarithmetic multiplies, where each "digit" is 32bits long, except the result is truncated to 64 bits. Code:
AB x CD  BD AD +BC  YZ Code:
AB x AB  BB +2AB  YZ Now obviously the new algorithm, with two if's per check, poses a bit of a challenge in SSE2. My first idea was that in one of those cases, K or KP, the number is odd and "d" is 0. The obvious use for this is to precompute KP and let 32bit code sort out which is which. This is faster, but not by much. A better approach involves separating the odds from the evens, saving out the evens to be processed with 32bit code, and processing the odds in SSE2. Without if's, this is a challenge! My solution comes from an old inplace swapping trick: Given A = A0, B = B0, and "^" is the Exclusive OR instruction: 1. A ^= B (so now A = A0^B0) 2. B ^= A (so now B = B0^(A0^B0). The B0's cancel, leaving A0.) 3. A ^= B (so now A = (A0^B0)^A0. The A0's cancel, leaving B0.) So, I get K and P, compute KP, K^(KP), and (K&1)1. This last effectively turns the low bit of K into a register with either all 1's or all 0's. I then AND this with K^(KP) so that it's either K^(KP) if K was even, or 0 if K was odd. I then XOR that value (K^(KP)0) with both K and KP, so that the K register holds both the odd values and the other register holds both the even values. I then save off the even values to the T(emp) array, to be worked on later with 32bit code, and check whether the odd values are <= kmax, by the same kind of logic I use at the end of the multiply code. I would have liked to find d (the number of low 0's, from Geoff's explanation above) and shifted it out in SSE2, but SSE2 has neither a lower zero count instruction nor a shift that can shift different distances in different parts of the register. However, if I take the inequality: k >> d <= kmax and rearrange it: k / 2^d <= kmax k <= kmax * 2^d Well, there is a method for finding 2^d without conditionals (x AND x). And there is an SSE2 multiply instruction. However, the multiply only does 32bit*32bit > 64bit. So for this method to work, both kmax and what it multiplies by have to be < 2^32. So this sets a limit on kmax; but if d >= 32 (which it can be), 2^d in 32bit would be 0. My solution is one more rearrangement: k <= kmax * (2^d1) + kmax That's heavy on the rightside calculation, so I switched one thing over to the left, so it can be computed in parallel: k  kmax <= kmax * (2^d1) Now look at the edge cases. For large d, 2^d1 = 2^321. k <= kmax * 2^32 is true if P (which k is less than) is < kmax*2^32. For d=0 on the other hand, which happens half the time when K is odd, we have: k  kmax <= kmax * 0 k  kmax <= 0 k <= kmax Exactly what you'd expect. So this method works, uses entirely SSE2 code except for "exceptional" cases, and is 80% as fast as 64bit, but only when P < kmax*2^32, and when kmax < 2^32. Otherwise I fall back to the above algorithm, which is "only" 70% as fast as 64bit. 
20101107, 17:02  #6 
Jan 2005
Caught in a sieve
5·79 Posts 
Here is an outline of the improved algorithm used in tpsieveCUDA 0.2.1 and up:
Algorithm for finding factors p of k*2^n+/1, where k is odd, k0 <= k <= k1, n0 <= n <= n1, k1 < p0 <= p <= p1: Code:
Let m be the greatest integer satisfying k1*2^{m1} < p0 Subtract one from m. For each probable prime p in p0 <= p <= p1 { n < n0 x < 2^{n} mod p Assign k =x If k is even, let k=pk If k0 <= k <= k1 AND k = 3 (mod 6) Report that p divides k*2^n1 if y=x, or k*2^n+1 if y != x. While n < n1 { Assign y =x If y is odd, let y=py Assign k,d such that k is odd and k*2^d = y If k0 <= k <= k1 AND k = 3 (mod 6) AND d <= m Report that p divides k*2^{n+d}1 if y=x, or k*2^{n+d}+1 if y != x. n < n+m x < x*2^m mod p } } Note that x*2^m mod p is accomplished quite easily with Montgomery math. I limit m to 32 in most cases, for hardwarespecific performance reasons only. 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Advantage of lattice sieve over line sieve  binu  Factoring  3  20130413 16:32 
new factorization algorithm  jasonp  Math  2  20120617 20:04 
TF algorithm(s)  davieddy  Miscellaneous Math  61  20110706 20:13 
Using p=2 for sieving (Quadratic sieve algorithm)  R1zZ1  Factoring  36  20071102 15:59 
Prime95's FFT Algorithm  Angular  Math  6  20020926 00:13 