![]() |
![]() |
#1 |
Mar 2003
New Zealand
13·89 Posts |
![]()
I am trying to write a sieve to help with the TPS project at PrimeGrid. This is the algorithm I am working with:
Code:
Algorithm for finding factors p dividing either one of k*2^n+/-1, where n is fixed, k is odd, k0 <= k < k1, and k1-k0 < p0 <= p < p1: For each probable prime p in p0 <= p < p1 { k <-- 2^{-n} mod p If k is even k <-- p-k If k0 <= k < k1 If k = 3 (mod 6) Report that p divides k*2^n+/-1 } I will post source and some binaries here for testing tpsieve, although it is not yet fast enough to be much use. |
![]() |
![]() |
![]() |
#2 |
Sep 2004
10000101012 Posts |
![]()
You should consider posting in the math thread. What does <-- mean? Is it store right side in variable on left?
|
![]() |
![]() |
![]() |
#3 | |
Mar 2003
New Zealand
13×89 Posts |
![]() Quote:
A correction: The algorithnm above only works when p0 > k1. Last fiddled with by geoff on 2009-04-22 at 02:09 |
|
![]() |
![]() |
![]() |
#4 |
Mar 2003
New Zealand
13·89 Posts |
![]()
Here is an outline of the improved algorithm used in tpsieve 0.3.0:
Algorithm for finding factors p of k*2^n+/-1, where k is odd, k0 <= k <= k1, n0 <= n <= n1, k1 < p0 <= p <= p1: Code:
Let m be the greatest integer satisfying k1*2^{m-1} < p0 For each probable prime p in p0 <= p <= p1 { n <-- max(n0,n1-m+1) x <-- 2^{-n} mod p While n >= n0 { Assign k,d such that k is odd and k*2^d = x If k0 <= k <= k1 AND k = 3 (mod 6) AND d < m Report that p divides k*2^{n+d}-1 Assign k,d such that k is odd and k*2^d = p-x If k0 <= k <= k1 AND k = 3 (mod 6) AND d < m Report that p divides k*2^{n+d}+1 n <-- n-m x <-- x*2^m mod p } } |
![]() |
![]() |
![]() |
#5 |
Jan 2005
Caught in a sieve
6138 Posts |
![]()
Since Geoff was nice enough to explain his new algorithm, I decided to explain my SSE2 optimization of it.
First I should explain the SSE2 completion of the floating point multiplication. SSE2 can't do a full 64-bit multiply, but it can do a 32-bit*32-bit=64-bit multiply. So the first part is just two regular school-arithmetic multiplies, where each "digit" is 32-bits long, except the result is truncated to 64 bits. Code:
AB x CD ---- BD AD +BC ---- YZ Code:
AB x AB ----- BB +2AB ----- YZ Now obviously the new algorithm, with two if's per check, poses a bit of a challenge in SSE2. My first idea was that in one of those cases, K or K-P, the number is odd and "d" is 0. The obvious use for this is to precompute K-P and let 32-bit code sort out which is which. This is faster, but not by much. A better approach involves separating the odds from the evens, saving out the evens to be processed with 32-bit code, and processing the odds in SSE2. Without if's, this is a challenge! My solution comes from an old in-place swapping trick: Given A = A0, B = B0, and "^" is the Exclusive OR instruction: 1. A ^= B (so now A = A0^B0) 2. B ^= A (so now B = B0^(A0^B0). The B0's cancel, leaving A0.) 3. A ^= B (so now A = (A0^B0)^A0. The A0's cancel, leaving B0.) So, I get K and P, compute K-P, K^(K-P), and (K&1)-1. This last effectively turns the low bit of K into a register with either all 1's or all 0's. I then AND this with K^(K-P) so that it's either K^(K-P) if K was even, or 0 if K was odd. I then XOR that value (K^(K-P)|0) with both K and K-P, so that the K register holds both the odd values and the other register holds both the even values. I then save off the even values to the T(emp) array, to be worked on later with 32-bit code, and check whether the odd values are <= kmax, by the same kind of logic I use at the end of the multiply code. I would have liked to find d (the number of low 0's, from Geoff's explanation above) and shifted it out in SSE2, but SSE2 has neither a lower zero count instruction nor a shift that can shift different distances in different parts of the register. However, if I take the inequality: k >> d <= kmax and rearrange it: k / 2^d <= kmax k <= kmax * 2^d Well, there is a method for finding 2^d without conditionals (x AND -x). And there is an SSE2 multiply instruction. However, the multiply only does 32-bit*32-bit -> 64-bit. So for this method to work, both kmax and what it multiplies by have to be < 2^32. So this sets a limit on kmax; but if d >= 32 (which it can be), 2^d in 32-bit would be 0. My solution is one more rearrangement: k <= kmax * (2^d-1) + kmax That's heavy on the right-side calculation, so I switched one thing over to the left, so it can be computed in parallel: k - kmax <= kmax * (2^d-1) Now look at the edge cases. For large d, 2^d-1 = 2^32-1. k <= kmax * 2^32 is true if P (which k is less than) is < kmax*2^32. For d=0 on the other hand, which happens half the time when K is odd, we have: k - kmax <= kmax * 0 k - kmax <= 0 k <= kmax Exactly what you'd expect. So this method works, uses entirely SSE2 code except for "exceptional" cases, and is 80% as fast as 64-bit, but only when P < kmax*2^32, and when kmax < 2^32. Otherwise I fall back to the above algorithm, which is "only" 70% as fast as 64-bit. |
![]() |
![]() |
![]() |
#6 |
Jan 2005
Caught in a sieve
5·79 Posts |
![]()
Here is an outline of the improved algorithm used in tpsieve-CUDA 0.2.1 and up:
Algorithm for finding factors p of k*2^n+/-1, where k is odd, k0 <= k <= k1, n0 <= n <= n1, k1 < p0 <= p <= p1: Code:
Let m be the greatest integer satisfying k1*2^{m-1} < p0 Subtract one from m. For each probable prime p in p0 <= p <= p1 { n <-- n0 x <-- 2^{-n} mod p Assign k =x If k is even, let k=p-k If k0 <= k <= k1 AND k = 3 (mod 6) Report that p divides k*2^n-1 if y=x, or k*2^n+1 if y != x. While n < n1 { Assign y =x If y is odd, let y=p-y Assign k,d such that k is odd and k*2^d = y If k0 <= k <= k1 AND k = 3 (mod 6) AND d <= m Report that p divides k*2^{n+d}-1 if y=x, or k*2^{n+d}+1 if y != x. n <-- n+m x <-- x*2^-m mod p } } Note that x*2^-m mod p is accomplished quite easily with Montgomery math. I limit m to 32 in most cases, for hardware-specific performance reasons only. |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Advantage of lattice sieve over line sieve | binu | Factoring | 3 | 2013-04-13 16:32 |
new factorization algorithm | jasonp | Math | 2 | 2012-06-17 20:04 |
TF algorithm(s) | davieddy | Miscellaneous Math | 61 | 2011-07-06 20:13 |
Using p=2 for sieving (Quadratic sieve algorithm) | R1zZ1 | Factoring | 36 | 2007-11-02 15:59 |
Prime95's FFT Algorithm | Angular | Math | 6 | 2002-09-26 00:13 |