![]() |
|
|
#276 |
|
Mar 2003
New Zealand
13×89 Posts |
tpsieve doesn't currently modify the bitmap after the sieve file has been loaded, so multiple factors for the same candidate will sometimes be found. This should become less of a problem as sieving depth increases.
It is a bit simpler to do it this way because the bitmap can't be updated atomically. If it was modifiable then it could only be accessed by one thread at a time, but making it read-only allows it to be accessed by many threads at once. I haven't tested how much difference it makes to performance however, maybe not much. If the bitmap is replaced by another data structure like a hash table (to save memory) then the benefit of keeping it read-only might be greater. Edit: Actually now that I think about it maybe the bitmap can be updated atomically by using assembly instead of C to do the update. But for consistency with a hashtable or other methods it might be better to keep it read-only anyway. Last fiddled with by geoff on 2009-08-20 at 02:51 |
|
|
|
|
|
#277 | |
|
Jun 2003
23×683 Posts |
Quote:
Can't even the hash table entry be removed if one finds an entry (just clear the LSB?) |
|
|
|
|
|
|
#278 | |
|
Jan 2005
Caught in a sieve
5·79 Posts |
Quote:
It would probably be easier to write something else, maybe even in C, to clear extra factors out of the results file once it's finished, if you wanted to do that. |
|
|
|
|
|
|
#279 |
|
Jan 2005
Caught in a sieve
6138 Posts |
Well, I have bad news...for 32-bit SSE2 users.
I have a new version of tpsieve out that makes 64-bit more than twice as fast as the SSE2 code! ![]() I'd like to thank Biwema for getting the idea stuck in my head (back on post 222) of finding an N to check based on even values of K. What took the longest was convincing myself that this approach is correct: First, if K is even, P-K (a.k.a. -K mod P) is odd. And vice versa. Second, it turns out Intel has a lowest-1-bit-finding instruction, "bsf". Since this has been around since the 386 or so, 32-bit non-SSE2 users get a boost too. Third, if I divide K by 2, that's equivalent to adding P to -K and dividing by 2. So if P>2*kmax, (P+(-K))/2 >= P/2, which is >kmax. So, my new algorithm is to begin by finding the odd one of K and -K. I check that for divisibility. Then I go to the even one (P-that previous value), shift it right until it's odd, jumping forward as many N's as there were 0-bits, and repeat at the divisibility check. The number of 0-bits averages to a little less than 2, since there's at least 1, a 50% chance of at least 2, a 25% chance of at least 3, etc. The last problem was that tpsieve reports whether it factored K*2^N+1 or K*2^N-1. I had to add an extra bit for each K, that I flip each loop, to record which I'm working on. It might be a little faster still without that! So, 64-bit is close to twice as fast as it was, and 32-bit non-SSE2 is more than twice as fast as it was, but I can't find a good way to implement the new algorithm for SSE2. In particular, SSE2 doesn't have a "bsf" equivalent, and doesn't allow for shifting two values by different amounts at the same time. I believe the shifting is coming with SSE5. I haven't seen a TZCNT instruction in the pipeline, though apparently a POPCNT instruction can help build an equivalent from four instructions. So for now the SSE2 code is unchanged. (Though I quietly upgraded the single-N SSE2 code earlier, this has little effect on the current TPS sieve.) Oh, and the new algorithm causes a small (5%) performance hit for 64-bit single-N sieves, so I left the old version out there for anyone doing a private N. |
|
|
|
|
|
#280 | |
|
Jun 2003
23×683 Posts |
Quote:
|
|
|
|
|
|
|
#281 | ||
|
Jan 2005
Caught in a sieve
5·79 Posts |
Quote:
Quote:
And, come to think of it, I should change the initialization code to verify the requirement that P > kmax*2. But I think my logic is solid, and my testing turned up no errors. |
||
|
|
|
|
|
#282 |
|
Jan 2005
Caught in a sieve
6138 Posts |
One more update: I managed to improve the performance of the 32-bit non-SSE2 multi-N section slightly past the performance of the SSE2 version.* So now SSE2 and non-SSE2 are almost identical in speed, the difference being the initial N calculation. This makes the 32-bit code almost half as fast as the 64-bit code. Somehow it seems like that's as it should be.
![]() * Pentium 4's may be an exception. If anyone has one, would you mind comparing my new SSE2 version with my old SSE2 version? |
|
|
|
|
|
#283 | |
|
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
3×23×89 Posts |
Quote:
|
|
|
|
|
|
|
#284 | |
|
Jan 2005
Caught in a sieve
18B16 Posts |
Quote:
.)Anyway, I have one more more update. Your post made me realize that SSE2-style MMX instructions wouldn't be much worse than the regular 64-bit math I was using, and might be faster on the P4. It took full inline assembly to get it working; the intrinsics decided to write everything to memory and swap it out through registers! And the performance was slightly slower on my Core 2. But with full inline assembly I was able to use all the MMX registers and software pipeline it like I do with the 64-bit code. So the "SSE2" code, using MMX registers, is now about 40% faster than it was. ![]() P.S. Before someone asks, no, the SSE2 changes won't work on a Pentium MMX, since Intel didn't add a 64-bit MMX add/subtract until they added SSE2. |
|
|
|
|
|
|
#285 |
|
Jan 2005
Caught in a sieve
6138 Posts |
A little update: I don't have a P4, but I do have an Athlon64. So I tried my new code on that, and the SSE2 code is slightly slower, even after a tiny code rearrangement; but only by 2-3%. I was surprised to learn that AMD made some of these instructions slow, too. On the other hand, my latest code probably isn't bad on older P4's. So it looks like the only processors that might not like my new code are Athlon64's in 32-bit mode, and P4 Prescotts.
I'm not going to make major changes just for Prescotts, or for just a 2-3% improvement (which might go away if I tried to add optional code paths anyway) on Athlon64s. Plus, the old code is still available for anyone with those processors. |
|
|
|
|
|
#286 | |
|
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
3×23×89 Posts |
Quote:
|
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| S9 and general sieving discussion | Lennart | Conjectures 'R Us | 31 | 2014-09-14 15:14 |
| Sieving discussion thread | philmoore | Five or Bust - The Dual Sierpinski Problem | 66 | 2010-02-10 14:34 |
| Combined sieving discussion | ltd | Prime Sierpinski Project | 76 | 2008-07-25 11:44 |
| Sieving Discussion | ltd | Prime Sierpinski Project | 26 | 2005-11-01 07:45 |
| Sieving Discussion | R.D. Silverman | Factoring | 7 | 2005-09-30 12:57 |