![]() |
|
|
#23 |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2·33·109 Posts |
Is there a particular reason why no one has written a sieve similar to sr2sieve or sr1sieve for gpus? Is it especially hard to write that sort of sieve(ppsieve uses a different method)? Either of these would be a huge help to some projects(crus).
|
|
|
|
|
|
#24 | |
|
"Mark"
Apr 2003
Between here and the
2·32·353 Posts |
Quote:
srsieve/sr1sieve/sr2sieve all require the use of a discrete log to solve for n in the equation k*b^n+c for fixed k, b, and c. These software programs use a baby-steps giant-steps variant of the discrete log. BSGS requires about sqrt(p) elements. If p=1e12, then you are talking about 1e6 elements of 8 bytes each or 80 MB. Your typical GPU runs up to 256 concurrent threads which would require about 2GB of memory. Few GPUs have that much memory. As p increases, your memory requirements go up so. There are ways to reduce the memory requirements of BSGS, but that would increase the time to find a solution. It takes about O(sqrt(n)) to find a solution (or determine there is no solution). An alternative is Pollard Rho. It requires little memory and can find a solution faster than BSGS, but it has its own problems. The biggest problem is that although the time to solve is about O(sqrt(p)) (where p is the largest prime factor of n), it can take up to O(n/2) to determine there is no solution. That's where the problem comes in. If the GPU is trying a discrete log on a large set of p, if just one p requires O(p/2) iterations to determine that there is no solution, then all other p in that set have to wait before working on the next set. There are ways to address that problem, but doing so could impact performance even more than BSGS. Think of it this way. For BSGS, you are guaranteed to find a solution within O(sqrt(n)) iterations (or determine there is no solution), so run times for individual p are consistent regardless of k, b, and c. For Pollard Rho, run times for individual p vary greatly depending upon k, b, and c. The downside of BSGS is memory. The downside of Pollard Rho is the wild fluctuation in iterations from one p to the next. |
|
|
|
|
|
|
#25 |
|
Apr 2010
Over the rainbow
2×1,303 Posts |
In the future, GPU will get more memory, thats a given...
So BSGS? |
|
|
|
|
|
#26 |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2·33·109 Posts |
As far as I can tell from wikipedia, you can pretty much rely on having at least 1Gb of memory from both amd and nvidia. A lot of gpus these days have 2Gb. In a generation or 2 of gpus it will almost certainly be viable.
|
|
|
|
|
|
#27 |
|
"Mark"
Apr 2003
Between here and the
2·32·353 Posts |
That might be true, but understand that p1e12 is the low end for most sieving. Users want to get to 1e14 and higher. That is a 10x jump in memory requirements. In any case using BSGS will require a lot of changes in order to avoid the memory limitations of the GPU.
|
|
|
|
|
|
#28 |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2×33×109 Posts |
A large amount of CRUS work doesn't even reach 1e12.
|
|
|
|
|
|
#29 |
|
"Mark"
Apr 2003
Between here and the
635410 Posts |
|
|
|
|
|
|
#30 |
|
"Mark"
Apr 2003
Between here and the
2×32×353 Posts |
I have posted v1.0.1 of fnsievecl here.
I've fixed the outstanding problem of sieving low primes. I've added the -m option to limit the factors echoed to the console. I've also added the ability read from a .pfgw file and save to a .pfgw file. This means that it can read from the file created by fnsieve and continue sieving. Based upon your settings for -t and -b you should see about a 10x improvement over fnsieve. I've run fnsieve and fnsievecl side by side and have verified the factors found and that output files match. At this time the code requires a GPU that supports OpenCL 1.2. The changes to eliminate that requirement are relatively small. Let me know if you need that change. |
|
|
|
|
|
#31 |
|
Jun 2012
6A16 Posts |
Would it be possible to combine different ABC files together to create larger input files? I know this can be done with other sieves, however I would like to know if it would be possible for the program to sieve multiple sequences at once.
BTW - I love the speed of this sieve. Thanks for such a fast piece of software! Last fiddled with by f1pokerspeed on 2012-10-30 at 09:09 |
|
|
|