mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2012-10-24, 09:43   #23
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

588610 Posts
Default

Is there a particular reason why no one has written a sieve similar to sr2sieve or sr1sieve for gpus? Is it especially hard to write that sort of sieve(ppsieve uses a different method)? Either of these would be a huge help to some projects(crus).
henryzz is online now   Reply With Quote
Old 2012-10-24, 12:58   #24
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

2·32·353 Posts
Default

Quote:
Originally Posted by henryzz View Post
Is there a particular reason why no one has written a sieve similar to sr2sieve or sr1sieve for gpus? Is it especially hard to write that sort of sieve(ppsieve uses a different method)? Either of these would be a huge help to some projects(crus).
I started one, but I ran into a problem that I haven't taken the time to resolve yet.

srsieve/sr1sieve/sr2sieve all require the use of a discrete log to solve for n in the equation k*b^n+c for fixed k, b, and c. These software programs use a baby-steps giant-steps variant of the discrete log. BSGS requires about sqrt(p) elements. If p=1e12, then you are talking about 1e6 elements of 8 bytes each or 80 MB. Your typical GPU runs up to 256 concurrent threads which would require about 2GB of memory. Few GPUs have that much memory. As p increases, your memory requirements go up so. There are ways to reduce the memory requirements of BSGS, but that would increase the time to find a solution. It takes about O(sqrt(n)) to find a solution (or determine there is no solution).

An alternative is Pollard Rho. It requires little memory and can find a solution faster than BSGS, but it has its own problems. The biggest problem is that although the time to solve is about O(sqrt(p)) (where p is the largest prime factor of n), it can take up to O(n/2) to determine there is no solution. That's where the problem comes in. If the GPU is trying a discrete log on a large set of p, if just one p requires O(p/2) iterations to determine that there is no solution, then all other p in that set have to wait before working on the next set. There are ways to address that problem, but doing so could impact performance even more than BSGS.

Think of it this way. For BSGS, you are guaranteed to find a solution within O(sqrt(n)) iterations (or determine there is no solution), so run times for individual p are consistent regardless of k, b, and c. For Pollard Rho, run times for individual p vary greatly depending upon k, b, and c. The downside of BSGS is memory. The downside of Pollard Rho is the wild fluctuation in iterations from one p to the next.
rogue is offline   Reply With Quote
Old 2012-10-24, 16:33   #25
firejuggler
 
firejuggler's Avatar
 
Apr 2010
Over the rainbow

2×1,303 Posts
Default

In the future, GPU will get more memory, thats a given...
So BSGS?
firejuggler is online now   Reply With Quote
Old 2012-10-24, 21:12   #26
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

2×33×109 Posts
Default

As far as I can tell from wikipedia, you can pretty much rely on having at least 1Gb of memory from both amd and nvidia. A lot of gpus these days have 2Gb. In a generation or 2 of gpus it will almost certainly be viable.
henryzz is online now   Reply With Quote
Old 2012-10-24, 21:48   #27
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

2·32·353 Posts
Default

Quote:
Originally Posted by henryzz View Post
As far as I can tell from wikipedia, you can pretty much rely on having at least 1Gb of memory from both amd and nvidia. A lot of gpus these days have 2Gb. In a generation or 2 of gpus it will almost certainly be viable.
That might be true, but understand that p1e12 is the low end for most sieving. Users want to get to 1e14 and higher. That is a 10x jump in memory requirements. In any case using BSGS will require a lot of changes in order to avoid the memory limitations of the GPU.
rogue is offline   Reply With Quote
Old 2012-10-24, 23:09   #28
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

2×33×109 Posts
Default

Quote:
Originally Posted by rogue View Post
That might be true, but understand that p1e12 is the low end for most sieving. Users want to get to 1e14 and higher. That is a 10x jump in memory requirements. In any case using BSGS will require a lot of changes in order to avoid the memory limitations of the GPU.
A large amount of CRUS work doesn't even reach 1e12.
henryzz is online now   Reply With Quote
Old 2012-10-25, 02:17   #29
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

2×32×353 Posts
Default

Quote:
Originally Posted by henryzz View Post
A large amount of CRUS work doesn't even reach 1e12.
Those are for conjectures with many k remaining. The more k, the more memory.
rogue is offline   Reply With Quote
Old 2012-10-27, 14:30   #30
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

2·32·353 Posts
Default

I have posted v1.0.1 of fnsievecl here.

I've fixed the outstanding problem of sieving low primes. I've added the -m option to limit the factors echoed to the console. I've also added the ability read from a .pfgw file and save to a .pfgw file. This means that it can read from the file created by fnsieve and continue sieving. Based upon your settings for -t and -b you should see about a 10x improvement over fnsieve. I've run fnsieve and fnsievecl side by side and have verified the factors found and that output files match.

At this time the code requires a GPU that supports OpenCL 1.2. The changes to eliminate that requirement are relatively small. Let me know if you need that change.
rogue is offline   Reply With Quote
Old 2012-10-30, 09:06   #31
f1pokerspeed
 
Jun 2012

2·53 Posts
Default

Would it be possible to combine different ABC files together to create larger input files? I know this can be done with other sieves, however I would like to know if it would be possible for the program to sieve multiple sequences at once.

BTW - I love the speed of this sieve. Thanks for such a fast piece of software!

Last fiddled with by f1pokerspeed on 2012-10-30 at 09:09
f1pokerspeed is offline   Reply With Quote
Old 2012-11-19, 23:21   #32
Cruelty
 
Cruelty's Avatar
 
May 2005

162410 Posts
Default

I see there is a new version here. Are you planning to release binaries (especially Win64)?
Cruelty is offline   Reply With Quote
Old 2012-11-19, 23:47   #33
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

635410 Posts
Default

Quote:
Originally Posted by Cruelty View Post
I see there is a new version here. Are you planning to release binaries (especially Win64)?
I will try tomorrow.
rogue is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 17:43.


Sun Aug 1 17:43:20 UTC 2021 up 9 days, 12:12, 0 users, load averages: 2.39, 1.96, 1.66

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.