mersenneforum.org GPU Prime Gap searching
 Register FAQ Search Today's Posts Mark Forums Read

2021-06-25, 18:41   #12
MJansen

Jan 2018

22·3·7 Posts

Quote:
 Originally Posted by ATH mfaktc can also sieve on the GPU: SieveOnGPU=1 in mfaktc.ini. It think almost everyone uses that now, and I think it was only in the very beginning of mfaktc that people were using CPU for sieving, but I could be wrong about that.
Thanks ATH! The thread is 300+ pages long over 10 years, I did not read them all yet ;-) Cool to see the sieving has been moved to the GPU though! Any indication where in the thread that change is announced?

Kind regards
Michiel

Last fiddled with by MJansen on 2021-06-25 at 18:42 Reason: Changed typo's

2021-06-25, 19:39   #13
ATH
Einyen

Dec 2003
Denmark

2·3·13·41 Posts

Quote:
 Originally Posted by MJansen Any indication where in the thread that change is announced?
Post #1948. Seems it was released in version 0.20 Dec 2012, about 3 years after the thread started, and SieveOnGPU was set to be automatically on from the start, so no one probably used CPU sieving since then.

 2021-06-26, 21:59 #14 MJansen   Jan 2018 1248 Posts Hi, In the mfaktc thread, the sieving on GPU seemed to get a boost after post #1610. No specifics though as far as I could find. I remember Dana once posted that a prp test is very fast and the mfaktc thread seems to indicate that pre-sieving should be kept to a minimum (1000 primes tops). I have no data on real life performance, but my intuitive start point would be: a. pre-sieve and PRP on the GPU: b. I would try a wheel approach first as a base reference, and prp the remaining candidates, i.e. 1 (or more?) fermat test(s) c. additional tests would be to play around with the number of primes in the pre-sieve to determine the trade off between pre-sieving and prp-ing Question is how to avoid possible pseudoprimes? Is it enough to perform a second Fermat test in a different base, or do you need more tests? Kind regards Michiel
2021-06-27, 15:37   #15
ATH
Einyen

Dec 2003
Denmark

2×3×13×41 Posts

Quote:
 Originally Posted by MJansen Question is how to avoid possible pseudoprimes? Is it enough to perform a second Fermat test in a different base, or do you need more tests?
Personally in my own CPU program if the first SPRP test is positive I do a Lucas test, so effectively a BPSW test. It is a lot slower than just a Fermat test, but it is almost sure not to find any pseudoprimes, since BPSW pseudoprimes are so rare that none are known. I have no idea if a Lucas test is possible or feasible on a GPU.

But above 264 Fermat and SPRP pseudoprimes are very rare, and the risk that a pseudoprime also should be "blocking" a record prime gap must be very very low, but I'm not sure if it is low enough to be negligible.

 2021-06-29, 03:44 #16 CraigLo   Mar 2021 738 Posts For sieving I start with a small wheel. I'm currently using 2310 but I haven't had a chance to test 30030 yet. Using 2310 gives me 480 possible candidates. I then sieve for each of the possible candidates. For example the first candidate is 1 so I sieve for 1, 2311, 4621, 6931, etc. This starts with 2 more wheels with 5 primes each. The first is copied into each possible candidate to initialize the sieve. The second wheel is combined with the first using a bitwise-or. Then I do 5 rounds of sieving with 2048 primes each. These are done using shared memory. I sieve 2048 primes for 192x1024 elements at a time. I then copy these into my global sieve. I find this approach fastest up to around 100k primes. After that approaches sieving directly into global memory are faster. In a single processing loop I do 80 of the shared sieve blocks. That means a single candidate gets sieved out to 192*1024*80=15.7E6. With 2310 elements in the first wheel that means I get through 36E9 per loop. Using a minimum gap of 1300 it takes about .24 seconds per loop which gives me 150E9 per sec. I haven't tested sieving only in a while but it is probably 2-3 times faster.
 2021-06-29, 03:49 #17 CraigLo   Mar 2021 59 Posts I looked briefly at a Lucas test but haven't thought about implementing it yet. On a CPU how much slower is it than a Fermat test?
 2021-06-29, 11:15 #18 a1call     "Rashid Naimi" Oct 2015 Remote to Here/There 1000011111102 Posts Well the experts seem to have chosen to stay silent so far so I will give you a non expert reply. You will have to know all the prime factors of p-1. So there is Factoring-computation-cost. And then you will have to do modular calculations computationally as expensive as a single Fermat test per each prime factor: Mod(base,p)^((p-1)/eachPrimeFactor) != 1 Then you have to try enough "random" bases to get a !=1 for all the found prime factors. Normally it does not take more than a few random trials. BTW this is an excellent thread. Thank you for sharing your expertise without the common-attitudes that seem to dominate the members here. ETA one issue is that you might hit a Fermat-pseudiprime and have to have implementations to break the loop. While they are rare for given base for larger numbers there exist infinitude of them per given base growing exponentially in size. Last fiddled with by a1call on 2021-06-29 at 11:50
 2021-07-15, 12:18 #19 ATH Einyen     Dec 2003 Denmark 2·3·13·41 Posts I switched my program from BPSW test (Miller-Rabin + Lucas) to 2 x Miller-Rabin instead, and it is now twice as fast. I'm pretty sure BPSW test is overkill here. I'm using base 2 and then base 9375, which is one of the bases with few survivors when testing all the 2-SPRP in the Feitsma list up to 264. Craig: You are using only 1 Fermat test ? Did you consider switching to the Miller-Rabin (SPRP) test ? The complexity according to Wikipedia is O(log3n) for MR and O(log2n*loglog n) for Fermat, so it should not be that much slower, but maybe it is hard to implement on GPU? The advantage is that there are no Carmichael numbers for SPRP test (where all bases shows PRP incorrectly), and there are fewer composite SPRP numbers than Fermat PRP. I'm not sure when doing 1 Fermat or 1 MR test, how likely it is that a false PRP will block a big prime gap, it seems to be very unlikely, but the question is if it is unlikely enough. 2-SPRP numbers are pretty rare, when I tried searching above 264 a few years ago, and if you look at the Feitsma list, there are 808 2-SPRP in the last 1015 below 264 or 1 every 1.24*1012 on average. I'm not sure about the numbers for Fermat PRP. Are your program testing all PRP surviving your sieve? The way I do it is that I set a minimum gap I want to find, in my case 1320 because it is the highest I have found above 264, then after I have sieved an interval I jump 1320 ahead from the last PRP I found and then test backwards until I find a PRP, then I jump ahead 1320 again and repeat. If I happen to get from 1320 down to 0 without finding a PRP then I search upwards again from 1320 until I find a new "record" gap. That way I'm sure no gap above or equal to my minimum is missed, but I will not find any gaps below the minimum. On average my program only reach from 1320 down to 1100-1200 before it finds a PRP. I'm not sure if that would work on a GPU. If you use all the threads to PRP test the same interval it probably will not, but if each thread tests its own small interval, it could be useful, but maybe you are already using this strategy? If you do I guess your minimum gap is set to 1000. Last fiddled with by ATH on 2021-07-15 at 12:20
 2021-07-16, 05:05 #20 CraigLo   Mar 2021 59 Posts It probably wouldn't be too hard to convert my Fermat code to a MR test. The time complexity should be the same. I think the O(log2n*loglog n) you wrote for Fermat uses an FFT multiply which is only faster for very large numbers. http://www.janfeitsma.nl/math/psp2/statistics From the Fietsma list there is about 1 2-PRP every 3E11 from 1E19 to 264. This is about 4x more than the 2-SPRP. I think there will be about 500 gaps >= 1400 from 264 to 265 and each will require about 70 prime checks on average. The chance that a gap above 1400 from 264 to 265 is missed due to a PRP is about 1 in 3E11/500/70 = 1 in 8E6. I'm currently saving gaps >= 1300 using an approach similar to yours. All gaps found in the GPU above 1300 are sent back to the CPU for a more thorough check using the gmp library.
 2021-09-03, 09:49 #21 MJansen   Jan 2018 8410 Posts I bought a laptop with a GPU (Geforce 1650 mobile, 1024 cuda cores, 16 SM's of 64 cores each) for testing. I will be using windows however (not Linux) and will install the necessary programs next week. A fast search showed that I will have to use C++ (new to me ...) to code. I looked at the pseudoprime problem and the solution ATH uses, seems most interesting (and fast) but also no clue as to how to implement that yet on a GPU. I have been looking at Dana's Perl code, but not sure this is the correct 64 bit code: https://metacpan.org/dist/Math-Prime...ime/Util/PP.pm For prime gaps, the fastest I found was using his next_prime / prev_prime code. sub next_prime { my($n) = @_; _validate_positive_integer($n); return $_prime_next_small[$n] if $n <=$#_prime_next_small; # This turns out not to be faster. # return $_primes_small[1+_tiny_prime_count($n)] if $n <$_primes_small[-1]; return Math::BigInt->new(MPU_32BIT ? "4294967311" : "18446744073709551629") if ref($n) ne 'Math::BigInt' &&$n >= MPU_MAXPRIME; # n is now either 1) not bigint and < maxprime, or (2) bigint and >= uvmax if ($n > 4294967295 && Math::Prime::Util::prime_get_config()->{'gmp'}) { return Math::Prime::Util::_reftyped($_[0], Math::Prime::Util::GMP::next_prime($n)); } if (ref($n) eq 'Math::BigInt') { do { $n +=$_wheeladvance30[$n%30]; } while !Math::BigInt::bgcd($n, B_PRIM767)->is_one || !_miller_rabin_2($n) || !is_extra_strong_lucas_pseudoprime($n); } else { do { $n +=$_wheeladvance30[$n%30]; } while !($n%7) || !_is_prime7($n); }$n; } And it seems like he uses a small wheel (30) and a Miller_Rabin base 2 test followed by a full Lucas, so in effect a full BPSW test? Is this a correct assumption?
2021-09-05, 06:20   #22
LaurV
Romulan Interpreter

"name field"
Jun 2011
Thailand

265016 Posts

Please use code tabs when you post code, it takes less space on the screen (especially for long routines, and not all of us have high-resolution, super-wide monitors), and it makes the code easily to read (guess what, not all of us are perl experts either ). Moreover, Mike may add some syntax-highlighting for perl in the future, which will make our life even easier

(edit: albeit code in quotes looks crap, as I just found out from below... grrr )

Quote:
 Originally Posted by MJansen Code:  sub next_prime { my($n) = @_; _validate_positive_integer($n); return $_prime_next_small[$n] if $n <=$#_prime_next_small; # This turns out not to be faster. # return $_primes_small[1+_tiny_prime_count($n)] if $n <$_primes_small[-1]; return Math::BigInt->new(MPU_32BIT ? "4294967311" : "18446744073709551629") if ref($n) ne 'Math::BigInt' &&$n >= MPU_MAXPRIME; # n is now either 1) not bigint and < maxprime, or (2) bigint and >= uvmax if ($n > 4294967295 && Math::Prime::Util::prime_get_config()->{'gmp'}) { return Math::Prime::Util::_reftyped($_[0], Math::Prime::Util::GMP::next_prime($n)); } if (ref($n) eq 'Math::BigInt') { do { $n +=$_wheeladvance30[$n%30]; } while !Math::BigInt::bgcd($n, B_PRIM767)->is_one || !_miller_rabin_2($n) || !is_extra_strong_lucas_pseudoprime($n); } else { do { $n +=$_wheeladvance30[$n%30]; } while !($n%7) || !_is_prime7($n); }$n; }

Last fiddled with by LaurV on 2021-09-05 at 06:23

 Similar Threads Thread Thread Starter Forum Replies Last Post Dale Mahalko Software 7 2015-03-21 19:55 hhh Math 7 2014-03-18 14:37 Trilo Riesel Prime Search 33 2013-09-19 21:21 Mini-Geek Twin Prime Search 8 2011-01-30 00:18 ltd Prime Sierpinski Project 3 2006-03-23 19:27

All times are UTC. The time now is 20:59.

Sat Dec 4 20:59:33 UTC 2021 up 134 days, 15:28, 1 user, load averages: 0.94, 1.05, 1.13