mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Prime Gap Searches (https://www.mersenneforum.org/forumdisplay.php?f=131)
-   -   GPU Prime Gap searching (https://www.mersenneforum.org/showthread.php?t=26944)

MJansen 2021-06-25 18:41

[QUOTE=ATH;581898]mfaktc can also sieve on the GPU: SieveOnGPU=1 in mfaktc.ini. It think almost everyone uses that now, and I think it was only in the very beginning of mfaktc that people were using CPU for sieving, but I could be wrong about that.[/QUOTE]

Thanks ATH! The thread is 300+ pages long over 10 years, I did not read them all yet ;-) Cool to see the sieving has been moved to the GPU though! Any indication where in the thread that change is announced?

Kind regards
Michiel

ATH 2021-06-25 19:39

[QUOTE=MJansen;581906]Any indication where in the thread that change is announced?[/QUOTE]

Post #1948. Seems it was released in version 0.20 Dec 2012, about 3 years after the thread started, and SieveOnGPU was set to be automatically on from the start, so no one probably used CPU sieving since then.

MJansen 2021-06-26 21:59

Hi,

In the mfaktc thread, the sieving on GPU seemed to get a boost after post #1610. No specifics though as far as I could find. I remember Dana once posted that a prp test is very fast and the mfaktc thread seems to indicate that pre-sieving should be kept to a minimum (1000 primes tops). I have no data on real life performance, but my intuitive start point would be:

a. pre-sieve and PRP on the GPU:
b. I would try a wheel approach first as a base reference, and prp the remaining candidates, i.e. 1 (or more?) fermat test(s)
c. additional tests would be to play around with the number of primes in the pre-sieve to determine the trade off between pre-sieving and prp-ing

Question is how to avoid possible pseudoprimes? Is it enough to perform a second Fermat test in a different base, or do you need more tests?

Kind regards
Michiel

ATH 2021-06-27 15:37

[QUOTE=MJansen;582010]Question is how to avoid possible pseudoprimes? Is it enough to perform a second Fermat test in a different base, or do you need more tests?[/QUOTE]

Personally in my own CPU program if the first SPRP test is positive I do a Lucas test, so effectively a BPSW test. It is a lot slower than just a Fermat test, but it is almost sure not to find any pseudoprimes, since BPSW pseudoprimes are so rare that none are known. I have no idea if a Lucas test is possible or feasible on a GPU.

But above 2[SUP]64[/SUP] Fermat and SPRP pseudoprimes are very rare, and the risk that a pseudoprime also should be "blocking" a record prime gap must be very very low, but I'm not sure if it is low enough to be negligible.

CraigLo 2021-06-29 03:44

For sieving I start with a small wheel. I'm currently using 2310 but I haven't had a chance to test 30030 yet. Using 2310 gives me 480 possible candidates. I then sieve for each of the possible candidates. For example the first candidate is 1 so I sieve for 1, 2311, 4621, 6931, etc. This starts with 2 more wheels with 5 primes each. The first is copied into each possible candidate to initialize the sieve. The second wheel is combined with the first using a bitwise-or. Then I do 5 rounds of sieving with 2048 primes each. These are done using shared memory. I sieve 2048 primes for 192x1024 elements at a time. I then copy these into my global sieve. I find this approach fastest up to around 100k primes. After that approaches sieving directly into global memory are faster.

In a single processing loop I do 80 of the shared sieve blocks. That means a single candidate gets sieved out to 192*1024*80=15.7E6. With 2310 elements in the first wheel that means I get through 36E9 per loop. Using a minimum gap of 1300 it takes about .24 seconds per loop which gives me 150E9 per sec. I haven't tested sieving only in a while but it is probably 2-3 times faster.

CraigLo 2021-06-29 03:49

I looked briefly at a Lucas test but haven't thought about implementing it yet. On a CPU how much slower is it than a Fermat test?

a1call 2021-06-29 11:15

Well the experts seem to have chosen to stay silent so far so I will give you a non expert reply. You will have to know all the prime factors of p-1. So there is Factoring-computation-cost. And then you will have to do modular calculations computationally as expensive as a single Fermat test per each prime factor:
Mod(base,p)^((p-1)/eachPrimeFactor) != 1
Then you have to try enough "random" bases to get a !=1 for all the found prime factors.
Normally it does not take more than a few random trials.

BTW this is an excellent thread. Thank you for sharing your expertise without the common-attitudes that seem to dominate the members here.

ETA one issue is that you might hit a Fermat-pseudiprime and have to have implementations to break the loop. While they are rare for given base for larger numbers there exist infinitude of them per given base growing exponentially in size.

ATH 2021-07-15 12:18

I switched my program from BPSW test (Miller-Rabin + Lucas) to 2 x Miller-Rabin instead, and it is now twice as fast. I'm pretty sure BPSW test is overkill here. I'm using base 2 and then base 9375, which is one of the bases with few survivors when testing all the 2-SPRP in the Feitsma list up to 2[SUP]64[/SUP].

Craig: You are using only 1 [URL="https://en.wikipedia.org/wiki/Fermat_primality_test"]Fermat test[/URL] ? Did you consider switching to the [URL="https://en.wikipedia.org/wiki/Miller%E2%80%93Rabin_primality_test"]Miller-Rabin[/URL] (SPRP) test ?
The complexity according to Wikipedia is O(log[SUP]3[/SUP]n) for MR and O(log[SUP]2[/SUP]n*loglog n) for Fermat, so it should not be that much slower, but maybe it is hard to implement on GPU?
The advantage is that there are no Carmichael numbers for SPRP test (where all bases shows PRP incorrectly), and there are fewer composite SPRP numbers than Fermat PRP.

I'm not sure when doing 1 Fermat or 1 MR test, how likely it is that a false PRP will block a big prime gap, it seems to be very unlikely, but the question is if it is unlikely enough.
2-SPRP numbers are pretty rare, when I tried searching above 2[SUP]64[/SUP] a few years ago, and if you look at the Feitsma list, there are 808 2-SPRP in the last 10[SUP]15[/SUP] below 2[SUP]64[/SUP] or 1 every 1.24*10[SUP]12[/SUP] on average. I'm not sure about the numbers for Fermat PRP.


Are your program testing all PRP surviving your sieve? The way I do it is that I set a minimum gap I want to find, in my case 1320 because it is the highest I have found above 2[SUP]64[/SUP], then after I have sieved an interval I jump 1320 ahead from the last PRP I found and then test backwards until I find a PRP, then I jump ahead 1320 again and repeat.
If I happen to get from 1320 down to 0 without finding a PRP then I search upwards again from 1320 until I find a new "record" gap. That way I'm sure no gap above or equal to my minimum is missed, but I will not find any gaps below the minimum. On average my program only reach from 1320 down to 1100-1200 before it finds a PRP.

I'm not sure if that would work on a GPU. If you use all the threads to PRP test the same interval it probably will not, but if each thread tests its own small interval, it could be useful, but maybe you are already using this strategy? If you do I guess your minimum gap is set to 1000.

CraigLo 2021-07-16 05:05

It probably wouldn't be too hard to convert my Fermat code to a MR test. The time complexity should be the same. I think the O(log[SUP]2[/SUP]n*loglog n) you wrote for Fermat uses an FFT multiply which is only faster for very large numbers.

[url]http://www.janfeitsma.nl/math/psp2/statistics[/url]
From the Fietsma list there is about 1 2-PRP every 3E11 from 1E19 to 2[SUP]64[/SUP]. This is about 4x more than the 2-SPRP. I think there will be about 500 gaps >= 1400 from 2[SUP]64[/SUP] to 2[SUP]65[/SUP] and each will require about 70 prime checks on average. The chance that a gap above 1400 from 2[SUP]64[/SUP] to 2[SUP]65[/SUP] is missed due to a PRP is about 1 in 3E11/500/70 = 1 in 8E6.

I'm currently saving gaps >= 1300 using an approach similar to yours. All gaps found in the GPU above 1300 are sent back to the CPU for a more thorough check using the gmp library.

MJansen 2021-09-03 09:49

I bought a laptop with a GPU (Geforce 1650 mobile, 1024 cuda cores, 16 SM's of 64 cores each) for testing. I will be using windows however (not Linux) and will install the necessary programs next week. A fast search showed that I will have to use C++ (new to me ...) to code.

I looked at the pseudoprime problem and the solution ATH uses, seems most interesting (and fast) but also no clue as to how to implement that yet on a GPU.

I have been looking at Dana's Perl code, but not sure this is the correct 64 bit code: [url]https://metacpan.org/dist/Math-Prime-Util/source/lib/Math/Prime/Util/PP.pm[/url]

For prime gaps, the fastest I found was using his next_prime / prev_prime code.

sub next_prime {
my($n) = @_;
_validate_positive_integer($n);
return $_prime_next_small[$n] if $n <= $#_prime_next_small;
# This turns out not to be faster.
# return $_primes_small[1+_tiny_prime_count($n)] if $n < $_primes_small[-1];

return Math::BigInt->new(MPU_32BIT ? "4294967311" : "18446744073709551629")
if ref($n) ne 'Math::BigInt' && $n >= MPU_MAXPRIME;
# n is now either 1) not bigint and < maxprime, or (2) bigint and >= uvmax

if ($n > 4294967295 && Math::Prime::Util::prime_get_config()->{'gmp'}) {
return Math::Prime::Util::_reftyped($_[0], Math::Prime::Util::GMP::next_prime($n));
}

if (ref($n) eq 'Math::BigInt') {
do {
$n += $_wheeladvance30[$n%30];
} while !Math::BigInt::bgcd($n, B_PRIM767)->is_one ||
!_miller_rabin_2($n) || !is_extra_strong_lucas_pseudoprime($n);
} else {
do {
$n += $_wheeladvance30[$n%30];
} while !($n%7) || !_is_prime7($n);
}
$n;
}

And it seems like he uses a small wheel (30) and a Miller_Rabin base 2 test followed by a full Lucas, so in effect a full BPSW test? Is this a correct assumption?

LaurV 2021-09-05 06:20

Please use code tabs when you post code, it takes less space on the screen (especially for long routines, and not all of us have high-resolution, super-wide monitors), and it makes the code easily to read (guess what, not all of us are perl experts either :razz:). Moreover, Mike may add some syntax-highlighting for perl in the future, which will make our life even easier :mike: :unsure:

(edit: albeit code in quotes looks crap, as I just found out from below... grrr :yucky:)

[QUOTE=MJansen;587141][CODE] sub next_prime {

my($n) = @_;
_validate_positive_integer($n);
return $_prime_next_small[$n] if $n <= $#_prime_next_small;

# This turns out not to be faster.
# return $_primes_small[1+_tiny_prime_count($n)] if $n < $_primes_small[-1];
return Math::BigInt->new(MPU_32BIT ? "4294967311" : "18446744073709551629")

if ref($n) ne 'Math::BigInt' && $n >= MPU_MAXPRIME;
# n is now either 1) not bigint and < maxprime, or (2) bigint and >= uvmax

if ($n > 4294967295 && Math::Prime::Util::prime_get_config()->{'gmp'}) {
return Math::Prime::Util::_reftyped($_[0], Math::Prime::Util::GMP::next_prime($n));
}

if (ref($n) eq 'Math::BigInt') {
do {
$n += $_wheeladvance30[$n%30];
} while !Math::BigInt::bgcd($n, B_PRIM767)->is_one ||
!_miller_rabin_2($n) || !is_extra_strong_lucas_pseudoprime($n);
} else {
do {
$n += $_wheeladvance30[$n%30];
} while !($n%7) || !_is_prime7($n);
}
$n;
}
[/CODE][/QUOTE]


All times are UTC. The time now is 04:59.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.