![]() |
|
|
#12 | |
|
Jan 2018
5410 Posts |
Quote:
Kind regards Michiel Last fiddled with by MJansen on 2021-06-25 at 18:42 Reason: Changed typo's |
|
|
|
|
|
|
#13 |
|
Einyen
Dec 2003
Denmark
35·13 Posts |
Post #1948. Seems it was released in version 0.20 Dec 2012, about 3 years after the thread started, and SieveOnGPU was set to be automatically on from the start, so no one probably used CPU sieving since then.
|
|
|
|
|
|
#14 |
|
Jan 2018
2×33 Posts |
Hi,
In the mfaktc thread, the sieving on GPU seemed to get a boost after post #1610. No specifics though as far as I could find. I remember Dana once posted that a prp test is very fast and the mfaktc thread seems to indicate that pre-sieving should be kept to a minimum (1000 primes tops). I have no data on real life performance, but my intuitive start point would be: a. pre-sieve and PRP on the GPU: b. I would try a wheel approach first as a base reference, and prp the remaining candidates, i.e. 1 (or more?) fermat test(s) c. additional tests would be to play around with the number of primes in the pre-sieve to determine the trade off between pre-sieving and prp-ing Question is how to avoid possible pseudoprimes? Is it enough to perform a second Fermat test in a different base, or do you need more tests? Kind regards Michiel |
|
|
|
|
|
#15 | |
|
Einyen
Dec 2003
Denmark
315910 Posts |
Quote:
But above 264 Fermat and SPRP pseudoprimes are very rare, and the risk that a pseudoprime also should be "blocking" a record prime gap must be very very low, but I'm not sure if it is low enough to be negligible. |
|
|
|
|
|
|
#16 |
|
Mar 2021
2×17 Posts |
For sieving I start with a small wheel. I'm currently using 2310 but I haven't had a chance to test 30030 yet. Using 2310 gives me 480 possible candidates. I then sieve for each of the possible candidates. For example the first candidate is 1 so I sieve for 1, 2311, 4621, 6931, etc. This starts with 2 more wheels with 5 primes each. The first is copied into each possible candidate to initialize the sieve. The second wheel is combined with the first using a bitwise-or. Then I do 5 rounds of sieving with 2048 primes each. These are done using shared memory. I sieve 2048 primes for 192x1024 elements at a time. I then copy these into my global sieve. I find this approach fastest up to around 100k primes. After that approaches sieving directly into global memory are faster.
In a single processing loop I do 80 of the shared sieve blocks. That means a single candidate gets sieved out to 192*1024*80=15.7E6. With 2310 elements in the first wheel that means I get through 36E9 per loop. Using a minimum gap of 1300 it takes about .24 seconds per loop which gives me 150E9 per sec. I haven't tested sieving only in a while but it is probably 2-3 times faster. |
|
|
|
|
|
#17 |
|
Mar 2021
3410 Posts |
I looked briefly at a Lucas test but haven't thought about implementing it yet. On a CPU how much slower is it than a Fermat test?
|
|
|
|
|
|
#18 |
|
"Rashid Naimi"
Oct 2015
Remote to Here/There
3·5·137 Posts |
Well the experts seem to have chosen to stay silent so far so I will give you a non expert reply. You will have to know all the prime factors of p-1. So there is Factoring-computation-cost. And then you will have to do modular calculations computationally as expensive as a single Fermat test per each prime factor:
Mod(base,p)^((p-1)/eachPrimeFactor) != 1 Then you have to try enough "random" bases to get a !=1 for all the found prime factors. Normally it does not take more than a few random trials. BTW this is an excellent thread. Thank you for sharing your expertise without the common-attitudes that seem to dominate the members here. ETA one issue is that you might hit a Fermat-pseudiprime and have to have implementations to break the loop. While they are rare for given base for larger numbers there exist infinitude of them per given base growing exponentially in size. Last fiddled with by a1call on 2021-06-29 at 11:50 |
|
|
|
|
|
#19 |
|
Einyen
Dec 2003
Denmark
35×13 Posts |
I switched my program from BPSW test (Miller-Rabin + Lucas) to 2 x Miller-Rabin instead, and it is now twice as fast. I'm pretty sure BPSW test is overkill here. I'm using base 2 and then base 9375, which is one of the bases with few survivors when testing all the 2-SPRP in the Feitsma list up to 264.
Craig: You are using only 1 Fermat test ? Did you consider switching to the Miller-Rabin (SPRP) test ? The complexity according to Wikipedia is O(log3n) for MR and O(log2n*loglog n) for Fermat, so it should not be that much slower, but maybe it is hard to implement on GPU? The advantage is that there are no Carmichael numbers for SPRP test (where all bases shows PRP incorrectly), and there are fewer composite SPRP numbers than Fermat PRP. I'm not sure when doing 1 Fermat or 1 MR test, how likely it is that a false PRP will block a big prime gap, it seems to be very unlikely, but the question is if it is unlikely enough. 2-SPRP numbers are pretty rare, when I tried searching above 264 a few years ago, and if you look at the Feitsma list, there are 808 2-SPRP in the last 1015 below 264 or 1 every 1.24*1012 on average. I'm not sure about the numbers for Fermat PRP. Are your program testing all PRP surviving your sieve? The way I do it is that I set a minimum gap I want to find, in my case 1320 because it is the highest I have found above 264, then after I have sieved an interval I jump 1320 ahead from the last PRP I found and then test backwards until I find a PRP, then I jump ahead 1320 again and repeat. If I happen to get from 1320 down to 0 without finding a PRP then I search upwards again from 1320 until I find a new "record" gap. That way I'm sure no gap above or equal to my minimum is missed, but I will not find any gaps below the minimum. On average my program only reach from 1320 down to 1100-1200 before it finds a PRP. I'm not sure if that would work on a GPU. If you use all the threads to PRP test the same interval it probably will not, but if each thread tests its own small interval, it could be useful, but maybe you are already using this strategy? If you do I guess your minimum gap is set to 1000. Last fiddled with by ATH on 2021-07-15 at 12:20 |
|
|
|
|
|
#20 |
|
Mar 2021
2×17 Posts |
It probably wouldn't be too hard to convert my Fermat code to a MR test. The time complexity should be the same. I think the O(log2n*loglog n) you wrote for Fermat uses an FFT multiply which is only faster for very large numbers.
http://www.janfeitsma.nl/math/psp2/statistics From the Fietsma list there is about 1 2-PRP every 3E11 from 1E19 to 264. This is about 4x more than the 2-SPRP. I think there will be about 500 gaps >= 1400 from 264 to 265 and each will require about 70 prime checks on average. The chance that a gap above 1400 from 264 to 265 is missed due to a PRP is about 1 in 3E11/500/70 = 1 in 8E6. I'm currently saving gaps >= 1300 using an approach similar to yours. All gaps found in the GPU above 1300 are sent back to the CPU for a more thorough check using the gmp library. |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| HTML5 prime number searching? | Dale Mahalko | Software | 7 | 2015-03-21 19:55 |
| Elementary Literature on probabilistic aspects of prime searching | hhh | Math | 7 | 2014-03-18 14:37 |
| Question about multi-core prime searching. | Trilo | Riesel Prime Search | 33 | 2013-09-19 21:21 |
| idea for twin prime searching | Mini-Geek | Twin Prime Search | 8 | 2011-01-30 00:18 |
| Searching for user BlueSoda (no its not a prime) | ltd | Prime Sierpinski Project | 3 | 2006-03-23 19:27 |