mersenneforum.org GPU Prime Gap searching
 Register FAQ Search Today's Posts Mark Forums Read

2021-11-03, 18:28   #45
MJansen

Jan 2018

5×17 Posts

Quote:
 Originally Posted by CraigLo Treesieve looks like it is useful when you want to compute C%p for the same C with a lot of different primes. I've never tried it, but it seems like it is better for trial division than sieving. https://www.mersenneforum.org/showpo...65&postcount=8 Is there a description of tree sieve anywhere?
I do not have the code, Jens does (C code with maybe some machine language incorporated iirc), I only have a compiled exe file for windows and some idea about what it does. But to be clear: every gap search basically is an interval search and must deal with sieving the interval. Treesieve was developed for large gaps (say P(x)# with x > 250, but I think initially it was even for different formats than primorials).

In my words, how I understand it:
The idea is to set up a tree of nodes where each node (= a product of more and more primes) is computed once and used to determine the remainder of that node to the interval start ( say IntStart= M*P(x)#/D-25*P(x)).

The tree has levels as Jens described in the link you gave, where the base level is made up of nodes consisting of the product of say 10 primes each.
Node(level=0,node=1)=P(1)*P(2)*P(3)*...*P(10)=6469693230
Node(level=0,node=2)=P(11)*P(12)*P(13)*...*P(20)
Node(level=0,node=3)=P(21)*P(22)*P(23)*...*P(30)
Node(level=0,node=4)=P(21)*P(22)*P(23)*...*P(30)

Level 1 has the multiplied nodes of level 0, say:
Node(level=1,node=1)=Node(0,1)*Node(0,2)
Node(level=1,node=2)=Node(0,1)*Node(0,2)

Level 2 has the multiplied nodes of level 1, say:
Node(level=2,node=1)=Node(1,1)*Node(1,2)

And assume Node(2,1) is in the order of sqrt(IntStart) (you have to build a tree with this in mind)

Now determine IntStart%Node(2,1) = R(2,1).
Use this remainder R(2,1) to determine:
R(1,1)=R(2,1)%Node(1,1)
R(1,2)= R(2,1)%Node(1,2)

And move to the lowest level:
R(0,1)=R(1,1)%Node(0,1)
R(0,2)=R(1,1)%Node(0,2)

R(0,3)=R(1,2)%Node(0,3)
R(0,4)=R(1,2)%Node(0,4)

Now determine the remainder of the individual primes (eg R(0,1)%3), so you can use that remainder to start crossing off the candidates in the prime gap interval.

This in a nutshell, hope it is clear. From experience I can say it works really well for big numbers.

Quote:
 Originally Posted by CraigLo Here are the basics of my approach. Let take an easy example and use A*11#/6. With a divider of 6 we want A%6 to be 1 or 5. We choose one of them. I'll use 1, so A will be 1, 7, 13, ... Find offsets from 1*11#/6 that can be prime after removing multiples of 2, 3, 5, 7, 11. With 2 we eliminate ..., -5, -3, -1, 1, 3, 5, ... With 3 we eliminate ..., -4, -1, 2, 5, ... With 5 we eliminate ..., -10, -5, 0, 5, 10, ... With 7 we eliminate ..., -14, -7, 0, 7, 14, ... With 11 we eliminate ..., -22, -11, 0, 11, 22, ... This leaves possible primes of ..., -8, -6, -2, 4, 6, 12, ... Now sieve starting with 13 using each of the possible prime offsets. With 13 and an offset of -2 we want to eliminate all cases where ((1 + 6*i)*11#/6 - 2)%13 = 0 The first i where this is true is 8. We can add multiples of 13 to this and the results will still be divisible by 13. So, for 13 and an offset of -2 we eliminate 8, 21, 34, ... Repeat for all primes in our sieve and all offsets. When doing a search I pick a fixed number of bits for the prime numbers. I usually pick a multiple of 32 which is a little more efficient. I then choose a primorial so that I can run for a couple days without needing to change the number of bits in my primes. When looking for merits >25 I use offsets up to +/- 20*merit. In the GPU I use 2 sieving algorithms. The first uses small primes and works in shared memory. The second uses large primes and works in global memory. I have an idea for a third that I need to experiment with that would go in between these 2. I sieve as deeply (in multiplier) as memory will allow. I've never tried optimizing this in a CPU but I would probably take the same approach. One algorithm for small primes where the sieve interval fits in cache and one algorithm for larger primes.
Your approach looks a lot like mine in the sense that I discard all even numbers while sieving, my code looks a little complex but the sieving is more efficient (CPU only! Sieving seems like a ideal parallel process, so that should improve the speed of gap searches most definitely). My code is directly based on the Sieve of Eratosthenes: you can discard all odd values that match x^2 + 2kx.

By the by:
NP = x^2+2kx is the formula for all non-primes,
with x a whole number [3.--> and k a whole number [0,-->

Use graph.exe to show you a grafic representation of that formula. Also nice, let graph.exe show the equation -NP = x^2+2kx (in the 4th quadrant, where k is negative). This shows where Fermat got his idea for NP = X^2 - Y^2. Based on the formula NP = x^2 + 2kx, you can be more exact:
NP + k^2 = x^2 + 2kx + k^2 = (x+k)^2
NP = (x+k)^2 - k^2.

Another way of coming to this formula is:
NP = x * y (x and y whole numbers [3,-->
Say x<=y and y is odd then y = x + 2k with k a whole number [0,-->
Then NP = x (x + 2k) = x^2 + 2kx

Note:
The formula for the non-primes (NP = x^2 + 2kx) moves asymptotically to k=-1/2x and since that is the place where no Non-Prime will ever be, it seems like a good place to look for primes in the imaginary mathematical world.
My point: the Riemann critical line is the intrinsic property of the primes.
The better question is: what is the relation between the Riemann zero's and the primes themselves? Can it be used to determine the actual primes?

if anybody gets inspired and proves the Riemann Hypothesis by this, at least mention me in your proof, A nice share in the prize money in the order of USD 100.000 would do nicely as well!

Kind regards
Michiel

 2021-11-04, 06:26 #46 CraigLo   Mar 2021 59 Posts The advantage of my approach is that you can do much better than Sieve of Eratosthenes. When using a wheel with the first 250 primes, only about 1 in 13 numbers can be prime. It's even better than this when using deficient primorials. My sieving method allows you to only sieve numbers which aren't already eliminated by the primorial. The Sieve of Eratosthenes requires eliminating all elements except the even numbers so you end up having to remove about 8 times more elements. I think you are sieving for one multiplier at a time. Are you re-computing the mods for each multiplier? The mods only need to be computed once. If I was going to sieve one multiplier at a time I would use the following: Compute A*P#/Divider%p, for each prime p Compute P#%p, for each prime p and use this to adjust the starting position from one multiplier to the next. Using the example from before: A*11#/6, A = 1, 7, 13, ... with p = 13 1*11#/6%13 = 8, so for A = 1 we can remove 5, 31, 57, ... and -21, -47, ... 11#%13 = 9, so each time we move to the next multiplier we just need to add 4 to the previous numbers. For 7*11#/6 we start with 9, (7*11#/6 + 9)%13 = 0 For 13*11#/6 we start with 13
2021-11-04, 07:57   #47
MJansen

Jan 2018

8510 Posts

Quote:
 Originally Posted by CraigLo The advantage of my approach is that you can do much better than Sieve of Eratosthenes. When using a wheel with the first 250 primes, only about 1 in 13 numbers can be prime. It's even better than this when using deficient primorials. My sieving method allows you to only sieve numbers which aren't already eliminated by the primorial. The Sieve of Eratosthenes requires eliminating all elements except the even numbers so you end up having to remove about 8 times more elements. I think you are sieving for one multiplier at a time. Are you re-computing the mods for each multiplier? The mods only need to be computed once. If I was going to sieve one multiplier at a time I would use the following: Compute A*P#/Divider%p, for each prime p Compute P#%p, for each prime p and use this to adjust the starting position from one multiplier to the next. Using the example from before: A*11#/6, A = 1, 7, 13, ... with p = 13 1*11#/6%13 = 8, so for A = 1 we can remove 5, 31, 57, ... and -21, -47, ... 11#%13 = 9, so each time we move to the next multiplier we just need to add 4 to the previous numbers. For 7*11#/6 we start with 9, (7*11#/6 + 9)%13 = 0 For 13*11#/6 we start with 13
Hi Craig,
you are correct. I do make too many calculations atm, mainly because I do not keep to one divider and also because I do not search one primorial at the time, but go through a wide range of those. The optimizations you (and Seth) are using, make sense if the search is more limited in P(x)# and D. You find much more 30+ gaps than I do, so you guys must be doing something right ;-)

This is also why I like the article (and the insights) of Seth. I will incorporate those ideas into my new approach. There are some other ideas I would like to explore, one has to do with a more efficient number presentation:
there are 8 candidates that remain in a wheel of 30. You can represent them as a matrix of 8 by N elements. The 3rd element of the 11th byte represents 11 + 11*30 = 341

(Note 8 candidates represented as 0 or 1 make up one byte). Sieve each bit separately in an interval and you have a better parallel sieving method that I think can work on GPU.
Changing to a wheel of 210 gives 48 candidates, but I am under the impression that a SMT has 64 cores, so what to do with the remaining 16 cores?

Anywayz, interesting times!

And one more thing about NP = x^2 + 2kx:
if k = -1/2 x + 1/2 then NP = x^2 + 2x(-1/2x + 1/2) = x
And k = -1/2 x - 1/2 then NP = x^2 + 2x(-1/2x - 1/2) = -x
Here you will find the trivial zero's.

 2021-11-04, 14:58 #48 CraigLo   Mar 2021 59 Posts I always wondered why Seth claimed a factor of 10000 improvement. I would have expected less than 10. Re-computing the mods probably explains it. In my sieve I use 1 bit per multiplier per possible prime offset so memory use is as efficient as possible. There are a couple of ways to parallelize. You can have each threadblock work on a different set of possible prime offsets. You can have each threadblock work with a different set of primes. There might be others.
2021-11-04, 18:35   #49
MJansen

Jan 2018

5×17 Posts

Quote:
 Originally Posted by CraigLo I always wondered why Seth claimed a factor of 10000 improvement. I would have expected less than 10. Re-computing the mods probably explains it. In my sieve I use 1 bit per multiplier per possible prime offset so memory use is as efficient as possible. There are a couple of ways to parallelize. You can have each threadblock work on a different set of possible prime offsets. You can have each threadblock work with a different set of primes. There might be others.
Hi Craig,

it is not so much about the storage of the in essence boolean value for candidate or composite, but sieving on more gpu cores at the same time. I don't know how to make this clear yet. I will let this sink in and let it stew for a while. Do some tests and come back on the sieving part. Maybe a varied approach should be in order, one for fixed P(x)# and div, and another for broader searches. Thanks for thinking along and sharing! Appreciated

Kind regards
michiel

 Similar Threads Thread Thread Starter Forum Replies Last Post Dale Mahalko Software 7 2015-03-21 19:55 hhh Math 7 2014-03-18 14:37 Trilo Riesel Prime Search 33 2013-09-19 21:21 Mini-Geek Twin Prime Search 8 2011-01-30 00:18 ltd Prime Sierpinski Project 3 2006-03-23 19:27

All times are UTC. The time now is 08:39.

Thu Jan 27 08:39:46 UTC 2022 up 188 days, 3:08, 1 user, load averages: 1.06, 1.47, 1.51