mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Prime Gap Searches

Reply
 
Thread Tools
Old 2021-09-10, 07:54   #34
MJansen
 
Jan 2018

1248 Posts
Default

Hi Craig,

appreciated, seems like sound advice! Goal is to get started with some coding and if CGBN is NVIDIA supported, it might develop further in the future. Will try CGBN. Another advantage is that it already has a probable prime test incorporated (Miller - Rabin according to Seth).

Regarding your response to CNBG groups, we'll have to find out ;-) Have to say the CUDA terminology is not that self explanatory (SM, Kernel, Thread, Block, Cores ....


@Robert, I am hesitant to switch to Unix just yet, I will stick to windows for now.

Regards
Michiel
MJansen is online now   Reply With Quote
Old 2021-10-14, 22:37   #35
Bobby Jacobs
 
Bobby Jacobs's Avatar
 
May 2018

3×79 Posts
Default

This is very cool. Now, when do you believe you will have the new code working to confirm the new maximal prime gaps?
Bobby Jacobs is offline   Reply With Quote
Old 2021-10-19, 17:12   #36
MJansen
 
Jan 2018

22×3×7 Posts
Default

Quote:
Originally Posted by Bobby Jacobs View Post
This is very cool. Now, when do you believe you will have the new code working to confirm the new maximal prime gaps?
Hi Bobby,
not sure who you are reacting to. If it was me let me be clear: I am not going to put energy into making a deterministic gapsearch from 2^64 onwards. I will be experimenting with coding for a GPU to see how much more efficient that is compared to the traditional CPU gap search.

Update on that:
I took a detour and have now installed Ubuntu under windows (WSL2?) and am currently learning the basics of the GMP datatype eccentricities. GMP is not very intuitive I must admit and I have not begun to look for speed optimizations yet. So slow going still. But I have a basic understanding of Ubuntu, GMP and C++. In summary: still confused but at a higher level ;-)

And coding for a GPU would be the next step, after I get to grips with GMP, C++ and Cuda (necessary to talk to the GPU). So not sure how this journey (in my spare time) will develop, but I am still finding it interesting!

Kind regards
Michiel
MJansen is online now   Reply With Quote
Old 2021-10-19, 19:30   #37
MJansen
 
Jan 2018

5416 Posts
Default

@Dana: I tried GMP_nextprime (primorial value = 10000455041*3607#/210) and it took 39 seconds under Ubuntu GMP compiled with g++. A simple gcd wheel of 9699690 took only 18 seconds to find the gap end on the plus side (+26734). And around 42 secs for the minus side (-47298).

Dana's Perl module (prime utils under Ubuntu Perl) takes only 39 seconds to find both the plus and the minus side of the gap. But also uses the gmp_nextprime code. Not sure where the speed up is coming from, maybe the datatype of n is important here. Will look into it further ...

If somebody has a cpu script that improves Dana's 39 seconds, I am interested ;-)

But your

Last fiddled with by MJansen on 2021-10-19 at 19:31
MJansen is online now   Reply With Quote
Old 2021-10-20, 10:19   #38
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

134618 Posts
Default

Quote:
Originally Posted by MJansen View Post
Hi Bobby,
not sure who you are reacting to. If it was me let me be clear: I am not going to put energy into making a deterministic gapsearch from 2^64 onwards. I will be experimenting with coding for a GPU to see how much more efficient that is compared to the traditional CPU gap search.

Update on that:
I took a detour and have now installed Ubuntu under windows (WSL2?) and am currently learning the basics of the GMP datatype eccentricities. GMP is not very intuitive I must admit and I have not begun to look for speed optimizations yet. So slow going still. But I have a basic understanding of Ubuntu, GMP and C++. In summary: still confused but at a higher level ;-)

And coding for a GPU would be the next step, after I get to grips with GMP, C++ and Cuda (necessary to talk to the GPU). So not sure how this journey (in my spare time) will develop, but I am still finding it interesting!

Kind regards
Michiel
You may be aware of this already but mpz_class (the c++ bindings for gmp) is quite a bit less clunky than mpz_t although some functions require mpz_t (which can be easily extracted from mpz_class).
henryzz is offline   Reply With Quote
Old 2021-10-20, 11:13   #39
MJansen
 
Jan 2018

22·3·7 Posts
Default

Hi Henryzz,

I saw that there are some samples with the mpz_class but have so far only used the mpz_t datatype after mixing up different types in the wrong places ;-) Thanks for the tip!

Kind regards
Michiel
MJansen is online now   Reply With Quote
Old 2021-10-23, 18:18   #40
Bobby Jacobs
 
Bobby Jacobs's Avatar
 
May 2018

3×79 Posts
Default

Quote:
Originally Posted by MJansen View Post
Hi Bobby,
not sure who you are reacting to.
I was reacting to you. Since this thread used to be part of the "New Maximal Gaps" thread, I assumed that you were using this code to find maximal gaps greater than 264.
Bobby Jacobs is offline   Reply With Quote
Old 2021-10-30, 21:25   #41
MJansen
 
Jan 2018

22·3·7 Posts
Default

I did an update on the conceptual gapsearch approach I want to take, and am curious if there are other thoughts (discourse is the academical basis for improvement after all):

1. pretest
Steps:
Use primorials for pretesting, or better primorial intervals like:
pri1 = P(1)*P(2)*...*P(20),
pri2 = P(21)*P(22)*...*P(40),
etc ..
Q: or P(1-100), P(101-200), ...) --> testing needed!

Determine the remainder R1: R1 = Multiplier*P(x)#/Divider % pri1

After that, go through the interval [0, +2, +4, +6, etc] and then if the interval value is not yet set to composite
determine if gcd(pri1,R1+interval value)!=1, if so set it to composite, else goto next interval value

Determine the remainder R2=m*P(x)#/D % pri2,
loop through the interval and test if an interval value is not found composite already,
then determine if gcd(pri2,R2+interval value)!=1, if so set it to composite, else goto next interval value
etc, etc, till the desired depth of presieving is reached

Note: Jens KA had a ball park figure of (d=log10(M*P(x)#/D)) and max presieve prime = d/2,5)^2,5, this was way too high according to my tests, I use 40.000*x if x > 250.
For M*P(504)#/210 this would mean in JKA's formula: d = ca 1500 and the max prime would be in the region of 2.31*10^17. In my approach it would mean stopping at the prime a little over 2*10^7. There is a trade off and the larger the x, the more interesting presieving is of course.

After presieve is done, check if remaining candidates in the interval are below a threshold count, if so the chance of finding a large gap are bigger, so discard the denser intervals (time is too short to test all possible intervals anyway, look where the chances are higher)

Q: 1 or 2 intervals to pretest?
1 interval to pretest [start = -20*x, end = -20*x] and later split into plus and minus for step 3, or
2 plus [0, +2, +4, +6, ... , 20*x] and minus interval [0, -2, -4, -6, ... , -20*x] pretested separately in two threads?

2. PRP (CPU-Pfgw or GPU-code to be written) the remaining candidates
Q I tend towards skipping the PRP step all together, but some testing needed!

3. Perform Extra strong BSWP test on the remaing candidates to avoid missing an even bigger gap
if a PRP is found below 3*P(x), stop the search end move to the next Multiplier

Q how to get the GPU involved, for step 3 only? or would it make sense to do some presieving or even PRP-ing on the GPU?
Q does the GPU-CGBN implementation of the Miller Rabin test result in strong enough PRP's?

In general: I found Multipliers with remainder 1 mod 30 to be effective (as are the multipliers remainder 7, 11, 13, 17, 19, 23, 29 mod 30) for finding bigger primes. Remainder 5 and 25 give more gaps over merit 20, but not the highest merits. I am running some extensive tests on the efficiency of the Multiplier (M) and Divider (D) that I will share when its done.

Oh and I am making a complete switch from windows to Ubuntu-C/C++ for gapsearching at the moment. It is a lot faster and after an initial steep learning curve it seems to get easier every day.

Kind regards
Michiel
MJansen is online now   Reply With Quote
Old 2021-11-01, 06:33   #42
CraigLo
 
Mar 2021

738 Posts
Default

I use the GPU for everything except for some initialization and verifying large gaps.


I skip step 3 (BPSW). For numbers around 264 there is only about a 1 in 1E12 chance of being a SPRP-2. The chances are even lower for larger numbers.


I use a different sieving approach. I think what I use is the same as Seth's hybrid approach.


I sieve for the next interval at the same time I am doing prime checks. It is efficient to do a memory bandwidth limited operation at the same time as a computation limited operation. The latencies for the memory operations can be hidden by the computations in another kernel.
CraigLo is offline   Reply With Quote
Old 2021-11-01, 21:46   #43
MJansen
 
Jan 2018

22·3·7 Posts
Default

Quote:
Originally Posted by CraigLo View Post
I use the GPU for everything except for some initialization and verifying large gaps.


I skip step 3 (BPSW). For numbers around 264 there is only about a 1 in 1E12 chance of being a SPRP-2. The chances are even lower for larger numbers.


I use a different sieving approach. I think what I use is the same as Seth's hybrid approach.


I sieve for the next interval at the same time I am doing prime checks. It is efficient to do a memory bandwidth limited operation at the same time as a computation limited operation. The latencies for the memory operations can be hidden by the computations in another kernel.
I have to say I really like the way you and Seth are thinking! Not conventional, but exploring new angles. I am trying to bend my head around the article Seth wrote regarding the hybrid approach and I probably will have to ask him for a Masterclass ;-) Especially about the specifics to make it applicable in code. I don't want to just copy, but understand what and why things are happening.

I am still not coding for the GPU yet, so a serious gap to close in practical and theoretical knowledge!

PS I will PM you to see how your sieve method compares to the treesieve Jens developed. That is the fastest sieving method I came accross yet.

Kind regards
Michiel
MJansen is online now   Reply With Quote
Old 2021-11-02, 15:43   #44
CraigLo
 
Mar 2021

738 Posts
Default

Treesieve looks like it is useful when you want to compute C%p for the same C with a lot of different primes. I've never tried it, but it seems like it is better for trial division than sieving.
https://www.mersenneforum.org/showpo...65&postcount=8
Is there a description of tree sieve anywhere?



Here are the basics of my approach. Let take an easy example and use A*11#/6.

With a divider of 6 we want A%6 to be 1 or 5. We choose one of them. I'll use 1, so A will be 1, 7, 13, ...


Find offsets from 1*11#/6 that can be prime after removing multiples of 2, 3, 5, 7, 11.
With 2 we eliminate ..., -5, -3, -1, 1, 3, 5, ...
With 3 we eliminate ..., -4, -1, 2, 5, ...
With 5 we eliminate ..., -10, -5, 0, 5, 10, ...
With 7 we eliminate ..., -14, -7, 0, 7, 14, ...
With 11 we eliminate ..., -22, -11, 0, 11, 22, ...
This leaves possible primes of ..., -8, -6, -2, 4, 6, 12, ...


Now sieve starting with 13 using each of the possible prime offsets.

With 13 and an offset of -2 we want to eliminate all cases where ((1 + 6*i)*11#/6 - 2)%13 = 0
The first i where this is true is 8.
We can add multiples of 13 to this and the results will still be divisible by 13.
So, for 13 and an offset of -2 we eliminate 8, 21, 34, ...
Repeat for all primes in our sieve and all offsets.


When doing a search I pick a fixed number of bits for the prime numbers. I usually pick a multiple of 32 which is a little more efficient. I then choose a primorial so that I can run for a couple days without needing to change the number of bits in my primes.


When looking for merits >25 I use offsets up to +/- 20*merit.


In the GPU I use 2 sieving algorithms. The first uses small primes and works in shared memory. The second uses large primes and works in global memory. I have an idea for a third that I need to experiment with that would go in between these 2. I sieve as deeply (in multiplier) as memory will allow.


I've never tried optimizing this in a CPU but I would probably take the same approach. One algorithm for small primes where the sieve interval fits in cache and one algorithm for larger primes.
CraigLo is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
HTML5 prime number searching? Dale Mahalko Software 7 2015-03-21 19:55
Elementary Literature on probabilistic aspects of prime searching hhh Math 7 2014-03-18 14:37
Question about multi-core prime searching. Trilo Riesel Prime Search 33 2013-09-19 21:21
idea for twin prime searching Mini-Geek Twin Prime Search 8 2011-01-30 00:18
Searching for user BlueSoda (no its not a prime) ltd Prime Sierpinski Project 3 2006-03-23 19:27

All times are UTC. The time now is 12:22.


Tue Nov 30 12:22:01 UTC 2021 up 130 days, 6:51, 0 users, load averages: 1.25, 1.12, 1.04

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.