Hi Craig,
appreciated, seems like sound advice! Goal is to get started with some coding and if CGBN is NVIDIA supported, it might develop further in the future. Will try CGBN. Another advantage is that it already has a probable prime test incorporated (Miller  Rabin according to Seth). Regarding your response to CNBG groups, we'll have to find out ;) Have to say the CUDA terminology is not that self explanatory (SM, Kernel, Thread, Block, Cores .... @Robert, I am hesitant to switch to Unix just yet, I will stick to windows for now. Regards Michiel 
This is very cool. Now, when do you believe you will have the new code working to confirm the new maximal prime gaps?

[QUOTE=Bobby Jacobs;590601]This is very cool. Now, when do you believe you will have the new code working to confirm the new maximal prime gaps?[/QUOTE]
Hi Bobby, not sure who you are reacting to. If it was me let me be clear: I am not going to put energy into making a deterministic gapsearch from 2^64 onwards. I will be experimenting with coding for a GPU to see how much more efficient that is compared to the traditional CPU gap search. Update on that: I took a detour and have now installed Ubuntu under windows (WSL2?) and am currently learning the basics of the GMP datatype eccentricities. GMP is not very intuitive I must admit and I have not begun to look for speed optimizations yet. So slow going still. But I have a basic understanding of Ubuntu, GMP and C++. In summary: still confused but at a higher level ;) And coding for a GPU would be the next step, after I get to grips with GMP, C++ and Cuda (necessary to talk to the GPU). So not sure how this journey (in my spare time) will develop, but I am still finding it interesting! Kind regards Michiel 
@Dana: I tried GMP_nextprime (primorial value = 10000455041*3607#/210) and it took 39 seconds under Ubuntu GMP compiled with g++. A simple gcd wheel of 9699690 took only 18 seconds to find the gap end on the plus side (+26734). And around 42 secs for the minus side (47298).
Dana's Perl module (prime utils under Ubuntu Perl) takes only 39 seconds to find both the plus and the minus side of the gap. But also uses the gmp_nextprime code. Not sure where the speed up is coming from, maybe the datatype of n is important here. Will look into it further ... If somebody has a cpu script that improves Dana's 39 seconds, I am interested ;) But your 
[QUOTE=MJansen;591074]Hi Bobby,
not sure who you are reacting to. If it was me let me be clear: I am not going to put energy into making a deterministic gapsearch from 2^64 onwards. I will be experimenting with coding for a GPU to see how much more efficient that is compared to the traditional CPU gap search. Update on that: I took a detour and have now installed Ubuntu under windows (WSL2?) and am currently learning the basics of the GMP datatype eccentricities. GMP is not very intuitive I must admit and I have not begun to look for speed optimizations yet. So slow going still. But I have a basic understanding of Ubuntu, GMP and C++. In summary: still confused but at a higher level ;) And coding for a GPU would be the next step, after I get to grips with GMP, C++ and Cuda (necessary to talk to the GPU). So not sure how this journey (in my spare time) will develop, but I am still finding it interesting! Kind regards Michiel[/QUOTE] You may be aware of this already but mpz_class (the c++ bindings for gmp) is quite a bit less clunky than mpz_t although some functions require mpz_t (which can be easily extracted from mpz_class). 
Hi Henryzz,
I saw that there are some samples with the mpz_class but have so far only used the mpz_t datatype after mixing up different types in the wrong places ;) Thanks for the tip! Kind regards Michiel 
[QUOTE=MJansen;591074]
Hi Bobby, not sure who you are reacting to. [/QUOTE] I was reacting to you. Since this thread used to be part of the "New Maximal Gaps" thread, I assumed that you were using this code to find maximal gaps greater than 2[SUP]64[/SUP]. 
I did an update on the conceptual gapsearch approach I want to take, and am curious if there are other thoughts (discourse is the academical basis for improvement after all):
1. pretest Steps: Use primorials for pretesting, or better primorial intervals like: pri1 = P(1)*P(2)*...*P(20), pri2 = P(21)*P(22)*...*P(40), etc .. Q: or P(1100), P(101200), ...) > testing needed! Determine the remainder R1: R1 = Multiplier*P(x)#/Divider % pri1 After that, go through the interval [0, +2, +4, +6, etc] and then if the interval value is not yet set to composite determine if gcd(pri1,R1+interval value)!=1, if so set it to composite, else goto next interval value Determine the remainder R2=m*P(x)#/D % pri2, loop through the interval and test if an interval value is not found composite already, then determine if gcd(pri2,R2+interval value)!=1, if so set it to composite, else goto next interval value etc, etc, till the desired depth of presieving is reached Note: Jens KA had a ball park figure of (d=log10(M*P(x)#/D)) and max presieve prime = d/2,5)^2,5, this was way too high according to my tests, I use 40.000*x if x > 250. For M*P(504)#/210 this would mean in JKA's formula: d = ca 1500 and the max prime would be in the region of 2.31*10^17. In my approach it would mean stopping at the prime a little over 2*10^7. There is a trade off and the larger the x, the more interesting presieving is of course. After presieve is done, check if remaining candidates in the interval are below a threshold count, if so the chance of finding a large gap are bigger, so discard the denser intervals (time is too short to test all possible intervals anyway, look where the chances are higher) Q: 1 or 2 intervals to pretest? 1 interval to pretest [start = 20*x, end = 20*x] and later split into plus and minus for step 3, or 2 plus [0, +2, +4, +6, ... , 20*x] and minus interval [0, 2, 4, 6, ... , 20*x] pretested separately in two threads? 2. PRP (CPUPfgw or GPUcode to be written) the remaining candidates Q I tend towards skipping the PRP step all together, but some testing needed! 3. Perform Extra strong BSWP test on the remaing candidates to avoid missing an even bigger gap if a PRP is found below 3*P(x), stop the search end move to the next Multiplier Q how to get the GPU involved, for step 3 only? or would it make sense to do some presieving or even PRPing on the GPU? Q does the GPUCGBN implementation of the Miller Rabin test result in strong enough PRP's? In general: I found Multipliers with remainder 1 mod 30 to be effective (as are the multipliers remainder 7, 11, 13, 17, 19, 23, 29 mod 30) for finding bigger primes. Remainder 5 and 25 give more gaps over merit 20, but not the highest merits. I am running some extensive tests on the efficiency of the Multiplier (M) and Divider (D) that I will share when its done. Oh and I am making a complete switch from windows to UbuntuC/C++ for gapsearching at the moment. It is a lot faster and after an initial steep learning curve it seems to get easier every day. Kind regards Michiel 
I use the GPU for everything except for some initialization and verifying large gaps.
I skip step 3 (BPSW). For numbers around 2[SUP]64[/SUP] there is only about a 1 in 1E12 chance of being a SPRP2. The chances are even lower for larger numbers. I use a different sieving approach. I think what I use is the same as Seth's hybrid approach. I sieve for the next interval at the same time I am doing prime checks. It is efficient to do a memory bandwidth limited operation at the same time as a computation limited operation. The latencies for the memory operations can be hidden by the computations in another kernel. 
[QUOTE=CraigLo;592178]I use the GPU for everything except for some initialization and verifying large gaps.
I skip step 3 (BPSW). For numbers around 2[SUP]64[/SUP] there is only about a 1 in 1E12 chance of being a SPRP2. The chances are even lower for larger numbers. I use a different sieving approach. I think what I use is the same as Seth's hybrid approach. I sieve for the next interval at the same time I am doing prime checks. It is efficient to do a memory bandwidth limited operation at the same time as a computation limited operation. The latencies for the memory operations can be hidden by the computations in another kernel.[/QUOTE] I have to say I really like the way you and Seth are thinking! Not conventional, but exploring new angles. I am trying to bend my head around the article Seth wrote regarding the hybrid approach and I probably will have to ask him for a Masterclass ;) Especially about the specifics to make it applicable in code. I don't want to just copy, but understand what and why things are happening. I am still not coding for the GPU yet, so a serious gap to close in practical and theoretical knowledge! PS I will PM you to see how your sieve method compares to the treesieve Jens developed. That is the fastest sieving method I came accross yet. Kind regards Michiel 
Treesieve looks like it is useful when you want to compute C%p for the same C with a lot of different primes. I've never tried it, but it seems like it is better for trial division than sieving.
[URL]https://www.mersenneforum.org/showpost.php?p=152965&postcount=8[/URL] Is there a description of tree sieve anywhere? Here are the basics of my approach. Let take an easy example and use A*11#/6. With a divider of 6 we want A%6 to be 1 or 5. We choose one of them. I'll use 1, so A will be 1, 7, 13, ... Find offsets from 1*11#/6 that can be prime after removing multiples of 2, 3, 5, 7, 11. With 2 we eliminate ..., 5, 3, 1, 1, 3, 5, ... With 3 we eliminate ..., 4, 1, 2, 5, ... With 5 we eliminate ..., 10, 5, 0, 5, 10, ... With 7 we eliminate ..., 14, 7, 0, 7, 14, ... With 11 we eliminate ..., 22, 11, 0, 11, 22, ... This leaves possible primes of ..., 8, 6, 2, 4, 6, 12, ... Now sieve starting with 13 using each of the possible prime offsets. With 13 and an offset of 2 we want to eliminate all cases where ((1 + 6*i)*11#/6  2)%13 = 0 The first i where this is true is 8. We can add multiples of 13 to this and the results will still be divisible by 13. So, for 13 and an offset of 2 we eliminate 8, 21, 34, ... Repeat for all primes in our sieve and all offsets. When doing a search I pick a fixed number of bits for the prime numbers. I usually pick a multiple of 32 which is a little more efficient. I then choose a primorial so that I can run for a couple days without needing to change the number of bits in my primes. When looking for merits >25 I use offsets up to +/ 20*merit. In the GPU I use 2 sieving algorithms. The first uses small primes and works in shared memory. The second uses large primes and works in global memory. I have an idea for a third that I need to experiment with that would go in between these 2. I sieve as deeply (in multiplier) as memory will allow. I've never tried optimizing this in a CPU but I would probably take the same approach. One algorithm for small primes where the sieve interval fits in cache and one algorithm for larger primes. 
All times are UTC. The time now is 06:24. 
Powered by vBulletin® Version 3.8.11
Copyright ©2000  2022, Jelsoft Enterprises Ltd.