mersenneforum.org Prime Gap Theory
 Register FAQ Search Today's Posts Mark Forums Read

 2020-10-28, 21:41 #100 SethTro     "Seth" Apr 2019 3728 Posts Do less work, find more records I've have recently doubled my rate of finding records with two tricks. I have a new algorithm that speeds up sieving by ~10,000x. This means deeper sieving meaning fewer PRPs. I have two additional tricks that also give a huge speed advantage. For each 'm' I calculate prob(record gap) given the results of deep sieving, this forms a distribution (see attachment bottom row). I only test records from the top ~20% (why waste time on 'm' with low chance of finding a record). Once I have an m with high prob(record gap) I find prev_prime and recalculate prob(record gap | prev prime), 90% of the time prob(record gap | prev prime) < prob(record gap) / 2 so I skip finding next_prime as the chance of record is lower than going to the next m. I get (.1 * prob(record) + .9 * .75 * prob(record)) / (.1 * 2 + .9 * 1) => 70% of the prob(record) in covered in 50% of the time, in practice it's even better because the faster prev_prime are more likely to skip so the denominator is more like (0.1 * (1.5 + 1) + .9 * 0.5). These tricks mean that I'm finding ~54 records / day vs ~25 records / day (testing all m, both sides). Attached Thumbnails
 2020-10-29, 00:00 #101 VBCurtis     "Curtis" Feb 2005 Riverside, CA 33·173 Posts This search isn't my cup of tea, but those charts and the work they represent are prehhhhhhtty! Nice work, sir.
2020-10-29, 00:15   #102
SethTro

"Seth"
Apr 2019

2·53 Posts

Quote:
 Originally Posted by VBCurtis This search isn't my cup of tea, but those charts and the work they represent are prehhhhhhtty! Nice work, sir.
I shudder a little to think about it but I probably have 2000 hours into this project now :)

2020-10-29, 08:05   #103
robert44444uk

Jun 2003
Oxford, UK

22·13·37 Posts

Quote:
 Originally Posted by SethTro I've have recently doubled my rate of finding records with two tricks. I have a new algorithm that speeds up sieving by ~10,000x. This means deeper sieving meaning fewer PRPs. I have two additional tricks that also give a huge speed advantage. For each 'm' I calculate prob(record gap) given the results of deep sieving, this forms a distribution (see attachment bottom row). I only test records from the top ~20% (why waste time on 'm' with low chance of finding a record). Once I have an m with high prob(record gap) I find prev_prime and recalculate prob(record gap | prev prime), 90% of the time prob(record gap | prev prime) < prob(record gap) / 2 so I skip finding next_prime as the chance of record is lower than going to the next m. I get (.1 * prob(record) + .9 * .75 * prob(record)) / (.1 * 2 + .9 * 1) => 70% of the prob(record) in covered in 50% of the time, in practice it's even better because the faster prev_prime are more likely to skip so the denominator is more like (0.1 * (1.5 + 1) + .9 * 0.5). These tricks mean that I'm finding ~54 records / day vs ~25 records / day (testing all m, both sides).
Hi Seth

It would be useful, in this and your other post, #98 if you could explain your variables - so what is "m", "d", "p" "X" - generally, for deficient primorial multiples we have tended to use the convention k*p#/d , n*p#/d or A*p#/d but actually it does not matter what you use as long as it is explained. It is also helpful to provide a small worked example where this helps understanding.

Which language are you developing the sieve in? On this group we have tended to use perl, because of dana's work providing Perl solutions for sieves, surround_primes and prev_prime and next_prime. or c++ where c++ coded programs have been made executables.

But we are all interested in something that speeds up a sieve 10000 times :) Your results pickup is impressive.

2020-10-29, 09:46   #104
SethTro

"Seth"
Apr 2019

FA16 Posts

Quote:
 Originally Posted by robert44444uk Hi Seth It would be useful, in this and your other post, #98 if you could explain your variables - so what is "m", "d", "p" "X" - generally, for deficient primorial multiples we have tended to use the convention k*p#/d , n*p#/d or A*p#/d but actually it does not matter what you use as long as it is explained. It is also helpful to provide a small worked example where this helps understanding. Which language are you developing the sieve in? On this group we have tended to use perl, because of dana's work providing Perl solutions for sieves, surround_primes and prev_prime and next_prime. or c++ where c++ coded programs have been made executables. But we are all interested in something that speeds up a sieve 10000 times :) Your results pickup is impressive.
Happy to elaborate.

I wrote the sieve and related tools in c++, my first plan was to try and include it in dana's library but it's a little more special purpose (hopefully I can share later this week) and requires many non-generic functions (e.g. file writing stuff, database management) that doesn't belong in dana's library IMO. If you're interested in testing it out (or anyone) send me a PM.

I use the form N = m * p# / d - a., with m instead of your k.

---

X refers to the offset from N. similar to a above.
sieve length = the range on each side of N that I sieve (for an sieve interval of 2*X+1).

Looking at a smaller example

Code:
d optimizer for P = 101# | large prime=79 | sl=505 (5.0 merit)
Optimizing | d = 2# | 40 remaining,   322 avg gap | SL insufficient 1.871% of time
Optimizing | d = 3# | 39 remaining,   192 avg gap | SL insufficient 1.964% of time
Optimizing | d = 5# | 47 remaining,   146 avg gap | SL insufficient 0.793% of time
prob_prime_coprime = change that a number of this magnitude is prime assuming no factors less than 101
= 1 / log(101#) / ((1-1/2) * (1-1/3) * (1-1/5) ...) ~= 0.094680

for d = 2# = 2, only coprime modulo is [1] so we test N = 1*101#/2
If we look just at the interval 1*101#/2 + [1, 505] we find 40 coprime X (this includes removing the multiples of 2), the lower side (- [1, 505]) has a similar number. call these X_i.

Code:
1 * 101#/2 | 40 | 2 4 8 16 32 64 128 206 214 218 226 254 256 262 274 278 298 302 314 326 334 346 358 362 382 386 394 398 412 422 428 436 446 452 454 458 466 478 482 502
Code:
avg gap = sum( (1 - prob_prime)^(i-1) * prob_prime * X_i
SL insufficient = chance that next_prime > SL
= (1 - prob_prime)^(40) = (1 - 0.094680)^40 = 0.01871
---

for d = 3# = 6, there are two coprime residuals [1, 5]

These have fewer X (so a larger chance of SL being insufficient) but a smaller average value (so a small expected next_prime).

Code:
1 * 101#/6 | 39 | 2 6 8 12 18 24 32 36 48 54 72 96 108 128 144 162 192 206 216 218 254 278 288 302 314 324 326 362 384 386 398 422 428 432 446 452 458 482 486
5 * 101#/6 | 39 | 4 6 12 16 18 24 36 48 54 64 72 96 108 144 162 192 214 216 226 256 262 274 288 298 324 334 346 358 382 384 394 412 432 436 454 466 478 486 502
for d = 5# = 30, there are eight coprime residuals

They have an average of 47 coprime X_i, so a smaller chance of next_prime exceeding SL.

Code:
1 * 101#/30 | 46 | 4 10 12 18 24 30 40 48 54 60 64 72 90 100 108 120 144 150 160 162 180 192 214 240 250 262 270 274 288 298 300 324 334 358 360 382 384 394 400 412 432 450 454 478 480 502
7 * 101#/30 | 48 | 4 6 10 16 18 24 30 36 40 48 54 60 64 90 96 100 108 120 144 150 160 180 214 216 226 240 250 256 270 274 288 298 300 324 334 346 358 360 384 394 400 436 450 454 466 478 480 486
11 * 101#/30 | 48 | 2 8 12 18 20 24 30 32 48 50 54 60 72 80 90 108 120 128 144 150 162 180 192 200 218 240 254 270 278 288 300 302 314 320 324 360 362 384 398 422 428 432 450 452 458 480 482 500
13 * 101#/30 | 50 | 4 6 10 12 16 24 30 36 40 54 60 64 72 90 96 100 120 144 150 160 162 180 192 214 216 226 240 250 256 262 270 274 300 324 334 346 360 382 384 394 400 412 432 436 450 454 466 480 486 502
17 * 101#/30 | 45 | 6 8 18 20 24 30 36 48 50 54 60 80 90 96 108 120 128 144 150 180 200 206 216 218 240 254 270 278 288 300 314 320 324 326 360 384 386 398 428 446 450 458 480 486 500
19 * 101#/30 | 45 | 6 10 12 16 18 30 36 40 48 60 72 90 96 100 108 120 150 160 162 180 192 216 226 240 250 256 262 270 288 298 300 346 358 360 382 400 412 432 436 450 466 478 480 486 502
23 * 101#/30 | 46 | 2 6 12 20 24 30 32 36 50 54 60 72 80 90 96 120 144 150 162 180 192 200 206 216 240 254 270 300 302 314 320 324 326 360 362 384 386 422 432 446 450 452 480 482 486 500
29 * 101#/30 | 50 | 2 6 8 12 18 20 30 32 36 48 50 60 72 80 90 96 108 120 128 150 162 180 192 200 206 216 218 240 270 278 288 300 302 320 326 360 362 386 398 422 428 432 446 450 452 458 480 482 486 500
Looking back at my original example in #98 (but with slightly newer output)
The highest chance of having a next_prime gap > 9070 is with d = 5#.

Code:
d optimizer for P = 907# | SL=9070 (10.0 merit)
Optimizing | d =     1 *  1# | 973 remaining,  2809 avg gap | SL insufficient 0.000% of time
Optimizing | d =     1 *  2# | 688 remaining,  4856 avg gap | SL insufficient 0.006% of time
Optimizing | d =     1 *  3# | 473 remaining,  4535 avg gap | SL insufficient 0.123% of time
Optimizing | d =     1 *  5# | 423 remaining,  3541 avg gap | SL insufficient 0.245% of time
Optimizing | d =     1 *  7# | 432 remaining,  2770 avg gap | SL insufficient 0.214% of time
Optimizing | d =     1 * 11# | 458 remaining,  2328 avg gap | SL insufficient 0.144% of time
with P=2503, it's maximized with d=7#
Note that this maximization depends on SL too but that relationship is larger to explain.

Code:
d optimizer for P = 2503# | SL=25030 (10.0 merit)
Optimizing | d =     1 *  1# | 2396 remaining,  7864 avg gap | SL insufficient 0.000% of time
Optimizing | d =     1 *  2# | 1662 remaining, 14483 avg gap | SL insufficient 0.007% of time
Optimizing | d =     1 *  3# | 1092 remaining, 15502 avg gap | SL insufficient 0.189% of time
Optimizing | d =     1 *  5# | 918 remaining, 13562 avg gap | SL insufficient 0.512% of time
Optimizing | d =     1 *  7# | 891 remaining, 11227 avg gap | SL insufficient 0.595% of time
Optimizing | d =     1 * 11# | 923 remaining,  9492 avg gap | SL insufficient 0.491% of time

2020-10-29, 10:03   #105
SethTro

"Seth"
Apr 2019

2·53 Posts

Quote:
 Originally Posted by SethTro I've have recently doubled my rate of finding records with two tricks. I have a new algorithm that speeds up sieving by ~10,000x. This means deeper sieving meaning fewer PRPs. I have two additional tricks that also give a huge speed advantage. For each 'm' I calculate prob(record gap) given the results of deep sieving, this forms a distribution (see attachment bottom row). I only test records from the top ~20% (why waste time on 'm' with low chance of finding a record). Once I have an m with high prob(record gap) I find prev_prime and recalculate prob(record gap | prev prime), 90% of the time prob(record gap | prev prime) < prob(record gap) / 2 so I skip finding next_prime as the chance of record is lower than going to the next m. I get (.1 * prob(record) + .9 * .75 * prob(record)) / (.1 * 2 + .9 * 1) => 70% of the prob(record) in covered in 50% of the time, in practice it's even better because the faster prev_prime are more likely to skip so the denominator is more like (0.1 * (1.5 + 1) + .9 * 0.5). These tricks mean that I'm finding ~54 records / day vs ~25 records / day (testing all m, both sides).
To explain this post slightly more. This is the debug info from a run over N = [1,100,000] * 3001#/2190 note that M = 100,000 but only ~27,000 values of m are coprime with d=2190.

The 9 graphics are

Top Row:
1. probability of gap = X
2. observed gaps from 15,000 tests and average gap (16207)
3. cdf of gap <= X

Middle Row:
1. Before the run I calculate the probability of gap > 12 merit. orange line is the sum of those probabilities if I perform the tests in order, green if I perform the tests best to work, blue is the observed count of gaps with merit > 12 (notice how nicely it matches!)
2. Histogram of count of m with prob(gap > 12 merit)
3. CDF of over those probabilities (e.g. 100%-95% = 5% of m have more than a 0.0501 chance of having gap with merit greater than 12, 10% have greater than a 0.0474 chance.

Bottom Row (same as middle but with Prob(gap is record) instead of Prob(gap > 12 merit).
1. Note that 5 records were found (you can see them here: https://github.com/primegap-list-pro...ommit/88f092c1), this exceeds the expectation of finding ~2.2 records. I assume that's just some extra luck (my current run has found 34 and expects 40 from 65000 intervals).
2. Ditto middle row
3. My point in #100 was why test the 50% of records with probability of record < 0.0000951 when we could be testing the 5% of numbers with probability > 0.000148.

2020-10-29, 20:58   #106
mart_r

Dec 2008
you know...around...

2×313 Posts

Quote:
 Originally Posted by Bobby Jacobs What scene are you talking about? I have watched "A Goofy Movie" before, and I cannot recall a scene where the word "left" was used in the way I used it.
It was the scene where Goofy and Max were approaching the most important junction of their trip, and Goofy asked whether they go left or right, and Max took a long time until, at the last second, he decided they go left.

Quote:
 Originally Posted by SethTro These tricks mean that I'm finding ~54 records / day vs ~25 records / day (testing all m, both sides).
Wow! I feel obsolete - last time I searched for some gap records my laptop ran for two days without finding any... well, the laptop is already quite obsolete itself, but still...

2020-10-29, 21:52   #107
SethTro

"Seth"
Apr 2019

2·53 Posts

Quote:
 Originally Posted by mart_r Wow! I feel obsolete - last time I searched for some gap records my laptop ran for two days without finding any... well, the laptop is already quite obsolete itself, but still...
You might have also been searching a harder range (e.g. small numbers). When I'm working on multiples of 2221#/11# my computer finds 70/day, for 4441#/11# I find ~103/day, I haven't tried to maximize records per day but I suspect near P=6000 would yield even more.

Recent status report

Code:
        13922707  447 <- unknowns ->  498        660 <- gap ->    0
tests      77454     (1.97/sec, 7.1 secs/test)  39334, 550284 secs
unknowns   73644513  (avg: 950.82), 99.05% composite  49.99% <- % -> 50.01%
prp tests  7510657   (avg: 86.04/side, 96.97/m) (190.946 tests/sec)
side skips 67620     (87.3%) (avg prob_record 0.00054/side, 0.000337/m/2, +59.9%)
sum(prob_minmerit):  2.5e+03, 5.55e+03/day  found: 143
sum(prob_record):         47, 103/day       found: 53
fallback prev_gap 310 (0.4%), next_gap 54 (0.1%)
merit      21.251    (at m=13880539)

 2020-10-30, 10:53 #108 henryzz Just call me Henry     "David" Sep 2007 Cambridge (GMT/BST) 2·2,909 Posts Very interesting work on optimizing the choice of d#. Can't wait to try this new code. This has made me realize that I don't know the recommended scripts for gap searching currently. https://mersenneforum.org/showthread.php?t=20872 could do with updating for the post Thomas Nicely era and adding some links to recommended software/scripts/guides would be useful for any new participants. Is https://mersenneforum.org/showpost.p...&postcount=103 still the latest script? Is ATH's program from that thread still recommended? I have improvements to both that I can share if they aren't obsolete. I realize that some of this will probably be replaced by Seth's work.
2020-10-30, 15:39   #109
robert44444uk

Jun 2003
Oxford, UK

22·13·37 Posts

Quote:
 Originally Posted by henryzz Very interesting work on optimizing the choice of d#. Can't wait to try this new code. This has made me realize that I don't know the recommended scripts for gap searching currently. https://mersenneforum.org/showthread.php?t=20872 could do with updating for the post Thomas Nicely era and adding some links to recommended software/scripts/guides would be useful for any new participants. Is https://mersenneforum.org/showpost.p...&postcount=103 still the latest script? Is ATH's program from that thread still recommended? I have improvements to both that I can share if they aren't obsolete. I realize that some of this will probably be replaced by Seth's work.
There are two major approaches at the moment.
• Deficient primorials of the form A*p#/d form the majority of gap records, given that it is at a midpoint of a relatively smooth range 4 merits wide, and a truly efficient perl code is shown below - thanks to danaj.
• The other approach is primorial offsets, of the form A*p#+/-offset+/-midpoint where the offset is something that provides for a range if integers that are relatively smooth

Code:
#!/usr/bin/env perl
use warnings;
use strict;
#use Timer::Runtime;
use Math::GMPz;
use Math::BigFloat lib=>"GMP";
use Math::Prime::Util qw/:all/;
$|=1; my$nthreads = enter the number of threads you machine can cancel here;
my $startprim = shift || enter xth prime here; my$endprime = shift || enter yth prime here;
my $beg = shift || enter A here; my$end = shift || enter upper end A of search here;
my $mingaplen = enter the smallest gap you want to see here as a merit; my$divisor   = enter your divisor here;
my $delta = we usually put 4 here, but it will not search this further if a prime comes up in a range of merits + or - delta/2 ; my %merit; my %mindig; open(my$merits, '<', 'merits.txt') or die "Cannot open merits file\n";
while (<$merits>) { next unless /^\s*(\d+)\s+(\S+)/;$merit{$1} =$2;
$mindig{$1} = int( $1 / ($2 * 2.3025) + 1.01 );  # Slightly large
}
close($merits); my @acc = (1,5,7,11,13,17,19,23,25,29); #my @acc = (0..29); my @results :shared; foreach my$pp ($startprim ..$endprime) {
my $prim = pn_primorial($pp);
$prim = Math::BigInt->new("$prim") unless ref($prim) eq 'Math::BigInt'; my ($sn, $rem) =$prim->copy->bdiv($divisor); next unless$rem == 0;
next if ($sn *$end) < 4e18;

my $snlog = length($sn) * 2.3026;   # Approximately one merit
my $mingap = int($mingaplen * $snlog);$mingap = 1432 if $mingap < 1432; my$prev_thresh = int( $delta *$snlog );

my $nth = nth_prime($pp);
print "-$nth$pp $divisor ($beg..$end) \n"; my @threads; push @threads, threads->create('searchi',$_, $sn,$nth,$prev_thresh,$mingap) for 0 .. $nthreads-1;$_->join() for (@threads);

while (@results) {
print shift(@results);
}
}

sub merit {
my($n,$gap) = @_;
my $fgap = Math::BigFloat->new("$gap");
my $fn = Math::BigFloat->new("$n");
return sprintf("%.6lf", $fgap / log($fn));
}

sub searchi {
my($tnum,$sn,$nth,$prev_thresh,$mingap) = @_;$sn = Math::GMPz->new("$sn"); my$beg30 = int($beg/30); my$index = $tnum; while (1) { if ($tnum == 0 && @results) {
lock(@results);
while (@results) {
print shift(@results);
}
}
my $i = 30*($beg30 + int($index/10)) +$acc[$index % 10]; # my$i = 30*($beg30 + int($index/30)) + $acc[$index % 30];
$index +=$nthreads;
last if $i >$end;
my $n =$sn * $i; my($dprev,$dnext)=Math::Prime::Util::GMP::surround_primes($n,$prev_thresh); next if$dprev == 0 || $dnext == 0; # prime found inside thresh my$gap = $dprev +$dnext;
next unless $gap >$mingap;             # gap is obviously too small
my $gbeg =$n - $dprev; next if defined$mindig{$gap} && length($gbeg) > $mindig{$gap};
my $merit = merit($gbeg, $gap); next if defined$merit{$gap} &&$merit{$gap} >=$merit;
if (!is_extra_strong_lucas_pseudoprime($gbeg) || !is_extra_strong_lucas_pseudoprime($gbeg+$gap)) { print "# SPSP2 found for$i*$nth#/$divisor\n";
($dprev,$dnext) = Math::Prime::Util::GMP::surround_primes($n);$gap = $dprev +$dnext;
$gbeg =$n - $dprev;$merit = merit($gbeg,$gap);
next if defined $merit{$gap} && $merit{$gap} >= $merit; } { lock(@results); push @results, sprintf("%9.6f %8d PRP%d = %d*%d#/%d-%d\n",$merit, $gap, length($n), $i,$nth, $divisor,$dprev);
}
}
1;
}

 2020-10-31, 20:24 #110 MJansen   Jan 2018 41 Posts Impressive results Seth! I have been looking at different approaches myself and have been testing those at small integer sizes (<10^20) to see how often they deliver maximum prime gaps (a test of the m efficiency if you will). I have to say that my findings are inconclusive. Skipping m values will result in missing max prime gaps. So I am sticking to much slower search routines that include all m (m = all odd multipliers in center = m * p# / d). If you do want to limit the numbers of m to test, my observation is that m = not squarefree gives more maximum prime gaps than searching for m = prime or m = (squarefree and not prime). If you do want to increase merit, I would suggest the following: I do find that adding an extra q to the center value (center = q * m * p# / d) delivers larger merits. Note: m and q are both prime. Just my 2 cents. Kind regards Michiel

 Similar Threads Thread Thread Starter Forum Replies Last Post Nick Number Theory Discussion Group 6 2016-10-14 19:38 firejuggler Math 0 2016-07-11 23:09 Ricie Miscellaneous Math 24 2009-08-14 15:31 jasong Math 3 2005-05-15 04:01 clowns789 Miscellaneous Math 5 2004-01-08 17:09

All times are UTC. The time now is 13:01.

Sat Feb 27 13:01:30 UTC 2021 up 86 days, 9:12, 0 users, load averages: 2.07, 2.14, 2.09