Thread: Combined Sieve
View Single Post
Old 2020-12-08, 02:31   #1
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

3·7·13 Posts
Default Combined Sieve

I'm happy to be sharing my prime-gap code and new Combined Sieve Algorithm (arXiv).

What's the combined sieve? It's a new algorithm that sieves multiple (combined) intervals at the same time at a dramatic speedup. I believe it's ~2x faster than PGSurround.pl and has a number of other fun features, but requires a lot more active maintenance to run.

---

The code has three parts: sieve, stats, test.
Sieve: Similar to DanaJ's surround prime you choose a constant primorial and divisor (which define K=P#/d) then the code sieves multiple m * K with primes up to a few billion (or trillion).
Stats: From the remaining unknown numbers calculate the most likely m to generate records.
Test: Test the remaining unknowns to find the surrounding primes.

---

Setup
Also found at https://github.com/sethtroisi/prime-gap#setup

I've upstream mpz_prevprime, but it's not part of any release yet so GMP needs to be built from source.

Code:
sudo apt install mercurial automake make bison autoconf libtool texinfo m4 

 hg clone https://gmplib.org/repo/gmp/
 cd gmp
./.bootstrap
mkdir build
cd build
../configure
make [-j4]
make check [-j4]
sudo make install
cd ../..
Additional primesieve and a few other libraries are needed

Code:
sudo apt install libprimesieve-dev

sudo apt install libmpfr-dev libmpc-dev
sudo apt install sqlite3 libsqlite3-dev

pip install --user gmpy2==2.1.0b5 primegapverify
Now we get to the real project

Code:
git clone https://github.com/sethtroisi/prime-gap.git
cd prime-gap

sqlite3 prime-gap-search.db < schema.sql
mkdir -p logs/ unknowns/

git clone --depth 10 https://github.com/primegap-list-project/prime-gap-list.git
sqlite3 gaps.db < prime-gap-list/allgaps.sql

make all
---

Running

sieve to sieve m * P#/d for m from a to a+b up to limit
Code:
./combined_sieve --save-unknowns -p <P> -d <d> --mstart <a> --minc <b> --sieve-length <X> --max-prime <limit>

# e.g. 907#/53970 sieving 10,000 in both directions up to 1B
./combined_sieve --save-unknowns -p 907 -d 53970 --mstart 1 --minc 10000 --sieve-length 15000 --max-prime 1000
# or in short form
./combined_sieve --save-unknowns --unknown-filename 907_53970_1_10000_s15000_l1000M.txt

999999937  (primes 24491666/26355868)    (seconds: 10.10/37.4 | per m: 0.0164)
    factors        29494942     (interval: 128088 avg m/large_prime interval: 0.4)
    unknowns   1241741/2276     (avg/m: 545.58) (composite: 98.18% +0.063% +42721)
    ~ 2x 23.22 PRP/m        (~ 3658.1 skipped PRP => 362.4 PRP/seconds)
...
It took 37.4 seconds (an average of ~0.0164 seconds per interval) to sieve up to 1 billion. We're left with ~545 unknowns per m. We'll have to test on average 23.22 numbers to find a prime.

You can hand inspect unknowns/907_53970_1_10000_s15000_l1000M.txt to get a sense of the output

Code:
$ head unknowns/907_53970_1_10000_s15000_l1000M.txt | cut -c1-100
0 : -279 +284 | -30 -54 -90 -120 -128 -140 -168 -180 -210 -240 -252 -288 -320 -350 -392 -432 -450
10 : -260 +290 | -24 -42 -60 -70 -120 -240 -252 -294 -300 -400 -588 -630 -672 -750 -784 -840 -882
12 : -278 +253 | -6 -12 -14 -20 -24 -36 -56 -80 -120 -140 -192 -216 -252 -294 -336 -392 -450 -500
16 : -264 +292 | -6 -40 -90 -96 -108 -126 -216 -294 -336 -360 -378 -384 -420 -448 -480 -490 -600
18 : -283 +271 | -2 -6 -30 -32 -42 -56 -60 -72 -90 -98 -180 -192 -240 -300 -320 -350 -378 -450 -500
22 : -268 +253 | -30 -70 -120 -162 -180 -250 -270 -294 -324 -336 -360 -480 -490 -600 -640 -750 -784
28 : -273 +257 | -12 -16 -28 -60 -72 -96 -112 -126 -180 -196 -210 -240 -280 -300 -336 -378 -448 -540
These lines tell us that (1+0)*907#/53970 + [-15000,1] has 279 remaining numbers (-30, -54, -90, ...) and + [1, 15000] has 284 remaining numbers (listed later in the line)


stats to produce stats run gap_stats on the unknown-file

Code:
./gap_stats --save-unknowns --unknown-filename 907_53970_1_10000_s15000_l1000M.txt
 
...
    RECORD : top   1% (   22) => sum(prob) = 4.30e-05 (avg: 1.95e-06)
    RECORD : top   5% (  113) => sum(prob) = 1.64e-04 (avg: 1.45e-06)
    RECORD : top  10% (  227) => sum(prob) = 2.81e-04 (avg: 1.24e-06)
    RECORD : top  20% (  455) => sum(prob) = 4.68e-04 (avg: 1.03e-06)
    RECORD : top  50% ( 1138) => sum(prob) = 8.59e-04 (avg: 7.55e-07)
    RECORD : top 100% ( 2276) => sum(prob) = 1.20e-03 (avg: 5.26e-07)
...
This prints some verbose output which says that even the best m's in this range has less than a 2e-6 chance of being a record gap (but I rigged this so we'll find one)

testing
Code:
# gap_test_simple tests all gaps and doesn't save results to db
$ ./gap_test_simple --unknown-filename 907_53970_1_10000_s15000_l1000M.txt --min-merit 15 --quiet
...
18792 21.7401  3823 * 907#/53970 -12178 to +6614
13818 15.9853  3937 * 907#/53970 -7132 to +6686
    m=4391  258 <- unknowns -> 285     5400 <- gap -> 840
        tests     1000       (23.47/sec)  43 seconds elapsed
        unknowns  545665     (avg: 545.66), 98.18% composite  49.91% <- % -> 50.09%
        prp tests 48515      (avg: 48.51) (1138.8 tests/sec)
21164 24.4724  5801 * 907#/53970 -9800 to +11364
13176 15.2339  6437 * 907#/53970 -9530 to +3646
15658 18.0947  9809 * 907#/53970 -12522 to +3136
    m=9997  257 <- unknowns -> 278       84 <- gap -> 1890
        tests     2276       (23.42/sec)  97 seconds elapsed
        unknowns  1241741    (avg: 545.58), 98.18% composite  49.94% <- % -> 50.06%
        prp tests 108729     (avg: 47.77) (1118.7 tests/sec)
...
Code:
# gap_test.py has lots of options (resuming, multithreaded, skipping)
$ ./gap_test.py --unknown-filename 907_53970_1_10000_s15000_l1000M.txt --min-merit 15 --prp-top-percent 40 --threads 4
...
13818  15.9853  3937 * 907#/53970 -6686 to +7132
21164  24.4724  5801 * 907#/53970 -11364 to +9800    RECORD!
...
    9997  257 <- unknowns ->  278    1890 <- gap ->    0
        tests      911       (69.4/sec, 40.2/sec)  13, 23 secs
        unknowns   482139    (avg: 529.24), 98.24% composite  49.91% <- % -> 50.09%
        prp tests  22327     (avg: 24.03/side, 24.51/m) (1701.154 tests/sec)
        side skips 893       (98.0%) (avg prob_record 8.03e-07/side, 4.11e-07/m/2, +95.3%)
        sum(prob_minmerit):       10, 6.69e+04/day    found: 2
        sum(prob_record):    0.00075, 4.91/day    found: 1
        merit      24.472    (at m=5801)
...
We can check if we found any records which tells us this record was already known
Code:
$ ./misc/record_check.py
    REDISCOVERED |  21164  24.472  5801 * 907#/53970 -11364 to +9800  (old: (24.4724, 376, '5801*907#/53970 - 11364', 'Jacobsen'))
---

limitations

* As of 2020 I'm still rapidly developing this code so it has a number of unhandled and un-optimized edge cases.
* For smaller p (< 1000? <1500?) there's significant overhead in writing numbers to disk, calculating gap_stats, saving to db, ... I would only recommend this code for p > 1500.
* Easy to delete data or end up with duplicate data. I'd suggest running for a couple of days then reporting all records and completely deleting prime-gap-search.db (don't get attached to the partial data it saves).

advantages
* Sieve with primes up to 2,000,000,000,000 very quickly
* Speeds up search by 30%-70% over PGSurround.pl thresholds.
* Sort and search only the most sparse intervals from thousands or millions of m
* Dynamically calculate if other side should be searched when one prime has been found
* Parallelization in gap_test stage
* Sublinear scaling in combined_sieve
SethTro is offline   Reply With Quote