mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Prime Gap Searches

Reply
 
Thread Tools
Old 2020-12-08, 02:31   #1
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

269 Posts
Default Combined Sieve

I'm happy to be sharing my prime-gap code and new Combined Sieve Algorithm (arXiv).

What's the combined sieve? It's a new algorithm that sieves multiple (combined) intervals at the same time at a dramatic speedup. I believe it's ~2x faster than PGSurround.pl and has a number of other fun features, but requires a lot more active maintenance to run.

---

The code has three parts: sieve, stats, test.
Sieve: Similar to DanaJ's surround prime you choose a constant primorial and divisor (which define K=P#/d) then the code sieves multiple m * K with primes up to a few billion (or trillion).
Stats: From the remaining unknown numbers calculate the most likely m to generate records.
Test: Test the remaining unknowns to find the surrounding primes.

---

Setup
Also found at https://github.com/sethtroisi/prime-gap#setup

I've upstream mpz_prevprime, but it's not part of any release yet so GMP needs to be built from source.

Code:
sudo apt install mercurial automake make bison autoconf libtool texinfo m4 

 hg clone https://gmplib.org/repo/gmp/
 cd gmp
./.bootstrap
mkdir build
cd build
../configure
make [-j4]
make check [-j4]
sudo make install
cd ../..
Additional primesieve and a few other libraries are needed

Code:
sudo apt install libprimesieve-dev

sudo apt install libmpfr-dev libmpc-dev
sudo apt install sqlite3 libsqlite3-dev

pip install --user gmpy2==2.1.0b5 primegapverify
Now we get to the real project

Code:
git clone https://github.com/sethtroisi/prime-gap.git
cd prime-gap

sqlite3 prime-gap-search.db < schema.sql
mkdir -p logs/ unknowns/

git clone --depth 10 https://github.com/primegap-list-project/prime-gap-list.git
sqlite3 gaps.db < prime-gap-list/allgaps.sql

make all
---

Running

sieve to sieve m * P#/d for m from a to a+b up to limit
Code:
./combined_sieve --save-unknowns -p <P> -d <d> --mstart <a> --minc <b> --sieve-length <X> --max-prime <limit>

# e.g. 907#/53970 sieving 10,000 in both directions up to 1B
./combined_sieve --save-unknowns -p 907 -d 53970 --mstart 1 --minc 10000 --sieve-length 15000 --max-prime 1000
# or in short form
./combined_sieve --save-unknowns --unknown-filename 907_53970_1_10000_s15000_l1000M.txt

999999937  (primes 24491666/26355868)    (seconds: 10.10/37.4 | per m: 0.0164)
    factors        29494942     (interval: 128088 avg m/large_prime interval: 0.4)
    unknowns   1241741/2276     (avg/m: 545.58) (composite: 98.18% +0.063% +42721)
    ~ 2x 23.22 PRP/m        (~ 3658.1 skipped PRP => 362.4 PRP/seconds)
...
It took 37.4 seconds (an average of ~0.0164 seconds per interval) to sieve up to 1 billion. We're left with ~545 unknowns per m. We'll have to test on average 23.22 numbers to find a prime.

You can hand inspect unknowns/907_53970_1_10000_s15000_l1000M.txt to get a sense of the output

Code:
$ head unknowns/907_53970_1_10000_s15000_l1000M.txt | cut -c1-100
0 : -279 +284 | -30 -54 -90 -120 -128 -140 -168 -180 -210 -240 -252 -288 -320 -350 -392 -432 -450
10 : -260 +290 | -24 -42 -60 -70 -120 -240 -252 -294 -300 -400 -588 -630 -672 -750 -784 -840 -882
12 : -278 +253 | -6 -12 -14 -20 -24 -36 -56 -80 -120 -140 -192 -216 -252 -294 -336 -392 -450 -500
16 : -264 +292 | -6 -40 -90 -96 -108 -126 -216 -294 -336 -360 -378 -384 -420 -448 -480 -490 -600
18 : -283 +271 | -2 -6 -30 -32 -42 -56 -60 -72 -90 -98 -180 -192 -240 -300 -320 -350 -378 -450 -500
22 : -268 +253 | -30 -70 -120 -162 -180 -250 -270 -294 -324 -336 -360 -480 -490 -600 -640 -750 -784
28 : -273 +257 | -12 -16 -28 -60 -72 -96 -112 -126 -180 -196 -210 -240 -280 -300 -336 -378 -448 -540
These lines tell us that (1+0)*907#/53970 + [-15000,1] has 279 remaining numbers (-30, -54, -90, ...) and + [1, 15000] has 284 remaining numbers (listed later in the line)


stats to produce stats run gap_stats on the unknown-file

Code:
./gap_stats --save-unknowns --unknown-filename 907_53970_1_10000_s15000_l1000M.txt
 
...
    RECORD : top   1% (   22) => sum(prob) = 4.30e-05 (avg: 1.95e-06)
    RECORD : top   5% (  113) => sum(prob) = 1.64e-04 (avg: 1.45e-06)
    RECORD : top  10% (  227) => sum(prob) = 2.81e-04 (avg: 1.24e-06)
    RECORD : top  20% (  455) => sum(prob) = 4.68e-04 (avg: 1.03e-06)
    RECORD : top  50% ( 1138) => sum(prob) = 8.59e-04 (avg: 7.55e-07)
    RECORD : top 100% ( 2276) => sum(prob) = 1.20e-03 (avg: 5.26e-07)
...
This prints some verbose output which says that even the best m's in this range has less than a 2e-6 chance of being a record gap (but I rigged this so we'll find one)

testing
Code:
# gap_test_simple tests all gaps and doesn't save results to db
$ ./gap_test_simple --unknown-filename 907_53970_1_10000_s15000_l1000M.txt --min-merit 15 --quiet
...
18792 21.7401  3823 * 907#/53970 -12178 to +6614
13818 15.9853  3937 * 907#/53970 -7132 to +6686
    m=4391  258 <- unknowns -> 285     5400 <- gap -> 840
        tests     1000       (23.47/sec)  43 seconds elapsed
        unknowns  545665     (avg: 545.66), 98.18% composite  49.91% <- % -> 50.09%
        prp tests 48515      (avg: 48.51) (1138.8 tests/sec)
21164 24.4724  5801 * 907#/53970 -9800 to +11364
13176 15.2339  6437 * 907#/53970 -9530 to +3646
15658 18.0947  9809 * 907#/53970 -12522 to +3136
    m=9997  257 <- unknowns -> 278       84 <- gap -> 1890
        tests     2276       (23.42/sec)  97 seconds elapsed
        unknowns  1241741    (avg: 545.58), 98.18% composite  49.94% <- % -> 50.06%
        prp tests 108729     (avg: 47.77) (1118.7 tests/sec)
...
Code:
# gap_test.py has lots of options (resuming, multithreaded, skipping)
$ ./gap_test.py --unknown-filename 907_53970_1_10000_s15000_l1000M.txt --min-merit 15 --prp-top-percent 40 --threads 4
...
13818  15.9853  3937 * 907#/53970 -6686 to +7132
21164  24.4724  5801 * 907#/53970 -11364 to +9800    RECORD!
...
    9997  257 <- unknowns ->  278    1890 <- gap ->    0
        tests      911       (69.4/sec, 40.2/sec)  13, 23 secs
        unknowns   482139    (avg: 529.24), 98.24% composite  49.91% <- % -> 50.09%
        prp tests  22327     (avg: 24.03/side, 24.51/m) (1701.154 tests/sec)
        side skips 893       (98.0%) (avg prob_record 8.03e-07/side, 4.11e-07/m/2, +95.3%)
        sum(prob_minmerit):       10, 6.69e+04/day    found: 2
        sum(prob_record):    0.00075, 4.91/day    found: 1
        merit      24.472    (at m=5801)
...
We can check if we found any records which tells us this record was already known
Code:
$ ./misc/record_check.py
    REDISCOVERED |  21164  24.472  5801 * 907#/53970 -11364 to +9800  (old: (24.4724, 376, '5801*907#/53970 - 11364', 'Jacobsen'))
---

limitations

* As of 2020 I'm still rapidly developing this code so it has a number of unhandled and un-optimized edge cases.
* For smaller p (< 1000? <1500?) there's significant overhead in writing numbers to disk, calculating gap_stats, saving to db, ... I would only recommend this code for p > 1500.
* Easy to delete data or end up with duplicate data. I'd suggest running for a couple of days then reporting all records and completely deleting prime-gap-search.db (don't get attached to the partial data it saves).

advantages
* Sieve with primes up to 2,000,000,000,000 very quickly
* Speeds up search by 30%-70% over PGSurround.pl thresholds.
* Sort and search only the most sparse intervals from thousands or millions of m
* Dynamically calculate if other side should be searched when one prime has been found
* Parallelization in gap_test stage
* Sublinear scaling in combined_sieve
SethTro is offline   Reply With Quote
Old 2020-12-09, 15:31   #2
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

2×29×101 Posts
Default

I am getting an error whenever it tries to access the database:
For example at the end of combined_sieve:
Code:
Saving unknowns to 'unknowns/997_210_10000_1000_s15000_l1000M.txt'

range INSERT failed 1: near "ON": syntax error
henryzz is online now   Reply With Quote
Old 2020-12-09, 22:20   #3
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

1000011012 Posts
Default

Quote:
Originally Posted by henryzz View Post
I am getting an error whenever it tries to access the database:
For example at the end of combined_sieve:
Code:
Saving unknowns to 'unknowns/997_210_10000_1000_s15000_l1000M.txt'

range INSERT failed 1: near "ON": syntax error
Can you check your sqlite3 version?

I'm using `on conflict` which I think was introduced in 3.24.

If you have version < 3.24 and updating isn't easy one hacky workaround is to just delete the "ON CONFLICT" lines https://github.com/sethtroisi/prime-....cpp#L336-L341 and https://github.com/sethtroisi/prime-....cpp#L295-L300
SethTro is offline   Reply With Quote
Old 2020-12-09, 22:45   #4
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

10110111000102 Posts
Default

It looks like it is probably time for me to upgrade Ubuntu 18.04 to 20.04. I have already had to upgrade python to 3.7 to attempt to get the python script running.
henryzz is online now   Reply With Quote
Old 2020-12-10, 09:47   #5
robert44444uk
 
robert44444uk's Avatar
 
Jun 2003
Oxford, UK

22·3·7·23 Posts
Default

Hi Seth

I don't know what software one needs to have, but searchers like me work in a Windows environment, or have the ability to emulate some other environments through a Windows shell.

Any chance of getting a c family code or perl?

Strawberry Perl allows code to work in Windows and msys64 allows me to create Windows executables from c.
robert44444uk is offline   Reply With Quote
Old 2020-12-10, 10:22   #6
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

269 Posts
Default

Quote:
Originally Posted by robert44444uk View Post
Hi Seth

I don't know what software one needs to have, but searchers like me work in a Windows environment, or have the ability to emulate some other environments through a Windows shell.

Any chance of getting a c family code or perl?

Strawberry Perl allows code to work in Windows and msys64 allows me to create Windows executables from c.
This is all c++ / python code, so technically it could work. It uses gmpy2 which is going to be ugly to install in windows. I'll see if I can cross compile with mingw but I'm not hopeful.
SethTro is offline   Reply With Quote
Old 2020-12-10, 18:06   #7
robert44444uk
 
robert44444uk's Avatar
 
Jun 2003
Oxford, UK

78C16 Posts
Default

Quote:
Originally Posted by SethTro View Post
This is all c++ / python code, so technically it could work. It uses gmpy2 which is going to be ugly to install in windows. I'll see if I can cross compile with mingw but I'm not hopeful.
Great f you can, Seth!
robert44444uk is offline   Reply With Quote
Old 2020-12-10, 19:48   #8
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

2×29×101 Posts
Default

Quote:
Originally Posted by robert44444uk View Post
Hi Seth

I don't know what software one needs to have, but searchers like me work in a Windows environment, or have the ability to emulate some other environments through a Windows shell.

Any chance of getting a c family code or perl?

Strawberry Perl allows code to work in Windows and msys64 allows me to create Windows executables from c.
I have been using WSL
henryzz is online now   Reply With Quote
Old 2020-12-11, 08:57   #9
robert44444uk
 
robert44444uk's Avatar
 
Jun 2003
Oxford, UK

22×3×7×23 Posts
Default

Quote:
Originally Posted by henryzz View Post
I have been using WSL
Well, that looks promising. So I can think if there is a Linux methodology to use Seth's program, then it should be possible to me to join in the fun.
robert44444uk is offline   Reply With Quote
Old 2020-12-11, 12:25   #10
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

2·29·101 Posts
Default

Quote:
Originally Posted by robert44444uk View Post
Well, that looks promising. So I can think if there is a Linux methodology to use Seth's program, then it should be possible to me to join in the fun.
For sanity's sake you will want to make sure that you are on a Ubuntu 20.04 version not 18.04 as that will have issues as mentioned above.

Any other issues would be useful to record so they can be added to the instructions.

@Seth
If I use gap_test.py to test the best 1% of a file, is there a way to test the next 1%? Is this the point of the database? Do I just run again?
Is there an easy way of extracting the results from the log for submission to the online database?
How many tests does pfgw run per call? There would be a lot less overhead if multiple are done in one call. I would imagine that this is more of an issue in WSL as there is extra overhead for disk access.
henryzz is online now   Reply With Quote
Old 2020-12-11, 23:23   #11
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

4158 Posts
Default

Quote:
Originally Posted by henryzz View Post
For sanity's sake you will want to make sure that you are on a Ubuntu 20.04 version not 18.04 as that will have issues as mentioned above.

Any other issues would be useful to record so they can be added to the instructions.
Please do comment on issues, I'll write this (sqlite3 > 3.18 + suggestion of Ubuntu 20.04) up in README.md#setup later today. Does this mean you got some/all of it running? under WSL?

Quote:
Originally Posted by henryzz View Post
@Seth
If I use gap_test.py to test the best 1% of a file, is there a way to test the next 1%? Is this the point of the database? Do I just run again?
Yes, the gap_test.py is smart, just run it again with what every total percent you want tested.

If you make the first call with `--top-prp-percent=1` you can quit half way through and all progress is recorded so when you run again it will skip finished tests. If you run again and change to `--top-prp-percent=5` it will test any record in the top 5% not yet tested (so 4% more if `--top-prp-percent=1` had already finished).

`gap_stats` should have told you an approximately optimal number which is generally 10-40%.

Quote:
Originally Posted by henryzz View Post
Is there an easy way of extracting the results from the log for submission to the online database?
Yes `./misc/record_check.py` will print all the records that improve over your local gaps.db file (occasionally you should cd into prime-gap-list and git pull). The output can be pasted into primegaps.cloudygo.com/

Quote:
Originally Posted by henryzz View Post
How many tests does pfgw run per call? There would be a lot less overhead if multiple are done in one call. I would imagine that this is more of an issue in WSL as there is extra overhead for disk access.
The code uses gmp for numbers with < 8000 bits and pfgw for larger numbers. Sadly it's currently making one call per prime (e.g. hundreds per interval) I haven't found any easy way to make multiple calls (maybe I can write a list of all the offsets to check like Robert did in https://www.mersenneforum.org/showpo...&postcount=203 but I wasn't sure if pfgw stopped after the first prime.

If the overhead is bad in the shortterm I'd recommend adjust the threshold (in gap_utils.py) up a bit.
SethTro is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
UPS /combined UPS and PS for P95 computer Christenson Hardware 12 2011-10-27 03:41
Combined sieving discussion ltd Prime Sierpinski Project 76 2008-07-25 11:44
Combined Sieve Guide Discussion Joe O Prime Sierpinski Project 35 2006-09-01 13:44
Combined Sieving? jaat Sierpinski/Riesel Base 5 5 2006-04-18 02:26
Sieve discussion Meaning of first/second pass, combined Citrix Prime Sierpinski Project 14 2005-12-31 19:39

All times are UTC. The time now is 20:10.

Tue Apr 20 20:10:48 UTC 2021 up 12 days, 14:51, 1 user, load averages: 3.07, 3.21, 3.08

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.