mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Prime Gap Searches (https://www.mersenneforum.org/forumdisplay.php?f=131)
-   -   Combined Sieve (https://www.mersenneforum.org/showthread.php?t=26286)

SethTro 2020-12-08 02:31

Combined Sieve
 
I'm happy to be sharing my [URL="https://github.com/sethtroisi/prime-gap/"]prime-gap code[/URL] and new [B]Combined Sieve Algorithm[/B] ([URL="https://arxiv.org/pdf/2012.03771.pdf"]arXiv[/URL]).

What's the combined sieve? It's a new algorithm that sieves multiple (combined) intervals at the same time at a dramatic speedup. I believe it's ~2x faster than PGSurround.pl and has a number of other fun features, but requires a lot more active maintenance to run.

---

The code has three parts: sieve, stats, test.
[B]Sieve[/B]: Similar to DanaJ's surround prime you choose a constant primorial and divisor (which define K=P#/d) then the code sieves multiple m * K with primes up to a few billion (or trillion).
[B]Stats[/B]: From the remaining unknown numbers calculate the most likely m to generate records.
[B]Test[/B]: Test the remaining unknowns to find the surrounding primes.

---

[B]Setup[/B]
Also found at [URL]https://github.com/sethtroisi/prime-gap#setup[/URL]

I've upstream mpz_prevprime, but it's not part of any release yet so GMP needs to be built from source.

[CODE]
sudo apt install mercurial automake make bison autoconf libtool texinfo m4

hg clone https://gmplib.org/repo/gmp/
cd gmp
./.bootstrap
mkdir build
cd build
../configure
make [-j4]
make check [-j4]
sudo make install
cd ../..
[/CODE]Additional [URL="https://github.com/kimwalisch/primesieve/"]primesieve[/URL] and a few other libraries are needed

[CODE]
sudo apt install libprimesieve-dev

sudo apt install libmpfr-dev libmpc-dev
sudo apt install sqlite3 libsqlite3-dev

pip install --user gmpy2==2.1.0b5 primegapverify
[/CODE]Now we get to the real project

[CODE]
git clone https://github.com/sethtroisi/prime-gap.git
cd prime-gap

sqlite3 prime-gap-search.db < schema.sql
mkdir -p logs/ unknowns/

git clone --depth 10 https://github.com/primegap-list-project/prime-gap-list.git
sqlite3 gaps.db < prime-gap-list/allgaps.sql

make all
[/CODE]---

Running

[B]sieve[/B] to sieve m * P#/d for m from a to a+b up to limit
[CODE]
./combined_sieve --save-unknowns -p <P> -d <d> --mstart <a> --minc <b> --sieve-length <X> --max-prime <limit>

# e.g. 907#/53970 sieving 10,000 in both directions up to 1B
./combined_sieve --save-unknowns -p 907 -d 53970 --mstart 1 --minc 10000 --sieve-length 15000 --max-prime 1000
# or in short form
./combined_sieve --save-unknowns --unknown-filename 907_53970_1_10000_s15000_l1000M.txt

999999937 (primes 24491666/26355868) (seconds: 10.10/37.4 | per m: 0.0164)
factors 29494942 (interval: 128088 avg m/large_prime interval: 0.4)
unknowns 1241741/2276 (avg/m: 545.58) (composite: 98.18% +0.063% +42721)
~ 2x 23.22 PRP/m (~ 3658.1 skipped PRP => 362.4 PRP/seconds)
...


[/CODE]It took 37.4 seconds (an average of ~0.0164 seconds per interval) to sieve up to 1 billion. We're left with ~545 unknowns per m. We'll have to test on average 23.22 numbers to find a prime.

You can hand inspect unknowns/907_53970_1_10000_s15000_l1000M.txt to get a sense of the output

[CODE]
$ head unknowns/907_53970_1_10000_s15000_l1000M.txt | cut -c1-100
0 : -279 +284 | -30 -54 -90 -120 -128 -140 -168 -180 -210 -240 -252 -288 -320 -350 -392 -432 -450
10 : -260 +290 | -24 -42 -60 -70 -120 -240 -252 -294 -300 -400 -588 -630 -672 -750 -784 -840 -882
12 : -278 +253 | -6 -12 -14 -20 -24 -36 -56 -80 -120 -140 -192 -216 -252 -294 -336 -392 -450 -500
16 : -264 +292 | -6 -40 -90 -96 -108 -126 -216 -294 -336 -360 -378 -384 -420 -448 -480 -490 -600
18 : -283 +271 | -2 -6 -30 -32 -42 -56 -60 -72 -90 -98 -180 -192 -240 -300 -320 -350 -378 -450 -500
22 : -268 +253 | -30 -70 -120 -162 -180 -250 -270 -294 -324 -336 -360 -480 -490 -600 -640 -750 -784
28 : -273 +257 | -12 -16 -28 -60 -72 -96 -112 -126 -180 -196 -210 -240 -280 -300 -336 -378 -448 -540
[/CODE]These lines tell us that (1+0)*907#/53970 + [-15000,1] has 279 remaining numbers (-30, -54, -90, ...) and + [1, 15000] has 284 remaining numbers (listed later in the line)


[B]stats[/B] to produce stats run gap_stats on the unknown-file

[CODE]
./gap_stats --save-unknowns --unknown-filename 907_53970_1_10000_s15000_l1000M.txt

...
RECORD : top 1% ( 22) => sum(prob) = 4.30e-05 (avg: 1.95e-06)
RECORD : top 5% ( 113) => sum(prob) = 1.64e-04 (avg: 1.45e-06)
RECORD : top 10% ( 227) => sum(prob) = 2.81e-04 (avg: 1.24e-06)
RECORD : top 20% ( 455) => sum(prob) = 4.68e-04 (avg: 1.03e-06)
RECORD : top 50% ( 1138) => sum(prob) = 8.59e-04 (avg: 7.55e-07)
RECORD : top 100% ( 2276) => sum(prob) = 1.20e-03 (avg: 5.26e-07)
...
[/CODE]This prints some verbose output which says that even the best m's in this range has less than a 2e-6 chance of being a record gap (but I rigged this so we'll find one)

[B]testing[/B]
[CODE]
# gap_test_simple tests all gaps and doesn't save results to db
$ ./gap_test_simple --unknown-filename 907_53970_1_10000_s15000_l1000M.txt --min-merit 15 --quiet
...
18792 21.7401 3823 * 907#/53970 -12178 to +6614
13818 15.9853 3937 * 907#/53970 -7132 to +6686
m=4391 258 <- unknowns -> 285 5400 <- gap -> 840
tests 1000 (23.47/sec) 43 seconds elapsed
unknowns 545665 (avg: 545.66), 98.18% composite 49.91% <- % -> 50.09%
prp tests 48515 (avg: 48.51) (1138.8 tests/sec)
21164 24.4724 5801 * 907#/53970 -9800 to +11364
13176 15.2339 6437 * 907#/53970 -9530 to +3646
15658 18.0947 9809 * 907#/53970 -12522 to +3136
m=9997 257 <- unknowns -> 278 84 <- gap -> 1890
tests 2276 (23.42/sec) 97 seconds elapsed
unknowns 1241741 (avg: 545.58), 98.18% composite 49.94% <- % -> 50.06%
prp tests 108729 (avg: 47.77) (1118.7 tests/sec)
...
[/CODE][CODE]
# gap_test.py has lots of options (resuming, multithreaded, skipping)
$ ./gap_test.py --unknown-filename 907_53970_1_10000_s15000_l1000M.txt --min-merit 15 --prp-top-percent 40 --threads 4
...
13818 15.9853 3937 * 907#/53970 -6686 to +7132
21164 24.4724 5801 * 907#/53970 -11364 to +9800 RECORD!
...
9997 257 <- unknowns -> 278 1890 <- gap -> 0
tests 911 (69.4/sec, 40.2/sec) 13, 23 secs
unknowns 482139 (avg: 529.24), 98.24% composite 49.91% <- % -> 50.09%
prp tests 22327 (avg: 24.03/side, 24.51/m) (1701.154 tests/sec)
side skips 893 (98.0%) (avg prob_record 8.03e-07/side, 4.11e-07/m/2, +95.3%)
sum(prob_minmerit): 10, 6.69e+04/day found: 2
sum(prob_record): 0.00075, 4.91/day found: 1
merit 24.472 (at m=5801)
...
[/CODE]We can check if we found any records which tells us this record was already known
[CODE]
$ ./misc/record_check.py
REDISCOVERED | 21164 24.472 5801 * 907#/53970 -11364 to +9800 (old: (24.4724, 376, '5801*907#/53970 - 11364', 'Jacobsen'))
[/CODE]---

[B]limitations[/B]

* As of 2020 I'm still rapidly developing this code so it has a number of unhandled and un-optimized edge cases.
* For smaller p (< 1000? <1500?) there's significant overhead in writing numbers to disk, calculating gap_stats, saving to db, ... I would only recommend this code for p > 1500.
* Easy to delete data or end up with duplicate data. I'd suggest running for a couple of days then reporting all records and completely deleting prime-gap-search.db (don't get attached to the partial data it saves).

[B]advantages[/B]
* Sieve with primes up to 2,000,000,000,000 very quickly
* Speeds up search by 30%-70% over PGSurround.pl thresholds.
* Sort and search only the most sparse intervals from thousands or millions of m
* Dynamically calculate if other side should be searched when one prime has been found
* Parallelization in gap_test stage
* Sublinear scaling in combined_sieve

henryzz 2020-12-09 15:31

I am getting an error whenever it tries to access the database:
For example at the end of combined_sieve:
[CODE]
Saving unknowns to 'unknowns/997_210_10000_1000_s15000_l1000M.txt'

range INSERT failed 1: near "ON": syntax error[/CODE]

SethTro 2020-12-09 22:20

[QUOTE=henryzz;565769]I am getting an error whenever it tries to access the database:
For example at the end of combined_sieve:
[CODE]
Saving unknowns to 'unknowns/997_210_10000_1000_s15000_l1000M.txt'

range INSERT failed 1: near "ON": syntax error[/CODE][/QUOTE]

Can you check your sqlite3 version?

I'm using `on conflict` which I think was introduced in [URL="https://sqlite.org/lang_conflict.html"]3.24[/URL].

If you have version < 3.24 and updating isn't easy one hacky workaround is to just delete the "ON CONFLICT" lines [url]https://github.com/sethtroisi/prime-gap/blob/main/combined_sieve.cpp#L336-L341[/url] and [url]https://github.com/sethtroisi/prime-gap/blob/main/gap_stats.cpp#L295-L300[/url]

henryzz 2020-12-09 22:45

It looks like it is probably time for me to upgrade Ubuntu 18.04 to 20.04. I have already had to upgrade python to 3.7 to attempt to get the python script running.

robert44444uk 2020-12-10 09:47

Hi Seth

I don't know what software one needs to have, but searchers like me work in a Windows environment, or have the ability to emulate some other environments through a Windows shell.

Any chance of getting a c family code or perl?

Strawberry Perl allows code to work in Windows and msys64 allows me to create Windows executables from c.

SethTro 2020-12-10 10:22

[QUOTE=robert44444uk;565844]Hi Seth

I don't know what software one needs to have, but searchers like me work in a Windows environment, or have the ability to emulate some other environments through a Windows shell.

Any chance of getting a c family code or perl?

Strawberry Perl allows code to work in Windows and msys64 allows me to create Windows executables from c.[/QUOTE]

This is all c++ / python code, so technically it could work. It uses gmpy2 which is going to be ugly to install in windows. I'll see if I can cross compile with mingw but I'm not hopeful.

robert44444uk 2020-12-10 18:06

[QUOTE=SethTro;565848]This is all c++ / python code, so technically it could work. It uses gmpy2 which is going to be ugly to install in windows. I'll see if I can cross compile with mingw but I'm not hopeful.[/QUOTE]

Great f you can, Seth!

henryzz 2020-12-10 19:48

[QUOTE=robert44444uk;565844]Hi Seth

I don't know what software one needs to have, but searchers like me work in a Windows environment, or have the ability to emulate some other environments through a Windows shell.

Any chance of getting a c family code or perl?

Strawberry Perl allows code to work in Windows and msys64 allows me to create Windows executables from c.[/QUOTE]

I have been using WSL

robert44444uk 2020-12-11 08:57

[QUOTE=henryzz;565892]I have been using WSL[/QUOTE]

Well, that looks promising. So I can think if there is a Linux methodology to use Seth's program, then it should be possible to me to join in the fun.

henryzz 2020-12-11 12:25

[QUOTE=robert44444uk;565930]Well, that looks promising. So I can think if there is a Linux methodology to use Seth's program, then it should be possible to me to join in the fun.[/QUOTE]

For sanity's sake you will want to make sure that you are on a Ubuntu 20.04 version not 18.04 as that will have issues as mentioned above.

Any other issues would be useful to record so they can be added to the instructions.

@Seth
If I use gap_test.py to test the best 1% of a file, is there a way to test the next 1%? Is this the point of the database? Do I just run again?
Is there an easy way of extracting the results from the log for submission to the online database?
How many tests does pfgw run per call? There would be a lot less overhead if multiple are done in one call. I would imagine that this is more of an issue in WSL as there is extra overhead for disk access.

SethTro 2020-12-11 23:23

[QUOTE=henryzz;565935]For sanity's sake you will want to make sure that you are on a Ubuntu 20.04 version not 18.04 as that will have issues as mentioned above.

Any other issues would be useful to record so they can be added to the instructions.
[/QUOTE]

Please do comment on issues, I'll write this (sqlite3 > 3.18 + suggestion of Ubuntu 20.04) up in [URL="https://github.com/sethtroisi/prime-gap/blob/main/README.md#setup"]README.md#setup[/URL] later today. Does this mean you got some/all of it running? under WSL?

[QUOTE=henryzz;565935]
@Seth
If I use gap_test.py to test the best 1% of a file, is there a way to test the next 1%? Is this the point of the database? Do I just run again?
[/QUOTE]

Yes, the gap_test.py is smart, just run it again with what every total percent you want tested.

If you make the first call with `--top-prp-percent=1` you can quit half way through and all progress is recorded so when you run again it will skip finished tests. If you run again and change to `--top-prp-percent=5` it will test any record in the top 5% not yet tested (so 4% more if `--top-prp-percent=1` had already finished).

`gap_stats` should have told you an approximately optimal number which is generally 10-40%.

[QUOTE=henryzz;565935]
Is there an easy way of extracting the results from the log for submission to the online database?[/QUOTE]

Yes `./misc/record_check.py` will print all the records that improve over your local gaps.db file (occasionally you should cd into prime-gap-list and git pull). The output can be pasted into primegaps.cloudygo.com/

[QUOTE=henryzz;565935]
How many tests does pfgw run per call? There would be a lot less overhead if multiple are done in one call. I would imagine that this is more of an issue in WSL as there is extra overhead for disk access.
[/QUOTE]

The code uses gmp for numbers with < 8000 bits and pfgw for larger numbers. Sadly it's currently making one call per prime (e.g. hundreds per interval) I haven't found any easy way to make multiple calls (maybe I can write a list of all the offsets to check like Robert did in [URL]https://www.mersenneforum.org/showpost.php?p=565027&postcount=203[/URL] but I wasn't sure if pfgw stopped after the first prime.

If the overhead is bad in the shortterm I'd recommend adjust the threshold (in gap_utils.py) up a bit.


All times are UTC. The time now is 10:32.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.