mersenneforum.org Combined Sieve
 User Name Remember Me? Password
 Register FAQ Search Today's Posts Mark Forums Read

 2020-12-08, 02:31 #1 SethTro     "Seth" Apr 2019 269 Posts Combined Sieve I'm happy to be sharing my prime-gap code and new Combined Sieve Algorithm (arXiv). What's the combined sieve? It's a new algorithm that sieves multiple (combined) intervals at the same time at a dramatic speedup. I believe it's ~2x faster than PGSurround.pl and has a number of other fun features, but requires a lot more active maintenance to run. --- The code has three parts: sieve, stats, test. Sieve: Similar to DanaJ's surround prime you choose a constant primorial and divisor (which define K=P#/d) then the code sieves multiple m * K with primes up to a few billion (or trillion). Stats: From the remaining unknown numbers calculate the most likely m to generate records. Test: Test the remaining unknowns to find the surrounding primes. --- Setup Also found at https://github.com/sethtroisi/prime-gap#setup I've upstream mpz_prevprime, but it's not part of any release yet so GMP needs to be built from source. Code: sudo apt install mercurial automake make bison autoconf libtool texinfo m4 hg clone https://gmplib.org/repo/gmp/ cd gmp ./.bootstrap mkdir build cd build ../configure make [-j4] make check [-j4] sudo make install cd ../.. Additional primesieve and a few other libraries are needed Code: sudo apt install libprimesieve-dev sudo apt install libmpfr-dev libmpc-dev sudo apt install sqlite3 libsqlite3-dev pip install --user gmpy2==2.1.0b5 primegapverify Now we get to the real project Code: git clone https://github.com/sethtroisi/prime-gap.git cd prime-gap sqlite3 prime-gap-search.db < schema.sql mkdir -p logs/ unknowns/ git clone --depth 10 https://github.com/primegap-list-project/prime-gap-list.git sqlite3 gaps.db < prime-gap-list/allgaps.sql make all --- Running sieve to sieve m * P#/d for m from a to a+b up to limit Code: ./combined_sieve --save-unknowns -p

-d --mstart --minc --sieve-length --max-prime # e.g. 907#/53970 sieving 10,000 in both directions up to 1B ./combined_sieve --save-unknowns -p 907 -d 53970 --mstart 1 --minc 10000 --sieve-length 15000 --max-prime 1000 # or in short form ./combined_sieve --save-unknowns --unknown-filename 907_53970_1_10000_s15000_l1000M.txt 999999937 (primes 24491666/26355868) (seconds: 10.10/37.4 | per m: 0.0164) factors 29494942 (interval: 128088 avg m/large_prime interval: 0.4) unknowns 1241741/2276 (avg/m: 545.58) (composite: 98.18% +0.063% +42721) ~ 2x 23.22 PRP/m (~ 3658.1 skipped PRP => 362.4 PRP/seconds) ... It took 37.4 seconds (an average of ~0.0164 seconds per interval) to sieve up to 1 billion. We're left with ~545 unknowns per m. We'll have to test on average 23.22 numbers to find a prime. You can hand inspect unknowns/907_53970_1_10000_s15000_l1000M.txt to get a sense of the output Code: $head unknowns/907_53970_1_10000_s15000_l1000M.txt | cut -c1-100 0 : -279 +284 | -30 -54 -90 -120 -128 -140 -168 -180 -210 -240 -252 -288 -320 -350 -392 -432 -450 10 : -260 +290 | -24 -42 -60 -70 -120 -240 -252 -294 -300 -400 -588 -630 -672 -750 -784 -840 -882 12 : -278 +253 | -6 -12 -14 -20 -24 -36 -56 -80 -120 -140 -192 -216 -252 -294 -336 -392 -450 -500 16 : -264 +292 | -6 -40 -90 -96 -108 -126 -216 -294 -336 -360 -378 -384 -420 -448 -480 -490 -600 18 : -283 +271 | -2 -6 -30 -32 -42 -56 -60 -72 -90 -98 -180 -192 -240 -300 -320 -350 -378 -450 -500 22 : -268 +253 | -30 -70 -120 -162 -180 -250 -270 -294 -324 -336 -360 -480 -490 -600 -640 -750 -784 28 : -273 +257 | -12 -16 -28 -60 -72 -96 -112 -126 -180 -196 -210 -240 -280 -300 -336 -378 -448 -540 These lines tell us that (1+0)*907#/53970 + [-15000,1] has 279 remaining numbers (-30, -54, -90, ...) and + [1, 15000] has 284 remaining numbers (listed later in the line) stats to produce stats run gap_stats on the unknown-file Code: ./gap_stats --save-unknowns --unknown-filename 907_53970_1_10000_s15000_l1000M.txt ... RECORD : top 1% ( 22) => sum(prob) = 4.30e-05 (avg: 1.95e-06) RECORD : top 5% ( 113) => sum(prob) = 1.64e-04 (avg: 1.45e-06) RECORD : top 10% ( 227) => sum(prob) = 2.81e-04 (avg: 1.24e-06) RECORD : top 20% ( 455) => sum(prob) = 4.68e-04 (avg: 1.03e-06) RECORD : top 50% ( 1138) => sum(prob) = 8.59e-04 (avg: 7.55e-07) RECORD : top 100% ( 2276) => sum(prob) = 1.20e-03 (avg: 5.26e-07) ... This prints some verbose output which says that even the best m's in this range has less than a 2e-6 chance of being a record gap (but I rigged this so we'll find one) testing Code: # gap_test_simple tests all gaps and doesn't save results to db$ ./gap_test_simple --unknown-filename 907_53970_1_10000_s15000_l1000M.txt --min-merit 15 --quiet ... 18792 21.7401 3823 * 907#/53970 -12178 to +6614 13818 15.9853 3937 * 907#/53970 -7132 to +6686 m=4391 258 <- unknowns -> 285 5400 <- gap -> 840 tests 1000 (23.47/sec) 43 seconds elapsed unknowns 545665 (avg: 545.66), 98.18% composite 49.91% <- % -> 50.09% prp tests 48515 (avg: 48.51) (1138.8 tests/sec) 21164 24.4724 5801 * 907#/53970 -9800 to +11364 13176 15.2339 6437 * 907#/53970 -9530 to +3646 15658 18.0947 9809 * 907#/53970 -12522 to +3136 m=9997 257 <- unknowns -> 278 84 <- gap -> 1890 tests 2276 (23.42/sec) 97 seconds elapsed unknowns 1241741 (avg: 545.58), 98.18% composite 49.94% <- % -> 50.06% prp tests 108729 (avg: 47.77) (1118.7 tests/sec) ... Code: # gap_test.py has lots of options (resuming, multithreaded, skipping) $./gap_test.py --unknown-filename 907_53970_1_10000_s15000_l1000M.txt --min-merit 15 --prp-top-percent 40 --threads 4 ... 13818 15.9853 3937 * 907#/53970 -6686 to +7132 21164 24.4724 5801 * 907#/53970 -11364 to +9800 RECORD! ... 9997 257 <- unknowns -> 278 1890 <- gap -> 0 tests 911 (69.4/sec, 40.2/sec) 13, 23 secs unknowns 482139 (avg: 529.24), 98.24% composite 49.91% <- % -> 50.09% prp tests 22327 (avg: 24.03/side, 24.51/m) (1701.154 tests/sec) side skips 893 (98.0%) (avg prob_record 8.03e-07/side, 4.11e-07/m/2, +95.3%) sum(prob_minmerit): 10, 6.69e+04/day found: 2 sum(prob_record): 0.00075, 4.91/day found: 1 merit 24.472 (at m=5801) ... We can check if we found any records which tells us this record was already known Code: $ ./misc/record_check.py REDISCOVERED | 21164 24.472 5801 * 907#/53970 -11364 to +9800 (old: (24.4724, 376, '5801*907#/53970 - 11364', 'Jacobsen')) --- limitations * As of 2020 I'm still rapidly developing this code so it has a number of unhandled and un-optimized edge cases. * For smaller p (< 1000? <1500?) there's significant overhead in writing numbers to disk, calculating gap_stats, saving to db, ... I would only recommend this code for p > 1500. * Easy to delete data or end up with duplicate data. I'd suggest running for a couple of days then reporting all records and completely deleting prime-gap-search.db (don't get attached to the partial data it saves). advantages * Sieve with primes up to 2,000,000,000,000 very quickly * Speeds up search by 30%-70% over PGSurround.pl thresholds. * Sort and search only the most sparse intervals from thousands or millions of m * Dynamically calculate if other side should be searched when one prime has been found * Parallelization in gap_test stage * Sublinear scaling in combined_sieve

 2020-12-09, 15:31 #2 henryzz Just call me Henry     "David" Sep 2007 Cambridge (GMT/BST) 585810 Posts I am getting an error whenever it tries to access the database: For example at the end of combined_sieve: Code: Saving unknowns to 'unknowns/997_210_10000_1000_s15000_l1000M.txt' range INSERT failed 1: near "ON": syntax error
2020-12-09, 22:20   #3
SethTro

"Seth"
Apr 2019

4158 Posts

Quote:
 Originally Posted by henryzz I am getting an error whenever it tries to access the database: For example at the end of combined_sieve: Code: Saving unknowns to 'unknowns/997_210_10000_1000_s15000_l1000M.txt' range INSERT failed 1: near "ON": syntax error
Can you check your sqlite3 version?

I'm using on conflict which I think was introduced in 3.24.

If you have version < 3.24 and updating isn't easy one hacky workaround is to just delete the "ON CONFLICT" lines https://github.com/sethtroisi/prime-....cpp#L336-L341 and https://github.com/sethtroisi/prime-....cpp#L295-L300

 2020-12-09, 22:45 #4 henryzz Just call me Henry     "David" Sep 2007 Cambridge (GMT/BST) 585810 Posts It looks like it is probably time for me to upgrade Ubuntu 18.04 to 20.04. I have already had to upgrade python to 3.7 to attempt to get the python script running.
 2020-12-10, 09:47 #5 robert44444uk     Jun 2003 Oxford, UK 36148 Posts Hi Seth I don't know what software one needs to have, but searchers like me work in a Windows environment, or have the ability to emulate some other environments through a Windows shell. Any chance of getting a c family code or perl? Strawberry Perl allows code to work in Windows and msys64 allows me to create Windows executables from c.
2020-12-10, 10:22   #6
SethTro

"Seth"
Apr 2019

269 Posts

Quote:
 Originally Posted by robert44444uk Hi Seth I don't know what software one needs to have, but searchers like me work in a Windows environment, or have the ability to emulate some other environments through a Windows shell. Any chance of getting a c family code or perl? Strawberry Perl allows code to work in Windows and msys64 allows me to create Windows executables from c.
This is all c++ / python code, so technically it could work. It uses gmpy2 which is going to be ugly to install in windows. I'll see if I can cross compile with mingw but I'm not hopeful.

2020-12-10, 18:06   #7
robert44444uk

Jun 2003
Oxford, UK

111100011002 Posts

Quote:
 Originally Posted by SethTro This is all c++ / python code, so technically it could work. It uses gmpy2 which is going to be ugly to install in windows. I'll see if I can cross compile with mingw but I'm not hopeful.
Great f you can, Seth!

2020-12-10, 19:48   #8
henryzz
Just call me Henry

"David"
Sep 2007
Cambridge (GMT/BST)

10110111000102 Posts

Quote:
 Originally Posted by robert44444uk Hi Seth I don't know what software one needs to have, but searchers like me work in a Windows environment, or have the ability to emulate some other environments through a Windows shell. Any chance of getting a c family code or perl? Strawberry Perl allows code to work in Windows and msys64 allows me to create Windows executables from c.
I have been using WSL

2020-12-11, 08:57   #9
robert44444uk

Jun 2003
Oxford, UK

193210 Posts

Quote:
 Originally Posted by henryzz I have been using WSL
Well, that looks promising. So I can think if there is a Linux methodology to use Seth's program, then it should be possible to me to join in the fun.

2020-12-11, 12:25   #10
henryzz
Just call me Henry

"David"
Sep 2007
Cambridge (GMT/BST)

16E216 Posts

Quote:
 Originally Posted by robert44444uk Well, that looks promising. So I can think if there is a Linux methodology to use Seth's program, then it should be possible to me to join in the fun.
For sanity's sake you will want to make sure that you are on a Ubuntu 20.04 version not 18.04 as that will have issues as mentioned above.

Any other issues would be useful to record so they can be added to the instructions.

@Seth
If I use gap_test.py to test the best 1% of a file, is there a way to test the next 1%? Is this the point of the database? Do I just run again?
Is there an easy way of extracting the results from the log for submission to the online database?
How many tests does pfgw run per call? There would be a lot less overhead if multiple are done in one call. I would imagine that this is more of an issue in WSL as there is extra overhead for disk access.

2020-12-11, 23:23   #11
SethTro

"Seth"
Apr 2019

269 Posts

Quote:
 Originally Posted by henryzz For sanity's sake you will want to make sure that you are on a Ubuntu 20.04 version not 18.04 as that will have issues as mentioned above. Any other issues would be useful to record so they can be added to the instructions.
Please do comment on issues, I'll write this (sqlite3 > 3.18 + suggestion of Ubuntu 20.04) up in README.md#setup later today. Does this mean you got some/all of it running? under WSL?

Quote:
 Originally Posted by henryzz @Seth If I use gap_test.py to test the best 1% of a file, is there a way to test the next 1%? Is this the point of the database? Do I just run again?
Yes, the gap_test.py is smart, just run it again with what every total percent you want tested.

If you make the first call with --top-prp-percent=1 you can quit half way through and all progress is recorded so when you run again it will skip finished tests. If you run again and change to --top-prp-percent=5 it will test any record in the top 5% not yet tested (so 4% more if --top-prp-percent=1 had already finished).

gap_stats should have told you an approximately optimal number which is generally 10-40%.

Quote:
 Originally Posted by henryzz Is there an easy way of extracting the results from the log for submission to the online database?
Yes ./misc/record_check.py will print all the records that improve over your local gaps.db file (occasionally you should cd into prime-gap-list and git pull). The output can be pasted into primegaps.cloudygo.com/

Quote:
 Originally Posted by henryzz How many tests does pfgw run per call? There would be a lot less overhead if multiple are done in one call. I would imagine that this is more of an issue in WSL as there is extra overhead for disk access.
The code uses gmp for numbers with < 8000 bits and pfgw for larger numbers. Sadly it's currently making one call per prime (e.g. hundreds per interval) I haven't found any easy way to make multiple calls (maybe I can write a list of all the offsets to check like Robert did in https://www.mersenneforum.org/showpo...&postcount=203 but I wasn't sure if pfgw stopped after the first prime.

If the overhead is bad in the shortterm I'd recommend adjust the threshold (in gap_utils.py) up a bit.

 Similar Threads Thread Thread Starter Forum Replies Last Post Christenson Hardware 12 2011-10-27 03:41 ltd Prime Sierpinski Project 76 2008-07-25 11:44 Joe O Prime Sierpinski Project 35 2006-09-01 13:44 jaat Sierpinski/Riesel Base 5 5 2006-04-18 02:26 Citrix Prime Sierpinski Project 14 2005-12-31 19:39

All times are UTC. The time now is 19:27.

Tue Apr 20 19:27:26 UTC 2021 up 12 days, 14:08, 1 user, load averages: 3.10, 3.08, 3.27