![]() |
![]() |
#1 |
"Seth"
Apr 2019
3×89 Posts |
![]()
I'm happy to be sharing my prime-gap code and new Combined Sieve Algorithm (arXiv).
What's the combined sieve? It's a new algorithm that sieves multiple (combined) intervals at the same time at a dramatic speedup. I believe it's ~2x faster than PGSurround.pl and has a number of other fun features, but requires a lot more active maintenance to run. --- The code has three parts: sieve, stats, test. Sieve: Similar to DanaJ's surround prime you choose a constant primorial and divisor (which define K=P#/d) then the code sieves multiple m * K with primes up to a few billion (or trillion). Stats: From the remaining unknown numbers calculate the most likely m to generate records. Test: Test the remaining unknowns to find the surrounding primes. --- Setup Also found at https://github.com/sethtroisi/prime-gap#setup I've upstream mpz_prevprime, but it's not part of any release yet so GMP needs to be built from source. Code:
sudo apt install mercurial automake make bison autoconf libtool texinfo m4 hg clone https://gmplib.org/repo/gmp/ cd gmp ./.bootstrap mkdir build cd build ../configure make [-j4] make check [-j4] sudo make install cd ../.. Code:
sudo apt install libprimesieve-dev sudo apt install libmpfr-dev libmpc-dev sudo apt install sqlite3 libsqlite3-dev pip install --user gmpy2==2.1.0b5 primegapverify Code:
git clone https://github.com/sethtroisi/prime-gap.git cd prime-gap sqlite3 prime-gap-search.db < schema.sql mkdir -p logs/ unknowns/ git clone --depth 10 https://github.com/primegap-list-project/prime-gap-list.git sqlite3 gaps.db < prime-gap-list/allgaps.sql make all Running sieve to sieve m * P#/d for m from a to a+b up to limit Code:
./combined_sieve --save-unknowns -p <P> -d <d> --mstart <a> --minc <b> --sieve-length <X> --max-prime <limit> # e.g. 907#/53970 sieving 10,000 in both directions up to 1B ./combined_sieve --save-unknowns -p 907 -d 53970 --mstart 1 --minc 10000 --sieve-length 15000 --max-prime 1000 # or in short form ./combined_sieve --save-unknowns --unknown-filename 907_53970_1_10000_s15000_l1000M.txt 999999937 (primes 24491666/26355868) (seconds: 10.10/37.4 | per m: 0.0164) factors 29494942 (interval: 128088 avg m/large_prime interval: 0.4) unknowns 1241741/2276 (avg/m: 545.58) (composite: 98.18% +0.063% +42721) ~ 2x 23.22 PRP/m (~ 3658.1 skipped PRP => 362.4 PRP/seconds) ... You can hand inspect unknowns/907_53970_1_10000_s15000_l1000M.txt to get a sense of the output Code:
$ head unknowns/907_53970_1_10000_s15000_l1000M.txt | cut -c1-100 0 : -279 +284 | -30 -54 -90 -120 -128 -140 -168 -180 -210 -240 -252 -288 -320 -350 -392 -432 -450 10 : -260 +290 | -24 -42 -60 -70 -120 -240 -252 -294 -300 -400 -588 -630 -672 -750 -784 -840 -882 12 : -278 +253 | -6 -12 -14 -20 -24 -36 -56 -80 -120 -140 -192 -216 -252 -294 -336 -392 -450 -500 16 : -264 +292 | -6 -40 -90 -96 -108 -126 -216 -294 -336 -360 -378 -384 -420 -448 -480 -490 -600 18 : -283 +271 | -2 -6 -30 -32 -42 -56 -60 -72 -90 -98 -180 -192 -240 -300 -320 -350 -378 -450 -500 22 : -268 +253 | -30 -70 -120 -162 -180 -250 -270 -294 -324 -336 -360 -480 -490 -600 -640 -750 -784 28 : -273 +257 | -12 -16 -28 -60 -72 -96 -112 -126 -180 -196 -210 -240 -280 -300 -336 -378 -448 -540 stats to produce stats run gap_stats on the unknown-file Code:
./gap_stats --save-unknowns --unknown-filename 907_53970_1_10000_s15000_l1000M.txt ... RECORD : top 1% ( 22) => sum(prob) = 4.30e-05 (avg: 1.95e-06) RECORD : top 5% ( 113) => sum(prob) = 1.64e-04 (avg: 1.45e-06) RECORD : top 10% ( 227) => sum(prob) = 2.81e-04 (avg: 1.24e-06) RECORD : top 20% ( 455) => sum(prob) = 4.68e-04 (avg: 1.03e-06) RECORD : top 50% ( 1138) => sum(prob) = 8.59e-04 (avg: 7.55e-07) RECORD : top 100% ( 2276) => sum(prob) = 1.20e-03 (avg: 5.26e-07) ... testing Code:
# gap_test_simple tests all gaps and doesn't save results to db $ ./gap_test_simple --unknown-filename 907_53970_1_10000_s15000_l1000M.txt --min-merit 15 --quiet ... 18792 21.7401 3823 * 907#/53970 -12178 to +6614 13818 15.9853 3937 * 907#/53970 -7132 to +6686 m=4391 258 <- unknowns -> 285 5400 <- gap -> 840 tests 1000 (23.47/sec) 43 seconds elapsed unknowns 545665 (avg: 545.66), 98.18% composite 49.91% <- % -> 50.09% prp tests 48515 (avg: 48.51) (1138.8 tests/sec) 21164 24.4724 5801 * 907#/53970 -9800 to +11364 13176 15.2339 6437 * 907#/53970 -9530 to +3646 15658 18.0947 9809 * 907#/53970 -12522 to +3136 m=9997 257 <- unknowns -> 278 84 <- gap -> 1890 tests 2276 (23.42/sec) 97 seconds elapsed unknowns 1241741 (avg: 545.58), 98.18% composite 49.94% <- % -> 50.06% prp tests 108729 (avg: 47.77) (1118.7 tests/sec) ... Code:
# gap_test.py has lots of options (resuming, multithreaded, skipping) $ ./gap_test.py --unknown-filename 907_53970_1_10000_s15000_l1000M.txt --min-merit 15 --prp-top-percent 40 --threads 4 ... 13818 15.9853 3937 * 907#/53970 -6686 to +7132 21164 24.4724 5801 * 907#/53970 -11364 to +9800 RECORD! ... 9997 257 <- unknowns -> 278 1890 <- gap -> 0 tests 911 (69.4/sec, 40.2/sec) 13, 23 secs unknowns 482139 (avg: 529.24), 98.24% composite 49.91% <- % -> 50.09% prp tests 22327 (avg: 24.03/side, 24.51/m) (1701.154 tests/sec) side skips 893 (98.0%) (avg prob_record 8.03e-07/side, 4.11e-07/m/2, +95.3%) sum(prob_minmerit): 10, 6.69e+04/day found: 2 sum(prob_record): 0.00075, 4.91/day found: 1 merit 24.472 (at m=5801) ... Code:
$ ./misc/record_check.py REDISCOVERED | 21164 24.472 5801 * 907#/53970 -11364 to +9800 (old: (24.4724, 376, '5801*907#/53970 - 11364', 'Jacobsen')) limitations * As of 2020 I'm still rapidly developing this code so it has a number of unhandled and un-optimized edge cases. * For smaller p (< 1000? <1500?) there's significant overhead in writing numbers to disk, calculating gap_stats, saving to db, ... I would only recommend this code for p > 1500. * Easy to delete data or end up with duplicate data. I'd suggest running for a couple of days then reporting all records and completely deleting prime-gap-search.db (don't get attached to the partial data it saves). advantages * Sieve with primes up to 2,000,000,000,000 very quickly * Speeds up search by 30%-70% over PGSurround.pl thresholds. * Sort and search only the most sparse intervals from thousands or millions of m * Dynamically calculate if other side should be searched when one prime has been found * Parallelization in gap_test stage * Sublinear scaling in combined_sieve |
![]() |
![]() |
![]() |
#2 |
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2·2,927 Posts |
![]()
I am getting an error whenever it tries to access the database:
For example at the end of combined_sieve: Code:
Saving unknowns to 'unknowns/997_210_10000_1000_s15000_l1000M.txt' range INSERT failed 1: near "ON": syntax error |
![]() |
![]() |
![]() |
#3 | |
"Seth"
Apr 2019
1000010112 Posts |
![]() Quote:
I'm using `on conflict` which I think was introduced in 3.24. If you have version < 3.24 and updating isn't easy one hacky workaround is to just delete the "ON CONFLICT" lines https://github.com/sethtroisi/prime-....cpp#L336-L341 and https://github.com/sethtroisi/prime-....cpp#L295-L300 |
|
![]() |
![]() |
![]() |
#4 |
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2×2,927 Posts |
![]()
It looks like it is probably time for me to upgrade Ubuntu 18.04 to 20.04. I have already had to upgrade python to 3.7 to attempt to get the python script running.
|
![]() |
![]() |
![]() |
#5 |
Jun 2003
Oxford, UK
193210 Posts |
![]()
Hi Seth
I don't know what software one needs to have, but searchers like me work in a Windows environment, or have the ability to emulate some other environments through a Windows shell. Any chance of getting a c family code or perl? Strawberry Perl allows code to work in Windows and msys64 allows me to create Windows executables from c. |
![]() |
![]() |
![]() |
#6 | |
"Seth"
Apr 2019
1000010112 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#7 |
Jun 2003
Oxford, UK
78C16 Posts |
![]() |
![]() |
![]() |
![]() |
#8 | |
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2·2,927 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#9 |
Jun 2003
Oxford, UK
111100011002 Posts |
![]() |
![]() |
![]() |
![]() |
#10 | |
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
2·2,927 Posts |
![]() Quote:
Any other issues would be useful to record so they can be added to the instructions. @Seth If I use gap_test.py to test the best 1% of a file, is there a way to test the next 1%? Is this the point of the database? Do I just run again? Is there an easy way of extracting the results from the log for submission to the online database? How many tests does pfgw run per call? There would be a lot less overhead if multiple are done in one call. I would imagine that this is more of an issue in WSL as there is extra overhead for disk access. |
|
![]() |
![]() |
![]() |
#11 | ||||
"Seth"
Apr 2019
26710 Posts |
![]() Quote:
Quote:
If you make the first call with `--top-prp-percent=1` you can quit half way through and all progress is recorded so when you run again it will skip finished tests. If you run again and change to `--top-prp-percent=5` it will test any record in the top 5% not yet tested (so 4% more if `--top-prp-percent=1` had already finished). `gap_stats` should have told you an approximately optimal number which is generally 10-40%. Quote:
Quote:
If the overhead is bad in the shortterm I'd recommend adjust the threshold (in gap_utils.py) up a bit. |
||||
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
UPS /combined UPS and PS for P95 computer | Christenson | Hardware | 12 | 2011-10-27 03:41 |
Combined sieving discussion | ltd | Prime Sierpinski Project | 76 | 2008-07-25 11:44 |
Combined Sieve Guide Discussion | Joe O | Prime Sierpinski Project | 35 | 2006-09-01 13:44 |
Combined Sieving? | jaat | Sierpinski/Riesel Base 5 | 5 | 2006-04-18 02:26 |
Sieve discussion Meaning of first/second pass, combined | Citrix | Prime Sierpinski Project | 14 | 2005-12-31 19:39 |