mersenneforum.org  

Go Back   mersenneforum.org > Math Stuff > Computer Science & Computational Number Theory

Reply
 
Thread Tools
Old 2018-09-17, 20:23   #1
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

22×227 Posts
Default Semiprime counting

Because of Robert's interest in some OEIS sequences, I started fiddling with semiprime code in my library. Let me preface by saying there is no particular deep mathematical reason for this work (at least for me), it is just an interesting coding exercise.

Functions:

Code:
  semiprime_count(n)                  count of semiprimes <= n
  semiprime_count(start, end)         count of semiprimes in range
  nth_semiprime(n)                    the nth semiprime
  forsemiprimes {...} [start,] end    loop over semiprimes in range
  random_unrestricted_semiprime(n)    random n-bit semiprime
  random_semiprime(n)                 as above with equal size factors
  is_semiprime(n)                     does n have exactly 2 prime factors
plus some vaguely related functions such as forfactored and forsquarefree.

I started a thread last year about some of the issues with efficiently selecting uniform n-bit random semiprimes. The is_semiprime function is pretty efficient and was mentioned in this post last year. Pari 2.12 is about 2x faster now, which leaves it only 50 times slower. Charles' more efficient Pari code is on github.

My issue is more around the semiprime counts and nth semiprime. I've discovered two sequences in OEIS with differering results. I'd like to know if anyone has a way to reproduce some results in a reasonable time period.

Pari/GP semiprime count:
Code:
a(n)=my(s); forprime(p=2, sqrt(10^n), s+=primepi((10^n-1)\p)); s-binomial(primepi(sqrt(10^n)), 2)
This is nice, but is 3 orders of magnitude too slow for the sizes I'm looking at.

A066265 has the number of semiprimes < 10^n, and Henri Lifchitz filled in values up to 10^21 (!) in 2015. He must have efficient code for this.

The method I'm using is basically identical to the Pari code, based on Meissel's prime counting method. The differences vs. Pari are (1) low overhead for smallish values, and (2) pretty fast LMO prime counting for larger values. In theory we could go one step further and use Lehmer's solution which would speed things up, though quite a bit more complicated.

For the nth semiprime, I am doing an interpolated search on the semiprime count to get close, then iterate over semiprimes to get to the exact value. I imagine there could be improvements in the interpolation to speed this up some. Of course any speedups in the semiprime count will pay off multiple-fold here.

Issues I see:

A125527: number of semiprimes <= 2^n. a(48) was corrected after finding a confirming value elsewhere.

A114125: The 10^n-th semiprime. a(14) doesn't look right to me. We can check the result in a reasonable time by doing a semiprime count on the value, assuming a fast enough count routine. I believe a(14) = 896113850117314. It looks like a(15) = 9413000361625346 which is a new value.

There are a few other sequences I'll be using this for, including A146168 which needs a b-file, and checking A292785. That sequence is calculated using the Lifchitz results so I can just check my results.
danaj is offline   Reply With Quote
Old 2018-09-17, 22:06   #2
science_man_88
 
science_man_88's Avatar
 
"Forget I exist"
Jul 2009
Dumbassville

26×131 Posts
Default

Quote:
Originally Posted by danaj View Post
Pari/GP semiprime count:
Code:
a(n)=my(s); forprime(p=2, sqrt(10^n), s+=primepi((10^n-1)\p)); s-binomial(primepi(sqrt(10^n)), 2)
This is nice, but is 3 orders of magnitude too slow for the sizes I'm looking at.
I have a few suggestions, but nothing miraculous:

parforprime instead of forprime ;

making sqrt(10^n) a variable ( outside the loop), as gets calculated every time through the loop even though it can be calculated outside the loop. maybe precalulate the binomial coefficient all these things are independent of the loop variable.
science_man_88 is offline   Reply With Quote
Old 2018-09-17, 22:28   #3
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

16148 Posts
Default

Quote:
Originally Posted by science_man_88 View Post
parforprime instead of forprime ;
Will need some locking for the accumulator. Just naively using parforprime yields incorrect results.

Quote:
making sqrt(10^n) a variable ( outside the loop), as gets calculated every time through the loop even though it can be calculated outside the loop. maybe precalulate the binomial coefficient all these things are independent of the loop variable.
Really makes no difference.

The issue is the thousands to millions of calls to primepi.
danaj is offline   Reply With Quote
Old 2018-09-17, 23:02   #4
science_man_88
 
science_man_88's Avatar
 
"Forget I exist"
Jul 2009
Dumbassville

20C016 Posts
Default

Quote:
Originally Posted by danaj View Post
Will need some locking for the accumulator. Just naively using parforprime yields incorrect results.


Really makes no difference.

The issue is the thousands to millions of calls to primepi.
made about 20 ms difference by a(9) on my system (admittedly <1/3%) . Theres also the option of forprimestep using equivalence classes. Yes primepi is a conundrum for me to get around. primepi(sqrt(10^n))^2 only deals with semiprimes with both primes less than sqrt(10^n). So the efficiency comes down to how to get a count of all semiprimes with 1 factor over sqrt(10^n) .

Last fiddled with by science_man_88 on 2018-09-17 at 23:03
science_man_88 is offline   Reply With Quote
Old 2018-09-18, 13:29   #5
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

22·1,493 Posts
Default

The real issue is that gp has a trivial implementation of primepi, which makes the algorithm noncompetitive. A proper implementation would make a big difference.

As for parallelizing it, probably the right way would be to do the biggest primepi in one thread and all the others in another thread, or something like that.

For what it's worth I agree about hoisting sqrt(10^n) out of the loop -- it's not a big deal, but it just rubs me the wrong way. :)

Last fiddled with by CRGreathouse on 2018-09-18 at 13:36
CRGreathouse is online now   Reply With Quote
Old 2018-09-18, 16:38   #6
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

22·1,493 Posts
Default

Oh, and as I forgot to say it before: these are nice, danaj, thanks for writing them! they're definitely handy to have in the toolbox.
CRGreathouse is online now   Reply With Quote
Old 2018-09-18, 16:39   #7
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

16148 Posts
Default

Quote:
Originally Posted by CRGreathouse View Post
The real issue is that gp has a trivial implementation of primepi, which makes the algorithm noncompetitive. A proper implementation would make a big difference.

As for parallelizing it, probably the right way would be to do the biggest primepi in one thread and all the others in another thread, or something like that.
Perhaps making some external calls to primecount for the first few large counts, which would both solve the parallelism and give a huge speedup for the primecount.

Then it would be a matter of making some low overhead methods for the smaller ones.

Quote:
For what it's worth I agree about hoisting sqrt(10^n) out of the loop -- it's not a big deal, but it just rubs me the wrong way. :)
True.

I may write a little C program using the primecount C API and see how that works.
danaj is offline   Reply With Quote
Old 2018-09-18, 16:41   #8
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

22·227 Posts
Default

Quote:
Originally Posted by CRGreathouse View Post
Oh, and as I forgot to say it before: these are nice, danaj, thanks for writing them! they're definitely handy to have in the toolbox.
Thanks! If only I'd just started writing for Pari at the start...
danaj is offline   Reply With Quote
Old 2018-09-22, 07:34   #9
robert44444uk
 
robert44444uk's Avatar
 
Jun 2003
Oxford, UK

192610 Posts
Default

I think that pi(semiprime) at n simply the summation of pi(prime) at n/2, n/3....n/p where p is the prime nearest and less than the square root of n.

Just out of interest, if pi(prime) is relatively easy to determine at a given integer n, then would this summation be relatively quicker than the method you are looking at?
robert44444uk is online now   Reply With Quote
Old 2018-09-22, 17:32   #10
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

22×1,493 Posts
Default

Quote:
Originally Posted by robert44444uk View Post
I think that pi(semiprime) at n simply the summation of pi(prime) at n/2, n/3....n/p where p is the prime nearest and less than the square root of n.

Just out of interest, if pi(prime) is relatively easy to determine at a given integer n, then would this summation be relatively quicker than the method you are looking at?
This is exactly the method we're talking about. pi(x) isn't particularly fast -- it takes time O(x^(0.5 + o(1))) with a big constant out front, or a bit worse for the combinatorial methods -- but it's the best we have.
CRGreathouse is online now   Reply With Quote
Old 2018-09-23, 20:40   #11
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

22×227 Posts
Default

That is the method, as shown on MathWorld, Stack Exchange, and a few other places. I haven't seen any mention of improvements to that.

For small ranges there are some slightly better ways, but asymptotically they are far worse.

The issue is then (1) have a fast prime count method for large inputs, and (2) optimize the many small-ish counts. My prime count routine is pretty decent though single threaded. Kim's primecount is a little faster at the medium size (e.g. 10^10-10^16) now, and much faster for very large values (not to mention it is multithreaded and works past 64-bit). I did some work on #2, using a segmented sieve and binary search for counting for counts from n^{1/2} to n^{2/3}. That helps quite a bit and removes well over 90% of the calls.



For nth prime, the method typically used is to calculate an estimate (using adjusted inverse Li or inverse R). Compute a single prime count to get an exact answer at the estimate. This usually ends up very close and then we just sieve the remainder. There are two reasons my nth semiprime is a little different. (1) inverse R or adjusted inverse Li gets us extremely close -- much closer than I have for an estimated semiprime count. (2) my counting to get the remainder is slower, so I really want to be closer before starting that.

Part of #2 is just that sieving a range for primes has been well studied, there are lots of known optimizations, and I have spent many, many hours on my code there. I just started on the semiprimes.

For #1, we can use the simple: n * log(n) / log(log(n)). While perhaps asymptotically correct, it's got a lot of error. Step one was to multiply that by 0.966 which is much closer. What I have now is multiplying by 0.968 - 0.0019278 * log(log(log(n))), which gets it even closer. Typically it seems I only need one more count to get close enough.

For the 2^45th semiprime:

3.6% nlogn/loglogn
0.05% .966 * nlogn/loglogn
0.0096% (0.968 - 0.0019278 * log(log(log(n)))) * nlogn/logn


This is not rigorously minimized. It also may be only a decent fit in the area I've tested.

With my current code on my Macbook, computing the 2^45th semiprime takes about 2.5 minutes, 2^48th in 11 minutes, 2^50th in 42 minutes. The last is about 20 times faster than when I first made the post.
danaj is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Semiprime and n-almost prime candidate for the k's with algebra for the Sierpinski/Riesel problem sweety439 sweety439 11 2020-09-23 01:42
A counting problem jasonp Puzzles 1 2017-12-24 19:38
Generating a uniform random n-bit semiprime danaj Programming 17 2017-09-15 16:41
Factored my first SemiPrime -Some Questions tal Msieve 31 2011-05-22 18:17
counting bricks michael Puzzles 8 2004-01-14 16:27

All times are UTC. The time now is 18:54.

Wed Mar 3 18:54:23 UTC 2021 up 90 days, 15:05, 2 users, load averages: 2.55, 1.93, 1.75

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.