![]() |
![]() |
#1 |
"Dana Jacobsen"
Feb 2011
Bangkok, TH
90910 Posts |
![]()
I have a routine that generates a random n-bit semiprime and would like to improve its efficiency. This is the "unrestricted" semiprime in that I am not just looking for an even bit split, so different than RSA key generation or examples for hard factoring. There are many papers about generation methods for those.
The input is a number of bits n, and the output is an n-bit semiprime. The probability of a given result should be equal to that of all other n-bit semiprimes. Hence with an input of 6, the values 33,34,35,38,39,46,49,51,55,57,58,62 should be returned with equal likelihood. See CRGreathouse's 2010 post for one possible implementation. For very small bits, e.g. 7 and under, we can just uniformly select from a list. Perfect distribution, super fast. For larger inputs, we can randomly select an n-bit number, do an is_semiprime() test, repeating until that is true. This is reasonably fast at small sizes, e.g. average 1 microsecond for 20-bit, 3uS for 32-bit, 10uS for 48-bit, 40uS for 60-bit. Your implementation will differ of course. But at some point this really starts slowing down, and basically for >64-bit this is not a good method. Charles' idea is clever, weighting the distributions. At a high level this gives a good approximation to the distribution. However it skews heavily toward primes with a large following gap. So much so that 3 occurs as often as 2, and 7 more than 5. I did basically a hack on this method, generating a (mostly) proper curve using a table for small primes then use Charles' method if we selected a larger one. This seems to work reasonably well and gives results much closer to the rejection sampling method. Any ideas on better ways? |
![]() |
![]() |
![]() |
#2 |
"Rashid Naimi"
Oct 2015
Remote to Here/There
2,333 Posts |
![]()
A bit over my head.
With that disclaimer and for what it's worth. Not sure how is_semiprime() works. But with the right trial-by-division algorithm up to the cube-root of n (which is significantly less taxing than trial-by-division up to the square-root of) n you will be left by a set of primes and semi-primes. |
![]() |
![]() |
![]() |
#3 | |
"Dana Jacobsen"
Feb 2011
Bangkok, TH
32·101 Posts |
![]()
There are various ways to implement it. With mine, 95% of all inputs require little more than a primality test, which is often trivially decided. Then partial factoring quickly finds most solutions. Charles has an example in Pari: http://www.mersenneforum.org/showpos...postcount=2074 that is similar. We both found these tests work well up to a certain point after which it is faster to do more sophisticated factoring. Yes we stop if the input is tiny and we pass the cube root of n.
Quote:
|
|
![]() |
![]() |
![]() |
#4 |
"Rashid Naimi"
Oct 2015
Remote to Here/There
2,333 Posts |
![]()
Again keeping my disclaimer in mind, i will offer some more simpleton comments which is all I can do.
The following example can be stretched in different ranges and is just a concept. Suppose you are aiming to produce the set of all the primes and semiprimes between 100 to 200. You could do trial by division up to n^(1/4). This would filter out all the factors of 2 and 3 for this example range. Then you could perform gcd operations between the remaining items which will distinguish all the primes from 5 to 37. The remaining primes would be to large to form a semi prime in the range (unless multiplied by 2 or 3). Last fiddled with by a1call on 2017-09-13 at 04:38 |
![]() |
![]() |
![]() |
#5 | |
"Dana Jacobsen"
Feb 2011
Bangkok, TH
32×101 Posts |
![]() Quote:
My problem is, for example, generating 1000 random 128-bit semiprimes, each with approximately uniform probability. I don't want to generate all the semiprimes in the range, even if that were at all feasible. There are something like 8 * 10^36 of them. Even with more tame sizes, the point isn't to generate all of them. It is to quickly generate a single random semiprime that has the right probability. Generating them with a somewhat incorrect probability is pretty easy (e.g. generate a uniform random prime p between 2 and 2^(n-1), then generate another uniform random prime q in the range such that p*q is an n-bit number). |
|
![]() |
![]() |
![]() |
#6 | |
Feb 2017
Nowhere
141048 Posts |
![]() Quote:
[2^(n-1), 2^n - 1]. A uniform distribution in an interval usually means, the probability of landing in a sub-interval is proportional to its length. But the numbers from the set you're interested in aren't uniformly distributed in the interval. So a process which does well at selecting numbers which are uniformly distributed in the interval, may do less well at selecting members of the subset of semiprimes with equal probability. As to randomness in your subset, theoretical perfection would seem to require knowing in advance all the numbers in the set (here, the n-bit semiprimes), then having a process to select one of them randomly, say by enumerating them 1 to K, and randomly selecting an integer from 1 to K. Obviously, knowing the complete set of semiprimes in the interval is impracticable when the bit size is large (as is, therefore, being able to know how "random" your choice of semiprime might be.) So it seems to me that some sort of "weighted" approach -- as already mentioned -- might be a practical compromise, to get reasonably close to what you want with a reasonable amout of computing power. There is an asymptotic estimate due to Landau for the number of semiprimes less than x, x*(log(log(x))/log(x) but this estimate is not all that great. I'm sure there are estimates of the distribution in terms of the size of the smaller of the two prime factors. I'm equally sure that, if I tried to reproduce such an estimate "on the fly" I would screw it up royally. But such an estimate seems to me to be a good way to go. Use the distribution in terms of the size of the smaller factor to create a "stratified random sample," and then cross your fingers... |
|
![]() |
![]() |
![]() |
#7 | |
"Dana Jacobsen"
Feb 2011
Bangkok, TH
32×101 Posts |
![]()
I think you've covered most of what I've got.
Two ways that generate the right distribution are:
Selecting uniform random primes in a range can be done similarly, and I have done each in that context. For primes the second is somewhat reasonable given extremely optimized prime count and nth prime routines, and for semiprimes Charles and I use this in a hardcoded fashion for tiny inputs. With primes, the former has some optimizations that cut out large groups of composites from the selection. I don't think those work nearly as well with semiprimes. The rejection sampling method is practical to 64-bits, and feasible probably up into 128 or so bits. Depends on ones available factoring algorithms and performance tolerance. It'd be nice to have something better, but this works. Quote:
|
|
![]() |
![]() |
![]() |
#8 |
Aug 2006
5,987 Posts |
![]()
If you use 'my' method, be sure to replace the hacky nextprime() bit with the proper random prime generation which should make the overall method nearly uniform. (PARI/GP now has this bit built in as randomprime.)
|
![]() |
![]() |
![]() |
#9 | |
"Dana Jacobsen"
Feb 2011
Bangkok, TH
32·101 Posts |
![]() Quote:
For 100k 26-bit random semiprimes, using the first code (rejection sampling) and looking at the histogram of smallest factor: Code:
17026 2 11633 3 7168 5 5300 7 3447 11 2956 13 2296 17 2098 19 1810 23 1368 29 Code:
15553 3 14520 2 8509 7 7661 5 4050 13 2964 23 2651 19 2649 11 2100 31 1540 17 |
|
![]() |
![]() |
![]() |
#10 |
Jun 2003
543610 Posts |
![]()
Here's an engineer's solution. Build a table of frequency of b-bit semiprimes where the smaller factor is p. But rather than do it for each prime < sqrt(2^b), we do it in batches. Since the smaller primes have more contribution, we can do it at invidividual prime levels for small primes (say < 128). After that we do it at at batches of, say, 0.5 bits, like 2^7-2^7.5, 2^7.5-2^8, etc... upto 2^(b/2).
For individual primes, we can estimate the # of semiprimes as Pi( 2^b/p ) - Pi( 2^(b-1)/p ). For a batch of primes between p&q, we can calculate it as if there are n primes (n=Pi(q)-Pi(p)) each of size Mean(p,q) [i dont know which would give better result, AM,GM or HM]. Once we have the frequency counts, we can generate a random number in [0,1) and use it to select prime/batch (in case of batch, select a random prime within the batch). There will be lot of approximations & estimations, but just might work. For a given b, there will be some initialization which can be amortized over multiple calls. Last fiddled with by axn on 2017-09-14 at 08:35 |
![]() |
![]() |
![]() |
#11 |
"Antonio Key"
Sep 2011
UK
32·59 Posts |
![]()
Don't know if it helps, but I have used the attached perl code which seems to work ok. It uses your random_prime.
Prime distribution, routine executed until one of the counted prime factors = 100k (26bit semi-primes): Code:
Prime Count 2 98900 3 98719 5 99101 7 99327 11 99254 13 98703 17 98651 19 99279 23 99474 29 99749 31 98857 37 99465 41 99004 43 99085 47 98477 53 100000 59 98947 61 98380 67 98631 71 98491 73 99117 79 99165 83 98799 89 98832 97 99597 101 98376 103 99035 107 98870 109 98463 113 98726 127 98853 131 98661 137 98324 139 98539 149 98855 151 99396 157 99167 163 98829 167 98532 173 99200 179 99195 181 99165 191 98809 193 99163 197 99418 199 99166 |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Semiprime and n-almost prime candidate for the k's with algebra for the Sierpinski/Riesel problem | sweety439 | sweety439 | 11 | 2020-09-23 01:42 |
random comments, random questions and thread titles made for Google | jasong | Lounge | 46 | 2017-05-09 12:32 |
Factored my first SemiPrime -Some Questions | tal | Msieve | 31 | 2011-05-22 18:17 |
Generating Random Primes | davar55 | Math | 14 | 2011-02-20 16:06 |
About random number (random seed) in Msieve | Greenk12 | Factoring | 1 | 2008-11-15 13:56 |