mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Programming

Reply
 
Thread Tools
Old 2017-09-12, 22:50   #1
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

16148 Posts
Default Generating a uniform random n-bit semiprime

I have a routine that generates a random n-bit semiprime and would like to improve its efficiency. This is the "unrestricted" semiprime in that I am not just looking for an even bit split, so different than RSA key generation or examples for hard factoring. There are many papers about generation methods for those.

The input is a number of bits n, and the output is an n-bit semiprime. The probability of a given result should be equal to that of all other n-bit semiprimes. Hence with an input of 6, the values 33,34,35,38,39,46,49,51,55,57,58,62 should be returned with equal likelihood.

See CRGreathouse's 2010 post for one possible implementation.

For very small bits, e.g. 7 and under, we can just uniformly select from a list. Perfect distribution, super fast.

For larger inputs, we can randomly select an n-bit number, do an is_semiprime() test, repeating until that is true. This is reasonably fast at small sizes, e.g. average 1 microsecond for 20-bit, 3uS for 32-bit, 10uS for 48-bit, 40uS for 60-bit. Your implementation will differ of course. But at some point this really starts slowing down, and basically for >64-bit this is not a good method.

Charles' idea is clever, weighting the distributions. At a high level this gives a good approximation to the distribution. However it skews heavily toward primes with a large following gap. So much so that 3 occurs as often as 2, and 7 more than 5. I did basically a hack on this method, generating a (mostly) proper curve using a table for small primes then use Charles' method if we selected a larger one. This seems to work reasonably well and gives results much closer to the rejection sampling method.

Any ideas on better ways?
danaj is offline   Reply With Quote
Old 2017-09-13, 01:13   #2
a1call
 
a1call's Avatar
 
"Rashid Naimi"
Oct 2015
Remote to Here/There

111111010002 Posts
Default

A bit over my head.
With that disclaimer and for what it's worth.
Not sure how is_semiprime() works. But with the right trial-by-division algorithm up to the cube-root of n (which is significantly less taxing than trial-by-division up to the square-root of) n you will be left by a set of primes and semi-primes.
a1call is offline   Reply With Quote
Old 2017-09-13, 02:53   #3
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

38C16 Posts
Default

Quote:
Originally Posted by a1call View Post
Not sure how is_semiprime() works.
There are various ways to implement it. With mine, 95% of all inputs require little more than a primality test, which is often trivially decided. Then partial factoring quickly finds most solutions. Charles has an example in Pari: http://www.mersenneforum.org/showpos...postcount=2074 that is similar. We both found these tests work well up to a certain point after which it is faster to do more sophisticated factoring. Yes we stop if the input is tiny and we pass the cube root of n.

Quote:
But with the right trial-by-division algorithm up to the cube-root of n (which is significantly less taxing than trial-by-division up to the square-root of) n you will be left by a set of primes and semi-primes.
This sounds like, to generate a single 64-bit semiprime, we'd create a list of all 64-bit primes and semiprimes (or on average half of them since we're randomly selecting one). That's a stupendously large list. 128-bit semiprimes would be far worse. If n is 1024, then we'd need to work up to the cube root of 2^1024 -- not going to happen.
danaj is offline   Reply With Quote
Old 2017-09-13, 04:16   #4
a1call
 
a1call's Avatar
 
"Rashid Naimi"
Oct 2015
Remote to Here/There

23×11×23 Posts
Default

Again keeping my disclaimer in mind, i will offer some more simpleton comments which is all I can do.

The following example can be stretched in different ranges and is just a concept.
Suppose you are aiming to produce the set of all the primes and semiprimes between 100 to 200.

You could do trial by division up to n^(1/4).
This would filter out all the factors of 2 and 3 for this example range.
Then you could perform gcd operations between the remaining items which will distinguish all the primes from 5 to 37.
The remaining primes would be to large to form a semi prime in the range (unless multiplied by 2 or 3).

Last fiddled with by a1call on 2017-09-13 at 04:38
a1call is offline   Reply With Quote
Old 2017-09-13, 05:12   #5
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

22×227 Posts
Default

Quote:
Originally Posted by a1call View Post
Suppose you are aiming to produce the set of all the primes and semiprimes between 100 to 200.
That's not what I'm asking for.

My problem is, for example, generating 1000 random 128-bit semiprimes, each with approximately uniform probability. I don't want to generate all the semiprimes in the range, even if that were at all feasible. There are something like 8 * 10^36 of them.

Even with more tame sizes, the point isn't to generate all of them. It is to quickly generate a single random semiprime that has the right probability. Generating them with a somewhat incorrect probability is pretty easy (e.g. generate a uniform random prime p between 2 and 2^(n-1), then generate another uniform random prime q in the range such that p*q is an n-bit number).
danaj is offline   Reply With Quote
Old 2017-09-13, 13:07   #6
Dr Sardonicus
 
Dr Sardonicus's Avatar
 
Feb 2017
Nowhere

2·52·7·13 Posts
Default

Quote:
Originally Posted by danaj View Post
The input is a number of bits n, and the output is an n-bit semiprime. The probability of a given result should be equal to that of all other n-bit semiprimes. Hence with an input of 6, the values 33,34,35,38,39,46,49,51,55,57,58,62 should be returned with equal likelihood.
Hmm. I'm not sure about the usage of either "uniform" or "random" here. You're looking at the interval

[2^(n-1), 2^n - 1].

A uniform distribution in an interval usually means, the probability of landing in a sub-interval is proportional to its length. But the numbers from the set you're interested in aren't uniformly distributed in the interval. So a process which does well at selecting numbers which are uniformly distributed in the interval, may do less well at selecting members of the subset of semiprimes with equal probability.

As to randomness in your subset, theoretical perfection would seem to require knowing in advance all the numbers in the set (here, the n-bit semiprimes), then having a process to select one of them randomly, say by enumerating them 1 to K, and randomly selecting an integer from 1 to K.

Obviously, knowing the complete set of semiprimes in the interval is impracticable when the bit size is large (as is, therefore, being able to know how "random" your choice of semiprime might be.)

So it seems to me that some sort of "weighted" approach -- as already mentioned -- might be a practical compromise, to get reasonably close to what you want with a reasonable amout of computing power.

There is an asymptotic estimate due to Landau for the number of semiprimes less than x,

x*(log(log(x))/log(x)

but this estimate is not all that great.

I'm sure there are estimates of the distribution in terms of the size of the smaller of the two prime factors. I'm equally sure that, if I tried to reproduce such an estimate "on the fly" I would screw it up royally. But such an estimate seems to me to be a good way to go. Use the distribution in terms of the size of the smaller factor to create a "stratified random sample," and then cross your fingers...
Dr Sardonicus is offline   Reply With Quote
Old 2017-09-13, 15:28   #7
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

22×227 Posts
Default

I think you've covered most of what I've got.

Two ways that generate the right distribution are:
  • Rejection sampling. Generate a uniform value in the range 2^(n-1) to 2^n-1. If not semiprime, try again. The chances of any given semiprime being chosen are identical to any other. The main downside is that is_semiprime gets expensive as the size goes up (e.g. n > 64)
  • Determine the number of semiprimes S_n in the range 2^(n-1) to 2^n-1. Generate a uniform integer 0 < i <= S_n. Select the i'th semiprime in the range. The downside is having to have efficient versions of semiprime_count and nth_semiprime.

Selecting uniform random primes in a range can be done similarly, and I have done each in that context. For primes the second is somewhat reasonable given extremely optimized prime count and nth prime routines, and for semiprimes Charles and I use this in a hardcoded fashion for tiny inputs. With primes, the former has some optimizations that cut out large groups of composites from the selection. I don't think those work nearly as well with semiprimes.

The rejection sampling method is practical to 64-bits, and feasible probably up into 128 or so bits. Depends on ones available factoring algorithms and performance tolerance. It'd be nice to have something better, but this works.

Quote:
Use the distribution in terms of the size of the smaller factor to create a "stratified random sample," and then cross your fingers...
I believe technically this is related to inverse transform sampling. Charles does this selecting the first prime. I found it didn't work as well as I'd like, so applied some hacky corrections, which makes it better but still dubious. At least at this size almost nobody will notice if we're not perfect.
danaj is offline   Reply With Quote
Old 2017-09-13, 20:34   #8
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

32×5×7×19 Posts
Default

If you use 'my' method, be sure to replace the hacky nextprime() bit with the proper random prime generation which should make the overall method nearly uniform. (PARI/GP now has this bit built in as randomprime.)
CRGreathouse is offline   Reply With Quote
Old 2017-09-13, 22:49   #9
danaj
 
"Dana Jacobsen"
Feb 2011
Bangkok, TH

22×227 Posts
Default

Quote:
Originally Posted by CRGreathouse View Post
If you use 'my' method, be sure to replace the hacky nextprime() bit with the proper random prime generation which should make the overall method nearly uniform. (PARI/GP now has this bit built in as randomprime.)
The issue is with the selection of p rather than q. The difference in probability of (2,3,5,7) is completely different with inputs of 25 vs. 26, where the former uses rejection sampling and the latter selects two primes using the expected probability.

For 100k 26-bit random semiprimes, using the first code (rejection sampling) and looking at the histogram of smallest factor:

Code:
17026 2
11633 3
7168 5
5300 7
3447 11
2956 13
2296 17
2098 19
1810 23
1368 29
Then the result with the second code (probability):

Code:
15553 3
14520 2
8509 7
7661 5
4050 13
2964 23
2651 19
2649 11
2100 31
1540 17
danaj is offline   Reply With Quote
Old 2017-09-14, 08:32   #10
axn
 
axn's Avatar
 
Jun 2003

4,969 Posts
Default

Here's an engineer's solution. Build a table of frequency of b-bit semiprimes where the smaller factor is p. But rather than do it for each prime < sqrt(2^b), we do it in batches. Since the smaller primes have more contribution, we can do it at invidividual prime levels for small primes (say < 128). After that we do it at at batches of, say, 0.5 bits, like 2^7-2^7.5, 2^7.5-2^8, etc... upto 2^(b/2).

For individual primes, we can estimate the # of semiprimes as Pi( 2^b/p ) - Pi( 2^(b-1)/p ).
For a batch of primes between p&q, we can calculate it as if there are n primes (n=Pi(q)-Pi(p)) each of size Mean(p,q) [i dont know which would give better result, AM,GM or HM].
Once we have the frequency counts, we can generate a random number in [0,1) and use it to select prime/batch (in case of batch, select a random prime within the batch).

There will be lot of approximations & estimations, but just might work. For a given b, there will be some initialization which can be amortized over multiple calls.

Last fiddled with by axn on 2017-09-14 at 08:35
axn is offline   Reply With Quote
Old 2017-09-14, 15:48   #11
Antonio
 
Antonio's Avatar
 
"Antonio Key"
Sep 2011
UK

32×59 Posts
Default

Don't know if it helps, but I have used the attached perl code which seems to work ok. It uses your random_prime.

Prime distribution, routine executed until one of the counted prime factors = 100k (26bit semi-primes):
Code:
Prime   Count
   2    98900
   3    98719
   5    99101
   7    99327
  11    99254
  13    98703
  17    98651
  19    99279
  23    99474
  29    99749
  31    98857
  37    99465
  41    99004
  43    99085
  47    98477
  53   100000
  59    98947
  61    98380
  67    98631
  71    98491
  73    99117
  79    99165
  83    98799
  89    98832
  97    99597
 101    98376
 103    99035
 107    98870
 109    98463
 113    98726
 127    98853
 131    98661
 137    98324
 139    98539
 149    98855
 151    99396
 157    99167
 163    98829
 167    98532
 173    99200
 179    99195
 181    99165
 191    98809
 193    99163
 197    99418
 199    99166
Attached Files
File Type: 7z random_semi.7z (605 Bytes, 78 views)
Antonio is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Semiprime and n-almost prime candidate for the k's with algebra for the Sierpinski/Riesel problem sweety439 sweety439 11 2020-09-23 01:42
random comments, random questions and thread titles made for Google jasong Lounge 46 2017-05-09 12:32
Factored my first SemiPrime -Some Questions tal Msieve 31 2011-05-22 18:17
Generating Random Primes davar55 Math 14 2011-02-20 16:06
About random number (random seed) in Msieve Greenk12 Factoring 1 2008-11-15 13:56

All times are UTC. The time now is 22:41.

Mon May 17 22:41:00 UTC 2021 up 39 days, 17:21, 0 users, load averages: 2.62, 2.49, 2.66

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.