![]() |
|
|
#1 |
|
(loop (#_fork))
Feb 2006
Cambridge, England
72×131 Posts |
What's an efficient way to choose samples uniformly at random from the set of, say, 'sets of N different integers less than M' ? I'd prefer not to use N^2 time (pick an element, check that it's different from all the ones picked to date) or M memory (flag each integer as not-picked-yet).
This seems to be a fairly key component of all the puzzles |
|
|
|
|
|
#2 | |
|
Undefined
"The unspeakable one"
Jun 2006
My evil lair
2·19·163 Posts |
Quote:
1. Select a card at random from the pack. 2. Remove that card from the pack. 3. Repeat 1 & 2 until you have the required number of cards. If you use something like a linked list you can remove each element in constant time, but finding the selected the element is linear time. If you don't mind breaking your "flag each integer as not-picked-yet" criteria, you can use a bitmap for the flags if memory is a concern, but may require multiple tries of random numbers until you 'hit' an unselected element. |
|
|
|
|
|
|
#3 | |
|
"Ben"
Feb 2007
351310 Posts |
Quote:
How about: pick all the elements (O(N)) sort (N log N) remove X duplicates (O(N)) N = X repeat until X = 0 X is probably much less than N every iteration, so I think the overall complexity is closer to N log N than N^2 |
|
|
|
|
|
|
#4 | |
|
Nov 2003
164448 Posts |
Quote:
at random' because your event space is not constant. What you seem to be looking for is a shuffling algorithm. Knuth Vol I has examples. |
|
|
|
|
|
|
#5 |
|
Mar 2008
25 Posts |
|
|
|
|
|
|
#6 | |
|
Jun 2005
38210 Posts |
Quote:
I think the 'pick and check' approach you described is appropriate, but the 'check' step doesn't need to have N complexity...it can be constant complexity. A hash table would be my choice for N much smaller than M. You can create a hash function ( mod N, for instance) that is roughly N-sized, your collisions would be fairly infrequent. Your worst case performance would still be N^2, but it's extremely unlikely. Typically, it will have order 'N' complexity. By choosing random integers, you've implicitly optimized against any clustering problems. A common coding technique is to use a bit-mask as an efficient modulus operator if your table length is a power of 2. For instance, if your hash table has 64 elements, and your selection is 'n', then your key = n & 0x3f. For your table length, pick the next power of 2 greater than 2N, then you can use open addressing in an array to effectively handle collisions. Drew Last fiddled with by drew on 2008-04-06 at 08:17 |
|
|
|
|
|
|
#7 |
|
Feb 2007
24·33 Posts |
I think the appropriate algorithm depends on the relative size of M/N.
"sparse" case: If M is very large (i.e. N/M small), it is best to use random values add them to a set (implemented as sorted list ignoring duplicates) until the set has desired cardinality. "dense" case: If N is very large (comparable to M) then eventually you will need O(M) memory (not bytes or worse int's, but at least bits) anyway. Here I'd suggest to chose random values in 1..M to set (or delete, if N>M/2) bits in a bitmap, if {1..M } is too large to shuffle and take the first N values. I wonder whether there coud be an approach using a random generator directly creating M-bit numbers (and doing something efficient to get such a random number having N bits), maybe "or"-ing if there are not enough and some intelligent thing to clear a few bits if there are too many bits set) Last fiddled with by m_f_h on 2008-04-08 at 14:01 |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Question regarding basic routines being used in Yafu. | storflyt32 | YAFU | 2 | 2015-06-29 23:25 |
| basic question for assignment | wong8888 | Information & Answers | 5 | 2015-03-22 12:15 |
| A basic math question | iconized | Prime Sierpinski Project | 2 | 2012-02-03 00:01 |
| Very basic question about Wiedemann methods | fivemack | Math | 0 | 2008-06-16 10:57 |
| Basic Question about ECM factoring? | drake2 | Math | 1 | 2006-01-12 07:40 |