20080319, 13:51  #1 
(loop (#_fork))
Feb 2006
Cambridge, England
2·7·461 Posts 
Basic optimisation question
What's an efficient way to choose samples uniformly at random from the set of, say, 'sets of N different integers less than M' ? I'd prefer not to use N^2 time (pick an element, check that it's different from all the ones picked to date) or M memory (flag each integer as notpickedyet).
This seems to be a fairly key component of all the puzzles 
20080319, 14:18  #2  
Undefined
"The unspeakable one"
Jun 2006
My evil lair
2×3×1,093 Posts 
Quote:
1. Select a card at random from the pack. 2. Remove that card from the pack. 3. Repeat 1 & 2 until you have the required number of cards. If you use something like a linked list you can remove each element in constant time, but finding the selected the element is linear time. If you don't mind breaking your "flag each integer as notpickedyet" criteria, you can use a bitmap for the flags if memory is a concern, but may require multiple tries of random numbers until you 'hit' an unselected element. 

20080319, 16:13  #3  
"Ben"
Feb 2007
5·727 Posts 
Quote:
How about: pick all the elements (O(N)) sort (N log N) remove X duplicates (O(N)) N = X repeat until X = 0 X is probably much less than N every iteration, so I think the overall complexity is closer to N log N than N^2 

20080319, 16:22  #4  
"Bob Silverman"
Nov 2003
North of Boston
2^{3}·3·311 Posts 
Quote:
at random' because your event space is not constant. What you seem to be looking for is a shuffling algorithm. Knuth Vol I has examples. 

20080406, 07:24  #5 
Mar 2008
2^{5} Posts 

20080406, 07:51  #6  
Jun 2005
2·191 Posts 
Quote:
I think the 'pick and check' approach you described is appropriate, but the 'check' step doesn't need to have N complexity...it can be constant complexity. A hash table would be my choice for N much smaller than M. You can create a hash function ( mod N, for instance) that is roughly Nsized, your collisions would be fairly infrequent. Your worst case performance would still be N^2, but it's extremely unlikely. Typically, it will have order 'N' complexity. By choosing random integers, you've implicitly optimized against any clustering problems. A common coding technique is to use a bitmask as an efficient modulus operator if your table length is a power of 2. For instance, if your hash table has 64 elements, and your selection is 'n', then your key = n & 0x3f. For your table length, pick the next power of 2 greater than 2N, then you can use open addressing in an array to effectively handle collisions. Drew Last fiddled with by drew on 20080406 at 08:17 

20080408, 13:50  #7 
Feb 2007
2^{4}×3^{3} Posts 
I think the appropriate algorithm depends on the relative size of M/N.
"sparse" case: If M is very large (i.e. N/M small), it is best to use random values add them to a set (implemented as sorted list ignoring duplicates) until the set has desired cardinality. "dense" case: If N is very large (comparable to M) then eventually you will need O(M) memory (not bytes or worse int's, but at least bits) anyway. Here I'd suggest to chose random values in 1..M to set (or delete, if N>M/2) bits in a bitmap, if {1..M } is too large to shuffle and take the first N values. I wonder whether there coud be an approach using a random generator directly creating Mbit numbers (and doing something efficient to get such a random number having N bits), maybe "or"ing if there are not enough and some intelligent thing to clear a few bits if there are too many bits set) Last fiddled with by m_f_h on 20080408 at 14:01 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Question regarding basic routines being used in Yafu.  storflyt32  YAFU  2  20150629 23:25 
basic question for assignment  wong8888  Information & Answers  5  20150322 12:15 
A basic math question  iconized  Prime Sierpinski Project  2  20120203 00:01 
Very basic question about Wiedemann methods  fivemack  Math  0  20080616 10:57 
Basic Question about ECM factoring?  drake2  Math  1  20060112 07:40 