20150202, 04:46  #1 
Aug 2010
Kansas
1000100011_{2} Posts 
Theoretical Experiment Design
Okay, I know there's probably a (relatively) easy way to figure this out, but I'm having trouble finding material to figure out how large of a sample I would need to fulfill the following criteria:
Background: Bread wheat (Triticum aestivum) is a diploid (2n=42) plant that selffertilizes. If parents A and B are crossed, and the progeny are allowed to selffertilize until all lines are truebreeding (recombinant inbred lines), how many end plants would be required to be 99% sure that one of every chromosome combination is represented? There are 2^21=2097152 possibilities, and for the sake of simplicity we'll assume that enough F1 (hybrid) seed is produced to cover the full spectrum. Each combination, then, has a 1/2097152 chance of occuring. How many extra/total seed would be needed to cover the possibility of the same genotype occuring multiple times? Any material references and/or quick, dirty hacks are appreciated! Thanks! 
20150202, 06:19  #2 
Jun 2003
2×3^{2}×293 Posts 
The probability of a particular combination not appearing in x seeds is (11/2097152)^x. So we need to find x such that this expression < 0.01 (i.e. 99% certainity that a particular combination would appear). Taking logs,
x * log(11/2097152) < log(0.01), which gives x = 9,657,740 (or about 4.6x the number of combinations) 
20150202, 17:04  #3 
Sep 2009
2^{4}·139 Posts 
I think that gives a 99% probability of any given combination of chromosomes being in the set. To ensure you have *every* combination you need a lot more. I think making the expression less than 0.01*1/2097152 would work, but that's probably not a practical amount of seed.
Chris 
20150202, 20:36  #4 
"William"
May 2003
New Haven
23×103 Posts 
I have problems with the problem statement.
The first problem is chromosome crossover. The problem statement appears to assume that chromosomes will remain intact through the exercise. But real chromosomes swap off, so that genes that start on the same chromosome can end up on different chromosomes. My next problem is that I don't understand the design setup. When I cross A and B, how many seeds do I keep? When I cross a child with itself, how many seeds I keep? The proposals appear to be addressing the mathematical question "suppose I select genotypes at random. How many do I need to select to have a 99% probability of having selected all genotypes." It is not clear the breeding program corresponds to this mathematical question. The exact solution to that perhaps inappropriate mathematical question can be found by creating the transition matrix for the random walk on the state space of "number of distinct genotypes selected." and iterating until the desired probability is reached. For an approximate solution to the perhaps inappropriate mathematical question, Chris2be8 is on the right track. The probability that a particular genotype has been missed in a sample of size x is px=(11/2097152)^x This event of "missing the particular genotype" will either occur, or it won't occur, so the expected number of times this event occurs is also px. But we are not interested in just this particular genotype  we are interested in all of them, each with same expected number of occurrences. So we are interested in many events with total expected number of events = a = 2097152*px. When dealing with many events that have small and independent probability of occurring, the Poisson approximation is appropriate  this is close enough. The Poisson approximation for the probability of exactly zero events occuring is exp(a). So if you want to be 99% certain of having all genotypes, you need a small enough that exp(a) = 0.99 a = ln(0.99) = 0.01005 px = a / 2097152 = 4.79e9 x = ln(px) / ln(11/2097152) = 40.2e6 Remember this may be the correct answer to wrong mathematical question 
20150203, 01:48  #5  
Romulan Interpreter
"name field"
Jun 2011
Thailand
3^{2}×1,097 Posts 
Quote:
Last fiddled with by LaurV on 20150203 at 01:49 

20150203, 03:55  #6 
Aug 2010
Kansas
547 Posts 
Wblipp: The question was simplified to disinclude crossing over and logistical concerns because I was more interested in figuring out how to come up with a rough estimate than in actually getting an answer. So, is the 9.7m estimate or the 40.2m estimate correct? Using the rule of "eh, that sounds about right" I would've expected it to be closer to 10m than 40...
Background: My degree program deals with genetics, and work/research has me doing QTL research with wheat, but I haven't had advanced stats classes yet (they don't start until grad school). I've had practical exposure to the trials themselves, but not (yet) to design methods. I was just trying to get a grasp of how many different RIL's/ DH lines (recombinant inbred lines/doubled haploid) would be required to give a full characterization of polygenic action. In theory, 44 lines should be enough to cover single genes (2 parents plus 2 genotypes that vary from each other by only 1 chromosome pair*21 chromosome pairs). 
20150203, 05:09  #7  
"William"
May 2003
New Haven
23·103 Posts 
Quote:
I went to the trouble of explaining each step of the 40 million estimate. If you think it's the right mathematical question, then you should examine where you disagree with the logic rather than dismissing it because of preconceived notions. 

20150203, 08:54  #8  
Jun 2003
1010010011010_{2} Posts 
Quote:
FWIW, you (and chris2be8) are correct. @OP: My formulation works for _any_ given combination, but for _all_ combinations together, we will have a combined probability of 0.99^2097152 = 2.2x10^9154. We actually need the individual probability, p, to be much closer to 1 such that p^2097152 = 0.99, which gives p = 0.9999999952. Plugging this number back into my original formulation gives x=40.17e6 which is basically wblipp's answer (except he derived it much more elegantly ) 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Prime number thought experiment  MooMoo2  Lounge  59  20180102 18:37 
My little ( above conjuctured K ) experiment (base 5)  pepi37  Conjectures 'R Us  1  20160523 08:44 
Theoretical Evaluation of a GPU  tapion64  PrimeNet  10  20140409 22:21 
Maximum theoretical MPG  MiniGeek  Lounge  9  20080714 22:45 
The Most Beautiful Experiment  Spherical Cow  Science & Technology  6  20080609 18:00 