 mersenneforum.org Theoretical Experiment Design
 Register FAQ Search Today's Posts Mark Forums Read 2015-02-02, 04:46 #1 c10ck3r   Aug 2010 Kansas 10001000112 Posts Theoretical Experiment Design Okay, I know there's probably a (relatively) easy way to figure this out, but I'm having trouble finding material to figure out how large of a sample I would need to fulfill the following criteria: Background: Bread wheat (Triticum aestivum) is a diploid (2n=42) plant that self-fertilizes. If parents A and B are crossed, and the progeny are allowed to self-fertilize until all lines are true-breeding (recombinant inbred lines), how many end plants would be required to be 99% sure that one of every chromosome combination is represented? There are 2^21=2097152 possibilities, and for the sake of simplicity we'll assume that enough F1 (hybrid) seed is produced to cover the full spectrum. Each combination, then, has a 1/2097152 chance of occuring. How many extra/total seed would be needed to cover the possibility of the same genotype occuring multiple times? Any material references and/or quick, dirty hacks are appreciated! Thanks!   2015-02-02, 06:19 #2 axn   Jun 2003 2×32×293 Posts The probability of a particular combination not appearing in x seeds is (1-1/2097152)^x. So we need to find x such that this expression < 0.01 (i.e. 99% certainity that a particular combination would appear). Taking logs, x * log(1-1/2097152) < log(0.01), which gives x = 9,657,740 (or about 4.6x the number of combinations)   2015-02-02, 17:04 #3 chris2be8   Sep 2009 24·139 Posts I think that gives a 99% probability of any given combination of chromosomes being in the set. To ensure you have *every* combination you need a lot more. I think making the expression less than 0.01*1/2097152 would work, but that's probably not a practical amount of seed. Chris   2015-02-02, 20:36 #4 wblipp   "William" May 2003 New Haven 23×103 Posts I have problems with the problem statement. The first problem is chromosome crossover. The problem statement appears to assume that chromosomes will remain intact through the exercise. But real chromosomes swap off, so that genes that start on the same chromosome can end up on different chromosomes. My next problem is that I don't understand the design setup. When I cross A and B, how many seeds do I keep? When I cross a child with itself, how many seeds I keep? The proposals appear to be addressing the mathematical question "suppose I select genotypes at random. How many do I need to select to have a 99% probability of having selected all genotypes." It is not clear the breeding program corresponds to this mathematical question. The exact solution to that perhaps inappropriate mathematical question can be found by creating the transition matrix for the random walk on the state space of "number of distinct genotypes selected." and iterating until the desired probability is reached. For an approximate solution to the perhaps inappropriate mathematical question, Chris2be8 is on the right track. The probability that a particular genotype has been missed in a sample of size x is px=(1-1/2097152)^x This event of "missing the particular genotype" will either occur, or it won't occur, so the expected number of times this event occurs is also px. But we are not interested in just this particular genotype - we are interested in all of them, each with same expected number of occurrences. So we are interested in many events with total expected number of events = a = 2097152*px. When dealing with many events that have small and independent probability of occurring, the Poisson approximation is appropriate - this is close enough. The Poisson approximation for the probability of exactly zero events occuring is exp(-a). So if you want to be 99% certain of having all genotypes, you need a small enough that exp(-a) = 0.99 a = -ln(0.99) = 0.01005 px = a / 2097152 = 4.79e-9 x = ln(px) / ln(1-1/2097152) = 40.2e6 Remember this may be the correct answer to wrong mathematical question   2015-02-03, 01:48   #5
LaurV
Romulan Interpreter

"name field"
Jun 2011
Thailand

32×1,097 Posts Quote:
 Originally Posted by wblipp The proposals appear to be addressing the mathematical question "suppose I select genotypes at random. How many do I need to select to have a 99% probability of having selected all genotypes." It is not clear the breeding program corresponds to this mathematical question. Remember this may be the correct answer to wrong mathematical question Me has advanced discussions with the little LaurV in the house about genetics. The little one is studying it at school and she constantly succeeds in asking difficult questions to which we can't answer and need to go back to the books. Me thinks she likes it (both the genetics and putting us in the corner).

Last fiddled with by LaurV on 2015-02-03 at 01:49   2015-02-03, 03:55 #6 c10ck3r   Aug 2010 Kansas 547 Posts Wblipp: The question was simplified to dis-include crossing over and logistical concerns because I was more interested in figuring out how to come up with a rough estimate than in actually getting an answer. So, is the 9.7m estimate or the 40.2m estimate correct? Using the rule of "eh, that sounds about right" I would've expected it to be closer to 10m than 40... Background: My degree program deals with genetics, and work/research has me doing QTL research with wheat, but I haven't had advanced stats classes yet (they don't start until grad school). I've had practical exposure to the trials themselves, but not (yet) to design methods. I was just trying to get a grasp of how many different RIL's/ DH lines (recombinant inbred lines/doubled haploid) would be required to give a full characterization of polygenic action. In theory, 44 lines should be enough to cover single genes (2 parents plus 2 genotypes that vary from each other by only 1 chromosome pair*21 chromosome pairs).   2015-02-03, 05:09   #7
wblipp

"William"
May 2003
New Haven

23·103 Posts Quote:
 Originally Posted by c10ck3r I was just trying to get a grasp of how many different RIL's/ DH lines (recombinant inbred lines/doubled haploid) would be required to give a full characterization of polygenic action.
I don't understand the genetics jargon well enough to translate this into a mathematical question. This sounds closer to "how many random genotypes to have a 99% chance of covering every chromosome in the original A and B plants?" Previously we working on having every possible combination of chromosomes. But I would expect this question to have an answer in the hundreds or thousands - so perhaps this isn't your mathematical question.

I went to the trouble of explaining each step of the 40 million estimate. If you think it's the right mathematical question, then you should examine where you disagree with the logic rather than dismissing it because of preconceived notions.   2015-02-03, 08:54   #8
axn

Jun 2003

10100100110102 Posts Quote:
 Originally Posted by wblipp I went to the trouble of explaining each step of the 40 million estimate. If you think it's the right mathematical question, then you should examine where you disagree with the logic rather than dismissing it because of preconceived notions.
Perhaps OP is unable to follow the mathematical argumentation and is thus unable to decide by himself which of the two solutions posted is correct?

FWIW, you (and chris2be8) are correct.

@OP: My formulation works for _any_ given combination, but for _all_ combinations together, we will have a combined probability of 0.99^2097152 = 2.2x10^-9154. We actually need the individual probability, p, to be much closer to 1 such that p^2097152 = 0.99, which gives p = 0.9999999952. Plugging this number back into my original formulation gives x=40.17e6 which is basically wblipp's answer (except he derived it much more elegantly )  Thread Tools Show Printable Version Email this Page Similar Threads Thread Thread Starter Forum Replies Last Post MooMoo2 Lounge 59 2018-01-02 18:37 pepi37 Conjectures 'R Us 1 2016-05-23 08:44 tapion64 PrimeNet 10 2014-04-09 22:21 Mini-Geek Lounge 9 2008-07-14 22:45 Spherical Cow Science & Technology 6 2008-06-09 18:00

All times are UTC. The time now is 05:16.

Wed Jan 26 05:16:22 UTC 2022 up 186 days, 23:45, 0 users, load averages: 1.09, 1.71, 1.72