mersenneforum.org  

Go Back   mersenneforum.org > New To GIMPS? Start Here! > Homework Help

Reply
 
Thread Tools
Old 2015-02-02, 04:46   #1
c10ck3r
 
c10ck3r's Avatar
 
Aug 2010
Kansas

547 Posts
Default Theoretical Experiment Design

Okay, I know there's probably a (relatively) easy way to figure this out, but I'm having trouble finding material to figure out how large of a sample I would need to fulfill the following criteria:
Background: Bread wheat (Triticum aestivum) is a diploid (2n=42) plant that self-fertilizes. If parents A and B are crossed, and the progeny are allowed to self-fertilize until all lines are true-breeding (recombinant inbred lines), how many end plants would be required to be 99% sure that one of every chromosome combination is represented?

There are 2^21=2097152 possibilities, and for the sake of simplicity we'll assume that enough F1 (hybrid) seed is produced to cover the full spectrum. Each combination, then, has a 1/2097152 chance of occuring. How many extra/total seed would be needed to cover the possibility of the same genotype occuring multiple times?

Any material references and/or quick, dirty hacks are appreciated!
Thanks!
c10ck3r is offline   Reply With Quote
Old 2015-02-02, 06:19   #2
axn
 
axn's Avatar
 
Jun 2003

22×52×47 Posts
Default

The probability of a particular combination not appearing in x seeds is (1-1/2097152)^x. So we need to find x such that this expression < 0.01 (i.e. 99% certainity that a particular combination would appear). Taking logs,

x * log(1-1/2097152) < log(0.01), which gives x = 9,657,740 (or about 4.6x the number of combinations)
axn is offline   Reply With Quote
Old 2015-02-02, 17:04   #3
chris2be8
 
chris2be8's Avatar
 
Sep 2009

7·271 Posts
Default

I think that gives a 99% probability of any given combination of chromosomes being in the set. To ensure you have *every* combination you need a lot more. I think making the expression less than 0.01*1/2097152 would work, but that's probably not a practical amount of seed.

Chris
chris2be8 is offline   Reply With Quote
Old 2015-02-02, 20:36   #4
wblipp
 
wblipp's Avatar
 
"William"
May 2003
New Haven

2·32·131 Posts
Default

I have problems with the problem statement.

The first problem is chromosome crossover. The problem statement appears to assume that chromosomes will remain intact through the exercise. But real chromosomes swap off, so that genes that start on the same chromosome can end up on different chromosomes.

My next problem is that I don't understand the design setup. When I cross A and B, how many seeds do I keep? When I cross a child with itself, how many seeds I keep?

The proposals appear to be addressing the mathematical question "suppose I select genotypes at random. How many do I need to select to have a 99% probability of having selected all genotypes." It is not clear the breeding program corresponds to this mathematical question.

The exact solution to that perhaps inappropriate mathematical question can be found by creating the transition matrix for the random walk on the state space of "number of distinct genotypes selected." and iterating until the desired probability is reached.

For an approximate solution to the perhaps inappropriate mathematical question, Chris2be8 is on the right track. The probability that a particular genotype has been missed in a sample of size x is

px=(1-1/2097152)^x

This event of "missing the particular genotype" will either occur, or it won't occur, so the expected number of times this event occurs is also px. But we are not interested in just this particular genotype - we are interested in all of them, each with same expected number of occurrences. So we are interested in many events with

total expected number of events = a = 2097152*px.

When dealing with many events that have small and independent probability of occurring, the Poisson approximation is appropriate - this is close enough. The Poisson approximation for the probability of exactly zero events occuring is exp(-a). So if you want to be 99% certain of having all genotypes, you need a small enough that

exp(-a) = 0.99

a = -ln(0.99) = 0.01005

px = a / 2097152 = 4.79e-9

x = ln(px) / ln(1-1/2097152) = 40.2e6

Remember this may be the correct answer to wrong mathematical question
wblipp is offline   Reply With Quote
Old 2015-02-03, 01:48   #5
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

19×461 Posts
Default

Quote:
Originally Posted by wblipp View Post
The proposals appear to be addressing the mathematical question "suppose I select genotypes at random. How many do I need to select to have a 99% probability of having selected all genotypes." It is not clear the breeding program corresponds to this mathematical question.

<snip>

Remember this may be the correct answer to wrong mathematical question
Me has advanced discussions with the little LaurV in the house about genetics. The little one is studying it at school and she constantly succeeds in asking difficult questions to which we can't answer and need to go back to the books. Me thinks she likes it (both the genetics and putting us in the corner).

Last fiddled with by LaurV on 2015-02-03 at 01:49
LaurV is offline   Reply With Quote
Old 2015-02-03, 03:55   #6
c10ck3r
 
c10ck3r's Avatar
 
Aug 2010
Kansas

547 Posts
Default

Wblipp: The question was simplified to dis-include crossing over and logistical concerns because I was more interested in figuring out how to come up with a rough estimate than in actually getting an answer. So, is the 9.7m estimate or the 40.2m estimate correct? Using the rule of "eh, that sounds about right" I would've expected it to be closer to 10m than 40...

Background: My degree program deals with genetics, and work/research has me doing QTL research with wheat, but I haven't had advanced stats classes yet (they don't start until grad school). I've had practical exposure to the trials themselves, but not (yet) to design methods. I was just trying to get a grasp of how many different RIL's/ DH lines (recombinant inbred lines/doubled haploid) would be required to give a full characterization of polygenic action. In theory, 44 lines should be enough to cover single genes (2 parents plus 2 genotypes that vary from each other by only 1 chromosome pair*21 chromosome pairs).
c10ck3r is offline   Reply With Quote
Old 2015-02-03, 05:09   #7
wblipp
 
wblipp's Avatar
 
"William"
May 2003
New Haven

2×32×131 Posts
Default

Quote:
Originally Posted by c10ck3r View Post
I was just trying to get a grasp of how many different RIL's/ DH lines (recombinant inbred lines/doubled haploid) would be required to give a full characterization of polygenic action.
I don't understand the genetics jargon well enough to translate this into a mathematical question. This sounds closer to "how many random genotypes to have a 99% chance of covering every chromosome in the original A and B plants?" Previously we working on having every possible combination of chromosomes. But I would expect this question to have an answer in the hundreds or thousands - so perhaps this isn't your mathematical question.

I went to the trouble of explaining each step of the 40 million estimate. If you think it's the right mathematical question, then you should examine where you disagree with the logic rather than dismissing it because of preconceived notions.
wblipp is offline   Reply With Quote
Old 2015-02-03, 08:54   #8
axn
 
axn's Avatar
 
Jun 2003

111348 Posts
Default

Quote:
Originally Posted by wblipp View Post
I went to the trouble of explaining each step of the 40 million estimate. If you think it's the right mathematical question, then you should examine where you disagree with the logic rather than dismissing it because of preconceived notions.
Perhaps OP is unable to follow the mathematical argumentation and is thus unable to decide by himself which of the two solutions posted is correct?

FWIW, you (and chris2be8) are correct.

@OP: My formulation works for _any_ given combination, but for _all_ combinations together, we will have a combined probability of 0.99^2097152 = 2.2x10^-9154. We actually need the individual probability, p, to be much closer to 1 such that p^2097152 = 0.99, which gives p = 0.9999999952. Plugging this number back into my original formulation gives x=40.17e6 which is basically wblipp's answer (except he derived it much more elegantly )
axn is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime number thought experiment MooMoo2 Lounge 59 2018-01-02 18:37
My little ( above conjuctured K ) experiment (base 5) pepi37 Conjectures 'R Us 1 2016-05-23 08:44
Theoretical Evaluation of a GPU tapion64 PrimeNet 10 2014-04-09 22:21
Maximum theoretical MPG Mini-Geek Lounge 9 2008-07-14 22:45
The Most Beautiful Experiment Spherical Cow Science & Technology 6 2008-06-09 18:00

All times are UTC. The time now is 18:37.

Wed Sep 30 18:37:20 UTC 2020 up 20 days, 15:48, 0 users, load averages: 1.82, 1.80, 1.76

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.