Segment 13 Sanmit Narvekar

From Computational Statistics Course Wiki
Jump to navigation Jump to search

Segment 13

To Calculate

1. With p=0.3, and various values of n, how big is the largest discrepancy between the Binomial probability pdf and the approximating Normal pdf? At what value of n does this value become smaller than Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle 10^{-15}} ?


2. Show that if four random variables are (together) multinomially distributed, each separately is binomially distributed.

Consider 4 random variables with number of occurrences Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle A, B, C, D } and probabilities of occurrence Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P_A, P_B, P_C, P_D} respectively, such that Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle A + B + C + D = n} and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P_A + P_B + P_C + P_D = 1} .

The multinomial distribution over these variables is (with some abuse of notation) given below (think about choosing the A spots for the first random variable, with some probability. Then choosing spots out of the ones remaining for the others, etc..):

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P(A, B, C, D) = \left( \binom{A+B+C+D}{A} (P_A)^A \left( \binom{B+C+D}{B} (P_B)^B \left( \binom{C+D}{C} (P_C)^C \binom{D}{D} (P_D)^D \right) \right) \right)}

Then, by repeatedly applying the binomial theorem:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P(A, B, C, D) = \left( \binom{A+B+C+D}{A} (P_A)^A \left( \binom{B+(C+D)}{B} (P_B)^B (P_C + P_D)^{C+D} \right) \right)}

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P(A, B, C, D) = \binom{A+(B+C+D)}{A} (P_A)^A (P_B + P_C + P_D)^{B + C+D}}

And using our notation from above

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P(A, B, C, D) = \binom{n}{A} (P_A)^A (1-P_A)^{n-A}}

gives the familiar form of the binomial distribution, where one event is A, and the other event is not A (that is B or C or D). Similar calculations can be done for the others by changing the order in which you select spots for the variables.

To Think About

1. The segment suggests that Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle A\ne T} and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle C\ne G} comes about because genes are randomly distributed on one strand or the other. Could you use the observed discrepancies to estimate, even roughly, the number of genes in the yeast genome? If so, how? If not, why not?


2. Suppose that a Bayesian thinks that the prior probability of the hypothesis that "Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P_A=P_T} " is 0.9, and that the set of all hypotheses that "Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P_A\ne P_T} " have a total prior of 0.1. How might he calculate the odds ratio Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \text{Prob}(P_A=P_T)/\text{Prob}(P_A\ne P_T)} ? Hint: Are there nuisance variables to be marginalized over?