# Difference between revisions of "Eleisha's Segment 6: The Towne Family Tree"

Jump to navigation Jump to search

To Calculate:

1. Write down an explicit expression for what the slides denote as bin(n,N,r).

$\displaystyle bin(n, N, r) = \binom{N}{n} r^n(1-r)^{N-n}$

2. There is a small error on slide 7 that carries through to the first equation on slide 8 and the graph on slide 9. Find the error, fix it, and redo the graph of slide 9. Does it make a big difference? Why or why not?

There are two errors on slide seven, only one of which carries over to slide eight. One on T = 5, $\displaystyle bin(1, 10 \times 37, r)$ should be $\displaystyle bin(3, 10 \times 37, r)$ . This typo is corrected on slide eight.

The other error is that there should be a $\displaystyle \Delta 0 = bin (0, 1 \times 37, r)$

This error does carry over onto slide eight. Therefore $\displaystyle P(data|r)$ is wrong as written. It should be:

$\displaystyle P(data|r) = bin(0, 1 \times 37, r) bin(0, 2\times 37, r) \ bin(0, 3\times 37, r) bin(1, 5 \times 37, r) bin(0, 5 \times 37, r) bin(0, 6 \times 37, r)$ $\displaystyle \times bin(1, 10 \times 37 , r) bin(3, 10 \times 37, r)$

Here is a graph of the two distributions. The blue is the original distribution and the red is the updated distribution.

It does not make that much of a difference the overall shape and position of the distribution is the sam. Any conclusions drawn about the probability of r given the data will be similar.

Here is some that can be used to plot the original distribution versus the new one:


from scipy.stats import binom
import numpy as np
import matplotlib.pyplot as plt

x_points = np.arange(0.0001, 0.02, 0.0005) #Create evenly spaced numbers from 0 to 0.02
old_r_values = []
for r in x_points:
prior = float(1/r) #Log-uniform prior
r_old_prob = (binom.pmf(0, 3*37, r)*binom.pmf(0, 3*37, r)*binom.pmf(1, 5*37, r)*binom.pmf(0, 5*37, r)
*binom.pmf(0, 6*37, r)*binom.pmf(1, 11*37, r)*binom.pmf(3, 10*37, r)*prior)
old_r_values.append(r_old_prob) 	#Original formula for P(data|r)*(1/r)

new_r_values = []
for r in x_points:
prior = float(1/r) #Log-uniform prior
r_new_prob = (binom.pmf(0, 1*37, r)*binom.pmf(0, 2*37, r)*binom.pmf(0, 3*37, r)*binom.pmf(1, 5*37, r)
*binom.pmf(0, 5*37, r)*binom.pmf(0, 6*37, r)*binom.pmf(1, 10*37, r)*binom.pmf(3, 10*37, r)*prior)
new_r_values.append(r_new_prob) #Updated formula for P(data|r)*(1/r)

#Lines that produce the plot
p1 = plt.plot(x_points,old_r_values, color = 'blue' )
p2 = plt.plot(x_points,new_r_values, color = 'red')
plt.xticks([0.0, 0.005, 0.010, 0.015, 0.02], ["0.0", "0.005", "0.010", "0.015", "0.02"])
plt.yticks([0.0, 0.1, 0.2, 0.3, 0.4, 0.5])
plt.xlabel("r")
plt.ylabel("Normalized P(r|data)")
plt.savefig("HW_6.png", format = None)
plt.show()


To Think About:

1. Suppose you knew the value of r (say, r = 0.0038). How would you simulate many instances of the Towne family data (e.g., the tables on slides 4 and 5?

2. How would you use your simulation to decide if the assumption of ignoring back mutations (the red note on slide 7) is justified?

3. How would you use your simulation to decide if our decision to trim T2, T11, and T13 from the estimation of r was justified? (This question anticipates several later discussions in the course, but thinking about it now will be a good start.)

Class Activity

I was in a group with Ellen and Rene. Our in class solution can be viewed on Segment 6...The Towne Family Tree

Back to: Eleisha Jackson