Difference between revisions of "Eleisha's Segment 6: The Towne Family Tree"

From Computational Statistics Course Wiki
Jump to navigation Jump to search
 
Line 67: Line 67:
  
 
I was in a group with Ellen and Rene. Our in class solution can be viewed on [[Segment 6...The Towne Family Tree]]
 
I was in a group with Ellen and Rene. Our in class solution can be viewed on [[Segment 6...The Towne Family Tree]]
 +
 +
<b> Back to: </b> [[Eleisha Jackson]]

Latest revision as of 11:43, 18 February 2014

To Calculate:

1. Write down an explicit expression for what the slides denote as bin(n,N,r).

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle bin(n, N, r) = \binom{N}{n} r^n(1-r)^{N-n}}

2. There is a small error on slide 7 that carries through to the first equation on slide 8 and the graph on slide 9. Find the error, fix it, and redo the graph of slide 9. Does it make a big difference? Why or why not?

There are two errors on slide seven, only one of which carries over to slide eight. One on T = 5, Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle bin(1, 10 \times 37, r) } should be Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle bin(3, 10 \times 37, r) } . This typo is corrected on slide eight.

The other error is that there should be a Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \Delta 0 = bin (0, 1 \times 37, r) }

This error does carry over onto slide eight. Therefore Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P(data|r) } is wrong as written. It should be:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P(data|r) = bin(0, 1 \times 37, r) bin(0, 2\times 37, r) \ bin(0, 3\times 37, r) bin(1, 5 \times 37, r) bin(0, 5 \times 37, r) bin(0, 6 \times 37, r)} Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \times bin(1, 10 \times 37 , r) bin(3, 10 \times 37, r) }

Here is a graph of the two distributions. The blue is the original distribution and the red is the updated distribution. HW 6.png

It does not make that much of a difference the overall shape and position of the distribution is the sam. Any conclusions drawn about the probability of r given the data will be similar.


Here is some that can be used to plot the original distribution versus the new one:

 
from scipy.stats import binom
import numpy as np
import matplotlib.pyplot as plt

x_points = np.arange(0.0001, 0.02, 0.0005) #Create evenly spaced numbers from 0 to 0.02
old_r_values = []
for r in x_points:
	prior = float(1/r) #Log-uniform prior
	r_old_prob = (binom.pmf(0, 3*37, r)*binom.pmf(0, 3*37, r)*binom.pmf(1, 5*37, r)*binom.pmf(0, 5*37, r)
	*binom.pmf(0, 6*37, r)*binom.pmf(1, 11*37, r)*binom.pmf(3, 10*37, r)*prior) 
	old_r_values.append(r_old_prob) 	#Original formula for P(data|r)*(1/r)

new_r_values = []
for r in x_points:
	prior = float(1/r) #Log-uniform prior
	r_new_prob = (binom.pmf(0, 1*37, r)*binom.pmf(0, 2*37, r)*binom.pmf(0, 3*37, r)*binom.pmf(1, 5*37, r)
	*binom.pmf(0, 5*37, r)*binom.pmf(0, 6*37, r)*binom.pmf(1, 10*37, r)*binom.pmf(3, 10*37, r)*prior) 
	new_r_values.append(r_new_prob) #Updated formula for P(data|r)*(1/r)

#Lines that produce the plot
p1 = plt.plot(x_points,old_r_values, color = 'blue' )
p2 = plt.plot(x_points,new_r_values, color = 'red')
plt.xticks([0.0, 0.005, 0.010, 0.015, 0.02], ["0.0", "0.005", "0.010", "0.015", "0.02"])
plt.yticks([0.0, 0.1, 0.2, 0.3, 0.4, 0.5])
plt.xlabel("r")
plt.ylabel("Normalized P(r|data)")
plt.savefig("HW_6.png", format = None)
plt.show()

To Think About:

1. Suppose you knew the value of r (say, r = 0.0038). How would you simulate many instances of the Towne family data (e.g., the tables on slides 4 and 5?

2. How would you use your simulation to decide if the assumption of ignoring back mutations (the red note on slide 7) is justified?

3. How would you use your simulation to decide if our decision to trim T2, T11, and T13 from the estimation of r was justified? (This question anticipates several later discussions in the course, but thinking about it now will be a good start.)


Class Activity

I was in a group with Ellen and Rene. Our in class solution can be viewed on Segment 6...The Towne Family Tree

Back to: Eleisha Jackson