# (Rene) Segment 6: The Town Family Tree

## Contents

### Problems

#### To Compute

1. Write down an explicit expression for what the slides denote as bin(k,n,r).

$\displaystyle bin(k,n,r) = \binom{n}{k} r^{k} \cdot (1-r)^{n-k} \quad \text{where} \quad \binom{n}{k} = \frac{n !}{k ! (n-k) !}$

Here n is the total number of events of which there are k successes with probability of success r.

2. There is a small error on slide 7 that carries through to the first equation on slide 8 and the graph on slide 9. Find the error, fix it, and redo the graph of slide 9. Does it make a big difference? Why or why not?

Since Samuel Towne (3rd generation) has zero mutations and we disregard back-mutations we can conclude that Jacob Towne (first generation) had zero mutations. Consequently, we have to correct the following probabilities,

$\displaystyle \text{Jacob Towne (JT):} \quad P(JT) = bin(0,1 \times 37,r)$

$\displaystyle \text{Samuel Towne (ST):} \quad P(ST) = bin(0,2 \times 37,r)$

$\displaystyle \text{John Doe (T4):} \quad P(T4) = bin(1,10 \times 37,r)$

Hence, we have

$\displaystyle P(Data | r) = bin(0,1 \times 37,r) \cdot bin(0,2 \times 37,r) \cdot bin(0,3 \times 37,r) \cdot bin(1,5 \times 37,r) \cdot bin(0,5 \times 37,r) \cdot bin(0,6 \times 37,r) \cdot bin(1,10 \times 37,r) \cdot bin(3,10 \times 37,r)$

Using Bayes rule we have that,

$\displaystyle P(r|Data) \propto P(Data|r) \cdot P(r)$

Using the log uniform prior P(r)=1/r, we can compute the above expression for a range of values of r, with the following result,

Both are very close together. So the conditional dependencies do not carry much weight in this case.

1. Suppose you knew the value of r (say, r = 0.0038). How would you simulate many instances of the Towne family data (e.g., the tables on slides 4 and 5?

2. How would you use your simulation to decide if the assumption of ignoring backmutations (the red note on slide 7) is justified?

3. How would you use your simulation to decide if our decision to trim T2, T11, and T13 from the estimation of r was justified? (This question anticipates several later discussions in the course, but thinking about it now will be a good start.)

#### Class activity: Multinomial parameter estimation

teammates: Ellen and Eleisha

The past two segments have been about Bayesian parameter estimation.

In Segment 5. Bernoulli Trials, we did Bayesian parameter estimation of the rate parameter of a binomial distribution. The setup was: we saw the outcomes of a series of independent trials. There were two possible outcomes to each trial: the jailer says B, or the jailer says C. There was one parameter of interest: x, the probability with which the jailer says B. (What about the probability that the jailer says C?) The goal was to compute the posterior distribution, given data in the form of counts of outcomes observed, of x.

In this exercise, we will generalize this to a multinomial setting. Each trial is now a chess game, to which there are three possible outcomes: white wins, black wins, or the players draw. We have data on the outcomes of 10,000 real chess games. We want to use this data to learn how likely each outcome is. In other words, we assume that due to the structure of the game of chess, there is some inherent probability of each outcome occurring, and we want to figure out what these probabilities are. The parameters of interest are w, the probability that white wins, and b, the probability that black wins. (What about the probability that they draw?) To do this, we will take counts of the outcomes observed and compute the joint posterior distribution of w and b given this data.

Notational conventions
w = probability that white wins
b = probability that black wins
d = probability that the players draw
N = total number of games observed
W = number of white wins in these games
B = number of black wins in these games
D = number of draws in these games
Activity checkpoints
1. What does a joint uniform prior on w and b look like?
2. Suppose we know that w=0.4, b = 0.3, and d = 0.3. If we watch N = 10 games, what is the probability that W = 3, B = 5, and D = 2?
3. For general w, b, d, W, B, D, what is P(W, B, D | w, b, d)?
4. Applying Bayes, what is P(w, b, d | W, B, D)? (The Bayes denominator is tricky - if you present us with the integral to evaluate, we will provide the answer.)
5. Here is the real data - chess_outcomes.txt. Each line represents the outcome of one game. Count the outcomes of the first N games and produce a visualization of the joint posterior of the win rates for N = 0, 3, 10, 100, 1000, and 10000.

If you do this in Python, the data is already on the class server - check out the "Jeff Hussmann 01-31-14 reading a file" notebook to see how to access it.

Some snippets demonstrating library functions for evaluating and visualizing a function on a 2D grid of points can be found here.

Joint uniform prior

We have the following constraint:

$\displaystyle 0 \leq w+b \leq 1$

Consequently, the 2D sample space is of the following form:

Here, d = 1 - w-b. Since we have that $\displaystyle \int_0^1 \int_0^1 p(wbd) dw db = 1$ , the multinomial uniform prior $\displaystyle p(wbd) = 2$ .

Probability P(WBD)
$\displaystyle P(WBD |wbd) = \frac{W+B+D !}{W ! B! D!} \cdot w^W \cdot b^B \cdot d^D$

hence we have that for W = 3, B = 5, D = 2

$\displaystyle P(W = 3, B = 5, D = 2 | w = 0.4, b = 0.3, d = 0.3) = \frac{10 !}{3 ! 5! 2!} \cdot 0.4^3 \cdot 0.3^5 \cdot 0.3^2 = 0.03527205$

Applying Bayes

Next we apply Bayes rule to compute the probability of the parameters given the data:

$\displaystyle P(w, b, d | W, B, D) = \frac{P(W, B, D | w, b, d) \cdot P(w,b,d)}{\int P(W, B, D | w, b, d) \cdot P(w,b,d) dw db}$

$\displaystyle =\frac{w^W \cdot b^B \cdot d^D}{ \int_{0}^{1} \int_{0}^{1-w} w^W \cdot b^B \cdot (1-w-b)^D db dw}$

$\displaystyle = \frac{W! B ! D! }{(W+B+D + 2)!} w^W \cdot b^B \cdot d^D$

The figure below shows the results for a total of N = [3,10,50,100,150] games of chess Hence, playing with white clearly has an advantage. Analyzing N=150 games, we have that the maximum is at w=0.3970 and b = 0.2613.