# Segment 6...The Towne Family Tree

**Problems** from Segment 6. The Towne Family Tree

**To Compute**

*1. Write down an explicit expression for what the slides denote as bin(n,N,r).*

The probability of getting events in draws, each with i.i.d. probability is

.

Some expressions that will be useful for question 2 are:

*2. There is a small error on slide 7 that carries through to the first equation on slide 8 and the graph on slide 9. Find the error, fix it, and redo the graph of slide 9. Does it make a big difference? Why or why not?*

The error is that the first Jacob is counted twice in our calculation i.e. that he is skipped as a parent node to Sam and T4 in tree terminology.

Here's the fixed slide (well not completely fixed, do you see the error that *doesn't* carry over to the next slides?):

Re-doing slide 9 by adding a term and changing two terms in the original function, we get the new graph in red (original in blue):

We can see that it's not a terrible difference, because for each probability immediately to the right of .4%, the new graph lies slightly above the old, which means that for each of these probabilities we will have slightly more mutations.

For example before we had a .5% chance of having 93 total mutations in the Towne tree. Now it says the Towne tree has a .5% chance of having about 100 mutations. This makes sense because we are no longer double counting the 0-mutation node twice, which would have pulled the distribution to the left.

Here's the Matlab code:

**To Think About**

*1. Suppose you knew the value of r (say, r = 0.0038). How would you simulate many instances of the Towne family data (e.g., the tables on slides 4 and 5?*

Here's an example of how I would simulate t3 and t4, and the other T's would be similar.

a=rand(1,37)<0.0038; %jacob towne I's DNA

b=rand(2,37)<0.0038; % 2 generations to samuel towne I's DNA

c=rand(6,37)<0.0038; % 6 generations to T3's DNA

%compute cumulative mutations to get T3's differences in DNA

[m,n]=size([a;b;c]);

tree=cumsum([a;b;c]);

t3=tree(m,:)

d=rand(10,37)<0.0038;% 10 generations to T3's DNA

%compute cumulative mutations to get T4's differences in DNA

[m,n]=size([a;d]);

tree=cumsum([a;d]);

t4=tree(m,:)

*2. How would you use your simulation to decide if the assumption of ignoring backmutations (the red note on slide 7) is justified?*

If there are any backmutations, they would show up in t4 or t3 in the above code as 2 since there is a either a 1 or a zero in the 1 by 37 array if there has been a change somewhere along the way. So in the handful of times that I ran this code I never got a 2, but I would need to run it probably a hundred times to how often 2's show up, and then assuming back mutations are as likely as a forward change, I would divide this number by two and this would be my likelihood of backmutations, which would be small if this statement is justified.

*3. How would you use your simulation to decide if our decision to trim T2, T11, and T13 from the estimation of r was justified? (This question anticipates several later discussions in the course, but thinking about it now will be a good start.)*

I would simulate T2, T11, and T13 and measure the variance in the number of changes of each DNA slot over a good number of simulations. If the actual T2, T11, and T13 fall outside of that variance I would trim them.

**Class Activity**

Group 1 with Rene and Eleisha

**Activity checkpoints**

*1.What does a joint uniform prior on w and b look like?*

p(wbd) identically equal to 2 since the constraint is w+b=1 in the unit square of (w,b) space and the integral over this is 1.

*2.Suppose we know that w=0.4, b = 0.3, and d = 0.3. If we watch N = 10 games, what is the probability that W = 3, B = 5, and D = 2?*

*3. For general w, b, d, W, B, D, what is P(W, B, D | w, b, d)?*

*4.Applying Bayes, what is P(w, b, d | W, B, D)? (The Bayes denominator is tricky - if you present us with the integral to evaluate, we will provide the answer.)*

where we use the formulas in 1 and 3.

The denominator is

*5.Here is the real data - chess_outcomes.txt. Each line represents the outcome of one game. Count the outcomes of the first N games and produce a visualization of the joint posterior of the win rates for N = 0, 3, 10, 100, 1000, and 10000.*

We would have used a ln of the formula in 4 to turn the difficult factorials (multiplications) into additions and exponents into coefficients, then exponentiate at the end.

Back to Ellen Le or Segment 6. The Towne Family Tree.