User:Trettels:Session 6 - 30JAN

From Computational Statistics (CSE383M and CS395T)
Jump to navigation Jump to search

Pre-Class exercises and thought experiments

To Compute


The function bin(n,N,r) is just the binomial distribution for observing <math>n</math> successes out of <math>N</math> trials, given a probability of success <math>r</math>. Therefore it can be re-written as <math>\frac{n!}{k!(n-k)!}p^k(1-p)^{n-k}</math>

Comment:  This is the correct expression, but it should be written in terms of n, N, and r's. User:Tsanders
Reply:    <math>\frac{N!}{N!(N-n)!}r^n(1-r)^{N-n}</math>


Graph of corrected distribution for session 6 pre-class activity

In the video, T3 was calculated as if he had one mutation, when he had three.

Making the correction does have a noticeable impact, as that one individual had more mutations than the rest of the individuals considered combined. It brings the distribution more in line with what we would expect if we didn't pull out the outliers with high mutation rates, though not as high as if we had included them.

comment: Sean I think you might want to take another look at this. Either you're talking about a different Towne or you are counting incorrectly. The issue myself and others found was the number of generations between T4 and the last delta = 0 family member. -Dan

In Class

Done with Travis, Jonathan

Matlab code that, when called, returns a cell array of the condensed Towne family tree, with mutation arrays (family{n}.genome) and mutation counts (family{n}.mutationCount) for each condensed branch. It also returns an array of the mutation counts for each branch of the tree.

function [family] = act6(mutationRate, nGenes)

    p1=newNode('P1',1,'p0',p0.genome, mutationRate);
    p2=newNode('P2',1,'p0',p0.genome, mutationRate);
    p3=newNode('P3',1,'p0',p0.genome, mutationRate);
    p4=newNode('P4',2,'p2',p2.genome, mutationRate);
    p5=newNode('P5',3,'p4',p4.genome, mutationRate);
    T11=newNode('T11',7,'p1' ,p1.genome,mutationRate);
    T8 = newNode('T8',5,'p5' ,p5.genome,mutationRate);
    T6 = newNode('T6',5,'p5' ,p5.genome,mutationRate);
    T3=newNode('T3',6,'p4' ,p4.genome,mutationRate);
    T4=newNode('T4',10,'p2' ,p2.genome,mutationRate);
    T5 = newNode('T5',8,'p3' ,p3.genome,mutationRate);
    T13 = newNode('T13',8,'p3' ,p3.genome,mutationRate);
    family={p0 p1 p2 p3 p4 p5 T2 T11 T8 T6 T3 T4 T5 T13};
    for aa=2:length(family)
    hist(nMut, linspace(0,max(nMut)+1,length(family)/2));
    axis tight;
    ylim([0, length(family)]);

function [node] = newNode(id, nMembers, linkedNodes, genome, mutationRate)
    node = struct('id',id, 'nMembers',nMembers,'linkedNodes',linkedNodes, 'genome',[], 'mutationCount',[]);
    genome = genome+ binornd(nMembers, mutationRate,size(genome));

The results of multiple runs give us a histogram of mutation count frequencies, with a minimum frequency of 0 mutations and a maximum of 11 mutations after 100,000 runs.

for aa=1:1e5
[fam, mut]=act6(0.0038, 37);
muter=[muter mut];

Trettel act6 inclass.png

Note: small update to the code and figure. I stopped counting the original ancestor (P0) in the mutation counts, as it is fixed at 0.

Back to TrettelS Index