# Feb5 team2

#### Class Activity for Segment 8 - Feb. 5

Here are the questions and our answers for the in-class activity. The data and exercise description is linked from Segment_8._Some_Standard_Distributions.

 Read 1000 events (values) from the file events20130204.txt –what does the distribution of events look like? –what is its sample $\displaystyle \mu$ and $\displaystyle \sigma$ ?

We loaded the file into MATLAB into a vector called A. To visualize the distribution, we created a histogram of the data and plotted it:

EDU>> hist(A, 100)


EDU>> mu = mean(A)

mu =

2.2368

EDU>> sigma=sqrt(var(A))

sigma =

2.2364


 What is the probability (density) of the data, if it is Normal  with the observed sample $\displaystyle \mu$ and $\displaystyle \sigma$ ? –you might want to compute log probabilities!

EDU>> np = normpdf(A, mu, sigma);
EDU>> n_density=sum(log(np))

n_density =

-2.2233e+03


 Now suppose the events are from a Student t with unknown $\displaystyle \mu, \sigma, \nu$ –what is the probability (density) of the data for $\displaystyle \mu=2, \sigma=1.2, \nu=5$ ? –how much more likely is this than the Normal hypothesis

We created a function called student that takes as input the data and the three parameters. Evaluating the function with our data and parameters of interest gives:

EDU>> student = @(x, m, s, v)  ((gamma(0.5 .* (v+1)) ./((gamma(0.5 .* v) .* sqrt(v .* pi) .* s))) .* ((1 + (1 ./ v) .* ((x - m)./s).^2).^(-.5 .* (v+1))));
EDU>> sp = student(A, 2, 1.2, 5);
EDU>> s_density=sum(log(sp))

s_density =

-2.2082e+03


We can figure out how much more likely it is that the data came from the Student t distribution versus the Normal distribution by looking at their ratio. Since we are in log space, we take the difference of the densities instead of dividing, and we see that it is much more likely the data fits a Student t distribution with $\displaystyle \mu=2, \sigma=1.2, \nu=5$ .

EDU>> exp(s_density - n_density)

ans =

3.7694e+06

 Find the MAP (maximum a posteriori) estimates for $\displaystyle \mu, \sigma, \nu$

We modified our Student t distribution function so that it can take a vector of the three parameters and load our data set from a file. It is also the negative of the Student t function, so now we can use MATLAB's optimization toolbox to find the minimum of this function (which corresponds to the maximum of the Student t function).

function prob = student3(vec)
m = vec(1);
s = vec(2);
v = vec(3);

x = A;
prob =((gamma(0.5 .* (v+1)) ./((gamma(0.5 .* v) .* sqrt(v .* pi) .* s))) .* ((1 + (1 ./ v) .* ((x - m)./s).^2).^(-.5 .* (v+1))));
prob = -sum(log(prob));


By providing some initial values for the parameters, we can call fminsearch to find the parameters that minimize the function (which is actually maximizing the Student t function).


EDU>> fminsearch(@student3, [2, 1.2, 5])

ans =

2.2209    1.5235    3.5947