Feb5 team2

From Computational Statistics Course Wiki
Jump to navigation Jump to search

Class Activity for Segment 8 - Feb. 5

Team 2: Aaron Myers, Rene, Sanmit Narvekar, Todd Swinson


Here are the questions and our answers for the in-class activity. The data and exercise description is linked from Segment_8._Some_Standard_Distributions.

Read 1000 events (values) from the file events20130204.txt

–what does the distribution of events look like?

–what is its sample Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mu} and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sigma} ?


We loaded the file into MATLAB into a vector called A. To visualize the distribution, we created a histogram of the data and plotted it:

EDU>> hist(A, 100)

Feb5-team2-normal.png

EDU>> mu = mean(A)

mu =

    2.2368

EDU>> sigma=sqrt(var(A))

sigma =

    2.2364


What is the probability (density) of the data, if it is Normal  with the observed sample Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mu} and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sigma} ?

–you might want to compute log probabilities!


EDU>> np = normpdf(A, mu, sigma);
EDU>> n_density=sum(log(np))

n_density =

  -2.2233e+03


Now suppose the events are from a Student t with unknown Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mu, \sigma, \nu}

–what is the probability (density) of the data for Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mu=2, \sigma=1.2, \nu=5} ?

–how much more likely is this than the Normal hypothesis


We created a function called student that takes as input the data and the three parameters. Evaluating the function with our data and parameters of interest gives:

EDU>> student = @(x, m, s, v)  ((gamma(0.5 .* (v+1)) ./((gamma(0.5 .* v) .* sqrt(v .* pi) .* s))) .* ((1 + (1 ./ v) .* ((x - m)./s).^2).^(-.5 .* (v+1))));
EDU>> sp = student(A, 2, 1.2, 5);
EDU>> s_density=sum(log(sp))

s_density =

  -2.2082e+03

We can figure out how much more likely it is that the data came from the Student t distribution versus the Normal distribution by looking at their ratio. Since we are in log space, we take the difference of the densities instead of dividing, and we see that it is much more likely the data fits a Student t distribution with Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mu=2, \sigma=1.2, \nu=5} .

EDU>> exp(s_density - n_density)

ans =

   3.7694e+06
Find the MAP (maximum a posteriori) estimates for Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mu, \sigma, \nu}

We modified our Student t distribution function so that it can take a vector of the three parameters and load our data set from a file. It is also the negative of the Student t function, so now we can use MATLAB's optimization toolbox to find the minimum of this function (which corresponds to the maximum of the Student t function).

function prob = student3(vec)
  m = vec(1);
  s = vec(2);
  v = vec(3);
    
  load A.mat
  x = A;
  prob =((gamma(0.5 .* (v+1)) ./((gamma(0.5 .* v) .* sqrt(v .* pi) .* s))) .* ((1 + (1 ./ v) .* ((x - m)./s).^2).^(-.5 .* (v+1))));
  prob = -sum(log(prob));

By providing some initial values for the parameters, we can call fminsearch to find the parameters that minimize the function (which is actually maximizing the Student t function).


EDU>> fminsearch(@student3, [2, 1.2, 5])

ans =

    2.2209    1.5235    3.5947