Segment 27 Sanmit Narvekar

From Computational Statistics Course Wiki
Jump to navigation Jump to search

Segment 27

To Calculate

The file Media:Mixturevals.txt contains 1000 values, each drawn either with probability Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle c} from the distribution Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \text{Exponential}(\beta)} (for some constant Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \beta} ), or otherwise (with probability Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle 1-c} ) from the distribution Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p(x) = (2/\pi)/(1+x^2),\; x>0} .


1. Write down an expression for the probability of the file's data given some values for the parameters Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \beta} and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle c} .

For each data point, the probability of seeing it is the probability that it came from one of the two distributions. Since it comes from the exponential with probability c, and the other mixture p(x) with probability (1-c), the probability is a weighted mixture of the two. Then to find the probability for the whole dataset, we take a product over the dataset, as follows:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P(data | \beta, c) = \prod_{i=1}^{1000} c * \text{Exponential}(x_i, \beta) + (1-c) * p(x_i)}

Where Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p(x_i)} is the probability of a point under the distribution given above (the exponential is likewise evaluated at x_i).


2. Calculate numerically the maximum likelihood values of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \beta} and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle c} .

To avoid underflow, we instead calculate the log maximum likelihood estimates:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \log P(data | \beta, c) = \sum_{i=1}^{1000} \log(c * \text{Exponential}(x_i, \beta) + (1-c) * p(x_i))}

Here is the code in Matlab. Instead of maximizing the log-likelihood, we will be minimizing the negative log-likelihood:


% Load data
data = load('Mixturevals.txt');

% P distribution
px = @(x) ((2 / pi) ./ (1 + x.^2));

% Mixture distribution
nlmixtureFunction = @(p) (-sum(log( (p(2) .* exppdf(data, 1 ./ p(1))) + ((1-p(2)) .* px(data)))));

% MLE estimates
initGuess = [1 0.5];
params = fminsearch(nlmixtureFunction, initGuess);

beta = params(1)
c = params(2)

This gives:


beta =

    0.6652


c =

    0.3595


3. Estimate numerically the Bayes posterior distribution of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \beta} , marginalizing over Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle c} as a nuisance parameter. (You'll of course have to make some assumption about priors.)

We assume a uniform prior, which gives the following posterior distribution over beta:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P(\beta | data) = \int_0^1 P(\beta, c | data) }

To evaluate this numerically, we densely sample the probabilities for different values of beta and c, and marginalize (i.e. sum) over the c's for a particular beta. Here is the MATLAB code:

mixtureFunction = @(p) (sum(log( (p(2) .* exppdf(data, 1 ./ p(1))) + ((1-p(2)) .* px(data)))));

gridB = (0.01:0.01:1);
gridC = 0:0.01:1;

[meshB, meshC] = meshgrid(gridB, gridC);
probs = zeros(size(meshB));

% Populate all the values -- there is probably a better way to write
% this...
for r=1:size(meshB, 1) 
    for c=1:size(meshB, 2)
        probs(r, c) = mixtureFunction([meshB(r, c), meshC(r, c)]);
    end
end

% Rescale so that the values don't underflow and we get a "nice looking" distribution
probs = exp(probs - max(max(probs)));

% Marginalize out c
betaPosterior = sum(probs);
plot(gridB, betaPosterior)

And here is the resulting posterior distribution:

SanmitSeg27.png

The maximum value occurs at 0.66, which is only different from the MLE estimate due to discretization.

To Think About

1. In problem 3, above, you assumed some definite prior for Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle c} . What if Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle c} is itself drawn (just once for the whole data set) from a distribution Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \text{Beta}(\mu,\nu)} , with unknown hyperparameters Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mu,\nu} . How would you now estimate the Bayes posterior distribution of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \beta} , marginalizing over everything else?