# Segment 27 Sanmit Narvekar

Jump to navigation Jump to search

## Segment 27

#### To Calculate

The file Media:Mixturevals.txt contains 1000 values, each drawn either with probability $\displaystyle c$ from the distribution $\displaystyle \text{Exponential}(\beta)$ (for some constant $\displaystyle \beta$ ), or otherwise (with probability $\displaystyle 1-c$ ) from the distribution $\displaystyle p(x) = (2/\pi)/(1+x^2),\; x>0$ .

1. Write down an expression for the probability of the file's data given some values for the parameters $\displaystyle \beta$ and $\displaystyle c$ .

For each data point, the probability of seeing it is the probability that it came from one of the two distributions. Since it comes from the exponential with probability c, and the other mixture p(x) with probability (1-c), the probability is a weighted mixture of the two. Then to find the probability for the whole dataset, we take a product over the dataset, as follows:

$\displaystyle P(data | \beta, c) = \prod_{i=1}^{1000} c * \text{Exponential}(x_i, \beta) + (1-c) * p(x_i)$

Where $\displaystyle p(x_i)$ is the probability of a point under the distribution given above (the exponential is likewise evaluated at x_i).

2. Calculate numerically the maximum likelihood values of $\displaystyle \beta$ and $\displaystyle c$ .

To avoid underflow, we instead calculate the log maximum likelihood estimates:

$\displaystyle \log P(data | \beta, c) = \sum_{i=1}^{1000} \log(c * \text{Exponential}(x_i, \beta) + (1-c) * p(x_i))$

Here is the code in Matlab. Instead of maximizing the log-likelihood, we will be minimizing the negative log-likelihood:


% Load data
data = load('Mixturevals.txt');

% P distribution
px = @(x) ((2 / pi) ./ (1 + x.^2));

% Mixture distribution
nlmixtureFunction = @(p) (-sum(log( (p(2) .* exppdf(data, 1 ./ p(1))) + ((1-p(2)) .* px(data)))));

% MLE estimates
initGuess = [1 0.5];
params = fminsearch(nlmixtureFunction, initGuess);

beta = params(1)
c = params(2)



This gives:


beta =

0.6652

c =

0.3595



3. Estimate numerically the Bayes posterior distribution of $\displaystyle \beta$ , marginalizing over $\displaystyle c$ as a nuisance parameter. (You'll of course have to make some assumption about priors.)

We assume a uniform prior, which gives the following posterior distribution over beta:

$\displaystyle P(\beta | data) = \int_0^1 P(\beta, c | data)$

To evaluate this numerically, we densely sample the probabilities for different values of beta and c, and marginalize (i.e. sum) over the c's for a particular beta. Here is the MATLAB code:

mixtureFunction = @(p) (sum(log( (p(2) .* exppdf(data, 1 ./ p(1))) + ((1-p(2)) .* px(data)))));

gridB = (0.01:0.01:1);
gridC = 0:0.01:1;

[meshB, meshC] = meshgrid(gridB, gridC);
probs = zeros(size(meshB));

% Populate all the values -- there is probably a better way to write
% this...
for r=1:size(meshB, 1)
for c=1:size(meshB, 2)
probs(r, c) = mixtureFunction([meshB(r, c), meshC(r, c)]);
end
end

% Rescale so that the values don't underflow and we get a "nice looking" distribution
probs = exp(probs - max(max(probs)));

% Marginalize out c
betaPosterior = sum(probs);
plot(gridB, betaPosterior)



And here is the resulting posterior distribution:

The maximum value occurs at 0.66, which is only different from the MLE estimate due to discretization.

#### To Think About

1. In problem 3, above, you assumed some definite prior for $\displaystyle c$ . What if $\displaystyle c$ is itself drawn (just once for the whole data set) from a distribution $\displaystyle \text{Beta}(\mu,\nu)$ , with unknown hyperparameters $\displaystyle \mu,\nu$ . How would you now estimate the Bayes posterior distribution of $\displaystyle \beta$ , marginalizing over everything else?