# Segment 27 Sanmit Narvekar

## Segment 27

#### To Calculate

The file Media:Mixturevals.txt contains 1000 values, each drawn either with probability **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle c}**
from the distribution **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \text{Exponential}(\beta)}**
(for some constant **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \beta}**
), or otherwise (with probability **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle 1-c}**
) from the distribution **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p(x) = (2/\pi)/(1+x^2),\; x>0}**
.

1. Write down an expression for the probability of the file's data given some values for the parameters **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \beta}**
and **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle c}**
.

For each data point, the probability of seeing it is the probability that it came from one of the two distributions. Since it comes from the exponential with probability c, and the other mixture p(x) with probability (1-c), the probability is a weighted mixture of the two. Then to find the probability for the whole dataset, we take a product over the dataset, as follows:

**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P(data | \beta, c) = \prod_{i=1}^{1000} c * \text{Exponential}(x_i, \beta) + (1-c) * p(x_i)}**

Where **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p(x_i)}**
is the probability of a point under the distribution given above (the exponential is likewise evaluated at x_i).

2. Calculate numerically the maximum likelihood values of **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \beta}**
and **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle c}**
.

To avoid underflow, we instead calculate the log maximum likelihood estimates:

**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \log P(data | \beta, c) = \sum_{i=1}^{1000} \log(c * \text{Exponential}(x_i, \beta) + (1-c) * p(x_i))}**

Here is the code in Matlab. Instead of maximizing the log-likelihood, we will be minimizing the negative log-likelihood:

% Load data data = load('Mixturevals.txt'); % P distribution px = @(x) ((2 / pi) ./ (1 + x.^2)); % Mixture distribution nlmixtureFunction = @(p) (-sum(log( (p(2) .* exppdf(data, 1 ./ p(1))) + ((1-p(2)) .* px(data))))); % MLE estimates initGuess = [1 0.5]; params = fminsearch(nlmixtureFunction, initGuess); beta = params(1) c = params(2)

This gives:

beta = 0.6652 c = 0.3595

3. Estimate numerically the Bayes posterior distribution of

We assume a uniform prior, which gives the following posterior distribution over beta:

**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P(\beta | data) = \int_0^1 P(\beta, c | data) }**

To evaluate this numerically, we densely sample the probabilities for different values of beta and c, and marginalize (i.e. sum) over the c's for a particular beta. Here is the MATLAB code:

mixtureFunction = @(p) (sum(log( (p(2) .* exppdf(data, 1 ./ p(1))) + ((1-p(2)) .* px(data))))); gridB = (0.01:0.01:1); gridC = 0:0.01:1; [meshB, meshC] = meshgrid(gridB, gridC); probs = zeros(size(meshB)); % Populate all the values -- there is probably a better way to write % this... for r=1:size(meshB, 1) for c=1:size(meshB, 2) probs(r, c) = mixtureFunction([meshB(r, c), meshC(r, c)]); end end % Rescale so that the values don't underflow and we get a "nice looking" distribution probs = exp(probs - max(max(probs))); % Marginalize out c betaPosterior = sum(probs); plot(gridB, betaPosterior)

And here is the resulting posterior distribution:

The maximum value occurs at 0.66, which is only different from the MLE estimate due to discretization.

#### To Think About

1. In problem 3, above, you assumed some definite prior for **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \text{Beta}(\mu,\nu)}**
, with unknown hyperparameters **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mu,\nu}**
. How would you now estimate the Bayes posterior distribution of