(Rene) Segment 25: Mixture models

From Computational Statistics Course Wiki
Jump to navigation Jump to search

To Calculate

The file Media:Mixturevals.txt contains 1000 values, each drawn either with probability from the distribution (for some constant ), or otherwise (with probability ) from the distribution .

1. Write down an expression for the probability of the file's data given some values for the parameters and .

Let and and let and . Then given , e.g. and , a model for the probability,


2. Calculate numerically the maximum likelihood values of and .


We have computed the maximum likelihood of the parameters and , given uniform priors. This is the Matlab implementation, where, instead we have minimized the negative log likelihood, which corresponds to maximizing the joint probability of the parameters:

% initializing
load Data;
% set probability distribution of both components in the mixture 
p1 = @(x,beta) beta*exp(-beta*x);
p0 = @(x) (x>0).* 2./(pi*(1+x.^2));

% probability of mixture
pmix = @(x,beta,c) p1(x,beta)*c + p0(x)*(1-c);

% model with uniform priors
model = @(b) -sum(log(pmix(data,b(1),b(2))));

% find maximum likelyhood parameters
bguess = [0.5 0.5];
bfitt  = fminsearch(model,bguess);

The results are: and .

3. Estimate numerically the Bayes posterior distribution of , marginalizing over as a nuisance parameter. (You'll of course have to make some assumption about priors.)

Here is the code for computing the pdf over and . Note that in order to avoid under/overflow problems, we take the log of the probability and subtract the log probability of the distributions maxima, computed in (b). After this operation we take the exponential, and normalize to obtain the 2D pdf over and . Subsequently we marginalize over the 2 different parameters to obtain univariate distributions in and . Note that the first figure illustrates how well the two pdf's fitt the data. The rational pdf obviously fits much better.

% compute grid
c = linspace(0,1,50); beta = linspace(0,bfitt(1)+2,100);
n = length(c); m = length(beta);
[B,C] = meshgrid(beta,c);

% computing the probability for each combination of beta and c
maxfitt = model(bfitt);
for i=1:n
    for j=1:m
        b = [B(i,j) C(i,j)];
        logprob(i,j) = model(b)-maxfitt;
prob = exp(logprob);
prob = (1/sum(sum(prob)))*prob;

% marginilizing
pdf_beta = sum(prob,1);
pdf_c = sum(prob,2);

Hist.jpg Surf.jpg

Pdf c.jpg Pdf beta.jpg

To Think About

1. In problem 3, above, you assumed some definite prior for . What if is itself drawn (just once for the whole data set) from a distribution , with unknown hyperparameters . How would you now estimate the Bayes posterior distribution of , marginalizing over everything else?