Segment 28. Gaussian Mixture Models in 1-D

From Computational Statistics (CSE383M and CS395T)
Jump to navigation Jump to search

Watch this segment

(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)

{{#widget:Iframe |url= |width=800 |height=625 |border=0 }}

The direct YouTube link is

Links to the slides: PDF file or PowerPoint file


To Calculate

1. Draw a sample of 100 points from the uniform distribution <math>U(0,1)</math>. This is your data set. Fit GMM models to your sample (now considered as being on the interval <math>-\infty < x < \infty</math>) with increasing numbers of components <math>K</math>, at least <math>K=1,\ldots,5</math>. Plot your models. Do they get better as <math>K</math> increases? Did you try multiple starting values to find the best (hopefully globally best) solutions for each <math>K</math>?

2. Multiplying a lot of individual likelihoods will often underflow. (a) On average, how many values drawn from <math>U(0,1)</math> can you multiply before the product underflows to zero? (b) What, analytically, is the distribution of the sum of <math>N</math> independent values <math>\log(U)</math>, where <math>U\sim U(0,1)</math>? (c) Is your answer to (a) consistent with your answer to (b)?

To Think About

1. Suppose you want to approximate some analytically known function <math>f(x)</math> (whose integral is finite), as a sum of <math>K</math> Gaussians with different centers and widths. You could pretend that <math>f(x)</math> (or some scaling of it) was a probability distribution, draw <math>N</math> points from it and do the GMM thing to find the approximating Gaussians. Now take the limit <math>N\rightarrow \infty</math>, figure out how sums become integrals, and write down an iterative method for fitting Gaussians to a given <math>f(x)</math>. Does it work? (You can assume that well-defined definite integrals can be done numerically.)