Travis: Segment 28

From Computational Statistics (CSE383M and CS395T)
Jump to navigation Jump to search

To Calculate

1. Draw a sample of 100 points from the uniform distribution <math>U(0,1)</math>. This is your data set. Fit GMM models to your sample (now considered as being on the interval <math>-\infty < x < \infty</math>) with increasing numbers of components <math>K</math>, at least <math>K=1,\ldots,5</math>. Plot your models. Do they get better as <math>K</math> increases? Did you try multiple starting values to find the best (hopefully globally best) solutions for each <math>K</math>?

2. Multiplying a lot of individual likelihoods will often underflow. (a) On average, how many values drawn from <math>U(0,1)</math> can you multiply before the product underflows to zero? (b) What, analytically, is the distribution of the sum of <math>N</math> independent values <math>\log(U)</math>, where <math>U\sim U(0,1)</math>? (c) Is your answer to (a) consistent with your answer to (b)?

To Think About

1. Suppose you want to approximate some analytically known function <math>f(x)</math> (whose integral is finite), as a sum of <math>K</math> Gaussians with different centers and widths. You could pretend that <math>f(x)</math> (or some scaling of it) was a probability distribution, draw <math>N</math> points from it and do the GMM thing to find the approximating Gaussians. Now take the limit <math>N\rightarrow \infty</math>, figure out how sums become integrals, and write down an iterative method for fitting Gaussians to a given <math>f(x)</math>. Does it work? (You can assume that well-defined definite integrals can be done numerically.)