GMM activity

1. Draw a sample of 100 points from the uniform distribution $\displaystyle U(0,1)$ . This is your data set. Fit GMM models to your sample (now considered as being on the interval $\displaystyle -\infty < x < \infty$ ) with increasing numbers of components $\displaystyle K$ , at least $\displaystyle K=1,\ldots,5$ . Plot your models. Do they get better as $\displaystyle K$ increases? Did you try multiple starting values to find the best (hopefully globally best) solutions for each $\displaystyle K$ ?

Competition: Who can create the best visualization of the convergence of the iterative fitting process in question 1?


2. Using a pre-existing package (gmdistribution for Matlab, or scikit-learn, which is installed on the class server, for Python), construct mixture models like those shown in Segment slide 8 (for 3 components) and slide 9 (for 8 components). You should plot 2-sigma error ellipses for the individual components, as shown in those slides.

The data is at Twoexondata.txt or on the IPython server.

3. In your favorite computer language, write a code for K-means clustering, and cluster the same data using (a) 3 components and (b) 8 components. Don't use anybody's K-means clustering package for this part: code it yourself. Hint: Don't try to do it as limiting case of GMMs, just code it from the definition of K-means clustering, using an E-M iteration. Plot your results by coloring the data points according to which cluster they are in. How sensitive is your answer to the starting guesses?

Competition: Who can create the best visualization of the convergence of the iterative fitting processes in questions 2 and 3?


Award winners

Best use of Python: Nick

Scroll down to see animations.

{{#widget:Iframe |url=http://nbviewer.ipython.org/github/CS395T/2014/blob/master/Nick%20Wilson%2004-02-14%20Segment%2029%20In%20Class%20Beauty%20Contest.ipynb |width=1000 |height=625 |border=1 }}