GMM activity

From Computational Statistics Course Wiki
Revision as of 11:45, 2 April 2014 by Jeff Hussmann (talk | contribs) (Created page with "'''1.''' Draw a sample of 100 points from the uniform distribution <math>U(0,1)</math>. This is your data set. Fit GMM models to your sample (now considered as being on the ...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

1. Draw a sample of 100 points from the uniform distribution Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle U(0,1)} . This is your data set. Fit GMM models to your sample (now considered as being on the interval Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle -\infty < x < \infty} ) with increasing numbers of components Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle K} , at least Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle K=1,\ldots,5} . Plot your models. Do they get better as Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle K} increases? Did you try multiple starting values to find the best (hopefully globally best) solutions for each Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle K} ?

Competition: Who can create the best visualization of the convergence of the iterative fitting process in question 1?

2. Using a pre-existing package (gmdistribution for Matlab, or scikit-learn, which is installed on the class server, for Python), construct mixture models like those shown in Segment slide 8 (for 3 components) and slide 9 (for 8 components). You should plot 2-sigma error ellipses for the individual components, as shown in those slides.

The data is at Twoexondata.txt or on the IPython server.

3. In your favorite computer language, write a code for K-means clustering, and cluster the same data using (a) 3 components and (b) 8 components. Don't use anybody's K-means clustering package for this part: code it yourself. Hint: Don't try to do it as limiting case of GMMs, just code it from the definition of K-means clustering, using an E-M iteration. Plot your results by coloring the data points according to which cluster they are in. How sensitive is your answer to the starting guesses?

Competition: Who can create the best visualization of the convergence of the iterative fitting processes in questions 2 and 3?