# Difference between revisions of "GMM activity"

(→Best use of Python) |
(→Best use of Matlab) |
||

Line 19: | Line 19: | ||

[[File:animation2_ts.gif]] | [[File:animation2_ts.gif]] | ||

− | ===Best use of Matlab=== | + | ===Best use of Matlab: Daniel=== |

[[File:Daniel_resized.gif]] | [[File:Daniel_resized.gif]] | ||

## Revision as of 21:03, 2 April 2014

**1.** Draw a sample of 100 points from the uniform distribution **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle U(0,1)}**
. This is your data set. Fit GMM models to your sample (now considered as being on the interval **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle -\infty < x < \infty}**
) with increasing numbers of components **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle K}**
, at least **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle K=1,\ldots,5}**
. Plot your models. Do they get better as **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle K}**
increases? Did you try multiple starting values to find the best (hopefully globally best) solutions for each **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle K}**
?

Competition: Who can create the best visualization of the convergence of the iterative fitting process in question 1?

**2.** Using a pre-existing package (gmdistribution for Matlab, or scikit-learn, which is installed on the class server, for Python), construct mixture models like those shown in Segment slide 8 (for 3 components) and slide 9 (for 8 components). You should plot 2-sigma error ellipses for the individual components, as shown in those slides.

The data is at Twoexondata.txt or on the IPython server.

**3.** In your favorite computer language, write a code for K-means clustering, and cluster the same data using (a) 3 components and (b) 8 components. Don't use anybody's K-means clustering package for this part: code it yourself. Hint: Don't try to do it as limiting case of GMMs, just code it from the definition of K-means clustering, using an E-M iteration. Plot your results by coloring the data points according to which cluster they are in. How sensitive is your answer to the starting guesses?

Competition: Who can create the best visualization of the convergence of the iterative fitting processes in questions 2 and 3?

## Contents

## Honorable mention

## Award winners

### Best in show, Best use of animation

### Best use of Matlab: Daniel

### Best use of Python: Nick

Scroll down to see animations.

{{#widget:Iframe |url=http://nbviewer.ipython.org/github/CS395T/2014/blob/master/Nick%20Wilson%2004-02-14%20Segment%2029%20In%20Class%20Beauty%20Contest.ipynb |width=1000 |height=625 |border=1 }}