# Difference between revisions of "GMM activity"

Line 11: | Line 11: | ||

'''Competition''': Who can create the best visualization of the convergence of the iterative fitting processes in questions 2 and 3? | '''Competition''': Who can create the best visualization of the convergence of the iterative fitting processes in questions 2 and 3? | ||

+ | ==Award winners== | ||

− | === | + | ===Best in show, Best use of animation=== |

+ | [[File:animation2_ts.gif]] | ||

− | Best | + | ===Best use of Matlab=== |

+ | [[File:Daniel_resized.gif]] | ||

− | Best use of | + | ===Best use of Python=== |

+ | Nick's animation will go here | ||

− | Best use of | + | ===Best use of still image=== |

− | + | [[File:convergence_transparency.png|700px]] | |

− | |||

− | |||

− |

## Revision as of 15:58, 2 April 2014

**1.** Draw a sample of 100 points from the uniform distribution **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle U(0,1)}**
. This is your data set. Fit GMM models to your sample (now considered as being on the interval **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle -\infty < x < \infty}**
) with increasing numbers of components **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle K}**
, at least **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle K=1,\ldots,5}**
. Plot your models. Do they get better as **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle K}**
increases? Did you try multiple starting values to find the best (hopefully globally best) solutions for each **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle K}**
?

Competition: Who can create the best visualization of the convergence of the iterative fitting process in question 1?

**2.** Using a pre-existing package (gmdistribution for Matlab, or scikit-learn, which is installed on the class server, for Python), construct mixture models like those shown in Segment slide 8 (for 3 components) and slide 9 (for 8 components). You should plot 2-sigma error ellipses for the individual components, as shown in those slides.

The data is at Twoexondata.txt or on the IPython server.

**3.** In your favorite computer language, write a code for K-means clustering, and cluster the same data using (a) 3 components and (b) 8 components. Don't use anybody's K-means clustering package for this part: code it yourself. Hint: Don't try to do it as limiting case of GMMs, just code it from the definition of K-means clustering, using an E-M iteration. Plot your results by coloring the data points according to which cluster they are in. How sensitive is your answer to the starting guesses?

Competition: Who can create the best visualization of the convergence of the iterative fitting processes in questions 2 and 3?

## Contents

## Award winners

### Best in show, Best use of animation

### Best use of Matlab

### Best use of Python

Nick's animation will go here