Difference between revisions of "GMM activity"

From Computational Statistics Course Wiki
Jump to navigation Jump to search
Line 11: Line 11:
 
  '''Competition''': Who can create the best visualization of the convergence of the iterative fitting processes in questions 2 and 3?
 
  '''Competition''': Who can create the best visualization of the convergence of the iterative fitting processes in questions 2 and 3?
  
 +
==Award winners==
  
====Awards====
+
===Best in show, Best use of animation===
 +
[[File:animation2_ts.gif]]
  
Best in show
+
===Best use of Matlab===
 +
[[File:Daniel_resized.gif]]
  
Best use of still image
+
===Best use of Python===
 +
Nick's animation will go here
  
Best use of animation
+
===Best use of still image===
 
+
[[File:convergence_transparency.png|700px]]
Best use of Matlab
 
 
 
Best use of Python
 

Revision as of 15:58, 2 April 2014

1. Draw a sample of 100 points from the uniform distribution Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle U(0,1)} . This is your data set. Fit GMM models to your sample (now considered as being on the interval Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle -\infty < x < \infty} ) with increasing numbers of components Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle K} , at least Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle K=1,\ldots,5} . Plot your models. Do they get better as Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle K} increases? Did you try multiple starting values to find the best (hopefully globally best) solutions for each Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle K} ?

Competition: Who can create the best visualization of the convergence of the iterative fitting process in question 1?

2. Using a pre-existing package (gmdistribution for Matlab, or scikit-learn, which is installed on the class server, for Python), construct mixture models like those shown in Segment slide 8 (for 3 components) and slide 9 (for 8 components). You should plot 2-sigma error ellipses for the individual components, as shown in those slides.

The data is at Twoexondata.txt or on the IPython server.

3. In your favorite computer language, write a code for K-means clustering, and cluster the same data using (a) 3 components and (b) 8 components. Don't use anybody's K-means clustering package for this part: code it yourself. Hint: Don't try to do it as limiting case of GMMs, just code it from the definition of K-means clustering, using an E-M iteration. Plot your results by coloring the data points according to which cluster they are in. How sensitive is your answer to the starting guesses?

Competition: Who can create the best visualization of the convergence of the iterative fitting processes in questions 2 and 3?

Award winners

Best in show, Best use of animation

Animation2 ts.gif

Best use of Matlab

Daniel resized.gif

Best use of Python

Nick's animation will go here

Best use of still image

Convergence transparency.png