# Nick Wilson

My favorite segments and class activities have 1-3 stars (***) -- the more the better -- at the end of their titles.

Many of the plots from my experiments are in the right margin of this page. Click on the images for a larger version or go to the segment for full details and code.

## Segments

Segment 8: The improper prior 1/x is just a limiting case of a (completely proper) Lognormal prior as sigma goes to infinity.
Segment 10: The product of several probability-like values rapidly falls to 0.
Segment 13: Accuracy of the Normal approximation of the Binomial distribution.
Segment 13: If four random variables are (together) multinomially distributed, each separately is binomially distributed.
Segment 19: For a multivariate normal distribution, the quantity $\displaystyle ({\mathbf x-\mathbf\mu})^T{\mathbf\Sigma}^{-1}({\mathbf x-\mathbf\mu})$ , where $\displaystyle \mathbf x$ is a random draw from the multivariate normal, is $\displaystyle \chi^2$ distributed.
Segment 23: Distribution of a statistic estimated by bootstrap (top) and by drawing samples from the real population (bottom).
Segment 24: The generated t-values-squared are distributed as Chisquare(100).
Segment 27: Posterior distribution of mixture model parameters estimated from data.
Segment 28: 5-component GMM fit to 100 points drawn from U(0,1).
Segment 28: Number of U(0,1) numbers that can be multiplied before underflowing.
Segment 29: K-Means on two exon data.
Segment 29: GMM on two exon data.

### Segment 33 - Contingency Table Protocols and Exact Fisher Test (*)

Segment 33: Wald statistic vs. chi-square for every distinct 2 by 2 contingency table containing exactly 14 elements.

### Segment 37 - A Few Bits of Information Theory

No problems given.

### Segment 38 - Mutual Information

No problems given.

### Segment 39 - MCMC and Gibbs Sampling (**)

x1).
Segment 39: Likelihood at each step of the MCMC chain for the "to think about" urn problem.

### Segment 40 - Markov Chain Monte Carlo, Example 1 (***)

Turns out I like playing with MCMC. I ran a variety of experiments and made a bunch of colorful plots.

Segment 40: Best-fitting two-student-t model found with MCMC for the 2nd exon data.
Segment 40: Posterior distribution of parameters from MCMC for a 2-student-t model of the 2nd exon lengths.
Segment 40: Scatter plot matrix showing the posterior distribution for all pairs of parameters for a 2-student-t model of the 2nd exon lengths. Each point is colored based on the index of the point along the MCMC chain.
Segment 40: (Left) Scatter plot matrix showing the posterior distribution of nu and height_ratio for the 2-student-t model of the 2nd exon lengths. Each point is colored based on the index of the point along the MCMC chain. (Right) Log likelihood at each point along the chain.

### Segment 41 - Markov Chain Monte Carlo, Example 2

Segment 41: Pdf's of the waiting times between every other Poisson event (left), and every Poisson event at half the rate (right).

### Segment 47 - Low-Rank Approximation of Data

No problems given.

## In-Class Activities

Many of these activities were written from scratch by myself after class to make sure I had a good understanding of the material.

### Segment 7 - Central Tendency and Moments

Plotting PDFs of distributions with given mean/variance/skewness/kurtosis.

### Segment 8 - Some Standard Distributions

Given 1000 values, estimate parameters assuming the data is from a few different distributions.

### Segment 10 - The Central Limit Theorem

Visualizing how the sum of 12 U(0, 1) variables minus 6 is approximately equal to the normal distribution with mean 0 and variance 1.

Class Activity: Segment 10: Difference between the standard normal distribution and 12 U(0, 1) variables minus 6.

### Segment 11 - Random Deviates

Class_Activities -- 02-12-14

Building our own U(0,1) random number generator.

### Segment 13 - The Yeast Genome

Finding regions of chromosome 4 that code for proteins.

Class Activity: Segment 13: Histogram of p-values assigned to all of the ORFs.

### Segment 16 - Multiple Hypotheses

P-value practice.

Multiple hypothesis testing on ORFs.

### Segment 15 - The Towne Family - Again (*)

Group quiz. We won!

### Segment 20 - Nonlinear Least Squares Fitting

Least squares fitting: one dataset, several models.

### Segment 21 - Marginalize or Condition Uninteresting Fitted Parameters (*)

Find the volcano!

Class Activity: Segment 21: Temperature measurements around the volcano region.
Class Activity: Segment 21: Location of the volcano.

### Segment 23 - Bootstrap Estimation of Uncertainty

Class Activity -- 03-24-14

Estimate the uncertainty in a statistic.

### Segment 27 - Mixture Models

Class Activity -- 03-28-14

Estimate parameters of a mixture model.

### Segment 28 - Gaussian Mixture Models in 1-D

Netflix'ish data exploration.

Class Activity: Segment 28: Correlation was calculated between all pairs of movies in the Netflix'ish data and then the movies were clustered and sorted based on the clustering. The correlation plot clearly shows 4 different types of movies.

### Segment 29 - GMMs in N-Dimensions (***)

Beauty contest: visualize GMM convergence. Won the "Best use of Python" award.

Class Activity: Segment 29: Screenshot of GMM-fitting animation.

### Segment 30 - Expectation Maximization (EM) Methods

EM coin flip activity from the Nature article.

### Segment 32 - Contingency Tables: A First Look

Analyzing contingency tables.

### Segment 33 - Contingency Table Protocols and Exact Fisher Test

Analyzing chess data with contingency tables.

### Segment 34 - Permutation Tests (***)

Speeding up the permutation test. I implemented two approaches, the one from the lecture (expanding the whole table) and the one that samples from the hypergeometric distribution to permute the table.

Example of generating a permutation by sampling from the hypergeometric distribution (see activity for more details):

Original table:
[[ 4  1 14]
[10  6 14]
[51 35 60]]
Original state:
[[  0   0   0  19]
[  0   0   0  30]
[  0   0   0 146]
[ 65  42  88   0]]
Filling in (0,0)
hyper(65, 130, 19)
[[  4   0   0  15]
[  0   0   0  30]
[  0   0   0 146]
[ 61  42  88   0]]
Filling in (0,1)
hyper(42, 88, 15)
[[  4   8   0   7]
[  0   0   0  30]
[  0   0   0 146]
[ 61  34  88   0]]
Filling in (1,0)
hyper(61, 122, 30)
[[  4   8   0   7]
[  6   0   0  24]
[  0   0   0 146]
[ 55  34  88   0]]
Filling in (1,1)
hyper(34, 88, 24)
[[  4   8   0   7]
[  6   9   0  15]
[  0   0   0 146]
[ 55  25  88   0]]
Filling in the first 2 values in the last row.
[[ 4  8  0  7]
[ 6  9  0 15]
[55 25  0 66]
[ 0  0 88  0]]
Filling in the first 2 values in the last column.
[[ 4  8  7  0]
[ 6  9 15  0]
[55 25  0 66]
[ 0  0 66  0]]
Filling in the final cell at the bottom right corner
[[ 4  8  7  0]
[ 6  9 15  0]
[55 25 66  0]
[ 0  0  0  0]]


### Segment 41 - Markov Chain Monte Carlo, Example 2 (**)

Urns with weighted balls with MCMC.

Class Activity: Segment 41: Posterior distribution of colored ball weights found with MCMC.