Nick Wilson

From Computational Statistics Course Wiki
Jump to navigation Jump to search

My favorite segments and class activities have 1-3 stars (***) -- the more the better -- at the end of their titles.

Many of the plots from my experiments are in the right margin of this page. Click on the images for a larger version or go to the segment for full details and code.


Contents


Segments

Segment 8: The improper prior 1/x is just a limiting case of a (completely proper) Lognormal prior as sigma goes to infinity.
Segment 10: The product of several probability-like values rapidly falls to 0.
Segment 13: Accuracy of the Normal approximation of the Binomial distribution.
Segment 13: If four random variables are (together) multinomially distributed, each separately is binomially distributed.
Segment 19: For a multivariate normal distribution, the quantity , where is a random draw from the multivariate normal, is distributed.
Segment 23: Distribution of a statistic estimated by bootstrap (top) and by drawing samples from the real population (bottom).
Segment 24: The generated t-values-squared are distributed as Chisquare(100).
Segment 27: Posterior distribution of mixture model parameters estimated from data.
Segment 28: 5-component GMM fit to 100 points drawn from U(0,1).
Segment 28: Number of U(0,1) numbers that can be multiplied before underflowing.
Segment 29: K-Means on two exon data.
Segment 29: GMM on two exon data.


Segment 1 - Let's Talk about Probability

Segment 1 - Nick Wilson


Segment 2 - Bayes

Segment 2 - Nick Wilson


Segment 3 - Monty Hall

Segment 3 - Nick Wilson


Segment 4 - The Jailer's Tip

Segment 4 - Nick Wilson


Segment 5 - Bernoulli Trials

Segment 5 - Nick Wilson


Segment 6 - The Towne Family Tree

Segment 6 - Nick Wilson


Segment 7 - Central Tendency and Moments

Segment 7 - Nick Wilson


Segment 8 - Some Standard Distributions (*)

Segment 8 - Nick Wilson


Segment 9 - Characteristic Functions

Segment 9 - Nick Wilson


Segment 10 - The Central Limit Theorem (*)

Segment 10 - Nick Wilson


Segment 11 - Random Deviates

Segment 11 - Nick Wilson


Segment 12 - P-Value Tests

Segment 12 - Nick Wilson


Segment 13 - The Yeast Genome

Segment 13 - Nick Wilson


Segment 14 - Bayesian Criticism of P-Values

Segment 14 - Nick Wilson


Segment 16 - Multiple Hypotheses

Segment 16 - Nick Wilson


Segment 15 - The Towne Family - Again

Segment 15 - Nick Wilson


Segment 17 - The Multivariate Normal Distribution

Segment 17 - Nick Wilson


Segment 18 - The Correlation Matrix

Segment 18 - Nick Wilson


Segment 19 - The Chi Square Statistic

Segment 19 - Nick Wilson


Segment 20 - Nonlinear Least Squares Fitting

Segment 20 - Nick Wilson


Segment 21 - Marginalize or Condition Uninteresting Fitted Parameters

Segment 21 - Nick Wilson


Segment 22 - Uncertainty of Derived Parameters

Segment 22 - Nick Wilson


Segment 23 - Bootstrap Estimation of Uncertainty

Segment 23 - Nick Wilson


Segment 24 - Goodness of Fit

Segment 24 - Nick Wilson


Segment 27 - Mixture Models

Segment 27 - Nick Wilson


Segment 28 - Gaussian Mixture Models in 1-D (**)

Segment 28 - Nick Wilson


Segment 29 - GMMs in N-Dimensions

Segment 29 - Nick Wilson


Segment 30 - Expectation Maximization (EM) Methods

Segment 30 - Nick Wilson


Segment 31 - A Tale of Model Selection

Segment 31 - Nick Wilson


Segment 32 - Contingency Tables: A First Look

Segment 32 - Nick Wilson


Segment 33 - Contingency Table Protocols and Exact Fisher Test (*)

Segment 33 - Nick Wilson

Segment 33: Wald statistic vs. chi-square for every distinct 2 by 2 contingency table containing exactly 14 elements.


Segment 34 - Permutation Tests

Segment 34 - Nick Wilson


Segment 37 - A Few Bits of Information Theory

No problems given.


Segment 38 - Mutual Information

No problems given.


Segment 39 - MCMC and Gibbs Sampling (**)

Segment 39 - Nick Wilson

x1).
Segment 39: Likelihood at each step of the MCMC chain for the "to think about" urn problem.


Segment 40 - Markov Chain Monte Carlo, Example 1 (***)

Segment 40 - Nick Wilson

Turns out I like playing with MCMC. I ran a variety of experiments and made a bunch of colorful plots.

Segment 40: Best-fitting two-student-t model found with MCMC for the 2nd exon data.
Segment 40: Posterior distribution of parameters from MCMC for a 2-student-t model of the 2nd exon lengths.
Segment 40: Scatter plot matrix showing the posterior distribution for all pairs of parameters for a 2-student-t model of the 2nd exon lengths. Each point is colored based on the index of the point along the MCMC chain.
Segment 40: (Left) Scatter plot matrix showing the posterior distribution of nu and height_ratio for the 2-student-t model of the 2nd exon lengths. Each point is colored based on the index of the point along the MCMC chain. (Right) Log likelihood at each point along the chain.


Segment 41 - Markov Chain Monte Carlo, Example 2

Segment 41 - Nick Wilson

Segment 41: Pdf's of the waiting times between every other Poisson event (left), and every Poisson event at half the rate (right).


Segment 47 - Low-Rank Approximation of Data

No problems given.


Segment 48 - Principal Component Analysis (PCA)

Segment 48 - Nick Wilson


In-Class Activities

Many of these activities were written from scratch by myself after class to make sure I had a good understanding of the material.


Segment 2 - Bayes

01-22-14 -- Group 1 -- Class Activities

Knight/Troll/Gnome simulation. Gift box calculations.


Segment 7 - Central Tendency and Moments

02-03-14 -- Group 4 -- Class Activities

Plotting PDFs of distributions with given mean/variance/skewness/kurtosis.


Segment 8 - Some Standard Distributions

02-05-14 -- Group 4 -- Class Activities

Given 1000 values, estimate parameters assuming the data is from a few different distributions.


Segment 10 - The Central Limit Theorem

02-10-14 -- Nick Wilson -- Class Activities

Visualizing how the sum of 12 U(0, 1) variables minus 6 is approximately equal to the normal distribution with mean 0 and variance 1.

Class Activity: Segment 10: Difference between the standard normal distribution and 12 U(0, 1) variables minus 6.


Segment 11 - Random Deviates

Class_Activities -- 02-12-14

Building our own U(0,1) random number generator.


Segment 13 - The Yeast Genome

02-17-14 -- Group 1 -- Class Activities

Finding regions of chromosome 4 that code for proteins.

Class Activity: Segment 13: Histogram of p-values assigned to all of the ORFs.


Segment 16 - Multiple Hypotheses

Team_1_-_Feb_21_Activity

02-21-14 -- Nick Wilson -- Class Activities

P-value practice.

Multiple hypothesis testing on ORFs.


Segment 15 - The Towne Family - Again (*)

02-24-14 -- Group 1 -- Group Quiz

Group quiz. We won!


Segment 20 - Nonlinear Least Squares Fitting

03-17-14 -- Nick Wilson -- Class Activities

Least squares fitting: one dataset, several models.


Segment 21 - Marginalize or Condition Uninteresting Fitted Parameters (*)

03-19-14 -- Nick Wilson -- Class Activities

Find the volcano!

Class Activity: Segment 21: Temperature measurements around the volcano region.
Class Activity: Segment 21: Location of the volcano.


Segment 23 - Bootstrap Estimation of Uncertainty

Class Activity -- 03-24-14

Estimate the uncertainty in a statistic.


Segment 27 - Mixture Models

Class Activity -- 03-28-14

Estimate parameters of a mixture model.


Segment 28 - Gaussian Mixture Models in 1-D

03-31-14 -- Nick Wilson -- Class Activities

Netflix'ish data exploration.

Class Activity: Segment 28: Correlation was calculated between all pairs of movies in the Netflix'ish data and then the movies were clustered and sorted based on the clustering. The correlation plot clearly shows 4 different types of movies.


Segment 29 - GMMs in N-Dimensions (***)

04-02-14 -- Nick Wilson -- Class Activities

Beauty contest: visualize GMM convergence. Won the "Best use of Python" award.

Class Activity: Segment 29: Screenshot of GMM-fitting animation.


Segment 30 - Expectation Maximization (EM) Methods

04-04-14 -- Nick Wilson -- Class Activities

EM coin flip activity from the Nature article.


Segment 32 - Contingency Tables: A First Look

04-09-14 -- Nick Wilson -- Class Activities

Analyzing contingency tables.


Segment 33 - Contingency Table Protocols and Exact Fisher Test

04-11-14 -- Nick Wilson -- Class Activities

Analyzing chess data with contingency tables.


Segment 34 - Permutation Tests (***)

04-14-14 -- Nick Wilson -- Class Activities

Speeding up the permutation test. I implemented two approaches, the one from the lecture (expanding the whole table) and the one that samples from the hypergeometric distribution to permute the table.

Example of generating a permutation by sampling from the hypergeometric distribution (see activity for more details):

Original table:
[[ 4  1 14]
 [10  6 14]
 [51 35 60]]
Original state:
[[  0   0   0  19]
 [  0   0   0  30]
 [  0   0   0 146]
 [ 65  42  88   0]]
Filling in (0,0)
hyper(65, 130, 19)
[[  4   0   0  15]
 [  0   0   0  30]
 [  0   0   0 146]
 [ 61  42  88   0]]
Filling in (0,1)
hyper(42, 88, 15)
[[  4   8   0   7]
 [  0   0   0  30]
 [  0   0   0 146]
 [ 61  34  88   0]]
Filling in (1,0)
hyper(61, 122, 30)
[[  4   8   0   7]
 [  6   0   0  24]
 [  0   0   0 146]
 [ 55  34  88   0]]
Filling in (1,1)
hyper(34, 88, 24)
[[  4   8   0   7]
 [  6   9   0  15]
 [  0   0   0 146]
 [ 55  25  88   0]]
Filling in the first 2 values in the last row.
[[ 4  8  0  7]
 [ 6  9  0 15]
 [55 25  0 66]
 [ 0  0 88  0]]
Filling in the first 2 values in the last column.
[[ 4  8  7  0]
 [ 6  9 15  0]
 [55 25  0 66]
 [ 0  0 66  0]]
Filling in the final cell at the bottom right corner
[[ 4  8  7  0]
 [ 6  9 15  0]
 [55 25 66  0]
 [ 0  0  0  0]]


Segment 41 - Markov Chain Monte Carlo, Example 2 (**)

04-25-14 -- Group -- Class Activities

Urns with weighted balls with MCMC.

Class Activity: Segment 41: Posterior distribution of colored ball weights found with MCMC.