CS395T Computational Statistics: Study Guide for Oral Exams (2011)

The oral exam will randomly select from the following lines, one at a time, and the question will always be the same: "Tell me about...". A good response can be just a few sentences. You can use the whiteboard if you want to write down an equation or graph (quickly!). You can say "next question" if you don't want to answer. This is better than trying to fake it if you really don't know. The exam grade is based both on the quality of responses, and the number of questions gotten through in 20 minutes.

If we don't get through Unit 20 in class, then you are not responsible for it.

It is not as bad as it sounds. Good luck!

Unit 1: Probability and Inference

(Lecture 1, 2 )

what is computational statistics?
probability
calculus of interence
probability axioms
Law of Or-ing, Law of And-ing, Law of Exhaustion
Law of De-Anding (Law of Total Probability)
Bayes Theorem
EME hypotheses
contrast Bayesians and Frequentists
probabilities modified by data
prior probability
posterior probability
evidence factor
Bayes denominator
background information
commutativity and associativity of evidence
the Monty Hall problem
Hempel's paradox

Unit 2: Bayesian Estimation of Parameters

(Lecture 2, 3, 4 )

marginalization
uninteresting parameters in a model
probability density function
Dirac delta function
massed prior
uniform prior
uninformative prior
i.i.d.
Bernoulli trials
sufficient statistic
conjugate prior
beta distribution
variable length short tandem repeat (VLSTR)
binomial distribution
conditional independence
naive Bayes models
improper prior
log-uniform prior
paradigm for Bayesian parameter estimation
statistical model
data trimming

Unit 3: Common Distributions

(Lecture 4 )

measures of central tendency
mean minimizes mean square deviation
median minimizes mean absolute deviation
centered moments
skewness and kurtosis
standard deviation
additivity of mean and variance
semi-invariants
semi-invariants of Gaussian and Poisson
normal (Gaussian) distribution
Student distribution
Cauchy distribution
heavy-tailed distributions
William Sealy Gosset
exponential distribution
lognormal distribution
gamma distribution
chi-square distribution
probability density function (PDF)
cumulative distribution function (CDF)

Unit 4: CLT, Gaussians, MLE

(Lecture 5 )

central limit theorem (CLT)
characteristic function of a distribution
Fourier convolution theorem
characteristic function of a Gaussian
characteristic function of Cauchy distribution
maximum a posteriori (MAP)
maximum likelihood (MLE)
sample mean and variance
estimate parameters of a Gaussian

Unit 5: Random Deviates

(Lecture 6 )

random deviate
U(0,1)
transformation method (random deviates)
rejection method (random deviates)
ratio of uniforms method (random deviates)
squeeze (random deviates)
Leva's algorithm for normal deviates

Unit 6: p-value (tail) tests

(Lecture 6, 7, 8 )

p-value test
null hypothesis
advantage of tail tests over Bayesian methods
distribution of p-values under the null hypothesis
t-values
Saccharomyces cerevisiae
A, C, G, T
multinomial distribution
p-test critical region
one-sided vs. two-sided p-value tests
stopping rule paradox
likelihood ratio test
Bayes odds ratio
Normal approximation to binomial distribution
Ronald Aylmer Fisher
posterior predictive p-value
empirical Bayes

Unit 7: Multiple Hypotheses

(Lecture 8 )

multiple hypothesis correction
Bonferroni correction
false discovery rate (FDR)
Bayesian approach to multiple hypotheses

Unit 8: Multivariate Normal Distributions and Chi-Square

(Lecture 9, 10 )

multivariate normal distribution
covariance matrix
estimate mean, covariance from multivariate data
fitting data by a multivariate normal distribution
slice or projection of a multivariate normal r.v.
Cholesky decomposition
how to generate multivariate normal deviates
how to compute and draw error ellipses
linear correlation matrix
test for correlation
chi-square statistic
chi-square distribution
generalization of chi-square to non-independent data

Unit 9: Weighted Nonlinear Least Squares Fitting

(Lecture 10, 11, 12 )

Normal error model
correlated Normal error model
maximum likelihood estimation of parameters
relation of chi-square to posterior probability
nonlinear least squares fitting
chi-square fitting
accuracy of fitted parameters
basin of convergence
Hessian matrix and relation to covariance matrix
posterior distribution of fitted parameters
calculation of Hessian matrix
how to marginalize over uninteresting parameters
how to condition on known parameter values
covariance matrix of fitted parameters vs. of data
consistency (property of MLE)
asymptotic efficiency (property of MLE)
Fisher Information Matrix
asymptotic normality (property of MLE)
linearized propagation of errors
sampling the posterior distribution (in least squares fitting)
bootstrap resampling
population distribution vs. sample distribution
drawing with and without replacement
bootstrap theorem
honoring (or not) the stated measurement errors
ratio of two normals as example of something

Unit 10: Confidence Intervals, Goodness of Fit

(Lecture 13, 14 )

what chi-square value indicates a good fit?
how to get confidence intervals from chi-square values
precision improves as square root of data quantity
degrees of freedom in chi-square fit
goodness-of-fit p-value (in least squares fitting)
number of degrees of freedom
linear constraints (chi-square)
nonlinear constraints (chi-square)
pseudo-count
mean-square error (relation to chi-square)
what makes a statistic accurately chi-square
normal approximation to chi-square distribution
Poisson as approximation to Binomial
Pearson vs. modified Neyman chi-square
corrected chi-square statistic for Poisson data

Unit 11: Mixture Models and Gaussian Mixture Models

(Lecture 15, 16 )

forward statistical model
mixture model
assignment vector (mixture model)
marginalization in mixture models
hierarchical Bayesian models
Gaussian mixture model
expectation-maximization (EM) methods
probabilistic assignment to components (GMMs)
Expectation or E-step
Maximization or M-step
overall likelihood of a GMM
log-sum-exp formula
starting values for GMM iteration
number of components in a GMM (pros and cons)
K-means clustering

Unit 12: Theory of EM Methods

(Lecture 16, 17 )

Jensen's inequality
concave function (EM methods)
EM theorem (e.g., geometrical interpretation)
missing data (EM methods)
GMM as an EM: what is the missing data, what are the parameters?

Unit 13: Maximum Likelihood Estimation

Unit 12: Theory of EM Methods

(Lecture 17 )

use of Student distributions vs. normal distribution
heavy-tailed models in MLE
model selection
Akaiki information criterion (AIC)
Bayes information criterion (BIC)

Unit 14: Contingency Tables

(Lecture 18, 19, 20, 21 )

contingency table
cross-tabulation
row or column marginals
chi-square or Pearson statistic for contingency table
conditions vs. factors
retrospective analysis or case/control study
hypergeometric distribution
prospective experiment or longitudinal study
nuisance parameter
cross-sectional or snapshot study
multinomial distribution
Fisher's Exact Test
sufficient statistic (re contingency tables)
Wald statistic (re contingency tables)
fragility of 2-tailed Fisher Exact Test
Permutation Test (re contingency tables)
Monte Carlo calculation
ordinal vs. nominal data
advantages of ordinal data (re contingency tables)
false pos vs. false neg in contingency table permutation test
Dirichlet distribution as conjugate to multinomial
how to generate Dirichlet deviates
p or q as nuisance parameters in experimental protocols (contingency tables)

Unit 15: Information Theory

(Lecture 21, 22 )

probable vs. improbabile sequences (re entropy)
Shannon's definition of entropy
bits vs. nats
maximally compressed message (re entropy)
monographic vs. digraphic entropy
conditional entropy
mutual information
side information
Kelly's formula for proportional betting
Kullback-Leibler distance
KL-distance as competitive edge in betting

Unit 16: Markov Chain Monte Carlo

(Lecture 23, 24 )

Bayes denominator (re MCMC)
sampling the posterior distribution (re MCMC)
Markov chain
detailed balance
ergodic sequence
Metropolis-Hastings algorithm
proposal distribution (re MCMC)
Gibbs sampler
waiting time in a Poisson process
good vs. bad proposal generators in MCMC

Unit 16: Wiener Filtering

(Lecture 25, 26 )

bases in function space (re Wiener filtering)
signal vs. noise model (re Wiener filtering)
Wiener or optimal filter
spatial or pixel basis
wavelet basis
DAUB wavelets
quadrature mirror filter
pyramidal algorithm
wavelet plaid (or, continuity of basis wavelets)
IRE test chart lady Jane (just kidding)

Unit 18: Laplace interpolation

(Lecture 26 )

Laplace's equation
mean value theorem for Laplace's equation
internal boundary condition
bi-conjugate gradient method

Unit 19: SVD, PCA, and All That

(Lecture 27, 28 )

data matrix or design matrix
singular value decomposition (SVD)
orthogonal matrix
optimal decomposition into rank 1 matrices
singular values
principal component analysis (PCA)
diagonalizing the covariance matrix
how much total variance is explained by principal components?
dimensional reduction
main effects (re PCA)
eigengenes and eigenvarrays
non-negative matrix factorization

Unit 20 (probably will not get to): Binary Classifiers

binary classifier
Type I vs. II error
TP, FP, FN, TN
confusion matrix
one classifier dominates another
true pos rate, sensitivity, recall
positive predictive value, precision
false discovery rate (re classifiers)
false positive rate
specificity (re classifiers)
negative predictive value (re classifiers)
ROC curve
convex hull of a ROC curve
precision-recall curve