# 2014 Concepts Study Page

### "Tell me about..."

#### Segment 1: Let's talk about probability

what is computational statistics?

probability

calculus of inference

probability axioms

Law of Or-ing, Law of And-ing, Law of Exhaustion

Law of De-Anding (Law of Total Probability)

#### Segment 2: Bayes

Bayes Theorem

EME hypotheses

contrast Bayesians and Frequentists

probabilities modified by data

prior probability

posterior probability

evidence factor

Bayes denominator

background information

commutativity and associativity of evidence

Hempel's paradox

#### Segment 3: Monty Hall

the Monty Hall problem

#### Segment 4: The Jailer's Tip

marginalization

uninteresting parameters in a model

probability density function

Dirac delta function

massed prior

uniform prior

uninformative prior

#### Segment 5: Bernoulli Trials

i.i.d.

Bernoulli trials

sufficient statistic

conjugate prior

beta distribution

#### Segment 6: The Towne Family Tree

variable length short tandem repeat (VLSTR)

binomial distribution

conditional independence

naive Bayes models

improper prior

log-uniform prior

paradigm for Bayesian parameter estimation

statistical model

data trimming

#### Segment 7: Central Tendency and Moments

measures of central tendency

mean minimizes mean square deviation

median minimizes mean absolute deviation

centered moments

skewness and kurtosis

standard deviation

additivity of mean and variance

semi-invariants

semi-invariants of Gaussian and Poisson

#### Segment 8: Some Standard Distributions

normal (Gaussian) distribution

Student distribution

Cauchy distribution

heavy-tailed distributions

William Sealy Gosset

exponential distribution

lognormal distribution

gamma distribution

chi-square distribution

probability density function (PDF)

cumulative distribution function (CDF)

#### Segment 9: Characteristic Functions

characteristic function of a distribution

Fourier convolution theorem

characteristic function of a Gaussian

characteristic function of Cauchy distribution

#### Segment 10: The Central Limit Theorem

central limit theorem (CLT)

Taylor series around zero can fail

maximum a posteriori (MAP)

maximum likelihood (MLE)

sample mean and variance

estimate parameters of a Gaussian

#### Segment 11: Random Deviates

random deviate

U(0,1)

transformation method (random deviates)

rejection method (random deviates)

ratio of uniforms method (random deviates)

squeeze (random deviates)

Leva's algorithm for normal deviates

#### Segment 12: P-Value Tests

p-value test

null hypothesis

test statistic

advantage of tail tests over Bayesian methods

distribution of p-values under the null hypothesis

t-values

p-test critical region

one-sided vs. two-sided p-value tests

#### Segment 13: The Yeast Genome

Saccharomyces cerevisiae

multinomial distribution

#### Segment 14: Bayesian Criticism of P-Values

stopping rule paradox

Bayes odds ratio

Normal approximation to binomial distribution

Ronald Aylmer Fisher

p=0.05 pros and cons

#### Segment 15: The Towne Family -- Again

posterior predictive p-value

empirical Bayes

#### Segment 16: Multiple Hypotheses

multiple hypothesis correction

Bonferroni correction

false discovery rate (FDR)

Bayesian approach to multiple hypotheses

#### Segment 17: The Multivariate Normal Distribution

multivariate normal distribution

covariance matrix

estimate mean, covariance from multivariate data

fitting data by a multivariate normal distribution

slice or projection of a multivariate normal r.v.

Cholesky decomposition

how to generate multivariate normal deviates

how to compute and draw error ellipses

#### Segment 18: The Correlation Matrix

covariance matrix

linear correlation matrix

test for correlation

#### Segment 19: The Chi-Square Statistic

chi-square statistic

chi-square distribution

transformation law of probabilities

characteristic function of chi-square distribution

generalization of chi-square to non-independent data

#### Segment 20: Nonlinear Least Squares Fitting

Normal error model

correlated Normal error model

maximum likelihood estimation of parameters

relation of chi-square to posterior probability

nonlinear least squares fitting

chi-square fitting

accuracy of fitted parameters

Hessian matrix and relation to covariance matrix

posterior distribution of fitted parameters

calculation of Hessian matrix

#### Segment 21: Marginalize vs. Condition Uninteresting Fitted Parameters

how to marginalize over uninteresting parameters

how to condition on known parameter values

covariance matrix of fitted parameters vs. of data

consistency (property of MLE)

asymptotic efficiency (property of MLE)

Fisher Information Matrix

asymptotic normality (property of MLE)

how to get confidence intervals from chi-square values

#### Segment 22: Uncertainty of Derived Parameters

linearized propagation of errors

sampling the posterior distribution (in least squares fitting)

ratio of two normals as example of something

#### Segment 23: Bootstrap Estimation of Uncertainty

bootstrap resampling

population distribution vs. sample distribution

drawing with and without replacement

bootstrap theorem

compare bootstrap with sampling the posterior

#### Segment 24: Goodness of Fit

precision improves as square root of data quantity

what chi-square value indicates a good fit?

degrees of freedom in chi-square fit

goodness-of-fit p-value (in least squares fitting)

number of degrees of freedom

linear constraints (chi-square)

nonlinear constraints (chi-square)

#### Segment 27: Mixture Models

forward statistical model

mixture model

assignment vector (mixture model)

marginalization in mixture models

hierarchical Bayesian models

#### Segment 28: Gaussian Mixture Models in 1-D

Gaussian mixture model

expectation-maximization (EM) methods

probabilistic assignment to components (GMMs)

Expectation or E-step

Maximization or M-step

overall likelihood of a GMM

log-sum-exp formula

#### Segment 29: GMMs in N-Dimensions

starting values for GMM iteration

number of components in a GMM (pros and cons)

K-means clustering

#### Segment 30: Expectation Maximization (EM) Methods

Jensen's inequality

concave function (EM methods)

EM theorem (e.g., geometrical interpretation)

missing data (EM methods)

GMM as an EM: what is the missing data, what are the parameters?

#### Segment 31: A Tale of Model Selection

use of Student distributions vs. normal distribution

heavy-tailed models in MLE

model selection

Akaiki information criterion (AIC)

Bayes information criterion (BIC)

#### Segment 32: Contingency Tables, A First Look

contingency table

cross-tabulation

row or column marginals

chi-square or Pearson statistic for contingency table

conditions vs. factors

hypergeometric distribution

multinomial distribution

#### Segment 33: Contingency Table Protocols and Fisher Exact Test

retrospective analysis or case/control study

prospective experiment or longitudinal study

nuisance parameter

cross-sectional or snapshot study

example of protocol with all marginals fixed

Fisher's Exact Test

sufficient statistic (re contingency tables)

Wald statistic (re contingency tables)

#### Segment 34: PermutationTests

Permutation Test (re contingency tables)

Monte Carlo calculation

#### Segment 37: A Few Bits of Information Theory

probable vs. improbable sequences (re entropy)

Shannon's definition of entropy

bits vs. nats

maximally compressed message (re entropy)

#### Segment 38: Mutual Information

monographic vs. digraphic entropy

conditional entropy

mutual information

side information

Kelly's formula for proportional betting

Kullback-Leibler distance

KL-distance as competitive edge in betting

#### Segment 39: MCMC and Gibbs Sampling

Bayes denominator (re MCMC)

sampling the posterior distribution (re MCMC)

Markov chain

detailed balance

ergodic sequence

Metropolis-Hastings algorithm

proposal distribution (re MCMC)

Gibbs sampler

burn-in

#### Segments 40 and 41: MCMC Examples

waiting time in a Poisson process

good vs. bad proposal generators in MCMC

#### Segment 47: Low Rank Approximation of Data

data matrix or design matrix

singular value decomposition (SVD)

orthogonal matrix

optimal decomposition into rank 1 matrices

singular values

#### Segment 48: Principal Component Analysis

principal component analysis (PCA)

diagonalizing the covariance matrix

how much total variance is explained by principal components?

dimensional reduction