http://wpressutexas.net/coursewiki/api.php?action=feedcontributions&user=Jeff+Hussmann&feedformat=atomComputational Statistics Course Wiki - User contributions [en]2020-12-05T15:13:48ZUser contributionsMediaWiki 1.32.0http://wpressutexas.net/coursewiki/index.php?title=Urns_with_MCMC&diff=3039Urns with MCMC2014-04-25T18:30:22Z<p>Jeff Hussmann: </p>
<hr />
<div>An urn that contains five different types of balls that are drawn without replacement. Each time a ball is drawn, the probability of drawing a specific type of ball is proportional to the number of that type remaining and to a type-specific weight.<br />
<br />
The experiment conducted was: draw from the urn until all balls have been drawn and record the order of colors produced.<br />
This experiment was carried out 50 independent times, producing [[media:Urn_data.txt|urn_data.txt]]. <br />
<br />
The initial counts in the urn are:<br />
<br />
* Red - 2<br />
* Blue - 4<br />
* Orange - 7<br />
* Purple - 3<br />
* Green - 5 <br />
<br />
The data consists of 50 lines that look like:<br />
<nowiki>1 2 1 2 3 4 2 3 3 2 4 4 1 2 1 2 2 4 4 0 0</nowiki><br />
which says "For this trial, the draws went blue, orange, blue, orange, purple, ...."<br />
<br />
Your job is to infer the values of the type-specific weights from the data by MCMC.<br />
<br />
Each team must divide into three sub-teams, each of which has a specific task.<br />
<br />
'''Sub-team 1:'''<br />
Produce a function that takes a set of weights and <br />
computes the likelihood of the entire data set given the weights.<br />
<br />
'''Sub-team 2:'''<br />
Produce a function that takes the current set of weights and proposes<br />
a new set, returning the proposal and the corresponding ratio of q's.<br />
<br />
'''Sub-team 3:'''<br />
Produce the logic that calls the two functions that will be supplied by the other sub-teams to carry out the MCMC. This will look like:<br />
For some number of steps,<br />
* propose a new set of proportions given the current ones<br />
* evaluate the likelihood of the data given the new proposal and combine with the q-ratio to produce alpha, and accept the proposal with probability alpha.<br />
* record the new values of the proportions used and the new likelihood. <br />
Plot the trajectories of the proportions and the likelihood over the course of the steps.</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Urns_with_MCMC&diff=3038Urns with MCMC2014-04-25T18:02:02Z<p>Jeff Hussmann: Created page with "An urn that contains five different types of balls that are drawn without replacement. Each time a ball is drawn, the probability of drawing a specific type of ball is proport..."</p>
<hr />
<div>An urn that contains five different types of balls that are drawn without replacement. Each time a ball is drawn, the probability of drawing a specific type of ball is proportional to the number of that type remaining and to a type-specific weight.<br />
<br />
The experiment conducted was: draw from the urn until all balls have been drawn and record the order of colors produced.<br />
This experiment was carried out 50 independent times, producing [[media:Urn_data.txt|urn_data.txt]]. <br />
<br />
The initial counts in the urn are:<br />
<br />
* Red - 2<br />
* Blue - 4<br />
* Orange - 7<br />
* Purple - 3<br />
* Green - 5 <br />
<br />
The data consists of 50 lines that look like:<br />
<nowiki>1 2 1 2 3 4 2 3 3 2 4 4 1 2 1 2 2 4 4 0 0</nowiki><br />
which says "For this trial, the draws went blue, orange, blue, purple, ...."<br />
<br />
Your job is to infer the values of the type-specific weights from the data by MCMC.<br />
<br />
Each team must divide into three sub-teams, each of which has a specific task.<br />
<br />
'''Sub-team 1:'''<br />
Produce a function that takes a set of weights and <br />
computes the likelihood of the entire data set given the weights.<br />
<br />
'''Sub-team 2:'''<br />
Produce a function that takes the current set of weights and proposes<br />
a new set, returning the proposal and the corresponding ratio of q's.<br />
<br />
'''Sub-team 3:'''<br />
Produce the logic that calls the two functions that will be supplied by the other sub-teams to carry out the MCMC. This will look like:<br />
For some number of steps,<br />
* propose a new set of proportions given the current ones<br />
* evaluate the likelihood of the data given the new proposal and combine with the q-ratio to produce alpha, and accept the proposal with probability alpha.<br />
* record the new values of the proportions used and the new likelihood. <br />
Plot the trajectories of the proportions and the likelihood over the course of the steps.</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=File:Urn_data.txt&diff=3037File:Urn data.txt2014-04-25T18:00:38Z<p>Jeff Hussmann: </p>
<hr />
<div></div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Segment_41._Markov_Chain_Monte_Carlo,_Example_2&diff=3036Segment 41. Markov Chain Monte Carlo, Example 22014-04-25T17:50:38Z<p>Jeff Hussmann: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/FnNckBLWJ24&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/FnNckBLWJ24 http://youtu.be/FnNckBLWJ24]<br />
<br />
Links to the slides: [http://slate.ices.utexas.edu/coursefiles/41.MCMCexample2.pdf PDF file] or [http://slate.ices.utexas.edu/coursefiles/41.MCMCexample2.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. Show that the waiting times (times between events) in a Poisson process are Exponentially distributed. (I think we've done this before.)<br />
<br />
2. Plot the pdf's of the waiting times between (a) every other Poisson event, and (b) every Poisson event at half the rate.<br />
<br />
3. Show, using characteristic functions, that the waiting times between every Nth event in a Poisson process is Gamma distributed. (I think we've also done one before, but it is newly relevant in this segment.)<br />
<br />
====To Think About====<br />
1. In slide 5, showing the results of the MCMC, how can we be sure (or, how can we gather quantitative evidence) that there won't be another discrete change in <math>k_1</math> or <math>k_2</math> if we keep running the model longer. That is, how can we measure convergence of the model?<br />
<br />
2. Suppose you have two hypotheses: H1 is that a set of times <math>t_i</math> are being generated as every 26th event from a Poisson process with rate 26. H2 is that they are every 27th event from a Poisson process with rate 27. (The mean rate is thus the same in both cases.) How would you estimate the number <math>N</math> of data points <math>t_i</math> that you need to clearly distinguish between these hypotheses?<br />
<br />
====Class Activity====<br />
[[Urns with MCMC]]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=2014_Concepts_Study_Page&diff=29902014 Concepts Study Page2014-04-23T19:05:56Z<p>Jeff Hussmann: </p>
<hr />
<div>__NOTOC__<br />
==="Tell me about..."===<br />
<br />
====Segment 1: Let's talk about probability====<br />
<br />
what is computational statistics?<br><br />
probability<br><br />
calculus of inference<br><br />
probability axioms<br><br />
Law of Or-ing, Law of And-ing, Law of Exhaustion<br><br />
Law of De-Anding (Law of Total Probability)<br />
<br />
====Segment 2: Bayes====<br />
<br />
Bayes Theorem<br><br />
EME hypotheses<br><br />
contrast Bayesians and Frequentists<br><br />
probabilities modified by data<br><br />
prior probability<br><br />
posterior probability<br><br />
evidence factor<br><br />
Bayes denominator<br><br />
background information<br><br />
commutativity and associativity of evidence<br><br />
Hempel's paradox<br />
<br />
====Segment 3: Monty Hall====<br />
<br />
the Monty Hall problem<br />
<br />
====Segment 4: The Jailer's Tip====<br />
<br />
marginalization<br><br />
uninteresting parameters in a model<br><br />
probability density function<br><br />
Dirac delta function<br><br />
massed prior<br><br />
uniform prior<br><br />
uninformative prior<br />
<br />
====Segment 5: Bernoulli Trials====<br />
<br />
i.i.d.<br><br />
Bernoulli trials<br><br />
sufficient statistic<br><br />
conjugate prior<br><br />
beta distribution<br />
<br />
====Segment 6: The Towne Family Tree====<br />
<br />
variable length short tandem repeat (VLSTR)<br><br />
binomial distribution<br><br />
conditional independence<br><br />
naive Bayes models<br><br />
improper prior<br><br />
log-uniform prior<br><br />
paradigm for Bayesian parameter estimation<br><br />
statistical model<br><br />
data trimming<br />
<br />
====Segment 7: Central Tendency and Moments====<br />
<br />
measures of central tendency<br><br />
mean minimizes mean square deviation<br><br />
median minimizes mean absolute deviation<br><br />
centered moments<br><br />
skewness and kurtosis<br><br />
standard deviation<br><br />
additivity of mean and variance<br><br />
semi-invariants<br><br />
semi-invariants of Gaussian and Poisson<br />
<br />
====Segment 8: Some Standard Distributions====<br />
<br />
normal (Gaussian) distribution<br><br />
Student distribution<br><br />
Cauchy distribution<br><br />
heavy-tailed distributions<br><br />
William Sealy Gosset<br><br />
exponential distribution<br><br />
lognormal distribution<br><br />
gamma distribution<br><br />
chi-square distribution<br><br />
probability density function (PDF)<br><br />
cumulative distribution function (CDF)<br />
<br />
====Segment 9: Characteristic Functions====<br />
<br />
characteristic function of a distribution<br><br />
Fourier convolution theorem<br><br />
characteristic function of a Gaussian<br><br />
characteristic function of Cauchy distribution<br />
<br />
====Segment 10: The Central Limit Theorem====<br />
<br />
central limit theorem (CLT)<br><br />
Taylor series around zero can fail<br><br />
maximum a posteriori (MAP)<br><br />
maximum likelihood (MLE)<br><br />
sample mean and variance<br><br />
estimate parameters of a Gaussian<br />
<br />
====Segment 11: Random Deviates====<br />
<br />
random deviate<br><br />
U(0,1)<br><br />
transformation method (random deviates)<br><br />
rejection method (random deviates)<br><br />
ratio of uniforms method (random deviates)<br><br />
squeeze (random deviates)<br><br />
Leva's algorithm for normal deviates<br />
<br />
====Segment 12: P-Value Tests====<br />
<br />
p-value test<br><br />
null hypothesis<br><br />
test statistic<br><br />
advantage of tail tests over Bayesian methods<br><br />
distribution of p-values under the null hypothesis<br><br />
t-values<br><br />
p-test critical region<br><br />
one-sided vs. two-sided p-value tests<br />
<br />
====Segment 13: The Yeast Genome====<br />
<br />
Saccharomyces cerevisiae<br><br />
multinomial distribution<br />
<br />
====Segment 14: Bayesian Criticism of P-Values====<br />
<br />
stopping rule paradox<br><br />
Bayes odds ratio<br><br />
Normal approximation to binomial distribution<br><br />
Ronald Aylmer Fisher<br><br />
p=0.05 pros and cons<br />
<br />
====Segment 15: The Towne Family -- Again====<br />
<br />
posterior predictive p-value<br><br />
empirical Bayes<br />
<br />
====Segment 16: Multiple Hypotheses====<br />
<br />
multiple hypothesis correction<br><br />
Bonferroni correction<br><br />
false discovery rate (FDR)<br><br />
Bayesian approach to multiple hypotheses<br />
<br />
====Segment 17: The Multivariate Normal Distribution====<br />
<br />
multivariate normal distribution<br><br />
covariance matrix<br><br />
estimate mean, covariance from multivariate data<br><br />
fitting data by a multivariate normal distribution<br><br />
slice or projection of a multivariate normal r.v.<br><br />
Cholesky decomposition<br><br />
how to generate multivariate normal deviates<br><br />
how to compute and draw error ellipses<br />
<br />
<br><br />
====Segment 18: The Correlation Matrix====<br />
<br />
covariance matrix<br><br />
linear correlation matrix<br><br />
test for correlation<br />
<br />
====Segment 19: The Chi-Square Statistic====<br />
<br />
chi-square statistic<br><br />
chi-square distribution<br><br />
transformation law of probabilities<br><br />
characteristic function of chi-square distribution<br><br />
generalization of chi-square to non-independent data<br />
<br />
====Segment 20: Nonlinear Least Squares Fitting====<br />
<br />
Normal error model<br><br />
correlated Normal error model<br><br />
maximum likelihood estimation of parameters<br><br />
relation of chi-square to posterior probability<br><br />
nonlinear least squares fitting<br><br />
chi-square fitting<br><br />
accuracy of fitted parameters<br><br />
Hessian matrix and relation to covariance matrix<br><br />
posterior distribution of fitted parameters<br><br />
calculation of Hessian matrix<br />
<br />
====Segment 21: Marginalize vs. Condition Uninteresting Fitted Parameters====<br />
<br />
how to marginalize over uninteresting parameters<br><br />
how to condition on known parameter values<br><br />
covariance matrix of fitted parameters vs. of data<br><br />
consistency (property of MLE)<br><br />
asymptotic efficiency (property of MLE)<br><br />
Fisher Information Matrix<br><br />
asymptotic normality (property of MLE)<br><br />
how to get confidence intervals from chi-square values<br />
<br />
====Segment 22: Uncertainty of Derived Parameters====<br />
<br />
linearized propagation of errors<br><br />
sampling the posterior distribution (in least squares fitting)<br><br />
ratio of two normals as example of something<br />
<br />
====Segment 23: Bootstrap Estimation of Uncertainty====<br />
<br />
bootstrap resampling<br><br />
population distribution vs. sample distribution<br><br />
drawing with and without replacement<br><br />
bootstrap theorem<br><br />
compare bootstrap with sampling the posterior<br />
<br />
====Segment 24: Goodness of Fit====<br />
<br />
precision improves as square root of data quantity<br><br />
what chi-square value indicates a good fit?<br><br />
degrees of freedom in chi-square fit<br><br />
goodness-of-fit p-value (in least squares fitting)<br><br />
number of degrees of freedom<br><br />
linear constraints (chi-square)<br><br />
nonlinear constraints (chi-square)<br />
<br />
====Segment 27: Mixture Models====<br />
<br />
forward statistical model<br><br />
mixture model<br><br />
assignment vector (mixture model)<br><br />
marginalization in mixture models<br><br />
hierarchical Bayesian models<br />
<br />
====Segment 28: Gaussian Mixture Models in 1-D====<br />
<br />
Gaussian mixture model<br><br />
expectation-maximization (EM) methods<br><br />
probabilistic assignment to components (GMMs)<br><br />
Expectation or E-step<br><br />
Maximization or M-step<br><br />
overall likelihood of a GMM<br><br />
log-sum-exp formula<br />
<br />
====Segment 29: GMMs in N-Dimensions====<br />
<br />
starting values for GMM iteration<br><br />
number of components in a GMM (pros and cons)<br><br />
K-means clustering<br />
<br />
====Segment 30: Expectation Maximization (EM) Methods====<br />
<br />
Jensen's inequality<br><br />
concave function (EM methods)<br><br />
EM theorem (e.g., geometrical interpretation)<br><br />
missing data (EM methods)<br><br />
GMM as an EM: what is the missing data, what are the parameters?<br />
<br />
====Segment 31: A Tale of Model Selection====<br />
<br />
use of Student distributions vs. normal distribution<br><br />
heavy-tailed models in MLE<br><br />
model selection<br><br />
Akaiki information criterion (AIC)<br><br />
Bayes information criterion (BIC)<br />
<br />
====Segment 32: Contingency Tables, A First Look====<br />
<br />
contingency table<br><br />
cross-tabulation<br><br />
row or column marginals<br><br />
chi-square or Pearson statistic for contingency table<br><br />
conditions vs. factors<br><br />
hypergeometric distribution<br><br />
multinomial distribution<br />
<br />
====Segment 33: Contingency Table Protocols and Fisher Exact Test====<br />
<br />
retrospective analysis or case/control study<br><br />
prospective experiment or longitudinal study<br><br />
nuisance parameter<br><br />
cross-sectional or snapshot study<br><br />
example of protocol with all marginals fixed<br><br />
Fisher's Exact Test<br><br />
sufficient statistic (re contingency tables)<br><br />
Wald statistic (re contingency tables)<br />
<br />
====Segment 34: PermutationTests====<br />
<br />
Permutation Test (re contingency tables)<br><br />
Monte Carlo calculation<br />
<br />
====Segment 37: A Few Bits of Information Theory====<br />
<br />
probable vs. improbable sequences (re entropy)<br><br />
Shannon's definition of entropy<br><br />
bits vs. nats<br><br />
maximally compressed message (re entropy)<br />
<br />
====Segment 38: Mutual Information====<br />
<br />
monographic vs. digraphic entropy<br><br />
conditional entropy<br><br />
mutual information<br><br />
side information<br><br />
Kelly's formula for proportional betting<br><br />
Kullback-Leibler distance<br><br />
KL-distance as competitive edge in betting <br />
<br />
====Segment 39: MCMC and Gibbs Sampling====<br />
<br />
Bayes denominator (re MCMC)<br><br />
sampling the posterior distribution (re MCMC)<br><br />
Markov chain<br><br />
detailed balance<br><br />
ergodic sequence<br><br />
Metropolis-Hastings algorithm<br><br />
proposal distribution (re MCMC)<br><br />
Gibbs sampler<br><br />
burn-in<br />
<br />
====Segments 40 and 41: MCMC Examples====<br />
<br />
waiting time in a Poisson process<br><br />
good vs. bad proposal generators in MCMC<br />
<br />
====Segment 47: Low Rank Approximation of Data====<br />
<br />
data matrix or design matrix<br><br />
singular value decomposition (SVD)<br><br />
orthogonal matrix<br><br />
optimal decomposition into rank 1 matrices<br><br />
singular values<br />
<br />
====Segment 48: Principal Component Analysis====<br />
<br />
principal component analysis (PCA)<br><br />
diagonalizing the covariance matrix<br><br />
how much total variance is explained by principal components?<br><br />
dimensional reduction</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Segment_39._MCMC_and_Gibbs_Sampling&diff=2930Segment 39. MCMC and Gibbs Sampling2014-04-22T05:53:31Z<p>Jeff Hussmann: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/4gNpgSPal_8&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/4gNpgSPal_8 http://youtu.be/4gNpgSPal_8]<br />
<br />
Links to the slides: [http://slate.ices.utexas.edu/coursefiles/39.MCMCandGibbsSampling.pdf PDF file] or [http://slate.ices.utexas.edu/coursefiles/39.MCMCandGibbsSampling.ppt PowerPoint file]<br />
<br />
===Problems===<br />
===To Calculate===<br />
1. Suppose the domain of a model are the five integers <math>x = \{1,2,3,4,5\}</math>, and that your proposal distribution is: "When <math>x_1 = 2,3,4</math>, choose with equal probability <math>x_2 = x_1 \pm 1</math>. For <math>x_1=1</math> always choose <math>x_2 =2</math>. For <math>x_1=5</math> always choose <math>x_2 =4</math>. What is the ratio of <math>q</math>'s that goes into the acceptance probability <math>\alpha(x_1,x_2)</math> for all the possible values of <math>x_1</math> and <math>x_2</math>?<br />
<br />
2. Suppose the domain of a model is <math>-\infty < x < \infty</math> and your proposal distribution is (perversely),<br />
<br />
<math>q(x_2|x_1) = \begin{cases}\tfrac{7}{2}\exp[-7(x_2-x_1)],\quad & x_2 \ge x_1 \\ \tfrac{5}{2}\exp[-5(x_1-x_2)],\quad & x_2 < x_1 \end{cases}</math><br />
<br />
Sketch this distribution as a function of <math>x_2-x_1</math>. Then, write down an expression for the ratio of <math>q</math>'s that goes into the acceptance probability <math>\alpha(x_1,x_2)</math>.<br />
<br />
===To Think About===<br />
1. Suppose an urn contains 7 large orange balls, 3 medium purple balls, and 5 small green balls. When balls are drawn randomly, the larger ones are more likely to be drawn, in the proportions large:medium:small = 6:4:3. You want to draw exactly 6 balls, one at a time without replacement. How would you use Gibbs sampling to learn: (a) How often do you get 4 orange plus 2 of the same (non-orange) color? (b) What is the expectation (mean) of the product of the number of purple and number of green balls drawn?<br />
<br />
2. How would you do the same problem computationally but without Gibbs sampling?<br />
<br />
3. How would you do the same problem non-stochastically (e.g., obtain answers to 12 significant figures)? (Hint: This is known as the Wallenius non-central hypergeometric distribution.)<br />
<br />
[Answers: 0.155342 and 1.34699]<br />
<br />
===Class Activity===<br />
There's a story here, about diagnosing rats by which branches they pick in a maze. Bill will explain in class. Unless he thinks up a better story.<br />
<br />
Mathematically, it's another one of these amazing Gibbs sampling examples. Suppose 2 unknown distributions over the digits 0..9, that is <math>p_0,p_1,\ldots,p_9</math> and <math>q_0,q_1,\ldots,q_9</math>, of course with <math>\sum_i p_i = 1</math> and <math>\sum_i q_i = 1</math>. [[media:Gibbs_data.txt|This data file]] has 1000 lines, each with 10 i.i.d. draws of digits, either from the <math>p</math>'s or the <math>q</math>'s -- but, for each line, you don't know which.<br />
<br />
1. Estimate <math>p_0,p_1,\ldots,p_9</math> and <math>q_0,q_1,\ldots,q_9</math> from the data. If you are ambitious, do this by two different methods: First, by Gibbs sampling. Second, by an E-M method. (Although these are conceptually different, my code for them differs by only a few lines.)<br />
<br />
2. Estimate a probability for each line in the data file as to whether it is drawn from the <math>p_i</math>'s (as opposed to the <math>q_i</math>'s.<br />
<br />
3. Plot histograms that show the uncertainties of your Gibbs estimate for the <math>p_i</math>'s. Do your E-M estimates appear to be at the modes of your Gibbs histograms? Should they be?<br />
<br />
[[Media:gibbs_data.txt]]<br />
<br />
[http://nbviewer.ipython.org/github/CS395T/2014/blob/master/Jeff%20Hussmann%2004-21-14%20Gibbs%20sampling_1398145904.ipynb Jeff's solution]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Information_theory_blitz&diff=2839Information theory blitz2014-04-18T20:44:25Z<p>Jeff Hussmann: Created page with "[http://granite.ices.utexas.edu/coursewiki/images/f/fa/Information_theory_blitz.pdf Link to pdf]"</p>
<hr />
<div>[http://granite.ices.utexas.edu/coursewiki/images/f/fa/Information_theory_blitz.pdf Link to pdf]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=File:Information_theory_blitz.pdf&diff=2838File:Information theory blitz.pdf2014-04-18T20:43:58Z<p>Jeff Hussmann: </p>
<hr />
<div></div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Segment_38._Mutual_Information&diff=2837Segment 38. Mutual Information2014-04-18T20:43:10Z<p>Jeff Hussmann: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/huNPh1mkJHM&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/huNPh1mkJHM http://youtu.be/huNPh1mkJHM]<br />
<br />
Links to the slides: [http://slate.ices.utexas.edu/coursefiles/38.MutualInformation.pdf PDF file] or [http://slate.ices.utexas.edu/coursefiles/38.MutualInformation.ppt PowerPoint file]<br />
<br />
====Class activity====<br />
[[Information theory blitz]]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Chess_contingency_tables&diff=2670Chess contingency tables2014-04-11T18:10:50Z<p>Jeff Hussmann: Created page with "The file [http://granite.ices.utexas.edu/coursefiles/game_data.txt here] consists of three pieces of information about each of 100,000 (real) chess games: the first move taken..."</p>
<hr />
<div>The file [http://granite.ices.utexas.edu/coursefiles/game_data.txt here] consists of three pieces of information about each of 100,000 (real) chess games: the first move taken by white, the first move taken by black, and the eventual winner of the game ((W)hite, (B)lack, or (D)raw). Each game is on a different, tab-separated line. For example, the first 10 lines of the file look like:<br />
<br />
<code>d4 Nf6 B<br><br />
d4 Nf6 D<br><br />
c4 Nf6 W<br><br />
e4 g6 B<br><br />
e4 Nf6 W<br><br />
e4 g6 D<br><br />
e4 c6 D<br><br />
e4 c6 B<br><br />
e4 c6 D<br><br />
e4 e6 B<br><br />
</code><br />
<br />
For an explanation of the syntax for describing moves, see [http://en.wikipedia.org/wiki/Algebraic_chess_notation wikipedia].<br />
<br />
Using this data, answer as many of the following questions as you can:<br />
<br />
1. Is there a significant association between the type of piece white chooses to move first and the outcome of the game?<br />
<br />
2. Is there a significant association between the pair of opening moves taken together and the outcome of the game?<br />
<br />
3. If white's first move is (pawn to e4), is there a significant association between black's response and the outcome of the game?<br />
<br />
If time permits, come up with a way to visualize these associations.</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=EM_activity&diff=2446EM activity2014-04-04T17:22:33Z<p>Jeff Hussmann: </p>
<hr />
<div>Today's exercise is about [http://www.nature.com/nbt/journal/v26/n8/pdf/nbt1406.pdf this paper]. <br />
<br />
# Read the paper, up to the end of the '''Mathematical foundations''' section.<br />
# Explain how every term in the formulation of the EM algorithm on the third line of slide 5 of segment 30 appears in the algorithm outlined in Figure 1(b) of the paper. (In other words, do for the algorithm in Figure 1(b) what slide 6 in the segment does for GMMs.) Whenever an equation can be made specific to the exact model in the paper, try to do so. Specifically:<br />
## What are <math>\textbf{z}</math>, <math>\textbf{x}</math> and <math>\theta</math> ?<br />
## What is <math>P(\textbf{x} | \textbf{z} \theta)</math><br />
## What is <math>P(\textbf{z} | \textbf{x} \theta')</math> ?<br />
## What is <math>P(\textbf{z} | \theta)</math> ?<br />
## What is a full statement of the function to be maximized in the M-step?<br />
## The paper never explicitly states the closed-form solution to this M-step maximization, but it can be reverse-engineered from the figure. What is this formula?<br />
# Implement both the algorithm outlined in Figure 1(b) of the paper and the naive algorithm given in the paragraph beginning “One iterative scheme for obtaining completions could work as follows:” in the third column of the first page of the paper. Compare the performance of these two algorithms on the data set in the paper.<br />
# Extra bonus challenge: show that your answer to 2.6 really does maximize your answer to 2.5. (This is actually pretty tricky!)</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=EM_activity&diff=2445EM activity2014-04-04T17:09:42Z<p>Jeff Hussmann: </p>
<hr />
<div>Today's exercise is about [http://www.nature.com/nbt/journal/v26/n8/pdf/nbt1406.pdf this paper]. <br />
<br />
# Read the paper, up to the end of the '''Mathematical foundations''' section.<br />
# Explain how every term in the formulation of the EM algorithm on the third line of slide 5 of segment 30 appears in the algorithm outlined in Figure 1(b) of the paper. (In other words, do for the algorithm in Figure 1(b) what slide 6 in the segment does for GMMs.) Whenever an equation can be made specific to the exact model in the paper, try to do so. Specifically:<br />
## What are <math>\textbf{z}</math>, <math>\textbf{x}</math> and <math>\theta</math> ?<br />
## What is <math>P(\textbf{x} | \textbf{z} \theta)</math><br />
## What is <math>P(\textbf{z} | \textbf{x} \theta')</math> ?<br />
## What is <math>P(\textbf{z} | \theta)</math> ?<br />
## What is a full statement of the function to be maximized in the M-step?<br />
## The paper never explicitly states the closed-form solution to this M-step maximization, but it can be reverse-engineered from the figure. What is this formula?<br />
# Implement both the algorithm outlined in Figure 1(b) of the paper and the naive algorithm given in the paragraph beginning “One iterative scheme for obtaining completions could work as follows:” in the third column of the first page of the paper. Compare the performance of these two algorithms on the data set in the paper.</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=GMM_activity&diff=2377GMM activity2014-04-03T02:04:51Z<p>Jeff Hussmann: </p>
<hr />
<div>'''1.''' Draw a sample of 100 points from the uniform distribution <math>U(0,1)</math>. This is your data set. Fit GMM models to your sample (now considered as being on the interval <math>-\infty < x < \infty</math>) with increasing numbers of components <math>K</math>, at least <math>K=1,\ldots,5</math>. Plot your models. Do they get better as <math>K</math> increases? Did you try multiple starting values to find the best (hopefully globally best) solutions for each <math>K</math>?<br />
<br />
'''Competition''': Who can create the best visualization of the convergence of the iterative fitting process in question 1?<br />
<br />
'''2.''' Using a pre-existing package ([http://www.mathworks.com/help/stats/gmdistributionclass.html gmdistribution] for Matlab, or [http://scikit-learn.org/stable/modules/mixture.html scikit-learn], which is installed on the class server, for Python), construct mixture models like those shown in Segment slide 8 (for 3 components) and slide 9 (for 8 components). You should plot 2-sigma error ellipses for the individual components, as shown in those slides.<br />
<br />
The data is at [http://granite.ices.utexas.edu/coursefiles/Twoexondata.txt Twoexondata.txt] or on the IPython server.<br />
<br />
'''3.''' In your favorite computer language, write a code for K-means clustering, and cluster the same data using (a) 3 components and (b) 8 components. Don't use anybody's K-means clustering package for this part: code it yourself. Hint: Don't try to do it as limiting case of GMMs, just code it from the definition of K-means clustering, using an E-M iteration. Plot your results by coloring the data points according to which cluster they are in. How sensitive is your answer to the starting guesses?<br />
<br />
'''Competition''': Who can create the best visualization of the convergence of the iterative fitting processes in questions 2 and 3?<br />
<br />
==Honorable mention==<br />
[[File:Histfin.png]]<br />
<br />
==Award winners==<br />
<br />
===Best in show, Best use of animation: Todd===<br />
[[File:animation2_ts.gif]]<br />
<br />
===Best use of Matlab: Daniel===<br />
[[File:Daniel_resized.gif]]<br />
<br />
===Best use of Python: Nick===<br />
Scroll down to see animations.<br />
<br />
{{#widget:Iframe<br />
|url=http://nbviewer.ipython.org/github/CS395T/2014/blob/master/Nick%20Wilson%2004-02-14%20Segment%2029%20In%20Class%20Beauty%20Contest.ipynb<br />
|width=1000<br />
|height=625<br />
|border=1<br />
}}<br />
<br />
===Best use of still image: Anonymous===<br />
[[File:convergence_transparency.png|700px]]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=GMM_activity&diff=2376GMM activity2014-04-03T02:03:52Z<p>Jeff Hussmann: /* Best use of Matlab */</p>
<hr />
<div>'''1.''' Draw a sample of 100 points from the uniform distribution <math>U(0,1)</math>. This is your data set. Fit GMM models to your sample (now considered as being on the interval <math>-\infty < x < \infty</math>) with increasing numbers of components <math>K</math>, at least <math>K=1,\ldots,5</math>. Plot your models. Do they get better as <math>K</math> increases? Did you try multiple starting values to find the best (hopefully globally best) solutions for each <math>K</math>?<br />
<br />
'''Competition''': Who can create the best visualization of the convergence of the iterative fitting process in question 1?<br />
<br />
'''2.''' Using a pre-existing package ([http://www.mathworks.com/help/stats/gmdistributionclass.html gmdistribution] for Matlab, or [http://scikit-learn.org/stable/modules/mixture.html scikit-learn], which is installed on the class server, for Python), construct mixture models like those shown in Segment slide 8 (for 3 components) and slide 9 (for 8 components). You should plot 2-sigma error ellipses for the individual components, as shown in those slides.<br />
<br />
The data is at [http://granite.ices.utexas.edu/coursefiles/Twoexondata.txt Twoexondata.txt] or on the IPython server.<br />
<br />
'''3.''' In your favorite computer language, write a code for K-means clustering, and cluster the same data using (a) 3 components and (b) 8 components. Don't use anybody's K-means clustering package for this part: code it yourself. Hint: Don't try to do it as limiting case of GMMs, just code it from the definition of K-means clustering, using an E-M iteration. Plot your results by coloring the data points according to which cluster they are in. How sensitive is your answer to the starting guesses?<br />
<br />
'''Competition''': Who can create the best visualization of the convergence of the iterative fitting processes in questions 2 and 3?<br />
<br />
==Honorable mention==<br />
[[File:Histfin.png]]<br />
<br />
==Award winners==<br />
<br />
===Best in show, Best use of animation===<br />
[[File:animation2_ts.gif]]<br />
<br />
===Best use of Matlab: Daniel===<br />
[[File:Daniel_resized.gif]]<br />
<br />
===Best use of Python: Nick===<br />
Scroll down to see animations.<br />
<br />
{{#widget:Iframe<br />
|url=http://nbviewer.ipython.org/github/CS395T/2014/blob/master/Nick%20Wilson%2004-02-14%20Segment%2029%20In%20Class%20Beauty%20Contest.ipynb<br />
|width=1000<br />
|height=625<br />
|border=1<br />
}}<br />
<br />
===Best use of still image===<br />
[[File:convergence_transparency.png|700px]]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=GMM_activity&diff=2375GMM activity2014-04-03T02:03:37Z<p>Jeff Hussmann: /* Best use of Python */</p>
<hr />
<div>'''1.''' Draw a sample of 100 points from the uniform distribution <math>U(0,1)</math>. This is your data set. Fit GMM models to your sample (now considered as being on the interval <math>-\infty < x < \infty</math>) with increasing numbers of components <math>K</math>, at least <math>K=1,\ldots,5</math>. Plot your models. Do they get better as <math>K</math> increases? Did you try multiple starting values to find the best (hopefully globally best) solutions for each <math>K</math>?<br />
<br />
'''Competition''': Who can create the best visualization of the convergence of the iterative fitting process in question 1?<br />
<br />
'''2.''' Using a pre-existing package ([http://www.mathworks.com/help/stats/gmdistributionclass.html gmdistribution] for Matlab, or [http://scikit-learn.org/stable/modules/mixture.html scikit-learn], which is installed on the class server, for Python), construct mixture models like those shown in Segment slide 8 (for 3 components) and slide 9 (for 8 components). You should plot 2-sigma error ellipses for the individual components, as shown in those slides.<br />
<br />
The data is at [http://granite.ices.utexas.edu/coursefiles/Twoexondata.txt Twoexondata.txt] or on the IPython server.<br />
<br />
'''3.''' In your favorite computer language, write a code for K-means clustering, and cluster the same data using (a) 3 components and (b) 8 components. Don't use anybody's K-means clustering package for this part: code it yourself. Hint: Don't try to do it as limiting case of GMMs, just code it from the definition of K-means clustering, using an E-M iteration. Plot your results by coloring the data points according to which cluster they are in. How sensitive is your answer to the starting guesses?<br />
<br />
'''Competition''': Who can create the best visualization of the convergence of the iterative fitting processes in questions 2 and 3?<br />
<br />
==Honorable mention==<br />
[[File:Histfin.png]]<br />
<br />
==Award winners==<br />
<br />
===Best in show, Best use of animation===<br />
[[File:animation2_ts.gif]]<br />
<br />
===Best use of Matlab===<br />
[[File:Daniel_resized.gif]]<br />
<br />
===Best use of Python: Nick===<br />
Scroll down to see animations.<br />
<br />
{{#widget:Iframe<br />
|url=http://nbviewer.ipython.org/github/CS395T/2014/blob/master/Nick%20Wilson%2004-02-14%20Segment%2029%20In%20Class%20Beauty%20Contest.ipynb<br />
|width=1000<br />
|height=625<br />
|border=1<br />
}}<br />
<br />
===Best use of still image===<br />
[[File:convergence_transparency.png|700px]]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=GMM_activity&diff=2363GMM activity2014-04-02T21:01:45Z<p>Jeff Hussmann: </p>
<hr />
<div>'''1.''' Draw a sample of 100 points from the uniform distribution <math>U(0,1)</math>. This is your data set. Fit GMM models to your sample (now considered as being on the interval <math>-\infty < x < \infty</math>) with increasing numbers of components <math>K</math>, at least <math>K=1,\ldots,5</math>. Plot your models. Do they get better as <math>K</math> increases? Did you try multiple starting values to find the best (hopefully globally best) solutions for each <math>K</math>?<br />
<br />
'''Competition''': Who can create the best visualization of the convergence of the iterative fitting process in question 1?<br />
<br />
'''2.''' Using a pre-existing package ([http://www.mathworks.com/help/stats/gmdistributionclass.html gmdistribution] for Matlab, or [http://scikit-learn.org/stable/modules/mixture.html scikit-learn], which is installed on the class server, for Python), construct mixture models like those shown in Segment slide 8 (for 3 components) and slide 9 (for 8 components). You should plot 2-sigma error ellipses for the individual components, as shown in those slides.<br />
<br />
The data is at [http://granite.ices.utexas.edu/coursefiles/Twoexondata.txt Twoexondata.txt] or on the IPython server.<br />
<br />
'''3.''' In your favorite computer language, write a code for K-means clustering, and cluster the same data using (a) 3 components and (b) 8 components. Don't use anybody's K-means clustering package for this part: code it yourself. Hint: Don't try to do it as limiting case of GMMs, just code it from the definition of K-means clustering, using an E-M iteration. Plot your results by coloring the data points according to which cluster they are in. How sensitive is your answer to the starting guesses?<br />
<br />
'''Competition''': Who can create the best visualization of the convergence of the iterative fitting processes in questions 2 and 3?<br />
<br />
==Honorable mention==<br />
[[File:Histfin.png]]<br />
<br />
==Award winners==<br />
<br />
===Best in show, Best use of animation===<br />
[[File:animation2_ts.gif]]<br />
<br />
===Best use of Matlab===<br />
[[File:Daniel_resized.gif]]<br />
<br />
===Best use of Python===<br />
Nick's animation will go here<br />
<br />
===Best use of still image===<br />
[[File:convergence_transparency.png|700px]]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=File:Histfin.png&diff=2362File:Histfin.png2014-04-02T21:01:22Z<p>Jeff Hussmann: </p>
<hr />
<div></div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=GMM_activity&diff=2359GMM activity2014-04-02T20:58:04Z<p>Jeff Hussmann: </p>
<hr />
<div>'''1.''' Draw a sample of 100 points from the uniform distribution <math>U(0,1)</math>. This is your data set. Fit GMM models to your sample (now considered as being on the interval <math>-\infty < x < \infty</math>) with increasing numbers of components <math>K</math>, at least <math>K=1,\ldots,5</math>. Plot your models. Do they get better as <math>K</math> increases? Did you try multiple starting values to find the best (hopefully globally best) solutions for each <math>K</math>?<br />
<br />
'''Competition''': Who can create the best visualization of the convergence of the iterative fitting process in question 1?<br />
<br />
'''2.''' Using a pre-existing package ([http://www.mathworks.com/help/stats/gmdistributionclass.html gmdistribution] for Matlab, or [http://scikit-learn.org/stable/modules/mixture.html scikit-learn], which is installed on the class server, for Python), construct mixture models like those shown in Segment slide 8 (for 3 components) and slide 9 (for 8 components). You should plot 2-sigma error ellipses for the individual components, as shown in those slides.<br />
<br />
The data is at [http://granite.ices.utexas.edu/coursefiles/Twoexondata.txt Twoexondata.txt] or on the IPython server.<br />
<br />
'''3.''' In your favorite computer language, write a code for K-means clustering, and cluster the same data using (a) 3 components and (b) 8 components. Don't use anybody's K-means clustering package for this part: code it yourself. Hint: Don't try to do it as limiting case of GMMs, just code it from the definition of K-means clustering, using an E-M iteration. Plot your results by coloring the data points according to which cluster they are in. How sensitive is your answer to the starting guesses?<br />
<br />
'''Competition''': Who can create the best visualization of the convergence of the iterative fitting processes in questions 2 and 3?<br />
<br />
==Award winners==<br />
<br />
===Best in show, Best use of animation===<br />
[[File:animation2_ts.gif]]<br />
<br />
===Best use of Matlab===<br />
[[File:Daniel_resized.gif]]<br />
<br />
===Best use of Python===<br />
Nick's animation will go here<br />
<br />
===Best use of still image===<br />
[[File:convergence_transparency.png|700px]]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=File:Convergence_transparency.png&diff=2358File:Convergence transparency.png2014-04-02T20:53:47Z<p>Jeff Hussmann: </p>
<hr />
<div></div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=File:Daniel_resized.gif&diff=2355File:Daniel resized.gif2014-04-02T20:47:17Z<p>Jeff Hussmann: </p>
<hr />
<div></div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=File:Daniel_8_components_loop.gif&diff=2353File:Daniel 8 components loop.gif2014-04-02T20:38:55Z<p>Jeff Hussmann: </p>
<hr />
<div></div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=GMM_activity&diff=2336GMM activity2014-04-02T18:54:55Z<p>Jeff Hussmann: </p>
<hr />
<div>'''1.''' Draw a sample of 100 points from the uniform distribution <math>U(0,1)</math>. This is your data set. Fit GMM models to your sample (now considered as being on the interval <math>-\infty < x < \infty</math>) with increasing numbers of components <math>K</math>, at least <math>K=1,\ldots,5</math>. Plot your models. Do they get better as <math>K</math> increases? Did you try multiple starting values to find the best (hopefully globally best) solutions for each <math>K</math>?<br />
<br />
'''Competition''': Who can create the best visualization of the convergence of the iterative fitting process in question 1?<br />
<br />
'''2.''' Using a pre-existing package ([http://www.mathworks.com/help/stats/gmdistributionclass.html gmdistribution] for Matlab, or [http://scikit-learn.org/stable/modules/mixture.html scikit-learn], which is installed on the class server, for Python), construct mixture models like those shown in Segment slide 8 (for 3 components) and slide 9 (for 8 components). You should plot 2-sigma error ellipses for the individual components, as shown in those slides.<br />
<br />
The data is at [http://granite.ices.utexas.edu/coursefiles/Twoexondata.txt Twoexondata.txt] or on the IPython server.<br />
<br />
'''3.''' In your favorite computer language, write a code for K-means clustering, and cluster the same data using (a) 3 components and (b) 8 components. Don't use anybody's K-means clustering package for this part: code it yourself. Hint: Don't try to do it as limiting case of GMMs, just code it from the definition of K-means clustering, using an E-M iteration. Plot your results by coloring the data points according to which cluster they are in. How sensitive is your answer to the starting guesses?<br />
<br />
'''Competition''': Who can create the best visualization of the convergence of the iterative fitting processes in questions 2 and 3?<br />
<br />
<br />
====Awards====<br />
<br />
Best in show<br />
<br />
Best use of still image<br />
<br />
Best use of animation<br />
<br />
Best use of Matlab<br />
<br />
Best use of Python</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Segment_29._GMMs_in_N-Dimensions&diff=2335Segment 29. GMMs in N-Dimensions2014-04-02T18:27:00Z<p>Jeff Hussmann: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/PH8_qqDTCYY&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/PH8_qqDTCYY http://youtu.be/PH8_qqDTCYY]<br />
<br />
Links to the slides: [http://slate.ices.utexas.edu/coursefiles/29.GMMsInNDimensions.pdf PDF file] or [http://slate.ices.utexas.edu/coursefiles/29.GMMsInNDimensions.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Compute====<br />
The file [[Media:Twoexondata.txt|twoexondata.txt]] contains data similar to that shown in slide 6, as 3000 (x,y) pairs.<br />
<br />
1. In your favorite computer language, write a code for K-means clustering, and cluster the given data using (a) 3 components and (b) 8 components. Don't use anybody's K-means clustering package for this part: Code it yourself. Hint: Don't try to do it as limiting case of GMMs, just code it from the definition of K-means clustering, using an E-M iteration. Plot your results by coloring the data points according to which cluster they are in. How sensitive is your answer to the starting guesses?<br />
<br />
2. In your favorite computer language, and either writing your own GMM program or using any code you can find elsewhere (e.g., Numerical Recipes for C++, or [http://scikit-learn.org/stable/modules/mixture.html scikit-learn], which is installed on the class server, for Python), construct mixture models like those shown in slide 8 (for 3 components) and slide 9 (for 8 components). You should plot 2-sigma error ellipses for the individual components, as shown in those slides.<br />
<br />
====To Think About====<br />
1. The segment (or the previous one) mentioned that the log-likelihood can sometimes get stuck on plateaus, barely increasing, for long periods of time, and then can suddenly increase by a lot. What do you think is happening from iteration to iteration during these times on a plateau?<br />
<br />
<br />
====Class Activity====<br />
[[GMM activity]]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=GMM_activity&diff=2333GMM activity2014-04-02T16:45:01Z<p>Jeff Hussmann: Created page with "'''1.''' Draw a sample of 100 points from the uniform distribution <math>U(0,1)</math>. This is your data set. Fit GMM models to your sample (now considered as being on the ..."</p>
<hr />
<div>'''1.''' Draw a sample of 100 points from the uniform distribution <math>U(0,1)</math>. This is your data set. Fit GMM models to your sample (now considered as being on the interval <math>-\infty < x < \infty</math>) with increasing numbers of components <math>K</math>, at least <math>K=1,\ldots,5</math>. Plot your models. Do they get better as <math>K</math> increases? Did you try multiple starting values to find the best (hopefully globally best) solutions for each <math>K</math>?<br />
<br />
'''Competition''': Who can create the best visualization of the convergence of the iterative fitting process in question 1?<br />
<br />
'''2.''' Using a pre-existing package ([http://www.mathworks.com/help/stats/gmdistributionclass.html gmdistribution] for Matlab, or [http://scikit-learn.org/stable/modules/mixture.html scikit-learn], which is installed on the class server, for Python), construct mixture models like those shown in Segment slide 8 (for 3 components) and slide 9 (for 8 components). You should plot 2-sigma error ellipses for the individual components, as shown in those slides.<br />
<br />
The data is at [http://granite.ices.utexas.edu/coursefiles/Twoexondata.txt Twoexondata.txt] or on the IPython server.<br />
<br />
'''3.''' In your favorite computer language, write a code for K-means clustering, and cluster the same data using (a) 3 components and (b) 8 components. Don't use anybody's K-means clustering package for this part: code it yourself. Hint: Don't try to do it as limiting case of GMMs, just code it from the definition of K-means clustering, using an E-M iteration. Plot your results by coloring the data points according to which cluster they are in. How sensitive is your answer to the starting guesses?<br />
<br />
'''Competition''': Who can create the best visualization of the convergence of the iterative fitting processes in questions 2 and 3?</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Segment_20._Nonlinear_Least_Squares_Fitting&diff=1872Segment 20. Nonlinear Least Squares Fitting2014-03-09T21:03:03Z<p>Jeff Hussmann: Removed outdated link to code</p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/xtBCGPHRcb0&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/xtBCGPHRcb0 http://youtu.be/xtBCGPHRcb0]<br />
<br />
Links to the slides: [http://slate.ices.utexas.edu/coursefiles/20.NonlinearLeastSquares.pdf PDF file] or [http://slate.ices.utexas.edu/coursefiles/20.NonlinearLeastSquares.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. (See lecture slide 3.) For one-dimensional <math>x</math>, the model <math>y(x | \mathbf b)</math> is called "linear" if <math>y(x | \mathbf b) = \sum_k b_k X_k(x)</math>, where <math>X_k(x)</math> are arbitrary known functions of <math>x</math>. Show that minimizing <math>\chi^2</math> produces a set of linear equations (called the "normal equations") for the parameters <math>b_k</math>.<br />
<br />
2. A simple example of a linear model is <math>y(x | \mathbf b) = b_0 + b_1 x</math>, which corresponds to fitting a straight line to data. What are the MLE estimates of <math>b_0</math> and <math>b_1</math> in terms of the data: <math>x_i</math>'s, <math>y_i</math>'s, and <math>\sigma_i</math>'s?<br />
<br />
====To Think About====<br />
1. We often rather casually assume a uniform prior <math>P(\mathbf b)= \text{constant}</math> on the parameters <math>\mathbf b</math>. If the prior is not uniform, then is minimizing <math>\chi^2</math> the right thing to do? If not, then what should you do instead? Can you think of a situation where the difference would be important?<br />
<br />
2. What if, in lecture slide 2, the measurement errors were <math>e_i \sim \text{Cauchy}(0,\sigma_i)</math> instead of <math>e_i \sim N(0,\sigma_i)</math>? How would you find MLE estimates for the parameters <math>\mathbf b</math>?<br />
<br />
===Class Activity===<br />
Here is some data: [[Media:Chisqfitdata.txt]]<br />
<br />
In class we will work on fitting this to some models as explained [[ClassActivity20130325|here]].<br />
<br />
Here are Bill's numerical answers, so that you can see whether you are on the right track (or whether Bill got it wrong!): [[Media:Chisqfitanswers.txt]]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Review_Session_for_Mid-Term_Exam&diff=1732Review Session for Mid-Term Exam2014-03-02T23:40:55Z<p>Jeff Hussmann: </p>
<hr />
<div>[http://granite.ices.utexas.edu/coursewiki/images/e/ed/Generalized_monty.pdf Generalized Monty Hall]<br />
<br />
[http://granite.ices.utexas.edu/coursefiles/blitz_2014.pdf Probability blitz]<br />
<br />
[http://granite.ices.utexas.edu/coursewiki/images/f/f4/Multivar_normal.pdf MVN Exercise]<br />
<br />
<br />
<br />
[http://granite.ices.utexas.edu/coursefiles/assorted_solutions.pdf Solutions to generalized Monty Hall and blitz]<br />
<br />
[http://granite.ices.utexas.edu/coursewiki/images/f/f4/Multivar_normal.pdf Solutions to MVN Exercise]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=File:Multivar_normal.pdf&diff=1731File:Multivar normal.pdf2014-03-02T23:39:11Z<p>Jeff Hussmann: Jeff Hussmann uploaded a new version of &quot;File:Multivar normal.pdf&quot;</p>
<hr />
<div>MVN Exercise</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Review_Session_for_Mid-Term_Exam&diff=1730Review Session for Mid-Term Exam2014-03-02T22:45:01Z<p>Jeff Hussmann: </p>
<hr />
<div>[http://granite.ices.utexas.edu/coursewiki/images/e/ed/Generalized_monty.pdf Generalized Monty Hall]<br />
<br />
[http://granite.ices.utexas.edu/coursefiles/blitz_2014.pdf Probability blitz]<br />
<br />
[http://granite.ices.utexas.edu/coursewiki/images/f/f4/Multivar_normal.pdf MVN Exercise]<br />
<br />
<br />
<br />
[http://granite.ices.utexas.edu/coursefiles/assorted_solutions.pdf Solutions to generalized Monty Hall and blitz]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Main_Page&diff=1701Main Page2014-02-28T19:31:40Z<p>Jeff Hussmann: </p>
<hr />
<div>__NOTOC__<br />
=Statistical and Discrete Methods for Scientific Computing=<br />
<br />
='''NOTE: Friday classes will be held in a different room - CBA 4.348'''=<br />
<br />
===CSE383M (65280) and CS395T (53715), Spring 2014===<br />
Welcome to the course! The instructor is Professor William Press (Bill), and the TA is Jeff Hussmann (Jeff). We meet Mondays and Wednesdays, 1:30 - 3:00 p.m. in CBA 4.344 with Bill, and Fridays, 1:30 - 3:00 p.m. in CBA 4.348 with Jeff. The course is aimed at first or second year graduate students, especially in the CSEM, CS, and ECE programs, but others are welcome. You'll need math at the level of <i>at least</i> 2nd year calculus, plus linear algebra, plus either more continuous math (e.g., CSEM students) or more discrete math (e.g., CS and ECE students). You'll also need to be able to program in some known computer language.<br />
<br />
===Mechanics of the Course===<br />
The last two years, we have tried the experiment of a "flipped" course. This has worked so well that we are doing this again this year. "Flipped" means that the lectures are all on the web as recorded webcasts. You <b>must</b> watch the assigned webcasts <b>before</b> the class for which they are scheduled; maybe watch them more than once if there are parts that you don't easily understand. Then, you will be ready for the active learning that we do in class. The class activities will <b>not</b> "cover the material". Rather, class is supposed to be for "aha moments" and for "fixing" the material in your learning memory. We'll thus do various kinds of "active learning" activities that will test and improve your understanding of the material in the lecture. Such in-class activities, often done in <i>randomized</i> groups of two or three, may include<br />
* group computer programming exercises<br />
* group working of problems<br />
* group writing assignments<br />
* discussing concepts (and communicating ideas back to the whole class)<br />
* "quiz show" style activities<br />
* short surprise quizzes (generally at the beginning of class -- no makeups allowed)<br />
* whatever else we all think of<br />
<br />
===Problems for Each Segment===<br />
Every lecture segment home page has one or two relatively easy "skill" problems. You should work these after watching the segment, before class. (You might be asked to discuss your solution with your small group in class.) Also on the segment's page are one or two concept thought problems. One or another of these will sometimes be the basis of in-class activities, so you might want to think about them before class.<br />
<br />
===Student Wiki Pages===<br />
Every student will have a wiki page (and as many linked pages as you want). You can post your solutions to as many problems as you wish to your wiki page. You can do this either before the relevant class or afterwards. You can also make up, and solve, additional problems. Problems won't be individually graded. However, at the end of the course, the completeness and quality of you wiki page(s) will be a part of your course grade. Your wiki page can include discussion of the thought problems, as well as the skill problems. <br />
<br />
You can also post signed comments on any other student's wiki pages. To the extent that these are generally helpful, they will add credit to your reputation and for your grade.<br />
<br />
[[Student Pages]]<br />
<br />
===Laptops or Tablets===<br />
You <b>must</b> bring your laptop computer or full-sized tablet to every class, so that you can (i) look things up during group discussions or problem sessions and (ii) do in-class programming exercises. You can program in any language you want. For Python, which we recommend as the best choice for this course, you can either install it on your machine, or else use the IPython notebook server described in class. The course will include several lectures of Python workshop by Jeff. <br />
<br />
If you instead want to use MATLAB or Mathematica, that is fine, but please be sure that it is installed on your computer before the first class. (The MATLAB Student Edition is a real bargain.) For C, C++, Java, etc., please be sure that you have a fully working environment for compiling and running small pieces of code.<br />
<br />
===Course Requirements and Grading===<br />
Grades will be based on these factors<br />
* in-class attendance and participation<br />
* an in-class midterm exam<br />
* completeness and quality of your individual wiki page(s)<br />
* relevance and usefulness of your comments on other people's wiki pages (or on the main wiki)<br />
* an individual 30-minute final oral exam<br />
<br />
In previous years there was a term project, but not this year. Your working the problems and posting solutions on your wiki page is this year's substitute.<br />
<br />
[[File:learning_cone.gif|200px|thumb|right|Click image to see a legible version.]]<br />
<br />
===What is Active Learning?===<br />
Much research shows that lecture courses, where students listen passively as the instructor talks, are inefficient ways to learn. What works is so-called [http://en.wikipedia.org/wiki/Active_learning active learning], a broad term that, for us, basically means that class time is too valuable to waste on lectures. (See image at right.)<br />
<br />
The lectures are all recorded as webcasts, but webcasts are not active learning. However, they are a starting point as a "linear" introduction to the material. <br />
<br />
===Feedback===<br />
<br />
What has worked well in class so far? What hasn't worked? How could things be improved? Please leave [[Feedback 2014|feedback]].<br />
<br />
===Resources and Links===<br />
<br />
There is no textbook for the course. A list of recommended supplementary books is [[Recommended books|here]].<br />
<br />
Some resources for learning Python can be found [[Python resources|here]].<br />
<br />
Some MATLAB resources can be found [[MATLAB resources|here]].<br />
<br />
===Webcast Lecture Segments <i>(Opinionated Lessons in Statistics)</i>===<br />
All of the lectures are in the form of webcasts, divided into segments of about 15-30 minutes each (occasionally a bit longer). Each segment, has a wiki page, page links below. You can view the lecture on its wiki page, which also has additional stuff about the segment (including the <b>skill and thought problems</b>, or by clicking directly to YouTube, where they are all on Bill's [http://www.youtube.com/user/opinionatedlessons/videos?view=0&flow=list&sort=da "Opinionated Lessons" channel].<br />
<br />
<center><br />
{| class="wikitable"<br />
|+Watch segments BEFORE class on the indicated dates:<br />
|-<br />
|Mon Jan 13<br />
|<b>First Day of Class</b> (no segment due)<br />
|-<br />
|Wed Jan 15<br />
|[[Segment 1. Let's Talk about Probability]] (or [http://www.youtube.com/watch?v=H5WjVgL6Nh4 YouTube])<br />
|-<br />
|Fri Jan 17<br />
|[[Python Set-up Tutorial and Workshop]] (no segment due)<br />
|-<br />
|Mon Jan 20<br />
|<b>Martin Luther King Day HOLIDAY</b> (no segment due)<br />
|-<br />
|Wed Jan 22<br />
|[[Segment 2. Bayes]] (or [http://www.youtube.com/watch?v=FROAk4AFKHk YouTube])<br />
|-<br />
|Fri Jan 24<br />
|[[Segment 3. Monty Hall]] (or [http://www.youtube.com/watch?v=Rxb8JG8nUFA YouTube])<br />
|-<br />
|Mon Jan 27<br />
|[[Segment 4. The Jailer's Tip]] (or [http://www.youtube.com/watch?v=425D0CjLLLs YouTube])<br />
|-<br />
|Wed Jan 29<br />
|[[Segment 5. Bernoulli Trials]] (or [http://www.youtube.com/watch?v=2T3KP2LleFg YouTube])<br />
|-<br />
|Fri Jan 31<br />
|[[Segment 6. The Towne Family Tree]] (or [http://www.youtube.com/watch?v=y_L2THpv5Jg YouTube])<br />
|-<br />
|Mon Feb 3<br />
|[[Segment 7. Central Tendency and Moments]] (or [http://www.youtube.com/watch?v=ZWOmsKWQ7Fw YouTube])<br />
|-<br />
|Wed Feb 5<br />
|[[Segment 8. Some Standard Distributions]] (or [http://www.youtube.com/watch?v=EDYDC7iNGTg YouTube])<br />
|-<br />
|Fri Feb 7<br />
|[[Segment 9. Characteristic Functions]] (or [http://www.youtube.com/watch?v=NJL-BX6HuxY YouTube])<br />
|-<br />
|Mon Feb 10<br />
|[[Segment 10. The Central Limit Theorem]] (or [http://www.youtube.com/watch?v=IpuYGsKplSw YouTube])<br />
|-<br />
|Wed Feb 12<br />
|[[Segment 11. Random Deviates]] (or [http://www.youtube.com/watch?v=4r1GlyisB8E YouTube])<br />
|-<br />
|Fri Feb 14<br />
|[[Segment 12. P-Value Tests]] (or [http://www.youtube.com/watch?v=2Ul7TI0B5ek YouTube])<br />
|-<br />
|Mon Feb 17<br />
|[[Segment 13. The Yeast Genome]] (or [http://www.youtube.com/watch?v=QSgUX-Do8Tc YouTube])<br />
|-<br />
|Wed Feb 19<br />
|[[Segment 14. Bayesian Criticism of P-Values]] (or [http://www.youtube.com/watch?v=IKV6Pn18C7o YouTube])<br />
|-<br />
|Fri Feb 21<br />
|[[Segment 16. Multiple Hypotheses]] (or [http://www.youtube.com/watch?v=w6AjduOEN2k YouTube]) [note order!]<br />
|-<br />
|Mon Feb 24<br />
|[[Segment 15. The Towne Family - Again]] (or [http://www.youtube.com/watch?v=Y-i0CN15X-M YouTube]) [note order!]<br />
|-<br />
|Wed Feb 26<br />
|[[Segment 17. The Multivariate Normal Distribution]] (or [http://www.youtube.com/watch?v=t7Z1a_BOkN4 YouTube])<br />
|-<br />
|Fri Feb 28<br />
|[[Review Session for Mid-Term Exam]] (no new segment due)<br />
|-<br />
|}<br />
<br />
<b>Monday, March 3. MIDTERM EXAM</b><br />
<br />
{| class="wikitable"<br />
|Wed Mar 5<br />
|[[Segment 18. The Correlation Matrix]] (or [http://www.youtube.com/watch?v=aW5q_P0it9E YouTube])<br />
|-<br />
|Fri Mar 7<br />
|[[Segment 19. The Chi Square Statistic]] (or [http://www.youtube.com/watch?v=87EMhmPkOhk YouTube])<br />
|-<br />
|}<br />
<br />
<b>Monday, March 10 through Friday, March 14: SPRING BREAK</b><br />
<br />
{| class="wikitable"<br />
|+Watch segments BEFORE class on the indicated dates:<br />
|Mon Mar 17<br />
|[[Segment 20. Nonlinear Least Squares Fitting]] (or [http://www.youtube.com/watch?v=xtBCGPHRcb0 YouTube])<br />
|-<br />
|Wed Mar 19<br />
|[[Segment 21. Marginalize or Condition Uninteresting Fitted Parameters]] (or [http://www.youtube.com/watch?v=yxZUS_BpEZk YouTube])<br />
|-<br />
|Fri Mar 21<br />
|[[Segment 22. Uncertainty of Derived Parameters]] (or [http://www.youtube.com/watch?v=ZoD3_rov--w YouTube])<br />
|-<br />
|Mon Mar 24<br />
|[[Segment 23. Bootstrap Estimation of Uncertainty]] (or [http://www.youtube.com/watch?v=1OC9ul-1PVg YouTube])<br />
|-<br />
|Wed Mar 26<br />
|[[Segment 24. Goodness of Fit]] (or [http://www.youtube.com/watch?v=EJleSVf0Z-U YouTube])<br />
|-<br />
|Fri Mar 28<br />
|[[Segment 27. Mixture Models]] (or [http://www.youtube.com/watch?v=9pWnZcpYh44 YouTube])<br />
|-<br />
|Mon Mar 31<br />
|[[Segment 28. Gaussian Mixture Models in 1-D]] (or [http://www.youtube.com/watch?v=n7u_tq0I6jM YouTube])<br />
|-<br />
|Wed Apr 2<br />
|[[Segment 29. GMMs in N-Dimensions]] (or [http://www.youtube.com/watch?v=PH8_qqDTCYY YouTube])<br />
|-<br />
|Fri Apr 4<br />
|[[Segment 30. Expectation Maximization (EM) Methods]] (or [http://www.youtube.com/watch?v=StQOzRqTNsw YouTube])<br />
|-<br />
|Mon Apr 7<br />
|[[Segment 31. A Tale of Model Selection]] (or [http://www.youtube.com/watch?v=_G1gzqQzbuM YouTube])<br />
|-<br />
|Wed Apr 9<br />
|[[Segment 32. Contingency Tables: A First Look]] (or [http://www.youtube.com/watch?v=NvCdN2RFufY YouTube])<br />
|-<br />
|Fri Apr 11<br />
|[[Segment 33. Contingency Table Protocols and Exact Fisher Test]] (or [http://www.youtube.com/watch?v=9Qrkw5UfAmQ You Tube])<br />
|-<br />
|Mon Apr 14<br />
|[[Segment 34. Permutation Tests]] (or [http://www.youtube.com/watch?v=_4BUS1NGNHA YouTube])<br />
|-<br />
|Wed Apr 16<br />
|[[Segment 37. A Few Bits of Information Theory]] (or [http://www.youtube.com/watch?v=ktzYOLDN3u4 YouTube])<br />
|-<br />
|Fri Apr 18<br />
|[[Segment 38. Mutual Information]] (or [http://www.youtube.com/watch?v=huNPh1mkJHM YouTube])<br />
|-<br />
|Mon Apr 21<br />
|[[Segment 39. MCMC and Gibbs Sampling ]] (or [http://www.youtube.com/watch?v=4gNpgSPal_8 YouTube])<br />
|-<br />
|Wed Apr 23<br />
|[[Segment 40. Markov Chain Monte Carlo, Example 1 ]] (or [http://www.youtube.com/watch?v=nSKZ02ZWzsY YouTube])<br />
|-<br />
|Fri Apr 25<br />
|[[Segment 41. Markov Chain Monte Carlo, Example 2 ]] (or [http://www.youtube.com/watch?v=FnNckBLWJ24 YouTube])<br />
|-<br />
|Mon Apr 28<br />
|[[Segment 47. Low-Rank Approximation of Data ]] (or [http://www.youtube.com/watch?v=M0gsHNS_5FE YouTube])<br><br />
|-<br />
|Wed Apr 30<br />
|[[Segment 48. Principal Component Analysis (PCA)]] (or [http://www.youtube.com/watch?v=frWqIUpIxLg YouTube])<br />
|-<br />
|Fri May 2<br />
| <b>Review Session for Oral Exams</b><br />
|}<br />
<br />
<b>Monday, May 5 and Tuesday, May 6: ORAL FINAL EXAMS</b><br />
<br />
</center><br />
<br />
===Extra Credit Segments (segment number indicates intended sequence)===<br />
[[Segment 25. Fitting Models to Counts]] (or [http://www.youtube.com/watch?v=YXaq2PVCGZQ YouTube])<br><br />
[[Segment 26. The Poisson Count Pitfall]] (or [http://www.youtube.com/watch?v=rPO3N5GI-3I YouTube])<br><br />
[[Segment 35. Ordinal vs. Nominal Contingency Tables]] (or [http://www.youtube.com/watch?v=fYUbj78aguk YouTube])<br><br />
[[Segment 36. Contingency Tables Have Nuisance Parameters]] (or [http://www.youtube.com/watch?v=bHK79WKOX-Y YouTube])<br><br />
[[Segment 49. Eigenthingies and Main Effects]] (or [http://www.youtube.com/watch?v=LpGQnvvGLMQ YouTube])<br><br />
<br />
===Segments with Slides But Not Yet Recorded===<br />
(links are to PowerPoint files)<br />
<br />
[http://granite.ices.utexas.edu/coursefiles/15.5.PoissonProcessesOrderStatistics.ppt Segment 15.5. Poisson Processes and Order Statistics]<br><br />
[http://granite.ices.utexas.edu/coursefiles/42.WienerFiltering.ppt Segment 42. Wiener Filtering]<br><br />
[http://granite.ices.utexas.edu/coursefiles/43.TheIRELady.ppt Segment 43. The IRE Lady]<br><br />
[http://granite.ices.utexas.edu/coursefiles/44.Wavelets.ppt Segment 44. Wavelets]<br><br />
[http://granite.ices.utexas.edu/coursefiles/45.LaplaceInterpolation.ppt Segment 45. Laplace Interpolation]<br><br />
[http://granite.ices.utexas.edu/coursefiles/46.InterpolationOnScatteredData.ppt Segment 46. Interpolation On Scattered Data]<br><br />
[http://granite.ices.utexas.edu/coursefiles/50.BinaryClassifiers.ppt Segment 50. Binary Classifiers]<br><br />
[http://granite.ices.utexas.edu/coursefiles/51.HierarchicalClassification.ppt Segment 51. Hierarchical Classification]<br><br />
[http://granite.ices.utexas.edu/coursefiles/52.DynamicProgramming.ppt Segment 52. Dynamic Programming]<br><br />
<br />
===Team Randomizer===<br />
Link to [http://granite.ices.utexas.edu/coursefiles/teamrandomizer.php the team randomizer]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Review_Session_for_Mid-Term_Exam&diff=1700Review Session for Mid-Term Exam2014-02-28T19:26:38Z<p>Jeff Hussmann: Created page with "[http://granite.ices.utexas.edu/coursewiki/images/e/ed/Generalized_monty.pdf Generalized Monty Hall] [http://granite.ices.utexas.edu/coursefiles/blitz_2014.pdf Probability bl..."</p>
<hr />
<div>[http://granite.ices.utexas.edu/coursewiki/images/e/ed/Generalized_monty.pdf Generalized Monty Hall]<br />
<br />
[http://granite.ices.utexas.edu/coursefiles/blitz_2014.pdf Probability blitz]<br />
<br />
[http://granite.ices.utexas.edu/coursewiki/images/f/f4/Multivar_normal.pdf MVN Exercise]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Segment_16._Multiple_Hypotheses&diff=1424Segment 16. Multiple Hypotheses2014-02-21T19:16:38Z<p>Jeff Hussmann: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/w6AjduOEN2k&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/w6AjduOEN2k http://youtu.be/w6AjduOEN2k]<br />
<br />
Links to the slides: [http://slate.ices.utexas.edu/coursefiles/16.MultipleHypotheses.pdf PDF file] or [http://slate.ices.utexas.edu/coursefiles/16.MultipleHypotheses.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. Simulate the following: You have M=50 p-values, none actually causal, so that they are drawn from a uniform distribution. Not knowing this sad fact, you apply the Benjamini-Hochberg prescription with <math>\alpha=0.05</math> and possibly call some discoveries as true. By repeated simulation, estimate the probability of thus getting N wrongly-called discoveries, for N=0, 1, 2, and 3.<br />
<br />
2. Does the distribution that you found in problem 1 depend on M? On <math>\alpha</math>? Derive its form analytically<br />
for the usual case of <math>\alpha \ll 1</math>?<br />
<br />
====To Think About====<br />
1. Suppose you have M independent trials of an experiment, each of which yields an independent p-value. Fisher proposed combining them by forming the statistic<br />
<br />
<math>S = -2\sum_{i=0}^{i=M}\log(p_i)</math><br />
<br />
Show that, under the null hypothesis, S is distributed as <math>\text{Chisquare}(2M)</math> and describe how you would obtain a combined p-value for this statistic.<br />
<br />
2. Fisher is sometimes credited, on the basis of problem 1, with having invented "meta-analysis", whereby results from multiple investigations can be combined to get an overall more significant result. Can you see any pitfalls in this?<br />
<br />
===Class Activity===<br />
[http://granite.ices.utexas.edu/coursefiles/p_value_follow_ups.pdf P-value follow-ups]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Segment_13._The_Yeast_Genome&diff=1303Segment 13. The Yeast Genome2014-02-17T19:48:28Z<p>Jeff Hussmann: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/QSgUX-Do8Tc&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/QSgUX-Do8Tc http://youtu.be/QSgUX-Do8Tc]<br />
<br />
Links to the slides: [http://slate.ices.utexas.edu/coursefiles/13.TheYeastGenome.pdf PDF file] or [http://slate.ices.utexas.edu/coursefiles/13.TheYeastGenome.ppt PowerPoint file]<br />
<br />
Link to the file mentioned in the segment: [http://slate.ices.utexas.edu/coursefiles/SacCerChr4.txt.zip SacSerChr4.txt]<br />
<br />
Link to all yeast chromosomes: [http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/chromosomes/ UCSC]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. With p=0.3, and various values of n, how big is the largest discrepancy between the Binomial probability pdf and the approximating Normal pdf? At what value of n does this value become smaller than <math>10^{-15}</math>?<br />
<br />
2. Show that if four random variables are (together) multinomially distributed, each separately is binomially distributed.<br />
<br />
====To Think About====<br />
1. The segment suggests that <math>A\ne T</math> and <math>C\ne G</math> comes about because genes are randomly distributed on one strand or the other. Could you use the observed discrepancies to estimate, even roughly, the number of genes in the yeast genome? If so, how? If not, why not?<br />
<br />
2. Suppose that a Bayesian thinks that the prior probability of the hypothesis that "<math>P_A=P_T</math>" is 0.9,<br />
and that the set of all hypotheses that "<math>P_A\ne P_T</math>" have a total prior of 0.1. How might he calculate the odds ratio <math>\text{Prob}(P_A=P_T)/\text{Prob}(P_A\ne P_T)</math>? Hint: Are there nuisance variables to be marginalized over?<br />
<br />
===Class Activity===<br />
<br />
[http://granite.ices.utexas.edu/coursefiles/chrIV.txt Yeast chromosome 4]<br />
<br />
[http://granite.ices.utexas.edu/coursefiles/yeast_ORFs Activity slides]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Segment_13._The_Yeast_Genome&diff=1295Segment 13. The Yeast Genome2014-02-17T16:56:57Z<p>Jeff Hussmann: changed data file URL</p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/QSgUX-Do8Tc&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/QSgUX-Do8Tc http://youtu.be/QSgUX-Do8Tc]<br />
<br />
Links to the slides: [http://slate.ices.utexas.edu/coursefiles/13.TheYeastGenome.pdf PDF file] or [http://slate.ices.utexas.edu/coursefiles/13.TheYeastGenome.ppt PowerPoint file]<br />
<br />
Link to the file mentioned in the segment: [http://slate.ices.utexas.edu/coursefiles/SacCerChr4.txt.zip SacSerChr4.txt]<br />
<br />
Link to all yeast chromosomes: [http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/chromosomes/ UCSC]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. With p=0.3, and various values of n, how big is the largest discrepancy between the Binomial probability pdf and the approximating Normal pdf? At what value of n does this value become smaller than <math>10^{-15}</math>?<br />
<br />
2. Show that if four random variables are (together) multinomially distributed, each separately is binomially distributed.<br />
<br />
====To Think About====<br />
1. The segment suggests that <math>A\ne T</math> and <math>C\ne G</math> comes about because genes are randomly distributed on one strand or the other. Could you use the observed discrepancies to estimate, even roughly, the number of genes in the yeast genome? If so, how? If not, why not?<br />
<br />
2. Suppose that a Bayesian thinks that the prior probability of the hypothesis that "<math>P_A=P_T</math>" is 0.9,<br />
and that the set of all hypotheses that "<math>P_A\ne P_T</math>" have a total prior of 0.1. How might he calculate the odds ratio <math>\text{Prob}(P_A=P_T)/\text{Prob}(P_A\ne P_T)</math>? Hint: Are there nuisance variables to be marginalized over?<br />
<br />
===Class Activity===<br />
<br />
[http://granite.ices.utexas.edu/coursefiles/chrIV.txt Yeast chromosome 4]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Segment_13._The_Yeast_Genome&diff=1293Segment 13. The Yeast Genome2014-02-17T06:18:55Z<p>Jeff Hussmann: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/QSgUX-Do8Tc&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/QSgUX-Do8Tc http://youtu.be/QSgUX-Do8Tc]<br />
<br />
Links to the slides: [http://slate.ices.utexas.edu/coursefiles/13.TheYeastGenome.pdf PDF file] or [http://slate.ices.utexas.edu/coursefiles/13.TheYeastGenome.ppt PowerPoint file]<br />
<br />
Link to the file mentioned in the segment: [http://slate.ices.utexas.edu/coursefiles/SacCerChr4.txt.zip SacSerChr4.txt]<br />
<br />
Link to all yeast chromosomes: [http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/chromosomes/ UCSC]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. With p=0.3, and various values of n, how big is the largest discrepancy between the Binomial probability pdf and the approximating Normal pdf? At what value of n does this value become smaller than <math>10^{-15}</math>?<br />
<br />
2. Show that if four random variables are (together) multinomially distributed, each separately is binomially distributed.<br />
<br />
====To Think About====<br />
1. The segment suggests that <math>A\ne T</math> and <math>C\ne G</math> comes about because genes are randomly distributed on one strand or the other. Could you use the observed discrepancies to estimate, even roughly, the number of genes in the yeast genome? If so, how? If not, why not?<br />
<br />
2. Suppose that a Bayesian thinks that the prior probability of the hypothesis that "<math>P_A=P_T</math>" is 0.9,<br />
and that the set of all hypotheses that "<math>P_A\ne P_T</math>" have a total prior of 0.1. How might he calculate the odds ratio <math>\text{Prob}(P_A=P_T)/\text{Prob}(P_A\ne P_T)</math>? Hint: Are there nuisance variables to be marginalized over?<br />
<br />
===Class Activity===<br />
<br />
[http://granite.ices.utexas.edu/coursefiles/chrIV.fa Yeast chromosome 4]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Segment_12._P-Value_Tests&diff=1292Segment 12. P-Value Tests2014-02-17T06:18:17Z<p>Jeff Hussmann: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/2Ul7TI0B5ek&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/2Ul7TI0B5ek http://youtu.be/2Ul7TI0B5ek]<br />
<br />
Links to the slides: [http://slate.ices.utexas.edu/coursefiles/12.P-ValueTests.pdf PDF file] or [http://slate.ices.utexas.edu/coursefiles/12.P-ValueTests.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. What is the critical region for a 5% two-sided test if, under the null hypothesis, the test statistic is distributed as <math>\text{Student}(0,\sigma,4)</math>? That is, what values of the test statistic disprove the null hypothesis with p < 0.05? (OK to use Python, MATLAB, or Mathematica.)<br />
<br />
2. For an exponentially distributed test statistic with mean <math>\mu</math> (under the null hypothesis), when is the the null hypothesis disproved with p < 0.01 for a one-sided test? for a two-sided test? <br />
<br />
====To Think About====<br />
1. P-value tests require an initial choice of a test statistic. What goes wrong if you choose a poor test statistic? What would make it poor?<br />
<br />
2. If the null hypothesis is that a coin is fair, and you record the results of N flips, what is a good test statistic? Are there any other possible test statistics?<br />
<br />
3. Why is it so hard for a Bayesian to do something as simple as, given some data, disproving a null hypothesis? Can't she just compute a Bayes odds ratio, P(null hypothesis is true)/P(null hypothesis is false) and derive a probability that the null hypothesis is true?<br />
<br />
===Class Activity===</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Segment_12._P-Value_Tests&diff=1220Segment 12. P-Value Tests2014-02-14T04:46:00Z<p>Jeff Hussmann: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/2Ul7TI0B5ek&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/2Ul7TI0B5ek http://youtu.be/2Ul7TI0B5ek]<br />
<br />
Links to the slides: [http://slate.ices.utexas.edu/coursefiles/12.P-ValueTests.pdf PDF file] or [http://slate.ices.utexas.edu/coursefiles/12.P-ValueTests.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. What is the critical region for a 5% two-sided test if, under the null hypothesis, the test statistic is distributed as <math>\text{Student}(0,\sigma,4)</math>? That is, what values of the test statistic disprove the null hypothesis with p < 0.05? (OK to use Python, MATLAB, or Mathematica.)<br />
<br />
2. For an exponentially distributed test statistic with mean <math>\mu</math> (under the null hypothesis), when is the the null hypothesis disproved with p < 0.01 for a one-sided test? for a two-sided test? <br />
<br />
====To Think About====<br />
1. P-value tests require an initial choice of a test statistic. What goes wrong if you choose a poor test statistic? What would make it poor?<br />
<br />
2. If the null hypothesis is that a coin is fair, and you record the results of N flips, what is a good test statistic? Are there any other possible test statistics?<br />
<br />
3. Why is it so hard for a Bayesian to do something as simple as, given some data, disproving a null hypothesis? Can't she just compute a Bayes odds ratio, P(null hypothesis is true)/P(null hypothesis is false) and derive a probability that the null hypothesis is true?<br />
<br />
===Class Activity===<br />
[[Class Activity 2/22/13]]<br />
<br />
[http://granite.ices.utexas.edu/coursefiles/chrIV.fa Yeast chromosome 4]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Segment_11._Random_Deviates&diff=1147Segment 11. Random Deviates2014-02-12T19:10:25Z<p>Jeff Hussmann: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/4r1GlyisB8E&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/4r1GlyisB8E http://youtu.be/4r1GlyisB8E]<br />
<br />
Links to the slides: [http://slate.ices.utexas.edu/coursefiles/11.RandomDeviates.pdf PDF file] or [http://slate.ices.utexas.edu/coursefiles/11.RandomDeviates.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. For the Cauchy distribution (Segment 8, Slide 3), find the inverse function of the CDF.<br />
<br />
2. In your favorite programming language, write a function that returns independent Cauchy deviates.<br />
<br />
====To Think About====<br />
1. Suppose you want a function that returns deviates for Student<math>(\nu)</math>. Could you use the Cauchy pdf (or some scaling of it) as a bounding function in a rejection method? How efficient is this (i.e., what fraction of the time does it reject)?<br />
<br />
2. Explain the three inequality tests in the "while" statement in Leva's algorithm (slide 7) and why they are hooked together with logical operators in the way shown.<br />
<br />
===Class Activity===<br />
[[Build your own random number generator]]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Build_your_own_random_number_generator&diff=1146Build your own random number generator2014-02-12T19:10:08Z<p>Jeff Hussmann: Created page with "The class activity is to make up your own random number generator. The goal is to learn how hard this actually is! If you Google and copy an existing generator from the web,..."</p>
<hr />
<div>The class activity is to make up your own random number generator. The goal is to learn how hard this actually is! If you Google and copy an existing generator from the web, you'll get zero credit unless you can give a detailed explanation of the mathematics behind it. So, I think you're better off to actually try to invent your own. Even though it's unlikely to be very good, you can still win if it is even tolerable (or better than the others that people invent).<br />
<br />
Your generator should return i.i.d. uniform random deviates X in the range 0 < X < 1. You should test it by generating a vector of one million deviates and running it through one of the test routines below. <br />
<br />
===Here are the test routines that you should use to test your generators:===<br />
<br />
PYTHON:<br />
import numpy as np<br />
import scipy.stats<br />
<br />
def test_generator(x):<br />
<nowiki>'''</nowiki> Input: x - an array of purported independent Uniform[0, 1]<br />
deviates<br />
Returns: a mysterious p-value.<br />
<nowiki>'''</nowiki><br />
ntable = 100<br />
counts, _, _ = np.histogram2d(x[:-1], x[1:],<br />
bins=ntable,<br />
range=[(0, 1), (0, 1)],<br />
)<br />
chisq, pval = scipy.stats.chisquare(counts.flatten())<br />
return pval<br />
<br />
(If you are using the IPython class server, you can just say 'from our_random import test' to have access to the function.)<br />
<br />
<br />
MATLAB:<br />
<br />
function pval = testran(x)<br />
ntable = 100;<br />
n = numel(x);<br />
sub1 = 1+fix(ntable*x(1:n-1));<br />
sub2 = 1+fix(ntable*x(2:n));<br />
subs = [sub1 sub2];<br />
counts = accumarray(subs,1,[ntable ntable]);<br />
ave = (n-1)/(ntable^2);<br />
chisq = sum( (counts(:)-ave).^2 / ave);<br />
pval = 1 - chi2cdf(chisq,ntable^2-1);<br />
end</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Segment_10._The_Central_Limit_Theorem&diff=1104Segment 10. The Central Limit Theorem2014-02-11T06:42:33Z<p>Jeff Hussmann: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/IpuYGsKplSw&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/IpuYGsKplSw http://youtu.be/IpuYGsKplSw<br />
<br />
Links to the slides: [http://slate.ices.utexas.edu/coursefiles/10.CentralLimitTheorem.pdf PDF file] or [http://slate.ices.utexas.edu/coursefiles/10.CentralLimitTheorem.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. Take 12 random values, each uniform between 0 and 1. Add them up and subtract 6. Prove that the result is close to a random value drawn from the Normal distribution with mean zero and standard deviation 1.<br />
<br />
2. Invent a family of functions, each different, that look like those in Slide 3: they all have value 1 at x = 0; they all have zero derivative at x = 0; and they generally (not necessarily monotonically) decrease to zero at large x. Now multiply 10 of them together and graph the result near the origin (i.e., reproduce what Slide 3 was sketching).<br />
<br />
3. For what value(s) of <math>\nu</math> does the Student distribution (Segment 8, Slide 4) have a convergent 1st and 2nd moment, but divergent 3rd and higher moments?<br />
<br />
====To Think About====<br />
1. A distribution with moments as in problem 3 above has a well-defined mean and variance. Does the CLT hold for the sum of RVs from such a distribution? If not, what goes wrong in the proof? Is the mean of the sum equal to the sum of the individual means? What about the variance of the sum? What, qualitatively, does the distribution of the sum of a bunch of them look like?<br />
<br />
2. Give an explanation of Bessel's correction in the last expression on slide 5. If, as we see, the MAP calculation gives the factor 1/N, why would one ever want to use 1/(N-1) instead? (There are various wiki and stackoverflow pages on this. See if they make sense to you!)<br />
<br />
====Just for fun====<br />
A fun problem that ties in to 'To Calculate' 1 above and problem 6 from the Probability Blitz:<br />
<br />
1. What is the expected number of Uniform[0,1] draws you need to add up before the sum exceeds 1? Prove your answer analytically and confirm it by simulation.<br />
<br />
[http://nbviewer.ipython.org/github/CS395T/2014/blob/master/Jeff%20Hussmann%2002-08-14%20sum%20of%20uniforms.ipynb Jeff's solution]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Segment_9._Characteristic_Functions&diff=1052Segment 9. Characteristic Functions2014-02-10T14:45:38Z<p>Jeff Hussmann: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/NJL-BX6HuxY&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/NJL-BX6HuxY http://youtu.be/NJL-BX6HuxY]<br />
<br />
Links to the slides: [http://slate.ices.utexas.edu/coursefiles/9.CharacteristicFunctions.pdf PDF file] or [http://slate.ices.utexas.edu/coursefiles/9.CharacteristicFunctions.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. Use characteristic functions to show that the sum of two independent Gaussian random variables is itself a Gaussian random variable. What is its mean and variance?<br />
<br />
2. Calculate (don't just look up) the characteristic function of the Exponential distribution.<br />
<br />
====To Think About====<br />
1. Learn enough about contour integration to be able to make sense of Saul's explanation at the bottom of slide 7. Then draw a picture of the contours, label the pole(s), and show how you calculate their residues.<br />
<br />
2. Do you think that characteristic functions are ever useful computationally (that is, not just analytically to prove theorems)?<br />
<br />
==Class Activity==<br />
[http://granite.ices.utexas.edu/coursefiles/blitz_2014.pdf Probability blitz]<br />
<br />
Alternate to problem 6(a): Show that a process with constant-rate exponentially distributed waiting times between events is a Poisson process (that has a Poisson distribution for the number of events in any fixed time interval).</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Segment_8._Some_Standard_Distributions&diff=956Segment 8. Some Standard Distributions2014-02-05T23:14:02Z<p>Jeff Hussmann: added Jeff's solution</p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/EDYDC7iNGTg&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/EDYDC7iNGTg http://youtu.be/EDYDC7iNGTg]<br />
Links to the slides: [http://slate.ices.utexas.edu/coursefiles/8.SomeStandardDistributions.pdf PDF file] or [http://slate.ices.utexas.edu/coursefiles/8.SomeStandardDistributions.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. In Segment 6 (slide 8) we used the improper prior <math>1/r</math>. Show that this is just a limiting case of a (completely proper) Lognormal prior.<br />
<br />
2. Prove that <math>{\rm Gamma}(\alpha,\beta)</math> has a single mode at <math>(\alpha-1)/\beta</math> when <math>\alpha \ge 1</math>.<br />
<br />
3. Show that the limiting case of the Student distribution as <math>\nu\rightarrow\infty</math> is the Normal distribution.<br />
<br />
====To Think About====<br />
1. Suppose you have an algorithm that can compute a CDF, <math>P(x)</math>. How would you design an algorithm to compute its inverse (see slide 9) <math>x(P)</math>?<br />
<br />
2. The lifetime t of a radioactive nucleus (say Uranium 238) is distributed as the Exponential distribution. Do you know why? (Hint: What is the distribution of an Exponential<math>(\beta)</math> random variable <i>conditioned on</i> its being greater than some given value?)<br />
<br />
===Class Activity===<br />
problem statement: [[media:ClassActivity20130204.pdf|ClassActivity20130204.pdf]]<br />
<br />
data file for class activity: [[media:events20130204.txt|events20130204.txt]]<br />
<br />
[http://nbviewer.ipython.org/github/CS395T/2014/blob/master/Jeff%20Hussmann%2002-05-13%20student%20t_1391641891.ipynb Jeff's solution]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Segment_6._The_Towne_Family_Tree&diff=737Segment 6. The Towne Family Tree2014-01-31T21:52:13Z<p>Jeff Hussmann: added Jeff's solution</p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see out-of-focus below is not the beginning of the segment. Press the play button to start at the beginning and in-focus.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/y_L2THpv5Jg&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/y_L2THpv5Jg http://youtu.be/y_L2THpv5Jg]<br />
<br />
Links to the slides: [http://slate.ices.utexas.edu/coursefiles/6.TheTowneFamilyTree.pdf PDF file] or [http://slate.ices.utexas.edu/coursefiles/6.TheTowneFamilyTree.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Compute====<br />
1. Write down an explicit expression for what the slides denote as bin(n,N,r).<br />
<br />
2. There is a small error on slide 7 that carries through to the first equation on slide 8 and the graph on slide 9. Find the error, fix it, and redo the graph of slide 9. Does it make a big difference? Why or why not?<br />
<br />
====To Think About====<br />
1. Suppose you knew the value of r (say, r = 0.0038). How would you simulate many instances of the Towne family data (e.g., the tables on slides 4 and 5?<br />
<br />
2. How would you use your simulation to decide if the assumption of ignoring backmutations (the red note on slide 7) is justified?<br />
<br />
3. How would you use your simulation to decide if our decision to trim T2, T11, and T13 from the estimation of r was justified? (This question anticipates several later discussions in the course, but thinking about it now will be a good start.)<br />
<br />
===Class Activity===<br />
[[Multinomial parameter estimation]]<br />
<br />
[http://nbviewer.ipython.org/github/CS395T/2014/blob/master/Jeff%20Hussmann%2001-31-14%20multinomial%20parameter%20estimation.ipynb Jeff's solution]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Segment_6._The_Towne_Family_Tree&diff=735Segment 6. The Towne Family Tree2014-01-31T19:31:37Z<p>Jeff Hussmann: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see out-of-focus below is not the beginning of the segment. Press the play button to start at the beginning and in-focus.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/y_L2THpv5Jg&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/y_L2THpv5Jg http://youtu.be/y_L2THpv5Jg]<br />
<br />
Links to the slides: [http://slate.ices.utexas.edu/coursefiles/6.TheTowneFamilyTree.pdf PDF file] or [http://slate.ices.utexas.edu/coursefiles/6.TheTowneFamilyTree.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Compute====<br />
1. Write down an explicit expression for what the slides denote as bin(n,N,r).<br />
<br />
2. There is a small error on slide 7 that carries through to the first equation on slide 8 and the graph on slide 9. Find the error, fix it, and redo the graph of slide 9. Does it make a big difference? Why or why not?<br />
<br />
====To Think About====<br />
1. Suppose you knew the value of r (say, r = 0.0038). How would you simulate many instances of the Towne family data (e.g., the tables on slides 4 and 5?<br />
<br />
2. How would you use your simulation to decide if the assumption of ignoring backmutations (the red note on slide 7) is justified?<br />
<br />
3. How would you use your simulation to decide if our decision to trim T2, T11, and T13 from the estimation of r was justified? (This question anticipates several later discussions in the course, but thinking about it now will be a good start.)<br />
<br />
===Class Activity===<br />
[[Multinomial parameter estimation]]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Segment_6._The_Towne_Family_Tree&diff=734Segment 6. The Towne Family Tree2014-01-31T19:31:02Z<p>Jeff Hussmann: Reverted edits by Jeff Hussmann (talk) to last revision by Bill Press</p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see out-of-focus below is not the beginning of the segment. Press the play button to start at the beginning and in-focus.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/y_L2THpv5Jg&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/y_L2THpv5Jg http://youtu.be/y_L2THpv5Jg]<br />
<br />
Links to the slides: [http://slate.ices.utexas.edu/coursefiles/6.TheTowneFamilyTree.pdf PDF file] or [http://slate.ices.utexas.edu/coursefiles/6.TheTowneFamilyTree.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Compute====<br />
1. Write down an explicit expression for what the slides denote as bin(n,N,r).<br />
<br />
2. There is a small error on slide 7 that carries through to the first equation on slide 8 and the graph on slide 9. Find the error, fix it, and redo the graph of slide 9. Does it make a big difference? Why or why not?<br />
<br />
====To Think About====<br />
1. Suppose you knew the value of r (say, r = 0.0038). How would you simulate many instances of the Towne family data (e.g., the tables on slides 4 and 5?<br />
<br />
2. How would you use your simulation to decide if the assumption of ignoring backmutations (the red note on slide 7) is justified?<br />
<br />
3. How would you use your simulation to decide if our decision to trim T2, T11, and T13 from the estimation of r was justified? (This question anticipates several later discussions in the course, but thinking about it now will be a good start.)<br />
<br />
===Class Activity===<br />
[[media:ClassActivities20130130.pdf|ClassActivities20130130.pdf]]<br />
<br />
Kai won the $100 contest. Congratulations to him! I'll have to be more careful before I offer prizes this big in the future!</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Segment_6._The_Towne_Family_Tree&diff=733Segment 6. The Towne Family Tree2014-01-31T19:30:40Z<p>Jeff Hussmann: /* Class Activity */</p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see out-of-focus below is not the beginning of the segment. Press the play button to start at the beginning and in-focus.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/y_L2THpv5Jg&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/y_L2THpv5Jg http://youtu.be/y_L2THpv5Jg]<br />
<br />
Links to the slides: [http://slate.ices.utexas.edu/coursefiles/6.TheTowneFamilyTree.pdf PDF file] or [http://slate.ices.utexas.edu/coursefiles/6.TheTowneFamilyTree.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Compute====<br />
1. Write down an explicit expression for what the slides denote as bin(n,N,r).<br />
<br />
2. There is a small error on slide 7 that carries through to the first equation on slide 8 and the graph on slide 9. Find the error, fix it, and redo the graph of slide 9. Does it make a big difference? Why or why not?<br />
<br />
====To Think About====<br />
1. Suppose you knew the value of r (say, r = 0.0038). How would you simulate many instances of the Towne family data (e.g., the tables on slides 4 and 5?<br />
<br />
2. How would you use your simulation to decide if the assumption of ignoring backmutations (the red note on slide 7) is justified?<br />
<br />
3. How would you use your simulation to decide if our decision to trim T2, T11, and T13 from the estimation of r was justified? (This question anticipates several later discussions in the course, but thinking about it now will be a good start.)<br />
<br />
[[Multinomial parameter estimation]]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Multinomial_parameter_estimation&diff=732Multinomial parameter estimation2014-01-31T19:29:09Z<p>Jeff Hussmann: Created page with "The past two segments have been about Bayesian parameter estimation. In Segment 5. Bernoulli Trials, we did Bayesian parameter estimation of the rate parameter of a binomial..."</p>
<hr />
<div>The past two segments have been about Bayesian parameter estimation. <br />
<br />
In Segment 5. Bernoulli Trials, we did Bayesian parameter estimation of the rate parameter of a binomial distribution.<br />
The setup was: we saw the outcomes of a series of independent trials.<br />
There were two possible outcomes to each trial: the jailer says B, or the jailer says C.<br />
There was one parameter of interest: x, the probability with which the jailer says B. (What about the probability that the jailer says C?)<br />
The goal was to compute the posterior distribution, given data in the form of counts of outcomes observed, of x.<br />
<br />
In this exercise, we will generalize this to a multinomial setting.<br />
Each trial is now a chess game, to which there are three possible outcomes: white wins, black wins, or the players draw.<br />
We have data on the outcomes of 10,000 real chess games.<br />
We want to use this data to learn how likely each outcome is.<br />
In other words, we assume that due to the structure of the game of chess, there is some inherent probability of each outcome occurring, and we want to figure out what these probabilities are.<br />
The parameters of interest are w, the probability that white wins, and b, the probability that black wins. (What about the probability that they draw?)<br />
To do this, we will take counts of the outcomes observed and compute the joint posterior distribution of w and b given this data.<br />
<br />
;Notational conventions<br />
:w = probability that white wins<br />
:b = probability that black wins<br />
:d = probability that the players draw<br />
:N = total number of games observed<br />
:W = number of white wins in these games<br />
:B = number of black wins in these games<br />
:D = number of draws in these games<br />
<br />
;Activity checkpoints<br />
<br />
# What does a joint uniform prior on w and b look like?<br />
# Suppose we know that w=0.4, b = 0.3, and d = 0.3. If we watch N = 10 games, what is the probability that W = 3, B = 5, and D = 2?<br />
# For general w, b, d, W, B, D, what is P(W, B, D | w, b, d)?<br />
# Applying Bayes, what is P(w, b, d | W, B, D)? (The Bayes denominator is tricky - if you present us with the integral to evaluate, we will provide the answer.)<br />
# Here is the real data - [http://granite.ices.utexas.edu/coursefiles/chess_outcomes.txt chess_outcomes.txt]. Each line represents the outcome of one game. Count the outcomes of the first N games and produce a visualization of the joint posterior of the win rates for N = 0, 3, 10, 100, 1000, and 10000.<br />
<br />
If you do this in Python, the data is already on the class server - check out the "Jeff Hussmann 01-31-14 reading a file" notebook to see how to access it.<br />
<br />
Some snippets demonstrating library functions for evaluating and visualizing a function on a 2D grid of points can be found [http://nbviewer.ipython.org/4692768/ here].</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Segment_5._Bernoulli_Trials&diff=649Segment 5. Bernoulli Trials2014-01-30T05:48:51Z<p>Jeff Hussmann: updated Jeff's solution yet again to include animation</p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see out-of-focus below is not the beginning of the segment. Press the play button to start at the beginning and in-focus.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/2T3KP2LleFg&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/2T3KP2LleFg http://youtu.be/2T3KP2LleFg]<br />
<br />
Links to the slides: [http://slate.ices.utexas.edu/coursefiles/5.BernoulliTrials.pdf PDF file] or [http://slate.ices.utexas.edu/coursefiles/5.BernoulliTrials.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Compute====<br />
1. You throw a pair of fair dice 10 times and, each time, you record the total number of spots. When you are done, what is the probability that exactly 5 of the 10 recorded totals are prime?<br />
<br />
2. If you flip a fair coin one billion times, what is the probability that the number of heads is between 500010000 and 500020000, inclusive? (Give answer to 4 significant figures.)<br />
<br />
====To Think About====<br />
1. Suppose that the assumption of independence (the first "i" in "i.i.d.") were violated. Specifically suppose that, after the first Bernoulli trial, every trial has a probability Q of simply reproducing the immediately previous outcome, and a probability (1-Q) of being an independent trial. How would you compute the probability of getting n events in N trials if the probability of each event (when it is independent) is p?<br />
<br />
2. Try the Mathematica calculation on slide 5 without the magical "GenerateConditions -> False". Why is the output different?<br />
<br />
===Class Activity===<br />
http://projecteuler.net/problem=267<br />
<br />
'''Part 2:''' The problem as stated multiplies your wager by 2 on a win. What is the smallest this factor can be while still leaving the probability of ending up above one billion greater than 0.5, assuming that you play with an optimal f given the value of the factor?<br />
<br />
[http://nbviewer.ipython.org/github/CS395T/2014/blob/master/Jeff%20Hussmann%2001-29-14%20billionaire_1391060765.ipynb Jeff's solution - updated in response to group 2's excellent work]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Segment_5,_Group_2&diff=648Segment 5, Group 22014-01-30T04:06:55Z<p>Jeff Hussmann: </p>
<hr />
<div>'''Part 1:''' You are given a unique investment opportunity. Starting with £1 of capital, you can choose a fixed proportion, f, of your capital to bet on a fair coin toss repeatedly for 1000 tosses. Your return is double your bet for heads and you lose your bet for tails. For example, if f = 1/4, for the first toss you bet £0.25, and if heads comes up you win £0.5 and so then have £1.5. You then bet £0.375 and if the second toss is tails, you have £1.125. Choosing f to maximize your chances of having at least £1,000,000,000 after 1,000 flips, what is the chance that you become a billionaire? All computations are assumed to be exact (no rounding), but give your answer rounded to 12 digits behind the decimal point in the form 0.abcdefghijkl.<br />
<br />
For this problem, the total capital you have is updated by a multiplicative factor after each coin flip. If the flip is heads, then the total capital is multiplied by (1+2f). If the flip is tails, then the total capital is multiplied by (1-f). This means that the order of the heads and tails doesn't matter, only the number of heads and tails. Also, the probability of flipping <math>N_H</math> heads is independent of the choice of f. Therefore, we only need to determine the number of heads required to reach £1,000,000,000 after 1,000 flips for a given f and find the f that minimizes this. Based on this minimum number of heads required to reach £1,000,000,000 after 1,000 flips, the probability can be determined by evaluating the CDF of the binomial distribution.<br />
<br />
For a given f, the required number of heads to reach £1,000,000,000 after 1,000 flips can be determined through the following relation<br />
<br />
<math>(1+2f)^{N_H}(1-f)^{1000-N_H} = \left(\frac{1+2f}{1-f}\right)^{N_H}(1-f)^{1000} \geq 10^9</math><br />
<math>\left(\frac{1+2f}{1-f}\right)^{N_H} \geq \frac{10^9}{(1-f)^{1000}}</math><br />
<math>N_H = ceil\left(\frac{\log\left(\frac{10^9}{(1-f)^{1000}}\right)}{\log\left(\frac{1+2f}{1-f}\right)}\right) = ceil\left(\frac{9 - 1000\log\left(1-f\right)}{\log\left(1+2f\right) - \log\left(1-f\right)}\right)</math><br />
<br />
A MATLAB script that determines this maximum probability of reaching £1,000,000,000 after 1,000 flips for any given f is shown below. This script divides the range of possible f into 1,000,000 evenly spaced points, computes the required <math>N_H</math> to reach £1,000,000,000 after 1,000 flips for each value of f, finds the minimum number of heads required, and determines the probability that you will have at least £1,000,000,000 after 1,000 flips.<br />
<br />
NH_f = @(f) ceil((9 - 1000*log10(1-f))./log10((1+2*f)./(1-f)));<br />
minf = 0.0001;<br />
maxf = 0.9999;<br />
f = linspace(minf,maxf,10^6);<br />
NH = NH_f(f);<br />
[NHmin ind] = min(NH);<br />
<br />
P = 1 - cdf('bino',NHmin-1,1000,.5)<br />
<br />
Running this script results in the output<br />
<br />
P =<br />
<br />
0.999992836186714<br />
<br />
<br />
<br />
'''Part 2:''' The problem as stated multiplies your wager by 2 on a win. What is the smallest this factor can be while still leaving the probability of ending up above one billion greater than 0.5, assuming that you play with an optimal f given the value of the factor?<br />
<br />
<br />
<br />
digits = 15;<br />
Mlow = 1;<br />
Mhigh = 2;<br />
minf = 0.0001;<br />
maxf = 0.9999;<br />
f = linspace(minf,maxf,10^6);<br />
for(d = 1:digits)<br />
M = linspace(Mlow,Mhigh,11);<br />
for(i = 1:length(M))<br />
Mi = M(i);<br />
NH_f = @(f) ceil((9 - 1000*log10(1-f))./log10((1+2*f)./(1-f)));<br />
NH = NH_f(f);<br />
[NHmin ind] = min(NH);<br />
if(NHmin <= 500)<br />
break<br />
end<br />
end<br />
Mhigh = Mi;<br />
Mlow = Mi - 10^(-d);<br />
end<br />
Mi<br />
P = 1 - cdf('bino',NHmin-1,1000,.5)<br />
<br />
Running this script results in the output<br />
<br />
Mi =<br />
<br />
1.504673877089423<br />
<br />
<br />
P =<br />
<br />
0.512612509089180<br />
<br />
Note that this answer is lower than that reported in Jeff's solution because the range of f was divided into 1,000,000 points instead of 1,000. This resulted in a more accurate and, thus, lower answer.<br />
<br />
:This is great stuff! You are absolutely right that this would have won the prize over my answer and about the mechanism why. But I have responded to your challenge with an even smaller factor - check my (updated) solution. I don't think you can beat this one, but I would be happy to be wrong! [[User:Jeff Hussmann|Jeff Hussmann]] ([[User talk:Jeff Hussmann|talk]]) 22:06, 29 January 2014 (CST)</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Segment_5._Bernoulli_Trials&diff=647Segment 5. Bernoulli Trials2014-01-30T04:06:29Z<p>Jeff Hussmann: updated Jeff's solution again</p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see out-of-focus below is not the beginning of the segment. Press the play button to start at the beginning and in-focus.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/2T3KP2LleFg&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/2T3KP2LleFg http://youtu.be/2T3KP2LleFg]<br />
<br />
Links to the slides: [http://slate.ices.utexas.edu/coursefiles/5.BernoulliTrials.pdf PDF file] or [http://slate.ices.utexas.edu/coursefiles/5.BernoulliTrials.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Compute====<br />
1. You throw a pair of fair dice 10 times and, each time, you record the total number of spots. When you are done, what is the probability that exactly 5 of the 10 recorded totals are prime?<br />
<br />
2. If you flip a fair coin one billion times, what is the probability that the number of heads is between 500010000 and 500020000, inclusive? (Give answer to 4 significant figures.)<br />
<br />
====To Think About====<br />
1. Suppose that the assumption of independence (the first "i" in "i.i.d.") were violated. Specifically suppose that, after the first Bernoulli trial, every trial has a probability Q of simply reproducing the immediately previous outcome, and a probability (1-Q) of being an independent trial. How would you compute the probability of getting n events in N trials if the probability of each event (when it is independent) is p?<br />
<br />
2. Try the Mathematica calculation on slide 5 without the magical "GenerateConditions -> False". Why is the output different?<br />
<br />
===Class Activity===<br />
http://projecteuler.net/problem=267<br />
<br />
'''Part 2:''' The problem as stated multiplies your wager by 2 on a win. What is the smallest this factor can be while still leaving the probability of ending up above one billion greater than 0.5, assuming that you play with an optimal f given the value of the factor?<br />
<br />
[http://nbviewer.ipython.org/github/CS395T/2014/blob/master/Jeff%20Hussmann%2001-29-14%20billionaire_1391054743.ipynb Jeff's solution]</div>Jeff Hussmannhttp://wpressutexas.net/coursewiki/index.php?title=Segment_5._Bernoulli_Trials&diff=635Segment 5. Bernoulli Trials2014-01-29T21:32:33Z<p>Jeff Hussmann: updated nbviewer link</p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see out-of-focus below is not the beginning of the segment. Press the play button to start at the beginning and in-focus.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/2T3KP2LleFg&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/2T3KP2LleFg http://youtu.be/2T3KP2LleFg]<br />
<br />
Links to the slides: [http://slate.ices.utexas.edu/coursefiles/5.BernoulliTrials.pdf PDF file] or [http://slate.ices.utexas.edu/coursefiles/5.BernoulliTrials.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Compute====<br />
1. You throw a pair of fair dice 10 times and, each time, you record the total number of spots. When you are done, what is the probability that exactly 5 of the 10 recorded totals are prime?<br />
<br />
2. If you flip a fair coin one billion times, what is the probability that the number of heads is between 500010000 and 500020000, inclusive? (Give answer to 4 significant figures.)<br />
<br />
====To Think About====<br />
1. Suppose that the assumption of independence (the first "i" in "i.i.d.") were violated. Specifically suppose that, after the first Bernoulli trial, every trial has a probability Q of simply reproducing the immediately previous outcome, and a probability (1-Q) of being an independent trial. How would you compute the probability of getting n events in N trials if the probability of each event (when it is independent) is p?<br />
<br />
2. Try the Mathematica calculation on slide 5 without the magical "GenerateConditions -> False". Why is the output different?<br />
<br />
===Class Activity===<br />
http://projecteuler.net/problem=267<br />
<br />
'''Part 2:''' The problem as stated multiplies your wager by 2 on a win. What is the smallest this factor can be while still leaving the probability of ending up above one billion greater than 0.5, assuming that you play with an optimal f given the value of the factor?<br />
<br />
[http://nbviewer.ipython.org/github/CS395T/2014/blob/master/Jeff%20Hussmann%2001-29-14%20billionaire_1391031094.ipynb Jeff's solution]</div>Jeff Hussmann