http://wpressutexas.net/coursewiki/api.php?action=feedcontributions&user=Bill+Press&feedformat=atomComputational Statistics Course Wiki - User contributions [en]2020-07-06T09:58:47ZUser contributionsMediaWiki 1.32.0http://wpressutexas.net/coursewiki/index.php?title=Segment_5._Bernoulli_Trials&diff=3454Segment 5. Bernoulli Trials2019-01-27T03:43:30Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
<br />
The direct YouTube link is [http://youtu.be/2T3KP2LleFg http://youtu.be/2T3KP2LleFg]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/5.BernoulliTrials.pdf PDF file] or [http://wpressutexas.net/coursefiles/5.BernoulliTrials.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Compute====<br />
1. You throw a pair of fair dice 10 times and, each time, you record the total number of spots. When you are done, what is the probability that exactly 5 of the 10 recorded totals are prime?<br />
<br />
2. If you flip a fair coin one billion times, what is the probability that the number of heads is between 500010000 and 500020000, inclusive? (Give answer to 4 significant figures.)<br />
<br />
====To Think About====<br />
1. Suppose that the assumption of independence (the first "i" in "i.i.d.") were violated. Specifically suppose that, after the first Bernoulli trial, every trial has a probability Q of simply reproducing the immediately previous outcome, and a probability (1-Q) of being an independent trial. How would you compute the probability of getting n events in N trials if the probability of each event (when it is independent) is p?<br />
<br />
2. Try the Mathematica calculation on slide 5 without the magical "GenerateConditions -> False". Why is the output different?<br />
<br />
===Class Activity===<br />
http://projecteuler.net/problem=267<br />
<br />
'''Part 2:''' The problem as stated multiplies your wager by 2 on a win. What is the smallest this factor can be while still leaving the probability of ending up above one billion greater than 0.5, assuming that you play with an optimal f given the value of the factor?<br />
<br />
[http://nbviewer.ipython.org/github/CS395T/2014/blob/master/Jeff%20Hussmann%2001-29-14%20billionaire_1391060765.ipynb Jeff's solution - updated in response to group 2's excellent work]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_4._The_Jailer%27s_Tip&diff=3453Segment 4. The Jailer's Tip2019-01-27T03:42:53Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
<br />
The direct YouTube link is [http://youtu.be/425D0CjLLLs http://youtu.be/425D0CjLLLs]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/4.TheJailersTip.pdf PDF file] or [http://wpressutexas.net/coursefiles/4.TheJailersTip.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. Evaluate <math>\int_0^1 \delta(3x-2) dx</math><br />
<br />
2. Prove that <math>\delta(a x) = \frac{1}{a}\delta(x)</math>.<br />
<br />
3. What is the numerical value of <math>P(A|S_BI)</math> if the prior for <math>p(x)</math> is a massed prior with half the mass at <math>x = 1/3</math> and half the mass at <math>x = 2/3</math>?<br />
<br />
====To Think About====<br />
1. With respect to problem 3, above, since x is a probability, how can choosing x=1/3 half the time, and x=2/3 the other half of the time be different from choosing x=1/2 all the time?<br />
<br />
2. Suppose A is some event that we view as stochastic with P(A), such as "will it rain today?". But the laws of physics (or meteorology) say that A actually depends on other weather variables X, Y, Z, etc., with conditional probabilities P(A|XYZ...). If we repeatedly sample just A, to naively measure P(A), are we correctly marginalizing over the other variables?<br />
<br />
===Class Activities===<br />
[[File:quiz1_scatterplot.png|300px|thumb|Comparison of peer scores to TA grades on quiz]]<br />
<br />
[[Media:Quiz20140127solution.pdf|Surprise Quiz (with Bill's solutions here)]] (Notice in the figure the<br />
almost perfect correlation between the peer ranks that the teams assigned and the TA's separate grading.)<br />
<br />
We also discussed Mr. and Mrs. Smith and their daughter(s) -- see Think About Question 3 in Segment 3.<br />
<br />
We also did some variants of [[Expected values and continuous distributions]]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_3._Monty_Hall&diff=3452Segment 3. Monty Hall2019-01-27T03:36:14Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
<br />
The direct YouTube link is [http://youtu.be/Rxb8JG8nUFA http://youtu.be/Rxb8JG8nUFA]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/3.MontyHall.pdf PDF file] or [http://wpressutexas.net/coursefiles/3.MontyHall.ppt PowerPoint file]<br />
<br />
====Bill's Comments====<br />
You might enjoy reading some of the correspondence that Marilyn vos Savant received, on [http://marilynvossavant.com/game-show-problem/ her web site].<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. The slides used a symmetry argument ("relabeling") to simplify the calculation. Redo the calculation without any such relabeling. Assume that the doors have big numbers "1", "2", and "3" nailed onto them, and consider all possibilities. Do you still have to make an assumption about Monty's preferences (where the slide assumed 1/2)?<br />
<br />
====To Think About====<br />
1. Lawyers are supposed to be able to argue either side of a case. What is the best argument that you can make that switching doors can't possibly make any difference? In other words, how cleverly can you hide some wrong assumption?<br />
<br />
2. We stated the problem as <i>requiring</i> the host to offer the contestant a chance to switch. But what if the host can offer that chance, or not, as he sees fit? Then, when offered the chance, should you still switch? (Spoiler alert: see [http://www.nytimes.com/1991/07/21/us/behind-monty-hall-s-doors-puzzle-debate-and-answer.html?pagewanted=all&src=pm this New York Times interview] with Monte Hall.)<br />
<br />
[[File:SmithFamilyDog_Credit_SusanBonners.png|300px|right|thumb]]<br />
3. Mr. and Mrs. Smith tell you that they have two children, one of whom is a girl.<br><br />
(a) What is the probability that the other child is a girl?<br><br />
Mr. Smith then shows you a photo of his children on his iPhone. One is clearly a girl, but the other one's face is hidden behind the family dog, and you can't tell their gender.<br><br />
(b) What is the probability that the hidden child is a girl?<br><br />
(c) If your answers to (a) and (b) are different, explain why there is a difference.<br />
<br />
====Class Activity====<br />
Class got cancelled due to a snow day (very unusual in Austin!). We would have done:<br />
<br />
[http://wpressutexas.net/coursewiki/images/e/ed/Generalized_monty.pdf Generalized Monty Hall]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_2._Bayes&diff=3451Segment 2. Bayes2019-01-27T03:35:39Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
<br />
The direct YouTube link is [http://youtu.be/FROAk4AFKHk http://youtu.be/FROAk4AFKHk]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/2.Bayes.pdf PDF file] or [http://wpressutexas.net/coursefiles/2.Bayes.ppt PowerPoint file]<br />
<br />
====Bill's Comments====<br />
Here is a link to the [http://wpressutexas.net/course2011/EfronWhyEveryone1986.pdf Efron paper] mentioned.<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. If the knight had captured a Gnome instead of a Troll, what would his chances be of crossing safely?<br />
<br />
2. Suppose that we have two identical boxes, A and B. A contains 5 red balls and 3 blue balls. B contains 2 red balls and 4 blue balls. A box is selected at random and exactly one ball is drawn from the box. What is the probability that it is blue? If it <i>is</i> blue, what is the probability that it came from box B?<br />
<br />
====To Think About====<br />
1. Do you think that the human brain's intuitive "inference engine" obeys the commutativity and associativity of evidence? For example, are we more likely to be swayed by recent, rather than older, evidence? How can evolution get this wrong if the mathematical formulation is correct?<br />
<br />
2. How would you simulate the Knight/Troll/Gnome problem on a computer, so that you could run it 100,000 times and see if the Knights probability of crossing safely converges to 1/3?<br />
<br />
3. Since different observers have different background information, isn't Bayesian inference useless for making social decisions (like what to do about climate change, for example)? How can there ever be any consensus on probabilities that are fundamentally subjective?<br />
<br />
====Class Activity====<br />
[[Media:ActivityWedJan22.pdf]]<br />
<br />
[[Jeff's gnomes and trolls simulation]]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_1._Let%27s_Talk_about_Probability&diff=3450Segment 1. Let's Talk about Probability2019-01-27T03:34:56Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
<br />
The direct YouTube link is [http://youtu.be/H5WjVgL6Nh4 http://youtu.be/H5WjVgL6Nh4]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/1.LetsTalkAboutProbability.pdf PDF file] or [http://wpressutexas.net/coursefiles/1.LetsTalkAboutProbability.ppt PowerPoint file]<br />
<br />
====Bill's comments on this segment====<br />
Well, I do sound nervous! This was one of my first webcasts. The production values get a little better with later segments. However, the material here is important, so be sure you understand it before going on.<br />
<br />
Here is a [http://wpressutexas.net/coursefiles/CoxPaper1946.pdf link to the paper by R.T. Cox], discussed on slide 2. It's surprisingly readable for something so fundamental.<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. Prove that <math>P(ABC) = P(B)P(C|B)P(A|BC)</math>.<br />
<br />
2. What is the probability that the sum of two dice is odd with neither being a 4?<br />
<br />
====To Think About====<br />
1. [http://en.wikipedia.org/wiki/First-order_logic First-order logic] is a type of propositional calculus with propositions <math>a,b,c</math> and quantifier symbols <math>\forall</math> and <math>\exists</math>. This allows statements like "Socrates is a philosopher", "Socrates is a man", "There exists a philosopher who is not a man", etc. Can you use first-order logic as a calculus of inference? Is it the same as using the probability axioms? If not, then which of Cox's suppositions is violated?<br />
<br />
2. You are an oracle that, when asked, says "yes" with probability <math>P</math> and "no" with probability <math>1-P</math>. How do you do this using only a fair, two-sided coin?<br />
As we did in class. Represent P as a binary number. Whenever<br />
<br />
<br />
3. For the trout/minnow problem, what if you want to know the probability that the Nth fish caught is a trout, for N=1,2,3,... What is an efficient way to set up this calculation? (Hint: If you ever learned the word "Markov", this might be a good time to remember it!)<br />
<br />
====Class Activity====<br />
[[Media:ActivityWedJan15.pdf]]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Main_Page&diff=3449Main Page2019-01-24T21:48:44Z<p>Bill Press: </p>
<hr />
<div>__NOTOC__<br />
=Statistical and Discrete Methods for Scientific Computing=<br />
<br />
The [[2014 Concepts Study Page]] lists all the questions that may be asked in the oral exam.<br />
<br />
You can generate your own practice exam at the [http://wpressutexas.net/examgenerator.php 2014 Practice Exam Machine].<br />
<br />
==Note on math rendering:==<br />
<br />
This site now uses only MathJax for math rendering. This is the most foolproof method and should<br />
work on all browsers. However, page loads are sometimes slow. Be patient. Your reward on a slow page<br />
will be a page with lots of math! <br />
<br />
===CSE383M (65280) and CS395T (53715), Spring 2014===<br />
Welcome to the course! The instructor is Professor William Press (Bill), and the TA is Jeff Hussmann (Jeff). We meet Mondays and Wednesdays, 1:30 - 3:00 p.m. in CBA 4.344 with Bill, and Fridays, 1:30 - 3:00 p.m. in CBA 4.348 with Jeff. The course is aimed at first or second year graduate students, especially in the CSEM, CS, and ECE programs, but others are welcome. You'll need math at the level of <i>at least</i> 2nd year calculus, plus linear algebra, plus either more continuous math (e.g., CSEM students) or more discrete math (e.g., CS and ECE students). You'll also need to be able to program in some known computer language.<br />
<br />
===Mechanics of the Course===<br />
The last two years, we have tried the experiment of a "flipped" course. This has worked so well that we are doing this again this year. "Flipped" means that the lectures are all on the web as recorded webcasts. You <b>must</b> watch the assigned webcasts <b>before</b> the class for which they are scheduled; maybe watch them more than once if there are parts that you don't easily understand. Then, you will be ready for the active learning that we do in class. The class activities will <b>not</b> "cover the material". Rather, class is supposed to be for "aha moments" and for "fixing" the material in your learning memory. We'll thus do various kinds of "active learning" activities that will test and improve your understanding of the material in the lecture. Such in-class activities, often done in <i>randomized</i> groups of two or three, may include<br />
* group computer programming exercises<br />
* group working of problems<br />
* group writing assignments<br />
* discussing concepts (and communicating ideas back to the whole class)<br />
* "quiz show" style activities<br />
* short surprise quizzes (generally at the beginning of class -- no makeups allowed)<br />
* whatever else we all think of<br />
<br />
===Problems for Each Segment===<br />
Every lecture segment home page has one or two relatively easy "skill" problems. You should work these after watching the segment, before class. (You might be asked to discuss your solution with your small group in class.) Also on the segment's page are one or two concept thought problems. One or another of these will sometimes be the basis of in-class activities, so you might want to think about them before class.<br />
<br />
===Student Wiki Pages===<br />
Every student will have a wiki page (and as many linked pages as you want). You can post your solutions to as many problems as you wish to your wiki page. You can do this either before the relevant class or afterwards. You can also make up, and solve, additional problems. Problems won't be individually graded. However, at the end of the course, the completeness and quality of you wiki page(s) will be a part of your course grade. Your wiki page can include discussion of the thought problems, as well as the skill problems. <br />
<br />
You can also post signed comments on any other student's wiki pages. To the extent that these are generally helpful, they will add credit to your reputation and for your grade.<br />
<br />
[[Student Pages]]<br />
<br />
===Laptops or Tablets===<br />
You <b>must</b> bring your laptop computer or full-sized tablet to every class, so that you can (i) look things up during group discussions or problem sessions and (ii) do in-class programming exercises. You can program in any language you want. For Python, which we recommend as the best choice for this course, you can either install it on your machine, or else use the IPython notebook server described in class. The course will include several lectures of Python workshop by Jeff. <br />
<br />
If you instead want to use MATLAB or Mathematica, that is fine, but please be sure that it is installed on your computer before the first class. (The MATLAB Student Edition is a real bargain.) For C, C++, Java, etc., please be sure that you have a fully working environment for compiling and running small pieces of code.<br />
<br />
===Course Requirements and Grading===<br />
Grades will be based on these factors<br />
* in-class attendance and participation<br />
* an in-class midterm exam<br />
* completeness and quality of your individual wiki page(s)<br />
* relevance and usefulness of your comments on other people's wiki pages (or on the main wiki)<br />
* an individual 30-minute final oral exam<br />
<br />
In previous years there was a term project, but not this year. Your working the problems and posting solutions on your wiki page is this year's substitute.<br />
<br />
[[File:learning_cone.gif|200px|thumb|right|Click image to see a legible version.]]<br />
<br />
===What is Active Learning?===<br />
Much research shows that lecture courses, where students listen passively as the instructor talks, are inefficient ways to learn. What works is so-called [http://en.wikipedia.org/wiki/Active_learning active learning], a broad term that, for us, basically means that class time is too valuable to waste on lectures. (See image at right.)<br />
<br />
The lectures are all recorded as webcasts, but webcasts are not active learning. However, they are a starting point as a "linear" introduction to the material. <br />
<br />
===Feedback===<br />
<br />
What has worked well in class so far? What hasn't worked? How could things be improved? Please leave [[Feedback 2014|feedback]].<br />
<br />
===Resources and Links===<br />
<br />
There is no textbook for the course. A list of recommended supplementary books is [[Recommended books|here]].<br />
<br />
Some resources for learning Python can be found [[Python resources|here]].<br />
<br />
Some MATLAB resources can be found [[MATLAB resources|here]].<br />
<br />
===Webcast Lecture Segments <i>(Opinionated Lessons in Statistics)</i>===<br />
All of the lectures are in the form of webcasts, divided into segments of about 15-30 minutes each (occasionally a bit longer). Each segment, has a wiki page, page links below. You can view the lecture on its wiki page, which also has additional stuff about the segment (including the <b>skill and thought problems</b>, or by clicking directly to YouTube, where they are all on Bill's [http://www.youtube.com/user/opinionatedlessons/videos?view=0&flow=list&sort=da "Opinionated Lessons" channel].<br />
<br />
<center><br />
{| class="wikitable"<br />
|+Watch segments BEFORE class on the indicated dates:<br />
|-<br />
|Mon Jan 13<br />
|<b>First Day of Class</b> (no segment due)<br />
|-<br />
|Wed Jan 15<br />
|[[Segment 1. Let's Talk about Probability]] (or [http://www.youtube.com/watch?v=H5WjVgL6Nh4 YouTube])<br />
|-<br />
|Fri Jan 17<br />
|[[Python Set-up Tutorial and Workshop]] (no segment due)<br />
|-<br />
|Mon Jan 20<br />
|<b>Martin Luther King Day HOLIDAY</b> (no segment due)<br />
|-<br />
|Wed Jan 22<br />
|[[Segment 2. Bayes]] (or [http://www.youtube.com/watch?v=FROAk4AFKHk YouTube])<br />
|-<br />
|Fri Jan 24<br />
|[[Segment 3. Monty Hall]] (or [http://www.youtube.com/watch?v=Rxb8JG8nUFA YouTube])<br />
|-<br />
|Mon Jan 27<br />
|[[Segment 4. The Jailer's Tip]] (or [http://www.youtube.com/watch?v=425D0CjLLLs YouTube])<br />
|-<br />
|Wed Jan 29<br />
|[[Segment 5. Bernoulli Trials]] (or [http://www.youtube.com/watch?v=2T3KP2LleFg YouTube])<br />
|-<br />
|Fri Jan 31<br />
|[[Segment 6. The Towne Family Tree]] (or [http://www.youtube.com/watch?v=y_L2THpv5Jg YouTube])<br />
|-<br />
|Mon Feb 3<br />
|[[Segment 7. Central Tendency and Moments]] (or [http://www.youtube.com/watch?v=ZWOmsKWQ7Fw YouTube])<br />
|-<br />
|Wed Feb 5<br />
|[[Segment 8. Some Standard Distributions]] (or [http://www.youtube.com/watch?v=EDYDC7iNGTg YouTube])<br />
|-<br />
|Fri Feb 7<br />
|[[Segment 9. Characteristic Functions]] (or [http://www.youtube.com/watch?v=NJL-BX6HuxY YouTube])<br />
|-<br />
|Mon Feb 10<br />
|[[Segment 10. The Central Limit Theorem]] (or [http://www.youtube.com/watch?v=IpuYGsKplSw YouTube])<br />
|-<br />
|Wed Feb 12<br />
|[[Segment 11. Random Deviates]] (or [http://www.youtube.com/watch?v=4r1GlyisB8E YouTube])<br />
|-<br />
|Fri Feb 14<br />
|[[Segment 12. P-Value Tests]] (or [http://www.youtube.com/watch?v=2Ul7TI0B5ek YouTube])<br />
|-<br />
|Mon Feb 17<br />
|[[Segment 13. The Yeast Genome]] (or [http://www.youtube.com/watch?v=QSgUX-Do8Tc YouTube])<br />
|-<br />
|Wed Feb 19<br />
|[[Segment 14. Bayesian Criticism of P-Values]] (or [http://www.youtube.com/watch?v=IKV6Pn18C7o YouTube])<br />
|-<br />
|Fri Feb 21<br />
|[[Segment 16. Multiple Hypotheses]] (or [http://www.youtube.com/watch?v=w6AjduOEN2k YouTube]) [note order!]<br />
|-<br />
|Mon Feb 24<br />
|[[Segment 15. The Towne Family - Again]] (or [http://www.youtube.com/watch?v=Y-i0CN15X-M YouTube]) [note order!]<br />
|-<br />
|Wed Feb 26<br />
|[[Segment 17. The Multivariate Normal Distribution]] (or [http://www.youtube.com/watch?v=t7Z1a_BOkN4 YouTube])<br />
|-<br />
|Fri Feb 28<br />
|[[Review Session for Mid-Term Exam]] (no new segment due)<br />
|-<br />
|}<br />
<br />
<b>Monday, March 3. MIDTERM EXAM</b><br />
<br />
[[Media:Midterm.pdf|(Exam)]]&nbsp;&nbsp;&nbsp;[[Media:MidtermSolutions2014.pdf|(Bill's solutions)]] &nbsp;&nbsp;&nbsp;[[Media:MidtermHistogram.pdf|(Histogram of grades)]]<br />
<br />
{| class="wikitable"<br />
|Wed Mar 5<br />
|[[Segment 18. The Correlation Matrix]] (or [http://www.youtube.com/watch?v=aW5q_P0it9E YouTube])<br />
|-<br />
|Fri Mar 7<br />
|[[Segment 19. The Chi Square Statistic]] (or [http://www.youtube.com/watch?v=87EMhmPkOhk YouTube])<br />
|-<br />
|}<br />
<br />
<b>Monday, March 10 through Friday, March 14: SPRING BREAK</b><br />
<br />
{| class="wikitable"<br />
|+Watch segments BEFORE class on the indicated dates:<br />
|Mon Mar 17<br />
|[[Segment 20. Nonlinear Least Squares Fitting]] (or [http://www.youtube.com/watch?v=xtBCGPHRcb0 YouTube])<br />
|-<br />
|Wed Mar 19<br />
|[[Segment 21. Marginalize or Condition Uninteresting Fitted Parameters]] (or [http://www.youtube.com/watch?v=yxZUS_BpEZk YouTube])<br />
|-<br />
|Fri Mar 21<br />
|[[Segment 22. Uncertainty of Derived Parameters]] (or [http://www.youtube.com/watch?v=ZoD3_rov--w YouTube])<br />
|-<br />
|Mon Mar 24<br />
|[[Segment 23. Bootstrap Estimation of Uncertainty]] (or [http://www.youtube.com/watch?v=1OC9ul-1PVg YouTube])<br />
|-<br />
|Wed Mar 26<br />
|[[Segment 24. Goodness of Fit]] (or [http://www.youtube.com/watch?v=EJleSVf0Z-U YouTube])<br />
|-<br />
|Fri Mar 28<br />
|[[Segment 27. Mixture Models]] (or [http://www.youtube.com/watch?v=9pWnZcpYh44 YouTube])<br />
|-<br />
|Mon Mar 31<br />
|[[Segment 28. Gaussian Mixture Models in 1-D]] (or [http://www.youtube.com/watch?v=n7u_tq0I6jM YouTube])<br />
|-<br />
|Wed Apr 2<br />
|[[Segment 29. GMMs in N-Dimensions]] (or [http://www.youtube.com/watch?v=PH8_qqDTCYY YouTube])<br />
|-<br />
|Fri Apr 4<br />
|[[Segment 30. Expectation Maximization (EM) Methods]] (or [http://www.youtube.com/watch?v=StQOzRqTNsw YouTube])<br />
|-<br />
|Mon Apr 7<br />
|[[Segment 31. A Tale of Model Selection]] (or [http://www.youtube.com/watch?v=_G1gzqQzbuM YouTube])<br />
|-<br />
|Wed Apr 9<br />
|[[Segment 32. Contingency Tables: A First Look]] (or [http://www.youtube.com/watch?v=NvCdN2RFufY YouTube])<br />
|-<br />
|Fri Apr 11<br />
|[[Segment 33. Contingency Table Protocols and Exact Fisher Test]] (or [http://www.youtube.com/watch?v=9Qrkw5UfAmQ You Tube])<br />
|-<br />
|Mon Apr 14<br />
|[[Segment 34. Permutation Tests]] (or [http://www.youtube.com/watch?v=_4BUS1NGNHA YouTube])<br />
|-<br />
|Wed Apr 16<br />
|[[Segment 37. A Few Bits of Information Theory]] (or [http://www.youtube.com/watch?v=ktzYOLDN3u4 YouTube])<br />
|-<br />
|Fri Apr 18<br />
|[[Segment 38. Mutual Information]] (or [http://www.youtube.com/watch?v=huNPh1mkJHM YouTube])<br />
|-<br />
|Mon Apr 21<br />
|[[Segment 39. MCMC and Gibbs Sampling ]] (or [http://www.youtube.com/watch?v=4gNpgSPal_8 YouTube])<br />
|-<br />
|Wed Apr 23<br />
|[[Segment 40. Markov Chain Monte Carlo, Example 1 ]] (or [http://www.youtube.com/watch?v=nSKZ02ZWzsY YouTube])<br />
|-<br />
|Fri Apr 25<br />
|[[Segment 41. Markov Chain Monte Carlo, Example 2 ]] (or [http://www.youtube.com/watch?v=FnNckBLWJ24 YouTube])<br />
|-<br />
|Mon Apr 28<br />
|[[Segment 47. Low-Rank Approximation of Data ]] (or [http://www.youtube.com/watch?v=M0gsHNS_5FE YouTube])<br><br />
|-<br />
|Wed Apr 30<br />
|[[Segment 48. Principal Component Analysis (PCA)]] (or [http://www.youtube.com/watch?v=frWqIUpIxLg YouTube])<br />
|-<br />
|Fri May 2<br />
| <b>Review Session for Oral Exams</b><br />
|}<br />
<br />
<b>Monday, May 5 and Tuesday, May 6: ORAL FINAL EXAMS</b><br />
<br />
</center><br />
<br />
===Extra Credit Segments (segment number indicates intended sequence)===<br />
[[Segment 25. Fitting Models to Counts]] (or [http://www.youtube.com/watch?v=YXaq2PVCGZQ YouTube])<br><br />
[[Segment 26. The Poisson Count Pitfall]] (or [http://www.youtube.com/watch?v=rPO3N5GI-3I YouTube])<br><br />
[[Segment 35. Ordinal vs. Nominal Contingency Tables]] (or [http://www.youtube.com/watch?v=fYUbj78aguk YouTube])<br><br />
[[Segment 36. Contingency Tables Have Nuisance Parameters]] (or [http://www.youtube.com/watch?v=bHK79WKOX-Y YouTube])<br><br />
[[Segment 49. Eigenthingies and Main Effects]] (or [http://www.youtube.com/watch?v=LpGQnvvGLMQ YouTube])<br><br />
<br />
===Segments with Slides But Not Yet Recorded===<br />
(links are to PDF files)<br />
<br />
[http://wpressutexas.net/coursefiles/15.5.PoissonProcessesOrderStatistics.pdf Segment 15.5. Poisson Processes and Order Statistics]<br><br />
[http://wpressutexas.net/coursefiles/42.WienerFiltering.pdf Segment 42. Wiener Filtering]<br><br />
[http://wpressutexas.net/coursefiles/43.TheIRELady.pdf Segment 43. The IRE Lady]<br><br />
[http://wpressutexas.net/coursefiles/44.Wavelets.pdf Segment 44. Wavelets]<br><br />
[http://wpressutexas.net/coursefiles/45.LaplaceInterpolation.pdf Segment 45. Laplace Interpolation]<br><br />
[http://wpressutexas.net/coursefiles/46.InterpolationOnScatteredData.pdf Segment 46. Interpolation On Scattered Data]<br><br />
[http://wpressutexas.net/coursefiles/50.BinaryClassifiers.pdf Segment 50. Binary Classifiers]<br><br />
[http://wpressutexas.net/coursefiles/51.HierarchicalClassification.pdf Segment 51. Hierarchical Classification]<br><br />
[http://wpressutexas.net/coursefiles/52.DynamicProgramming.pdf Segment 52. Dynamic Programming]<br><br />
<br />
===Team Randomizer===<br />
Link to [http://wpressutexas.net/coursefiles/teamrandomizer.php the team randomizer]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Main_Page&diff=3448Main Page2019-01-23T22:39:16Z<p>Bill Press: </p>
<hr />
<div>__NOTOC__<br />
=Statistical and Discrete Methods for Scientific Computing=<br />
<br />
The [[2014 Concepts Study Page]] lists all the questions that may be asked in the oral exam.<br />
<br />
You can generate your own practice exam at the [http://wpressutexas.net/examgenerator.php 2014 Practice Exam Machine].<br />
<br />
===CSE383M (65280) and CS395T (53715), Spring 2014===<br />
Welcome to the course! The instructor is Professor William Press (Bill), and the TA is Jeff Hussmann (Jeff). We meet Mondays and Wednesdays, 1:30 - 3:00 p.m. in CBA 4.344 with Bill, and Fridays, 1:30 - 3:00 p.m. in CBA 4.348 with Jeff. The course is aimed at first or second year graduate students, especially in the CSEM, CS, and ECE programs, but others are welcome. You'll need math at the level of <i>at least</i> 2nd year calculus, plus linear algebra, plus either more continuous math (e.g., CSEM students) or more discrete math (e.g., CS and ECE students). You'll also need to be able to program in some known computer language.<br />
<br />
===Mechanics of the Course===<br />
The last two years, we have tried the experiment of a "flipped" course. This has worked so well that we are doing this again this year. "Flipped" means that the lectures are all on the web as recorded webcasts. You <b>must</b> watch the assigned webcasts <b>before</b> the class for which they are scheduled; maybe watch them more than once if there are parts that you don't easily understand. Then, you will be ready for the active learning that we do in class. The class activities will <b>not</b> "cover the material". Rather, class is supposed to be for "aha moments" and for "fixing" the material in your learning memory. We'll thus do various kinds of "active learning" activities that will test and improve your understanding of the material in the lecture. Such in-class activities, often done in <i>randomized</i> groups of two or three, may include<br />
* group computer programming exercises<br />
* group working of problems<br />
* group writing assignments<br />
* discussing concepts (and communicating ideas back to the whole class)<br />
* "quiz show" style activities<br />
* short surprise quizzes (generally at the beginning of class -- no makeups allowed)<br />
* whatever else we all think of<br />
<br />
===Problems for Each Segment===<br />
Every lecture segment home page has one or two relatively easy "skill" problems. You should work these after watching the segment, before class. (You might be asked to discuss your solution with your small group in class.) Also on the segment's page are one or two concept thought problems. One or another of these will sometimes be the basis of in-class activities, so you might want to think about them before class.<br />
<br />
===Student Wiki Pages===<br />
Every student will have a wiki page (and as many linked pages as you want). You can post your solutions to as many problems as you wish to your wiki page. You can do this either before the relevant class or afterwards. You can also make up, and solve, additional problems. Problems won't be individually graded. However, at the end of the course, the completeness and quality of you wiki page(s) will be a part of your course grade. Your wiki page can include discussion of the thought problems, as well as the skill problems. <br />
<br />
You can also post signed comments on any other student's wiki pages. To the extent that these are generally helpful, they will add credit to your reputation and for your grade.<br />
<br />
[[Student Pages]]<br />
<br />
===Laptops or Tablets===<br />
You <b>must</b> bring your laptop computer or full-sized tablet to every class, so that you can (i) look things up during group discussions or problem sessions and (ii) do in-class programming exercises. You can program in any language you want. For Python, which we recommend as the best choice for this course, you can either install it on your machine, or else use the IPython notebook server described in class. The course will include several lectures of Python workshop by Jeff. <br />
<br />
If you instead want to use MATLAB or Mathematica, that is fine, but please be sure that it is installed on your computer before the first class. (The MATLAB Student Edition is a real bargain.) For C, C++, Java, etc., please be sure that you have a fully working environment for compiling and running small pieces of code.<br />
<br />
===Course Requirements and Grading===<br />
Grades will be based on these factors<br />
* in-class attendance and participation<br />
* an in-class midterm exam<br />
* completeness and quality of your individual wiki page(s)<br />
* relevance and usefulness of your comments on other people's wiki pages (or on the main wiki)<br />
* an individual 30-minute final oral exam<br />
<br />
In previous years there was a term project, but not this year. Your working the problems and posting solutions on your wiki page is this year's substitute.<br />
<br />
[[File:learning_cone.gif|200px|thumb|right|Click image to see a legible version.]]<br />
<br />
===What is Active Learning?===<br />
Much research shows that lecture courses, where students listen passively as the instructor talks, are inefficient ways to learn. What works is so-called [http://en.wikipedia.org/wiki/Active_learning active learning], a broad term that, for us, basically means that class time is too valuable to waste on lectures. (See image at right.)<br />
<br />
The lectures are all recorded as webcasts, but webcasts are not active learning. However, they are a starting point as a "linear" introduction to the material. <br />
<br />
===Feedback===<br />
<br />
What has worked well in class so far? What hasn't worked? How could things be improved? Please leave [[Feedback 2014|feedback]].<br />
<br />
===Resources and Links===<br />
<br />
There is no textbook for the course. A list of recommended supplementary books is [[Recommended books|here]].<br />
<br />
Some resources for learning Python can be found [[Python resources|here]].<br />
<br />
Some MATLAB resources can be found [[MATLAB resources|here]].<br />
<br />
===Webcast Lecture Segments <i>(Opinionated Lessons in Statistics)</i>===<br />
All of the lectures are in the form of webcasts, divided into segments of about 15-30 minutes each (occasionally a bit longer). Each segment, has a wiki page, page links below. You can view the lecture on its wiki page, which also has additional stuff about the segment (including the <b>skill and thought problems</b>, or by clicking directly to YouTube, where they are all on Bill's [http://www.youtube.com/user/opinionatedlessons/videos?view=0&flow=list&sort=da "Opinionated Lessons" channel].<br />
<br />
<center><br />
{| class="wikitable"<br />
|+Watch segments BEFORE class on the indicated dates:<br />
|-<br />
|Mon Jan 13<br />
|<b>First Day of Class</b> (no segment due)<br />
|-<br />
|Wed Jan 15<br />
|[[Segment 1. Let's Talk about Probability]] (or [http://www.youtube.com/watch?v=H5WjVgL6Nh4 YouTube])<br />
|-<br />
|Fri Jan 17<br />
|[[Python Set-up Tutorial and Workshop]] (no segment due)<br />
|-<br />
|Mon Jan 20<br />
|<b>Martin Luther King Day HOLIDAY</b> (no segment due)<br />
|-<br />
|Wed Jan 22<br />
|[[Segment 2. Bayes]] (or [http://www.youtube.com/watch?v=FROAk4AFKHk YouTube])<br />
|-<br />
|Fri Jan 24<br />
|[[Segment 3. Monty Hall]] (or [http://www.youtube.com/watch?v=Rxb8JG8nUFA YouTube])<br />
|-<br />
|Mon Jan 27<br />
|[[Segment 4. The Jailer's Tip]] (or [http://www.youtube.com/watch?v=425D0CjLLLs YouTube])<br />
|-<br />
|Wed Jan 29<br />
|[[Segment 5. Bernoulli Trials]] (or [http://www.youtube.com/watch?v=2T3KP2LleFg YouTube])<br />
|-<br />
|Fri Jan 31<br />
|[[Segment 6. The Towne Family Tree]] (or [http://www.youtube.com/watch?v=y_L2THpv5Jg YouTube])<br />
|-<br />
|Mon Feb 3<br />
|[[Segment 7. Central Tendency and Moments]] (or [http://www.youtube.com/watch?v=ZWOmsKWQ7Fw YouTube])<br />
|-<br />
|Wed Feb 5<br />
|[[Segment 8. Some Standard Distributions]] (or [http://www.youtube.com/watch?v=EDYDC7iNGTg YouTube])<br />
|-<br />
|Fri Feb 7<br />
|[[Segment 9. Characteristic Functions]] (or [http://www.youtube.com/watch?v=NJL-BX6HuxY YouTube])<br />
|-<br />
|Mon Feb 10<br />
|[[Segment 10. The Central Limit Theorem]] (or [http://www.youtube.com/watch?v=IpuYGsKplSw YouTube])<br />
|-<br />
|Wed Feb 12<br />
|[[Segment 11. Random Deviates]] (or [http://www.youtube.com/watch?v=4r1GlyisB8E YouTube])<br />
|-<br />
|Fri Feb 14<br />
|[[Segment 12. P-Value Tests]] (or [http://www.youtube.com/watch?v=2Ul7TI0B5ek YouTube])<br />
|-<br />
|Mon Feb 17<br />
|[[Segment 13. The Yeast Genome]] (or [http://www.youtube.com/watch?v=QSgUX-Do8Tc YouTube])<br />
|-<br />
|Wed Feb 19<br />
|[[Segment 14. Bayesian Criticism of P-Values]] (or [http://www.youtube.com/watch?v=IKV6Pn18C7o YouTube])<br />
|-<br />
|Fri Feb 21<br />
|[[Segment 16. Multiple Hypotheses]] (or [http://www.youtube.com/watch?v=w6AjduOEN2k YouTube]) [note order!]<br />
|-<br />
|Mon Feb 24<br />
|[[Segment 15. The Towne Family - Again]] (or [http://www.youtube.com/watch?v=Y-i0CN15X-M YouTube]) [note order!]<br />
|-<br />
|Wed Feb 26<br />
|[[Segment 17. The Multivariate Normal Distribution]] (or [http://www.youtube.com/watch?v=t7Z1a_BOkN4 YouTube])<br />
|-<br />
|Fri Feb 28<br />
|[[Review Session for Mid-Term Exam]] (no new segment due)<br />
|-<br />
|}<br />
<br />
<b>Monday, March 3. MIDTERM EXAM</b><br />
<br />
[[Media:Midterm.pdf|(Exam)]]&nbsp;&nbsp;&nbsp;[[Media:MidtermSolutions2014.pdf|(Bill's solutions)]] &nbsp;&nbsp;&nbsp;[[Media:MidtermHistogram.pdf|(Histogram of grades)]]<br />
<br />
{| class="wikitable"<br />
|Wed Mar 5<br />
|[[Segment 18. The Correlation Matrix]] (or [http://www.youtube.com/watch?v=aW5q_P0it9E YouTube])<br />
|-<br />
|Fri Mar 7<br />
|[[Segment 19. The Chi Square Statistic]] (or [http://www.youtube.com/watch?v=87EMhmPkOhk YouTube])<br />
|-<br />
|}<br />
<br />
<b>Monday, March 10 through Friday, March 14: SPRING BREAK</b><br />
<br />
{| class="wikitable"<br />
|+Watch segments BEFORE class on the indicated dates:<br />
|Mon Mar 17<br />
|[[Segment 20. Nonlinear Least Squares Fitting]] (or [http://www.youtube.com/watch?v=xtBCGPHRcb0 YouTube])<br />
|-<br />
|Wed Mar 19<br />
|[[Segment 21. Marginalize or Condition Uninteresting Fitted Parameters]] (or [http://www.youtube.com/watch?v=yxZUS_BpEZk YouTube])<br />
|-<br />
|Fri Mar 21<br />
|[[Segment 22. Uncertainty of Derived Parameters]] (or [http://www.youtube.com/watch?v=ZoD3_rov--w YouTube])<br />
|-<br />
|Mon Mar 24<br />
|[[Segment 23. Bootstrap Estimation of Uncertainty]] (or [http://www.youtube.com/watch?v=1OC9ul-1PVg YouTube])<br />
|-<br />
|Wed Mar 26<br />
|[[Segment 24. Goodness of Fit]] (or [http://www.youtube.com/watch?v=EJleSVf0Z-U YouTube])<br />
|-<br />
|Fri Mar 28<br />
|[[Segment 27. Mixture Models]] (or [http://www.youtube.com/watch?v=9pWnZcpYh44 YouTube])<br />
|-<br />
|Mon Mar 31<br />
|[[Segment 28. Gaussian Mixture Models in 1-D]] (or [http://www.youtube.com/watch?v=n7u_tq0I6jM YouTube])<br />
|-<br />
|Wed Apr 2<br />
|[[Segment 29. GMMs in N-Dimensions]] (or [http://www.youtube.com/watch?v=PH8_qqDTCYY YouTube])<br />
|-<br />
|Fri Apr 4<br />
|[[Segment 30. Expectation Maximization (EM) Methods]] (or [http://www.youtube.com/watch?v=StQOzRqTNsw YouTube])<br />
|-<br />
|Mon Apr 7<br />
|[[Segment 31. A Tale of Model Selection]] (or [http://www.youtube.com/watch?v=_G1gzqQzbuM YouTube])<br />
|-<br />
|Wed Apr 9<br />
|[[Segment 32. Contingency Tables: A First Look]] (or [http://www.youtube.com/watch?v=NvCdN2RFufY YouTube])<br />
|-<br />
|Fri Apr 11<br />
|[[Segment 33. Contingency Table Protocols and Exact Fisher Test]] (or [http://www.youtube.com/watch?v=9Qrkw5UfAmQ You Tube])<br />
|-<br />
|Mon Apr 14<br />
|[[Segment 34. Permutation Tests]] (or [http://www.youtube.com/watch?v=_4BUS1NGNHA YouTube])<br />
|-<br />
|Wed Apr 16<br />
|[[Segment 37. A Few Bits of Information Theory]] (or [http://www.youtube.com/watch?v=ktzYOLDN3u4 YouTube])<br />
|-<br />
|Fri Apr 18<br />
|[[Segment 38. Mutual Information]] (or [http://www.youtube.com/watch?v=huNPh1mkJHM YouTube])<br />
|-<br />
|Mon Apr 21<br />
|[[Segment 39. MCMC and Gibbs Sampling ]] (or [http://www.youtube.com/watch?v=4gNpgSPal_8 YouTube])<br />
|-<br />
|Wed Apr 23<br />
|[[Segment 40. Markov Chain Monte Carlo, Example 1 ]] (or [http://www.youtube.com/watch?v=nSKZ02ZWzsY YouTube])<br />
|-<br />
|Fri Apr 25<br />
|[[Segment 41. Markov Chain Monte Carlo, Example 2 ]] (or [http://www.youtube.com/watch?v=FnNckBLWJ24 YouTube])<br />
|-<br />
|Mon Apr 28<br />
|[[Segment 47. Low-Rank Approximation of Data ]] (or [http://www.youtube.com/watch?v=M0gsHNS_5FE YouTube])<br><br />
|-<br />
|Wed Apr 30<br />
|[[Segment 48. Principal Component Analysis (PCA)]] (or [http://www.youtube.com/watch?v=frWqIUpIxLg YouTube])<br />
|-<br />
|Fri May 2<br />
| <b>Review Session for Oral Exams</b><br />
|}<br />
<br />
<b>Monday, May 5 and Tuesday, May 6: ORAL FINAL EXAMS</b><br />
<br />
</center><br />
<br />
===Extra Credit Segments (segment number indicates intended sequence)===<br />
[[Segment 25. Fitting Models to Counts]] (or [http://www.youtube.com/watch?v=YXaq2PVCGZQ YouTube])<br><br />
[[Segment 26. The Poisson Count Pitfall]] (or [http://www.youtube.com/watch?v=rPO3N5GI-3I YouTube])<br><br />
[[Segment 35. Ordinal vs. Nominal Contingency Tables]] (or [http://www.youtube.com/watch?v=fYUbj78aguk YouTube])<br><br />
[[Segment 36. Contingency Tables Have Nuisance Parameters]] (or [http://www.youtube.com/watch?v=bHK79WKOX-Y YouTube])<br><br />
[[Segment 49. Eigenthingies and Main Effects]] (or [http://www.youtube.com/watch?v=LpGQnvvGLMQ YouTube])<br><br />
<br />
===Segments with Slides But Not Yet Recorded===<br />
(links are to PDF files)<br />
<br />
[http://wpressutexas.net/coursefiles/15.5.PoissonProcessesOrderStatistics.pdf Segment 15.5. Poisson Processes and Order Statistics]<br><br />
[http://wpressutexas.net/coursefiles/42.WienerFiltering.pdf Segment 42. Wiener Filtering]<br><br />
[http://wpressutexas.net/coursefiles/43.TheIRELady.pdf Segment 43. The IRE Lady]<br><br />
[http://wpressutexas.net/coursefiles/44.Wavelets.pdf Segment 44. Wavelets]<br><br />
[http://wpressutexas.net/coursefiles/45.LaplaceInterpolation.pdf Segment 45. Laplace Interpolation]<br><br />
[http://wpressutexas.net/coursefiles/46.InterpolationOnScatteredData.pdf Segment 46. Interpolation On Scattered Data]<br><br />
[http://wpressutexas.net/coursefiles/50.BinaryClassifiers.pdf Segment 50. Binary Classifiers]<br><br />
[http://wpressutexas.net/coursefiles/51.HierarchicalClassification.pdf Segment 51. Hierarchical Classification]<br><br />
[http://wpressutexas.net/coursefiles/52.DynamicProgramming.pdf Segment 52. Dynamic Programming]<br><br />
<br />
===Team Randomizer===<br />
Link to [http://wpressutexas.net/coursefiles/teamrandomizer.php the team randomizer]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Main_Page&diff=3447Main Page2019-01-23T22:38:53Z<p>Bill Press: </p>
<hr />
<div>__NOTOC__<br />
=Statistical and Discrete Methods for Scientific Computing=<br />
<br />
The [[2014 Concepts Study Page]] lists all the questions that may be asked in the oral exam.<br />
<br />
NEW: Post questions [[Exam Study Session Questions (2014)|here]] before the Friday review session.<br />
<br />
You can generate your own practice exam at the [http://wpressutexas.net/examgenerator.php 2014 Practice Exam Machine].<br />
<br />
===CSE383M (65280) and CS395T (53715), Spring 2014===<br />
Welcome to the course! The instructor is Professor William Press (Bill), and the TA is Jeff Hussmann (Jeff). We meet Mondays and Wednesdays, 1:30 - 3:00 p.m. in CBA 4.344 with Bill, and Fridays, 1:30 - 3:00 p.m. in CBA 4.348 with Jeff. The course is aimed at first or second year graduate students, especially in the CSEM, CS, and ECE programs, but others are welcome. You'll need math at the level of <i>at least</i> 2nd year calculus, plus linear algebra, plus either more continuous math (e.g., CSEM students) or more discrete math (e.g., CS and ECE students). You'll also need to be able to program in some known computer language.<br />
<br />
===Mechanics of the Course===<br />
The last two years, we have tried the experiment of a "flipped" course. This has worked so well that we are doing this again this year. "Flipped" means that the lectures are all on the web as recorded webcasts. You <b>must</b> watch the assigned webcasts <b>before</b> the class for which they are scheduled; maybe watch them more than once if there are parts that you don't easily understand. Then, you will be ready for the active learning that we do in class. The class activities will <b>not</b> "cover the material". Rather, class is supposed to be for "aha moments" and for "fixing" the material in your learning memory. We'll thus do various kinds of "active learning" activities that will test and improve your understanding of the material in the lecture. Such in-class activities, often done in <i>randomized</i> groups of two or three, may include<br />
* group computer programming exercises<br />
* group working of problems<br />
* group writing assignments<br />
* discussing concepts (and communicating ideas back to the whole class)<br />
* "quiz show" style activities<br />
* short surprise quizzes (generally at the beginning of class -- no makeups allowed)<br />
* whatever else we all think of<br />
<br />
===Problems for Each Segment===<br />
Every lecture segment home page has one or two relatively easy "skill" problems. You should work these after watching the segment, before class. (You might be asked to discuss your solution with your small group in class.) Also on the segment's page are one or two concept thought problems. One or another of these will sometimes be the basis of in-class activities, so you might want to think about them before class.<br />
<br />
===Student Wiki Pages===<br />
Every student will have a wiki page (and as many linked pages as you want). You can post your solutions to as many problems as you wish to your wiki page. You can do this either before the relevant class or afterwards. You can also make up, and solve, additional problems. Problems won't be individually graded. However, at the end of the course, the completeness and quality of you wiki page(s) will be a part of your course grade. Your wiki page can include discussion of the thought problems, as well as the skill problems. <br />
<br />
You can also post signed comments on any other student's wiki pages. To the extent that these are generally helpful, they will add credit to your reputation and for your grade.<br />
<br />
[[Student Pages]]<br />
<br />
===Laptops or Tablets===<br />
You <b>must</b> bring your laptop computer or full-sized tablet to every class, so that you can (i) look things up during group discussions or problem sessions and (ii) do in-class programming exercises. You can program in any language you want. For Python, which we recommend as the best choice for this course, you can either install it on your machine, or else use the IPython notebook server described in class. The course will include several lectures of Python workshop by Jeff. <br />
<br />
If you instead want to use MATLAB or Mathematica, that is fine, but please be sure that it is installed on your computer before the first class. (The MATLAB Student Edition is a real bargain.) For C, C++, Java, etc., please be sure that you have a fully working environment for compiling and running small pieces of code.<br />
<br />
===Course Requirements and Grading===<br />
Grades will be based on these factors<br />
* in-class attendance and participation<br />
* an in-class midterm exam<br />
* completeness and quality of your individual wiki page(s)<br />
* relevance and usefulness of your comments on other people's wiki pages (or on the main wiki)<br />
* an individual 30-minute final oral exam<br />
<br />
In previous years there was a term project, but not this year. Your working the problems and posting solutions on your wiki page is this year's substitute.<br />
<br />
[[File:learning_cone.gif|200px|thumb|right|Click image to see a legible version.]]<br />
<br />
===What is Active Learning?===<br />
Much research shows that lecture courses, where students listen passively as the instructor talks, are inefficient ways to learn. What works is so-called [http://en.wikipedia.org/wiki/Active_learning active learning], a broad term that, for us, basically means that class time is too valuable to waste on lectures. (See image at right.)<br />
<br />
The lectures are all recorded as webcasts, but webcasts are not active learning. However, they are a starting point as a "linear" introduction to the material. <br />
<br />
===Feedback===<br />
<br />
What has worked well in class so far? What hasn't worked? How could things be improved? Please leave [[Feedback 2014|feedback]].<br />
<br />
===Resources and Links===<br />
<br />
There is no textbook for the course. A list of recommended supplementary books is [[Recommended books|here]].<br />
<br />
Some resources for learning Python can be found [[Python resources|here]].<br />
<br />
Some MATLAB resources can be found [[MATLAB resources|here]].<br />
<br />
===Webcast Lecture Segments <i>(Opinionated Lessons in Statistics)</i>===<br />
All of the lectures are in the form of webcasts, divided into segments of about 15-30 minutes each (occasionally a bit longer). Each segment, has a wiki page, page links below. You can view the lecture on its wiki page, which also has additional stuff about the segment (including the <b>skill and thought problems</b>, or by clicking directly to YouTube, where they are all on Bill's [http://www.youtube.com/user/opinionatedlessons/videos?view=0&flow=list&sort=da "Opinionated Lessons" channel].<br />
<br />
<center><br />
{| class="wikitable"<br />
|+Watch segments BEFORE class on the indicated dates:<br />
|-<br />
|Mon Jan 13<br />
|<b>First Day of Class</b> (no segment due)<br />
|-<br />
|Wed Jan 15<br />
|[[Segment 1. Let's Talk about Probability]] (or [http://www.youtube.com/watch?v=H5WjVgL6Nh4 YouTube])<br />
|-<br />
|Fri Jan 17<br />
|[[Python Set-up Tutorial and Workshop]] (no segment due)<br />
|-<br />
|Mon Jan 20<br />
|<b>Martin Luther King Day HOLIDAY</b> (no segment due)<br />
|-<br />
|Wed Jan 22<br />
|[[Segment 2. Bayes]] (or [http://www.youtube.com/watch?v=FROAk4AFKHk YouTube])<br />
|-<br />
|Fri Jan 24<br />
|[[Segment 3. Monty Hall]] (or [http://www.youtube.com/watch?v=Rxb8JG8nUFA YouTube])<br />
|-<br />
|Mon Jan 27<br />
|[[Segment 4. The Jailer's Tip]] (or [http://www.youtube.com/watch?v=425D0CjLLLs YouTube])<br />
|-<br />
|Wed Jan 29<br />
|[[Segment 5. Bernoulli Trials]] (or [http://www.youtube.com/watch?v=2T3KP2LleFg YouTube])<br />
|-<br />
|Fri Jan 31<br />
|[[Segment 6. The Towne Family Tree]] (or [http://www.youtube.com/watch?v=y_L2THpv5Jg YouTube])<br />
|-<br />
|Mon Feb 3<br />
|[[Segment 7. Central Tendency and Moments]] (or [http://www.youtube.com/watch?v=ZWOmsKWQ7Fw YouTube])<br />
|-<br />
|Wed Feb 5<br />
|[[Segment 8. Some Standard Distributions]] (or [http://www.youtube.com/watch?v=EDYDC7iNGTg YouTube])<br />
|-<br />
|Fri Feb 7<br />
|[[Segment 9. Characteristic Functions]] (or [http://www.youtube.com/watch?v=NJL-BX6HuxY YouTube])<br />
|-<br />
|Mon Feb 10<br />
|[[Segment 10. The Central Limit Theorem]] (or [http://www.youtube.com/watch?v=IpuYGsKplSw YouTube])<br />
|-<br />
|Wed Feb 12<br />
|[[Segment 11. Random Deviates]] (or [http://www.youtube.com/watch?v=4r1GlyisB8E YouTube])<br />
|-<br />
|Fri Feb 14<br />
|[[Segment 12. P-Value Tests]] (or [http://www.youtube.com/watch?v=2Ul7TI0B5ek YouTube])<br />
|-<br />
|Mon Feb 17<br />
|[[Segment 13. The Yeast Genome]] (or [http://www.youtube.com/watch?v=QSgUX-Do8Tc YouTube])<br />
|-<br />
|Wed Feb 19<br />
|[[Segment 14. Bayesian Criticism of P-Values]] (or [http://www.youtube.com/watch?v=IKV6Pn18C7o YouTube])<br />
|-<br />
|Fri Feb 21<br />
|[[Segment 16. Multiple Hypotheses]] (or [http://www.youtube.com/watch?v=w6AjduOEN2k YouTube]) [note order!]<br />
|-<br />
|Mon Feb 24<br />
|[[Segment 15. The Towne Family - Again]] (or [http://www.youtube.com/watch?v=Y-i0CN15X-M YouTube]) [note order!]<br />
|-<br />
|Wed Feb 26<br />
|[[Segment 17. The Multivariate Normal Distribution]] (or [http://www.youtube.com/watch?v=t7Z1a_BOkN4 YouTube])<br />
|-<br />
|Fri Feb 28<br />
|[[Review Session for Mid-Term Exam]] (no new segment due)<br />
|-<br />
|}<br />
<br />
<b>Monday, March 3. MIDTERM EXAM</b><br />
<br />
[[Media:Midterm.pdf|(Exam)]]&nbsp;&nbsp;&nbsp;[[Media:MidtermSolutions2014.pdf|(Bill's solutions)]] &nbsp;&nbsp;&nbsp;[[Media:MidtermHistogram.pdf|(Histogram of grades)]]<br />
<br />
{| class="wikitable"<br />
|Wed Mar 5<br />
|[[Segment 18. The Correlation Matrix]] (or [http://www.youtube.com/watch?v=aW5q_P0it9E YouTube])<br />
|-<br />
|Fri Mar 7<br />
|[[Segment 19. The Chi Square Statistic]] (or [http://www.youtube.com/watch?v=87EMhmPkOhk YouTube])<br />
|-<br />
|}<br />
<br />
<b>Monday, March 10 through Friday, March 14: SPRING BREAK</b><br />
<br />
{| class="wikitable"<br />
|+Watch segments BEFORE class on the indicated dates:<br />
|Mon Mar 17<br />
|[[Segment 20. Nonlinear Least Squares Fitting]] (or [http://www.youtube.com/watch?v=xtBCGPHRcb0 YouTube])<br />
|-<br />
|Wed Mar 19<br />
|[[Segment 21. Marginalize or Condition Uninteresting Fitted Parameters]] (or [http://www.youtube.com/watch?v=yxZUS_BpEZk YouTube])<br />
|-<br />
|Fri Mar 21<br />
|[[Segment 22. Uncertainty of Derived Parameters]] (or [http://www.youtube.com/watch?v=ZoD3_rov--w YouTube])<br />
|-<br />
|Mon Mar 24<br />
|[[Segment 23. Bootstrap Estimation of Uncertainty]] (or [http://www.youtube.com/watch?v=1OC9ul-1PVg YouTube])<br />
|-<br />
|Wed Mar 26<br />
|[[Segment 24. Goodness of Fit]] (or [http://www.youtube.com/watch?v=EJleSVf0Z-U YouTube])<br />
|-<br />
|Fri Mar 28<br />
|[[Segment 27. Mixture Models]] (or [http://www.youtube.com/watch?v=9pWnZcpYh44 YouTube])<br />
|-<br />
|Mon Mar 31<br />
|[[Segment 28. Gaussian Mixture Models in 1-D]] (or [http://www.youtube.com/watch?v=n7u_tq0I6jM YouTube])<br />
|-<br />
|Wed Apr 2<br />
|[[Segment 29. GMMs in N-Dimensions]] (or [http://www.youtube.com/watch?v=PH8_qqDTCYY YouTube])<br />
|-<br />
|Fri Apr 4<br />
|[[Segment 30. Expectation Maximization (EM) Methods]] (or [http://www.youtube.com/watch?v=StQOzRqTNsw YouTube])<br />
|-<br />
|Mon Apr 7<br />
|[[Segment 31. A Tale of Model Selection]] (or [http://www.youtube.com/watch?v=_G1gzqQzbuM YouTube])<br />
|-<br />
|Wed Apr 9<br />
|[[Segment 32. Contingency Tables: A First Look]] (or [http://www.youtube.com/watch?v=NvCdN2RFufY YouTube])<br />
|-<br />
|Fri Apr 11<br />
|[[Segment 33. Contingency Table Protocols and Exact Fisher Test]] (or [http://www.youtube.com/watch?v=9Qrkw5UfAmQ You Tube])<br />
|-<br />
|Mon Apr 14<br />
|[[Segment 34. Permutation Tests]] (or [http://www.youtube.com/watch?v=_4BUS1NGNHA YouTube])<br />
|-<br />
|Wed Apr 16<br />
|[[Segment 37. A Few Bits of Information Theory]] (or [http://www.youtube.com/watch?v=ktzYOLDN3u4 YouTube])<br />
|-<br />
|Fri Apr 18<br />
|[[Segment 38. Mutual Information]] (or [http://www.youtube.com/watch?v=huNPh1mkJHM YouTube])<br />
|-<br />
|Mon Apr 21<br />
|[[Segment 39. MCMC and Gibbs Sampling ]] (or [http://www.youtube.com/watch?v=4gNpgSPal_8 YouTube])<br />
|-<br />
|Wed Apr 23<br />
|[[Segment 40. Markov Chain Monte Carlo, Example 1 ]] (or [http://www.youtube.com/watch?v=nSKZ02ZWzsY YouTube])<br />
|-<br />
|Fri Apr 25<br />
|[[Segment 41. Markov Chain Monte Carlo, Example 2 ]] (or [http://www.youtube.com/watch?v=FnNckBLWJ24 YouTube])<br />
|-<br />
|Mon Apr 28<br />
|[[Segment 47. Low-Rank Approximation of Data ]] (or [http://www.youtube.com/watch?v=M0gsHNS_5FE YouTube])<br><br />
|-<br />
|Wed Apr 30<br />
|[[Segment 48. Principal Component Analysis (PCA)]] (or [http://www.youtube.com/watch?v=frWqIUpIxLg YouTube])<br />
|-<br />
|Fri May 2<br />
| <b>Review Session for Oral Exams</b><br />
|}<br />
<br />
<b>Monday, May 5 and Tuesday, May 6: ORAL FINAL EXAMS</b><br />
<br />
</center><br />
<br />
===Extra Credit Segments (segment number indicates intended sequence)===<br />
[[Segment 25. Fitting Models to Counts]] (or [http://www.youtube.com/watch?v=YXaq2PVCGZQ YouTube])<br><br />
[[Segment 26. The Poisson Count Pitfall]] (or [http://www.youtube.com/watch?v=rPO3N5GI-3I YouTube])<br><br />
[[Segment 35. Ordinal vs. Nominal Contingency Tables]] (or [http://www.youtube.com/watch?v=fYUbj78aguk YouTube])<br><br />
[[Segment 36. Contingency Tables Have Nuisance Parameters]] (or [http://www.youtube.com/watch?v=bHK79WKOX-Y YouTube])<br><br />
[[Segment 49. Eigenthingies and Main Effects]] (or [http://www.youtube.com/watch?v=LpGQnvvGLMQ YouTube])<br><br />
<br />
===Segments with Slides But Not Yet Recorded===<br />
(links are to PDF files)<br />
<br />
[http://wpressutexas.net/coursefiles/15.5.PoissonProcessesOrderStatistics.pdf Segment 15.5. Poisson Processes and Order Statistics]<br><br />
[http://wpressutexas.net/coursefiles/42.WienerFiltering.pdf Segment 42. Wiener Filtering]<br><br />
[http://wpressutexas.net/coursefiles/43.TheIRELady.pdf Segment 43. The IRE Lady]<br><br />
[http://wpressutexas.net/coursefiles/44.Wavelets.pdf Segment 44. Wavelets]<br><br />
[http://wpressutexas.net/coursefiles/45.LaplaceInterpolation.pdf Segment 45. Laplace Interpolation]<br><br />
[http://wpressutexas.net/coursefiles/46.InterpolationOnScatteredData.pdf Segment 46. Interpolation On Scattered Data]<br><br />
[http://wpressutexas.net/coursefiles/50.BinaryClassifiers.pdf Segment 50. Binary Classifiers]<br><br />
[http://wpressutexas.net/coursefiles/51.HierarchicalClassification.pdf Segment 51. Hierarchical Classification]<br><br />
[http://wpressutexas.net/coursefiles/52.DynamicProgramming.pdf Segment 52. Dynamic Programming]<br><br />
<br />
===Team Randomizer===<br />
Link to [http://wpressutexas.net/coursefiles/teamrandomizer.php the team randomizer]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=MATLAB_resources&diff=3444MATLAB resources2016-04-22T19:06:16Z<p>Bill Press: URL fix</p>
<hr />
<div>Here are some resources to help you come up to speed in MATLAB.<br />
<br />
===Tutorials===<br />
<br />
A [http://numrec.com/CS395T/MatlabPrimerForCS395T.pdf primer written a few years ago for this course] is a good place to start. This begins with simple concepts, then moves on to some more advanced stuff that I'll use in lectures. However, it is by no means complete.<br />
<br />
MathWorks' own [http://www.mathworks.com/help/matlab/index.html?/access/helpdesk/help/techdoc/learn_matlab/bqr_2pl.html= getting started documentation] is also not a bad place to start. It has some videos, if you like that kind of thing.<br />
<br />
The University of Cambridge Engineering Department has a good [http://www-h.eng.cam.ac.uk/help/tpl/programs/matlab.html list of tutorial links]. Especially nice, once you understand the basics, is their own [http://www-h.eng.cam.ac.uk/help/tpl/programs/Matlab/tricks.html tutorial on vectorization tricks].<br />
<br />
The acknowledged world expert on MATLAB vectorization tricks is Peter J. Acklam of the University of Oslo. Links are on his [http://home.online.no/~pjacklam/matlab/doc/mtt/index.html MATLAB page]. His 30-page [http://home.online.no/~pjacklam/matlab/doc/mtt/doc/archive/2000-05-05/mtt.pdf.gz 2000 tutorial] is quick to read, while his 63-page [http://numrec.com/CS395T/AcklamMatlabTricks.pdf 2003 version] is more complete.<br />
<br />
If you are using MATLAB on a Windows machine, then "live" notebooks within Microsoft Word ("M-books") are pretty cool. The above-listed "primer written for this course" was done as an M-book, then printed to a PDF file.<br />
<br />
Finally, you might want to interface MATLAB to Numerical Recipes (or any other C++ programs), both for versatility and (sometimes) hugely increased speed. A [http://www.numrec.com/nr3_matlab.html complete tutorial on this] is on the NR web site.<br />
<br />
===Cheat Sheets===<br />
<br />
Here are a couple of Matlab cheatsheets, useful for looking up the basic commands for someone new to the language:<br />
<br />
[http://www.karenkopecky.net/Teaching/eco613614/Matlab%20Resources/MatlabCheatSheet.pdf Basic commands cheat sheet]<br />
<br />
For people familiar with either MATLAB or Python and interested in learning the other, here are some lists of equivalent commands between the two: <br />
<br />
[http://mathesaurus.sourceforge.net/matlab-numpy.html Matlab/Octave and Python]<br />
<br />
[http://wiki.scipy.org/NumPy_for_Matlab_Users NumPy for Matlab Users]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Test_1&diff=3443Test 12016-04-22T19:05:42Z<p>Bill Press: URL fix</p>
<hr />
<div>Here are some resources to help you come up to speed in MATLAB.<br />
<br />
===Tutorials===<br />
<br />
A [http://numrec.com/CS395T/MatlabPrimerForCS395T.pdf primer written a few years ago for this course] is a good place to start. This begins with simple concepts, then moves on to some more advanced stuff that I'll use in lectures. However, it is by no means complete.<br />
<br />
MathWorks' own [http://www.mathworks.com/help/matlab/index.html?/access/helpdesk/help/techdoc/learn_matlab/bqr_2pl.html= getting started documentation] is also not a bad place to start. It has some videos, if you like that kind of thing.<br />
<br />
The University of Cambridge Engineering Department has a good [http://www-h.eng.cam.ac.uk/help/tpl/programs/matlab.html list of tutorial links]. Especially nice, once you understand the basics, is their own [http://www-h.eng.cam.ac.uk/help/tpl/programs/Matlab/tricks.html tutorial on vectorization tricks].<br />
<br />
The acknowledged world expert on MATLAB vectorization tricks is Peter J. Acklam of the University of Oslo. Links are on his [http://home.online.no/~pjacklam/matlab/doc/mtt/index.html MATLAB page]. His 30-page [http://home.online.no/~pjacklam/matlab/doc/mtt/doc/archive/2000-05-05/mtt.pdf.gz 2000 tutorial] is quick to read, while his 63-page [http://numrec.com/CS395T/AcklamMatlabTricks.pdf 2003 version] is more complete.<br />
<br />
If you are using MATLAB on a Windows machine, then "live" notebooks within Microsoft Word ("M-books") are pretty cool. The above-listed "primer written for this course" was done as an M-book, then printed to a PDF file.<br />
<br />
Finally, you might want to interface MATLAB to Numerical Recipes (or any other C++ programs), both for versatility and (sometimes) hugely increased speed. A [http://www.numrec.com/nr3_matlab.html complete tutorial on this] is on the NR web site.<br />
<br />
===Cheat Sheets===<br />
<br />
Here are a couple of Matlab cheatsheets, useful for looking up the basic commands for someone new to the language:<br />
<br />
[http://www.karenkopecky.net/Teaching/eco613614/Matlab%20Resources/MatlabCheatSheet.pdf Basic commands cheat sheet]<br />
<br />
For people familiar with either MATLAB or Python and interested in learning the other, here are some lists of equivalent commands between the two: <br />
<br />
[http://mathesaurus.sourceforge.net/matlab-numpy.html Matlab/Octave and Python]<br />
<br />
[http://wiki.scipy.org/NumPy_for_Matlab_Users NumPy for Matlab Users]Extra 1</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_13._The_Yeast_Genome&diff=3442Segment 13. The Yeast Genome2016-04-22T19:01:49Z<p>Bill Press: URL fix</p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/QSgUX-Do8Tc&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/QSgUX-Do8Tc http://youtu.be/QSgUX-Do8Tc]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/13.TheYeastGenome.pdf PDF file] or [http://wpressutexas.net/coursefiles/13.TheYeastGenome.ppt PowerPoint file]<br />
<br />
Link to the file mentioned in the segment: [http://wpressutexas.net/coursefiles/SacCerChr4.txt.zip SacSerChr4.txt]<br />
<br />
Link to all yeast chromosomes: [http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/chromosomes/ UCSC]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. With p=0.3, and various values of n, how big is the largest discrepancy between the Binomial probability pdf and the approximating Normal pdf? At what value of n does this value become smaller than <math>10^{-15}</math>?<br />
<br />
2. Show that if four random variables are (together) multinomially distributed, each separately is binomially distributed.<br />
<br />
====To Think About====<br />
1. The segment suggests that <math>A\ne T</math> and <math>C\ne G</math> comes about because genes are randomly distributed on one strand or the other. Could you use the observed discrepancies to estimate, even roughly, the number of genes in the yeast genome? If so, how? If not, why not?<br />
<br />
2. Suppose that a Bayesian thinks that the prior probability of the hypothesis that "<math>P_A=P_T</math>" is 0.9,<br />
and that the set of all hypotheses that "<math>P_A\ne P_T</math>" have a total prior of 0.1. How might he calculate the odds ratio <math>\text{Prob}(P_A=P_T)/\text{Prob}(P_A\ne P_T)</math>? Hint: Are there nuisance variables to be marginalized over?<br />
<br />
===Class Activity===<br />
<br />
[http://wpressutexas.net/coursefiles/chrIV.txt Yeast chromosome 4]<br />
<br />
[http://wpressutexas.net/coursefiles/yeast_ORFs Activity slides]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_1._Let%27s_Talk_about_Probability&diff=3441Segment 1. Let's Talk about Probability2016-04-22T19:00:50Z<p>Bill Press: URL fix</p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see out-of-focus below is not the beginning of the segment. Press the play button to start at the beginning and in-focus.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/H5WjVgL6Nh4&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/H5WjVgL6Nh4 http://youtu.be/H5WjVgL6Nh4]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/1.LetsTalkAboutProbability.pdf PDF file] or [http://wpressutexas.net/coursefiles/1.LetsTalkAboutProbability.ppt PowerPoint file]<br />
<br />
====Bill's comments on this segment====<br />
Well, I do sound nervous! This was one of my first webcasts. The production values get a little better with later segments. However, the material here is important, so be sure you understand it before going on.<br />
<br />
Here is a [http://wpressutexas.net/coursefiles/CoxPaper1946.pdf link to the paper by R.T. Cox], discussed on slide 2. It's surprisingly readable for something so fundamental.<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. Prove that <math>P(ABC) = P(B)P(C|B)P(A|BC)</math>.<br />
<br />
2. What is the probability that the sum of two dice is odd with neither being a 4?<br />
<br />
====To Think About====<br />
1. [http://en.wikipedia.org/wiki/First-order_logic First-order logic] is a type of propositional calculus with propositions <math>a,b,c</math> and quantifier symbols <math>\forall</math> and <math>\exists</math>. This allows statements like "Socrates is a philosopher", "Socrates is a man", "There exists a philosopher who is not a man", etc. Can you use first-order logic as a calculus of inference? Is it the same as using the probability axioms? If not, then which of Cox's suppositions is violated?<br />
<br />
2. You are an oracle that, when asked, says "yes" with probability <math>P</math> and "no" with probability <math>1-P</math>. How do you do this using only a fair, two-sided coin?<br />
As we did in class. Represent P as a binary number. Whenever<br />
<br />
<br />
3. For the trout/minnow problem, what if you want to know the probability that the Nth fish caught is a trout, for N=1,2,3,... What is an efficient way to set up this calculation? (Hint: If you ever learned the word "Markov", this might be a good time to remember it!)<br />
<br />
====Class Activity====<br />
[[Media:ActivityWedJan15.pdf]]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_40._Markov_Chain_Monte_Carlo,_Example_1&diff=3440Segment 40. Markov Chain Monte Carlo, Example 12016-04-22T19:00:26Z<p>Bill Press: URL fix</p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/nSKZ02ZWzsY&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/nSKZ02ZWzsY http://youtu.be/nSKZ02ZWzsY]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/40.MCMCexample1.pdf PDF file] or [http://wpressutexas.net/coursefiles/40.MCMCexample1.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. The file [http://wpressutexas.net/coursefiles/Twoexondata.txt Twoexondata.txt] has 3000 pairs of (first, second) exon lengths. Choose 600 of the first exon lengths at random. Then, in your favorite programming language, repeat the calculation shown in the segment to model the chosen first exon lengths as a mixture of two Student distributions. That is (see slide 2): "6 parameters: two centers, two widths, ratio of peak heights, and Student t index." After running your Markov chain, plot the posterior distribution of the ratio of areas of the two Student components, as in slide 6.<br />
<br />
2. Make a histogram of the 2nd exon lengths. Do they seem to require two separate components? If so, repeat the calculations of problem 1. If not, use MCMC to explore the posterior of a model with a single Student component. Plot the posterior distribution of the Student parameter <math>\nu</math>.<br />
<br />
====To Think About====<br />
1. As a Bayesian, how would you decide whether, in problem 2 above, you need one vs. two components? What about 7 components? What about 200? Can you think of a way to enforce model simplicity?<br />
<br />
2. After you have given a good "textbook" answer to the preceding problem, think harder about whether this can really work for large data sets. The problem is that even tiny differences in log-likelihood <i>per data point</i> become huge log-odds differences when the number of data points is large. So, given the opportunity, models are almost always driven to high complexity. What do you think that practical Bayesians actually do about this?<br />
<br />
====Activity====<br />
[[Urns with MCMC]]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Main_Page&diff=3439Main Page2016-04-22T18:54:40Z<p>Bill Press: </p>
<hr />
<div>__NOTOC__<br />
=Statistical and Discrete Methods for Scientific Computing=<br />
<br />
==='''Oral Exam schedule for next Monday and Tuesday is [[Exam Sign-Up (2014)|here]]. All exams are in Bill's office, ACE (or POB) 3.258.'''===<br />
<br />
The [[2014 Concepts Study Page]] lists all the questions that may be asked in the oral exam.<br />
<br />
NEW: Post questions [[Exam Study Session Questions (2014)|here]] before the Friday review session.<br />
<br />
You can generate your own practice exam at the [http://wpressutexas.net/examgenerator.php 2014 Practice Exam Machine].<br />
<br />
===CSE383M (65280) and CS395T (53715), Spring 2014===<br />
Welcome to the course! The instructor is Professor William Press (Bill), and the TA is Jeff Hussmann (Jeff). We meet Mondays and Wednesdays, 1:30 - 3:00 p.m. in CBA 4.344 with Bill, and Fridays, 1:30 - 3:00 p.m. in CBA 4.348 with Jeff. The course is aimed at first or second year graduate students, especially in the CSEM, CS, and ECE programs, but others are welcome. You'll need math at the level of <i>at least</i> 2nd year calculus, plus linear algebra, plus either more continuous math (e.g., CSEM students) or more discrete math (e.g., CS and ECE students). You'll also need to be able to program in some known computer language.<br />
<br />
===Mechanics of the Course===<br />
The last two years, we have tried the experiment of a "flipped" course. This has worked so well that we are doing this again this year. "Flipped" means that the lectures are all on the web as recorded webcasts. You <b>must</b> watch the assigned webcasts <b>before</b> the class for which they are scheduled; maybe watch them more than once if there are parts that you don't easily understand. Then, you will be ready for the active learning that we do in class. The class activities will <b>not</b> "cover the material". Rather, class is supposed to be for "aha moments" and for "fixing" the material in your learning memory. We'll thus do various kinds of "active learning" activities that will test and improve your understanding of the material in the lecture. Such in-class activities, often done in <i>randomized</i> groups of two or three, may include<br />
* group computer programming exercises<br />
* group working of problems<br />
* group writing assignments<br />
* discussing concepts (and communicating ideas back to the whole class)<br />
* "quiz show" style activities<br />
* short surprise quizzes (generally at the beginning of class -- no makeups allowed)<br />
* whatever else we all think of<br />
<br />
===Problems for Each Segment===<br />
Every lecture segment home page has one or two relatively easy "skill" problems. You should work these after watching the segment, before class. (You might be asked to discuss your solution with your small group in class.) Also on the segment's page are one or two concept thought problems. One or another of these will sometimes be the basis of in-class activities, so you might want to think about them before class.<br />
<br />
===Student Wiki Pages===<br />
Every student will have a wiki page (and as many linked pages as you want). You can post your solutions to as many problems as you wish to your wiki page. You can do this either before the relevant class or afterwards. You can also make up, and solve, additional problems. Problems won't be individually graded. However, at the end of the course, the completeness and quality of you wiki page(s) will be a part of your course grade. Your wiki page can include discussion of the thought problems, as well as the skill problems. <br />
<br />
You can also post signed comments on any other student's wiki pages. To the extent that these are generally helpful, they will add credit to your reputation and for your grade.<br />
<br />
[[Student Pages]]<br />
<br />
===Laptops or Tablets===<br />
You <b>must</b> bring your laptop computer or full-sized tablet to every class, so that you can (i) look things up during group discussions or problem sessions and (ii) do in-class programming exercises. You can program in any language you want. For Python, which we recommend as the best choice for this course, you can either install it on your machine, or else use the IPython notebook server described in class. The course will include several lectures of Python workshop by Jeff. <br />
<br />
If you instead want to use MATLAB or Mathematica, that is fine, but please be sure that it is installed on your computer before the first class. (The MATLAB Student Edition is a real bargain.) For C, C++, Java, etc., please be sure that you have a fully working environment for compiling and running small pieces of code.<br />
<br />
===Course Requirements and Grading===<br />
Grades will be based on these factors<br />
* in-class attendance and participation<br />
* an in-class midterm exam<br />
* completeness and quality of your individual wiki page(s)<br />
* relevance and usefulness of your comments on other people's wiki pages (or on the main wiki)<br />
* an individual 30-minute final oral exam<br />
<br />
In previous years there was a term project, but not this year. Your working the problems and posting solutions on your wiki page is this year's substitute.<br />
<br />
[[File:learning_cone.gif|200px|thumb|right|Click image to see a legible version.]]<br />
<br />
===What is Active Learning?===<br />
Much research shows that lecture courses, where students listen passively as the instructor talks, are inefficient ways to learn. What works is so-called [http://en.wikipedia.org/wiki/Active_learning active learning], a broad term that, for us, basically means that class time is too valuable to waste on lectures. (See image at right.)<br />
<br />
The lectures are all recorded as webcasts, but webcasts are not active learning. However, they are a starting point as a "linear" introduction to the material. <br />
<br />
===Feedback===<br />
<br />
What has worked well in class so far? What hasn't worked? How could things be improved? Please leave [[Feedback 2014|feedback]].<br />
<br />
===Resources and Links===<br />
<br />
There is no textbook for the course. A list of recommended supplementary books is [[Recommended books|here]].<br />
<br />
Some resources for learning Python can be found [[Python resources|here]].<br />
<br />
Some MATLAB resources can be found [[MATLAB resources|here]].<br />
<br />
===Webcast Lecture Segments <i>(Opinionated Lessons in Statistics)</i>===<br />
All of the lectures are in the form of webcasts, divided into segments of about 15-30 minutes each (occasionally a bit longer). Each segment, has a wiki page, page links below. You can view the lecture on its wiki page, which also has additional stuff about the segment (including the <b>skill and thought problems</b>, or by clicking directly to YouTube, where they are all on Bill's [http://www.youtube.com/user/opinionatedlessons/videos?view=0&flow=list&sort=da "Opinionated Lessons" channel].<br />
<br />
<center><br />
{| class="wikitable"<br />
|+Watch segments BEFORE class on the indicated dates:<br />
|-<br />
|Mon Jan 13<br />
|<b>First Day of Class</b> (no segment due)<br />
|-<br />
|Wed Jan 15<br />
|[[Segment 1. Let's Talk about Probability]] (or [http://www.youtube.com/watch?v=H5WjVgL6Nh4 YouTube])<br />
|-<br />
|Fri Jan 17<br />
|[[Python Set-up Tutorial and Workshop]] (no segment due)<br />
|-<br />
|Mon Jan 20<br />
|<b>Martin Luther King Day HOLIDAY</b> (no segment due)<br />
|-<br />
|Wed Jan 22<br />
|[[Segment 2. Bayes]] (or [http://www.youtube.com/watch?v=FROAk4AFKHk YouTube])<br />
|-<br />
|Fri Jan 24<br />
|[[Segment 3. Monty Hall]] (or [http://www.youtube.com/watch?v=Rxb8JG8nUFA YouTube])<br />
|-<br />
|Mon Jan 27<br />
|[[Segment 4. The Jailer's Tip]] (or [http://www.youtube.com/watch?v=425D0CjLLLs YouTube])<br />
|-<br />
|Wed Jan 29<br />
|[[Segment 5. Bernoulli Trials]] (or [http://www.youtube.com/watch?v=2T3KP2LleFg YouTube])<br />
|-<br />
|Fri Jan 31<br />
|[[Segment 6. The Towne Family Tree]] (or [http://www.youtube.com/watch?v=y_L2THpv5Jg YouTube])<br />
|-<br />
|Mon Feb 3<br />
|[[Segment 7. Central Tendency and Moments]] (or [http://www.youtube.com/watch?v=ZWOmsKWQ7Fw YouTube])<br />
|-<br />
|Wed Feb 5<br />
|[[Segment 8. Some Standard Distributions]] (or [http://www.youtube.com/watch?v=EDYDC7iNGTg YouTube])<br />
|-<br />
|Fri Feb 7<br />
|[[Segment 9. Characteristic Functions]] (or [http://www.youtube.com/watch?v=NJL-BX6HuxY YouTube])<br />
|-<br />
|Mon Feb 10<br />
|[[Segment 10. The Central Limit Theorem]] (or [http://www.youtube.com/watch?v=IpuYGsKplSw YouTube])<br />
|-<br />
|Wed Feb 12<br />
|[[Segment 11. Random Deviates]] (or [http://www.youtube.com/watch?v=4r1GlyisB8E YouTube])<br />
|-<br />
|Fri Feb 14<br />
|[[Segment 12. P-Value Tests]] (or [http://www.youtube.com/watch?v=2Ul7TI0B5ek YouTube])<br />
|-<br />
|Mon Feb 17<br />
|[[Segment 13. The Yeast Genome]] (or [http://www.youtube.com/watch?v=QSgUX-Do8Tc YouTube])<br />
|-<br />
|Wed Feb 19<br />
|[[Segment 14. Bayesian Criticism of P-Values]] (or [http://www.youtube.com/watch?v=IKV6Pn18C7o YouTube])<br />
|-<br />
|Fri Feb 21<br />
|[[Segment 16. Multiple Hypotheses]] (or [http://www.youtube.com/watch?v=w6AjduOEN2k YouTube]) [note order!]<br />
|-<br />
|Mon Feb 24<br />
|[[Segment 15. The Towne Family - Again]] (or [http://www.youtube.com/watch?v=Y-i0CN15X-M YouTube]) [note order!]<br />
|-<br />
|Wed Feb 26<br />
|[[Segment 17. The Multivariate Normal Distribution]] (or [http://www.youtube.com/watch?v=t7Z1a_BOkN4 YouTube])<br />
|-<br />
|Fri Feb 28<br />
|[[Review Session for Mid-Term Exam]] (no new segment due)<br />
|-<br />
|}<br />
<br />
<b>Monday, March 3. MIDTERM EXAM</b><br />
<br />
[[Media:Midterm.pdf|(Exam)]]&nbsp;&nbsp;&nbsp;[[Media:MidtermSolutions2014.pdf|(Bill's solutions)]] &nbsp;&nbsp;&nbsp;[[Media:MidtermHistogram.pdf|(Histogram of grades)]]<br />
<br />
{| class="wikitable"<br />
|Wed Mar 5<br />
|[[Segment 18. The Correlation Matrix]] (or [http://www.youtube.com/watch?v=aW5q_P0it9E YouTube])<br />
|-<br />
|Fri Mar 7<br />
|[[Segment 19. The Chi Square Statistic]] (or [http://www.youtube.com/watch?v=87EMhmPkOhk YouTube])<br />
|-<br />
|}<br />
<br />
<b>Monday, March 10 through Friday, March 14: SPRING BREAK</b><br />
<br />
{| class="wikitable"<br />
|+Watch segments BEFORE class on the indicated dates:<br />
|Mon Mar 17<br />
|[[Segment 20. Nonlinear Least Squares Fitting]] (or [http://www.youtube.com/watch?v=xtBCGPHRcb0 YouTube])<br />
|-<br />
|Wed Mar 19<br />
|[[Segment 21. Marginalize or Condition Uninteresting Fitted Parameters]] (or [http://www.youtube.com/watch?v=yxZUS_BpEZk YouTube])<br />
|-<br />
|Fri Mar 21<br />
|[[Segment 22. Uncertainty of Derived Parameters]] (or [http://www.youtube.com/watch?v=ZoD3_rov--w YouTube])<br />
|-<br />
|Mon Mar 24<br />
|[[Segment 23. Bootstrap Estimation of Uncertainty]] (or [http://www.youtube.com/watch?v=1OC9ul-1PVg YouTube])<br />
|-<br />
|Wed Mar 26<br />
|[[Segment 24. Goodness of Fit]] (or [http://www.youtube.com/watch?v=EJleSVf0Z-U YouTube])<br />
|-<br />
|Fri Mar 28<br />
|[[Segment 27. Mixture Models]] (or [http://www.youtube.com/watch?v=9pWnZcpYh44 YouTube])<br />
|-<br />
|Mon Mar 31<br />
|[[Segment 28. Gaussian Mixture Models in 1-D]] (or [http://www.youtube.com/watch?v=n7u_tq0I6jM YouTube])<br />
|-<br />
|Wed Apr 2<br />
|[[Segment 29. GMMs in N-Dimensions]] (or [http://www.youtube.com/watch?v=PH8_qqDTCYY YouTube])<br />
|-<br />
|Fri Apr 4<br />
|[[Segment 30. Expectation Maximization (EM) Methods]] (or [http://www.youtube.com/watch?v=StQOzRqTNsw YouTube])<br />
|-<br />
|Mon Apr 7<br />
|[[Segment 31. A Tale of Model Selection]] (or [http://www.youtube.com/watch?v=_G1gzqQzbuM YouTube])<br />
|-<br />
|Wed Apr 9<br />
|[[Segment 32. Contingency Tables: A First Look]] (or [http://www.youtube.com/watch?v=NvCdN2RFufY YouTube])<br />
|-<br />
|Fri Apr 11<br />
|[[Segment 33. Contingency Table Protocols and Exact Fisher Test]] (or [http://www.youtube.com/watch?v=9Qrkw5UfAmQ You Tube])<br />
|-<br />
|Mon Apr 14<br />
|[[Segment 34. Permutation Tests]] (or [http://www.youtube.com/watch?v=_4BUS1NGNHA YouTube])<br />
|-<br />
|Wed Apr 16<br />
|[[Segment 37. A Few Bits of Information Theory]] (or [http://www.youtube.com/watch?v=ktzYOLDN3u4 YouTube])<br />
|-<br />
|Fri Apr 18<br />
|[[Segment 38. Mutual Information]] (or [http://www.youtube.com/watch?v=huNPh1mkJHM YouTube])<br />
|-<br />
|Mon Apr 21<br />
|[[Segment 39. MCMC and Gibbs Sampling ]] (or [http://www.youtube.com/watch?v=4gNpgSPal_8 YouTube])<br />
|-<br />
|Wed Apr 23<br />
|[[Segment 40. Markov Chain Monte Carlo, Example 1 ]] (or [http://www.youtube.com/watch?v=nSKZ02ZWzsY YouTube])<br />
|-<br />
|Fri Apr 25<br />
|[[Segment 41. Markov Chain Monte Carlo, Example 2 ]] (or [http://www.youtube.com/watch?v=FnNckBLWJ24 YouTube])<br />
|-<br />
|Mon Apr 28<br />
|[[Segment 47. Low-Rank Approximation of Data ]] (or [http://www.youtube.com/watch?v=M0gsHNS_5FE YouTube])<br><br />
|-<br />
|Wed Apr 30<br />
|[[Segment 48. Principal Component Analysis (PCA)]] (or [http://www.youtube.com/watch?v=frWqIUpIxLg YouTube])<br />
|-<br />
|Fri May 2<br />
| <b>Review Session for Oral Exams</b><br />
|}<br />
<br />
<b>Monday, May 5 and Tuesday, May 6: ORAL FINAL EXAMS</b><br />
<br />
</center><br />
<br />
===Extra Credit Segments (segment number indicates intended sequence)===<br />
[[Segment 25. Fitting Models to Counts]] (or [http://www.youtube.com/watch?v=YXaq2PVCGZQ YouTube])<br><br />
[[Segment 26. The Poisson Count Pitfall]] (or [http://www.youtube.com/watch?v=rPO3N5GI-3I YouTube])<br><br />
[[Segment 35. Ordinal vs. Nominal Contingency Tables]] (or [http://www.youtube.com/watch?v=fYUbj78aguk YouTube])<br><br />
[[Segment 36. Contingency Tables Have Nuisance Parameters]] (or [http://www.youtube.com/watch?v=bHK79WKOX-Y YouTube])<br><br />
[[Segment 49. Eigenthingies and Main Effects]] (or [http://www.youtube.com/watch?v=LpGQnvvGLMQ YouTube])<br><br />
<br />
===Segments with Slides But Not Yet Recorded===<br />
(links are to PDF files)<br />
<br />
[http://wpressutexas.net/coursefiles/15.5.PoissonProcessesOrderStatistics.pdf Segment 15.5. Poisson Processes and Order Statistics]<br><br />
[http://wpressutexas.net/coursefiles/42.WienerFiltering.pdf Segment 42. Wiener Filtering]<br><br />
[http://wpressutexas.net/coursefiles/43.TheIRELady.pdf Segment 43. The IRE Lady]<br><br />
[http://wpressutexas.net/coursefiles/44.Wavelets.pdf Segment 44. Wavelets]<br><br />
[http://wpressutexas.net/coursefiles/45.LaplaceInterpolation.pdf Segment 45. Laplace Interpolation]<br><br />
[http://wpressutexas.net/coursefiles/46.InterpolationOnScatteredData.pdf Segment 46. Interpolation On Scattered Data]<br><br />
[http://wpressutexas.net/coursefiles/50.BinaryClassifiers.pdf Segment 50. Binary Classifiers]<br><br />
[http://wpressutexas.net/coursefiles/51.HierarchicalClassification.pdf Segment 51. Hierarchical Classification]<br><br />
[http://wpressutexas.net/coursefiles/52.DynamicProgramming.pdf Segment 52. Dynamic Programming]<br><br />
<br />
===Team Randomizer===<br />
Link to [http://wpressutexas.net/coursefiles/teamrandomizer.php the team randomizer]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_49._Eigenthingies_and_Main_Effects&diff=3438Segment 49. Eigenthingies and Main Effects2016-04-22T18:53:13Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/LpGQnvvGLMQ&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/LpGQnvvGLMQ http://youtu.be/LpGQnvvGLMQ]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/49.EigenthingiesAndMainEffects.pdf PDF file] or [http://wpressutexas.net/coursefiles/49.EigenthingiesAndMainEffects.ppt PowerPoint file]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_36._Contingency_Tables_Have_Nuisance_Parameters&diff=3437Segment 36. Contingency Tables Have Nuisance Parameters2016-04-22T18:52:48Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/bHK79WKOX-Y&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/bHK79WKOX-Y http://youtu.be/bHK79WKOX-Y]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/36.ContingencyTablesNuisanceParameters.pdf PDF file] or [http://wpressutexas.net/coursefiles/36.ContingencyTablesNuisanceParameters.ppt PowerPoint file]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_35._Ordinal_vs._Nominal_Contingency_Tables&diff=3436Segment 35. Ordinal vs. Nominal Contingency Tables2016-04-22T18:52:24Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/fYUbj78aguk&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/fYUbj78aguk http://youtu.be/fYUbj78aguk]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/35.OrdinalVsNominalTables.pdf PDF file] or [http://wpressutexas.net/coursefiles/35.OrdinalVsNominalTables.ppt PowerPoint file]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_26._The_Poisson_Count_Pitfall&diff=3435Segment 26. The Poisson Count Pitfall2016-04-22T18:51:58Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/rPO3N5GI-3I&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/rPO3N5GI-3I http://youtu.be/rPO3N5GI-3I]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/26.ThePoissonCountPitfall.pdf PDF file] or [http://wpressutexas.net/coursefiles/26.ThePoissonCountPitfall.ppt PowerPoint file]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_25._Fitting_Models_to_Counts&diff=3434Segment 25. Fitting Models to Counts2016-04-22T18:51:27Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/YXaq2PVCGZQ&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/YXaq2PVCGZQ http://youtu.be/YXaq2PVCGZQ]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/25.FittingModelsToCounts.pdf PDF file] or [http://wpressutexas.net/coursefiles/25.FittingModelsToCounts.ppt PowerPoint file]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_48._Principal_Component_Analysis_(PCA)&diff=3433Segment 48. Principal Component Analysis (PCA)2016-04-22T18:50:47Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/frWqIUpIxLg&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/frWqIUpIxLg http://youtu.be/frWqIUpIxLg]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/48.PrincipalComponentAnalysis.pdf PDF file] or [http://wpressutexas.net/coursefiles/48.PrincipalComponentAnalysis.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Compute====<br />
1. Suppose that only one principal component is large (that is, there is a single dominant value <math>s_i</math>). In terms of the matrix <math>\mathbf V</math> (and anything else relevant), what are the constants <math>a_j</math> and <math>b_j</math> that make a one-dimensional model of the data? This would be a model where<br />
<math>x_{ij} \approx a_j \lambda_i + b_j</math><br />
with each of the data points (rows) having its own value of an independent variable <math>\lambda_i</math> and each of the responses (columns) having it's own constants <math>a_j,b_j</math>.<br />
<br />
2. The file [[Media:Dataforpca.txt|dataforpca.txt]] has 1000 data points (rows) each with 3 responses (columns). Make three scatter plots, each showing a pair of responses (in all 3 possible ways). Do the responses seem to be correlated?<br />
<br />
3. Find the principal components of the data and make three new scatter plots, each showing a pair of principal coordinates of the data. What is the distribution (histogram) of the data along the largest principal component? What is a one-dimensional model of the data (as in problem 1 above)?<br />
<br />
====To Think About====<br />
1. Although PCA doesn't require that the data be multivariate normal, it is most meaningful in that case, because the data is then completely defined by its principal components (i.e., covariance matrix) and means. Can you design a test statistic that measures "quality of approximation of a data set by a multivariate normal" in some quantitative way? Try to make your statistic approximately independent of <math>N</math>, the number of data points.</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_47._Low-Rank_Approximation_of_Data&diff=3432Segment 47. Low-Rank Approximation of Data2016-04-22T18:49:53Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/M0gsHNS_5FE&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/M0gsHNS_5FE http://youtu.be/M0gsHNS_5FE]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/47.LowRankApproximationOfData.pdf PDF file] or [http://wpressutexas.net/coursefiles/47.LowRankApproximationOfData.ppt PowerPoint file]<br />
<br />
<br />
<br />
== Class Activity ==<br />
<br />
Class activity description: [[File:svd_exercise.pdf]]<br />
<br />
Required files:<br />
[[File:Svd_exercise.m.txt]], [[File:GetCongress.txt]]<br />
<br />
Last resort file:<br />
[[File:CongressSVD.m.txt]]<br />
<br />
Pictures:<br />
<br />
[[File:Cacti.jpg]]<br />
<br />
[[File:Ryb.jpg]]<br />
<br />
[[File:Ryb_twist.jpg]]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_41._Markov_Chain_Monte_Carlo,_Example_2&diff=3431Segment 41. Markov Chain Monte Carlo, Example 22016-04-22T18:49:03Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/FnNckBLWJ24&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/FnNckBLWJ24 http://youtu.be/FnNckBLWJ24]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/41.MCMCexample2.pdf PDF file] or [http://wpressutexas.net/coursefiles/41.MCMCexample2.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. Show that the waiting times (times between events) in a Poisson process are Exponentially distributed. (I think we've done this before.)<br />
<br />
2. Plot the pdf's of the waiting times between (a) every other Poisson event, and (b) every Poisson event at half the rate.<br />
<br />
3. Show, using characteristic functions, that the waiting times between every Nth event in a Poisson process is Gamma distributed. (I think we've also done one before, but it is newly relevant in this segment.)<br />
<br />
====To Think About====<br />
1. In slide 5, showing the results of the MCMC, how can we be sure (or, how can we gather quantitative evidence) that there won't be another discrete change in <math>k_1</math> or <math>k_2</math> if we keep running the model longer. That is, how can we measure convergence of the model?<br />
<br />
2. Suppose you have two hypotheses: H1 is that a set of times <math>t_i</math> are being generated as every 26th event from a Poisson process with rate 26. H2 is that they are every 27th event from a Poisson process with rate 27. (The mean rate is thus the same in both cases.) How would you estimate the number <math>N</math> of data points <math>t_i</math> that you need to clearly distinguish between these hypotheses?<br />
<br />
====Class Activity====<br />
[[Urns with MCMC]]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_40._Markov_Chain_Monte_Carlo,_Example_1&diff=3430Segment 40. Markov Chain Monte Carlo, Example 12016-04-22T18:48:25Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/nSKZ02ZWzsY&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/nSKZ02ZWzsY http://youtu.be/nSKZ02ZWzsY]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/40.MCMCexample1.pdf PDF file] or [http://wpressutexas.net/coursefiles/40.MCMCexample1.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. The file [http://granite.ices.utexas.edu/coursefiles/Twoexondata.txt Twoexondata.txt] has 3000 pairs of (first, second) exon lengths. Choose 600 of the first exon lengths at random. Then, in your favorite programming language, repeat the calculation shown in the segment to model the chosen first exon lengths as a mixture of two Student distributions. That is (see slide 2): "6 parameters: two centers, two widths, ratio of peak heights, and Student t index." After running your Markov chain, plot the posterior distribution of the ratio of areas of the two Student components, as in slide 6.<br />
<br />
2. Make a histogram of the 2nd exon lengths. Do they seem to require two separate components? If so, repeat the calculations of problem 1. If not, use MCMC to explore the posterior of a model with a single Student component. Plot the posterior distribution of the Student parameter <math>\nu</math>.<br />
<br />
====To Think About====<br />
1. As a Bayesian, how would you decide whether, in problem 2 above, you need one vs. two components? What about 7 components? What about 200? Can you think of a way to enforce model simplicity?<br />
<br />
2. After you have given a good "textbook" answer to the preceding problem, think harder about whether this can really work for large data sets. The problem is that even tiny differences in log-likelihood <i>per data point</i> become huge log-odds differences when the number of data points is large. So, given the opportunity, models are almost always driven to high complexity. What do you think that practical Bayesians actually do about this?<br />
<br />
====Activity====<br />
[[Urns with MCMC]]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_39._MCMC_and_Gibbs_Sampling&diff=3429Segment 39. MCMC and Gibbs Sampling2016-04-22T18:47:39Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/4gNpgSPal_8&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/4gNpgSPal_8 http://youtu.be/4gNpgSPal_8]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/39.MCMCandGibbsSampling.pdf PDF file] or [http://wpressutexas.net/coursefiles/39.MCMCandGibbsSampling.ppt PowerPoint file]<br />
<br />
===Problems===<br />
===To Calculate===<br />
1. Suppose the domain of a model are the five integers <math>x = \{1,2,3,4,5\}</math>, and that your proposal distribution is: "When <math>x_1 = 2,3,4</math>, choose with equal probability <math>x_2 = x_1 \pm 1</math>. For <math>x_1=1</math> always choose <math>x_2 =2</math>. For <math>x_1=5</math> always choose <math>x_2 =4</math>. What is the ratio of <math>q</math>'s that goes into the acceptance probability <math>\alpha(x_1,x_2)</math> for all the possible values of <math>x_1</math> and <math>x_2</math>?<br />
<br />
2. Suppose the domain of a model is <math>-\infty < x < \infty</math> and your proposal distribution is (perversely),<br />
<br />
<math>q(x_2|x_1) = \begin{cases}\tfrac{7}{2}\exp[-7(x_2-x_1)],\quad & x_2 \ge x_1 \\ \tfrac{5}{2}\exp[-5(x_1-x_2)],\quad & x_2 < x_1 \end{cases}</math><br />
<br />
Sketch this distribution as a function of <math>x_2-x_1</math>. Then, write down an expression for the ratio of <math>q</math>'s that goes into the acceptance probability <math>\alpha(x_1,x_2)</math>.<br />
<br />
===To Think About===<br />
1. Suppose an urn contains 7 large orange balls, 3 medium purple balls, and 5 small green balls. When balls are drawn randomly, the larger ones are more likely to be drawn, in the proportions large:medium:small = 6:4:3. You want to draw exactly 6 balls, one at a time without replacement. How would you use Gibbs sampling to learn: (a) How often do you get 4 orange plus 2 of the same (non-orange) color? (b) What is the expectation (mean) of the product of the number of purple and number of green balls drawn?<br />
<br />
2. How would you do the same problem computationally but without Gibbs sampling?<br />
<br />
3. How would you do the same problem non-stochastically (e.g., obtain answers to 12 significant figures)? (Hint: This is known as the Wallenius non-central hypergeometric distribution.)<br />
<br />
[Answers: 0.155342 and 1.34699]<br />
<br />
===Class Activity===<br />
There's a story here, about diagnosing rats by which branches they pick in a maze. Bill will explain in class. Unless he thinks up a better story.<br />
<br />
Mathematically, it's another one of these amazing Gibbs sampling examples. Suppose 2 unknown distributions over the digits 0..9, that is <math>p_0,p_1,\ldots,p_9</math> and <math>q_0,q_1,\ldots,q_9</math>, of course with <math>\sum_i p_i = 1</math> and <math>\sum_i q_i = 1</math>. [[media:Gibbs_data.txt|This data file]] has 1000 lines, each with 10 i.i.d. draws of digits, either from the <math>p</math>'s or the <math>q</math>'s -- but, for each line, you don't know which.<br />
<br />
1. Estimate <math>p_0,p_1,\ldots,p_9</math> and <math>q_0,q_1,\ldots,q_9</math> from the data. If you are ambitious, do this by two different methods: First, by Gibbs sampling. Second, by an E-M method. (Although these are conceptually different, my code for them differs by only a few lines.)<br />
<br />
2. Estimate a probability for each line in the data file as to whether it is drawn from the <math>p_i</math>'s (as opposed to the <math>q_i</math>'s.<br />
<br />
3. Plot histograms that show the uncertainties of your Gibbs estimate for the <math>p_i</math>'s. Do your E-M estimates appear to be at the modes of your Gibbs histograms? Should they be?<br />
<br />
[[Media:gibbs_data.txt]]<br />
<br />
[http://nbviewer.ipython.org/github/CS395T/2014/blob/master/Jeff%20Hussmann%2004-21-14%20Gibbs%20sampling_1398145904.ipynb Jeff's solution]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Information_theory_blitz&diff=3428Information theory blitz2016-04-22T18:46:52Z<p>Bill Press: </p>
<hr />
<div>[http://wpressutexas.net/coursewiki/images/f/fa/Information_theory_blitz.pdf Link to pdf]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_38._Mutual_Information&diff=3427Segment 38. Mutual Information2016-04-22T18:46:32Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/huNPh1mkJHM&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/huNPh1mkJHM http://youtu.be/huNPh1mkJHM]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/38.MutualInformation.pdf PDF file] or [http://wpressutexas.net/coursefiles/38.MutualInformation.ppt PowerPoint file]<br />
<br />
====Class activity====<br />
[[Information theory blitz]]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_37._A_Few_Bits_of_Information_Theory&diff=3426Segment 37. A Few Bits of Information Theory2016-04-22T18:45:54Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/ktzYOLDN3u4&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/ktzYOLDN3u4 http://youtu.be/ktzYOLDN3u4]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/37.AFewBitsOfInformationTheory.pdf PDF file] or [http://wpressutexas.net/coursefiles/37.AFewBitsOfInformationTheory.ppt PowerPoint file]<br />
<br />
===Class Activity===<br />
<br />
There is no general way to estimate the entropy of a (non i.i.d.) process from the data it generates,<br />
because you may or may not be able to recognize its entropy-lowering internal structure. So,<br />
in general, even an accurate "estimate" is only an upper bound on the entropy.<br />
<br />
Let's see how well we can do at estimating the true entropy of five different strings<br />
in the alphabet A, C, G, T. (Bill knows the answer, because he knows how they were<br />
generated. But he's not telling!)<br />
<br />
The more you study the data, the better you'll do! (If you know how to use Hidden<br />
Markov Models, which we didn't have room for in this course, you might do even better.)<br />
<br />
[[Media:entropystring1.txt]]<br />
<br />
[[Media:entropystring2.txt]]<br />
<br />
[[Media:entropystring3.txt]]<br />
<br />
[[Media:entropystring4.txt]]<br />
<br />
[[Media:entropystring5.txt]]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_34._Permutation_Tests&diff=3425Segment 34. Permutation Tests2016-04-22T18:45:22Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/_4BUS1NGNHA&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/_4BUS1NGNHA http://youtu.be/_4BUS1NGNHA]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/34.PermutationTests.pdf PDF file] or [http://wpressutexas.net/coursefiles/34.PermutationTests.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. Use the permutation test to decide whether the contingency table <math><br />
\begin{bmatrix}<br />
5 & 3 & 2\\<br />
2 & 3 & 6 \\<br />
0 & 2 & 3<br />
\end{bmatrix}</math><br />
shows a significant association. What is the p-value?<br />
<br />
2. Repeat the calculation using the Pearson chi-square statistic instead of the Wald statistic, or vice versa.<br />
<br />
====To Think About====<br />
1. Is slide's 7 suggestion, that you figure out how to implement the permutation test without "expanding all the data", actually possible? If so, what is your method?<br />
<br />
===Class Activity===<br />
[[Media:somecontingencies3.txt]]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_33._Contingency_Table_Protocols_and_Exact_Fisher_Test&diff=3424Segment 33. Contingency Table Protocols and Exact Fisher Test2016-04-22T18:44:52Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/9Qrkw5UfAmQ&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/9Qrkw5UfAmQ http://youtu.be/9Qrkw5UfAmQ]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/33.ProtocolsAndFisherExactTest.pdf PDF file] or [http://wpressutexas.net/coursefiles/33.ProtocolsAndFisherExactTest.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. How many distinct m by n contingency tables are there that have exactly N total events?<br />
<br />
2. For every distinct 2 by 2 contingency table containing exactly 14 elements, compute its chi-square statistic,<br />
and also its Wald statistic. Display your results as a scatter plot of one statistic versus the other.<br />
<br />
====To Think About====<br />
1. Suppose you want to find out of living under power lines causes cancer. Describe in detail how you would do this (1) as a case/control study, (2) as a longitudinal study, (3) as a snapshot study. Can you think of a way to do it as a study with all the marginals fixed (protocol 4)?<br />
<br />
2. For an m by n contingency table, can you think of a systematic way to code "the loop over all possible contingency tables with the same marginals" in slide 8?<br />
<br />
====Activity====<br />
[[Chess contingency tables]]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_32._Contingency_Tables:_A_First_Look&diff=3423Segment 32. Contingency Tables: A First Look2016-04-22T18:44:26Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/NvCdN2RFufY&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/NvCdN2RFufY http://youtu.be/NvCdN2RFufY]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/32.ContingencyTablesFirstLook.pdf PDF file] or [http://wpressutexas.net/coursefiles/32.ContingencyTablesFirstLook.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. 20 our of 100 U.S. Senators are women, yet when the Senate formed an intramural baseball team of 9 people only 1 woman was chosen for the team. What is the probability of this occurring by chance? What is the p-value with which the null hypothesis "there is no discrimination against women Senators" can be rejected?<br />
<br />
2. A large jelly bean jar has 20% red jelly beans, 30% blue, and 50% yellow. If 6 jelly beans are chosen at random, what is the chance of getting exactly 2 of each color? What is the name of this distribution?<br />
<br />
3. A small jelly bean jar has 2 red jelly beans, 3 blue, and 5 yellow. If 6 jelly beans are chosen at random, what is the chance of getting exactly 2 of each color? What is the name of this distribution?<br />
<br />
====To Think About====<br />
1. Suppose that, in the population, 82% of people are right-handed, 18% left handed; 49% are male, 51% female; and that handedness and sex are independent. Repeatedly draw samples of N=15 individuals, form the contingency table, and apply the chi-square test for significance to get a p-value, exactly as described in the lecture segment. How often is your p-value less than 0.05? If you get an answer that is different from 0.05, why? Try larger values of N until the answer converges to 0.05.<br />
(How are you handling zero draws when they occur?)<br />
<br />
===Class Activity===<br />
<br />
There was a surprise quiz. Bill's solutions are [[Media:Quiz20140409.pdf|here]].<br />
<br />
We will analyze these contingency tables, asking (i) What is <math>\chi^2</math>? (ii) What is the p-value? (iii) Is there a significant association? (iv) If so, can you describe the main effect(s) seen?<br />
<br />
{|<br />
! align="left"| <br />
! Vanilla<br />
! Strawberry<br />
! Chocolate<br />
|-<br />
!Texas Tech<br />
|1<br />
|1<br />
|13<br />
|-<br />
!A&M<br />
|16<br />
|4<br />
|15<br />
|-<br />
!UT<br />
|45<br />
|32<br />
|80<br />
|}<br />
<br />
{|<br />
! align="left"| Grades<br />
! A<br />
! B<br />
! C<br />
! D<br />
|-<br />
!A&M<br />
|5<br />
|24<br />
|32<br />
|1<br />
|-<br />
!UT<br />
|17<br />
|80<br />
|50<br />
|18<br />
|}</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_31._A_Tale_of_Model_Selection&diff=3422Segment 31. A Tale of Model Selection2016-04-22T18:43:59Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/_G1gzqQzbuM&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/_G1gzqQzbuM http://youtu.be/_G1gzqQzbuM]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/31.ATaleOfModelSelection.pdf PDF file] or [http://wpressutexas.net/coursefiles/31.ATaleOfModelSelection.ppt PowerPoint file]<br />
<br />
===Problems===<br />
[[image:Modelselection.png|right|500px]]<br />
====To Calculate====<br />
(These problems will be the class activity on Monday, but you can get a head start on them if you want.)<br />
<br />
I measured the temperature of my framitron manifold every minute for 1000 minutes, with the same accuracy for each measurement. The data is plotted on the right (with data points connected by straight lines), and is in the file [[media:Modelselection.txt|Modelselection.txt]].<br />
<br />
1. From the data, estimate the measurement error <math>\sigma</math>. (You can make any reasonable assumptions that follow from looking at the data.)<br />
<br />
2. Write down a few guesses for functional forms, with different (or adjustable) numbers of parameters that might be good models for the data. Order these by their model complexity (number of parameters) from least to most.<br />
<br />
3. Fit each of your models to the data, obtaining the parameters and <math>\chi^2_{min}</math> for each. (Hint: write your code generally enough that you can change from model to model by changing only one or two lines.)<br />
<br />
4. Which of your models "wins" the model selection contest if you use AIC? Which for BIC?<br />
<br />
====To Think About====<br />
1. Both AIC and BIC decide whether to allow a new parameter based on a <math>\Delta\chi^2</math>. So it is possible to think about each as a p-value test for whether a null hypothesis ("no new parameter") is ruled out at some significance level. Viewed in this way, what are the critical p-values being used by each test?<br />
<br />
2. Can you give a reasonable rationale, that might be used by a proponent of BIC, for why its <math>\Delta\chi^2</math> should be larger in magnitude as <math>N</math> (the number of data points) increases?</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_30._Expectation_Maximization_(EM)_Methods&diff=3421Segment 30. Expectation Maximization (EM) Methods2016-04-22T18:43:34Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/StQOzRqTNsw&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/StQOzRqTNsw http://youtu.be/StQOzRqTNsw]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/30.EMmethodsTheory.pdf PDF file] or [http://wpressutexas.net/coursefiles/30.EMmethodsTheory.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. For a set of positive values <math>\{x_i\}</math>, use Jensen's inequality to show (a) the mean of their square is never less than the square of their mean, and (b) their (arithmetic) mean is never less than their harmonic mean.<br />
<br />
2. Sharpen the argument about termination of E-M methods that was given in slide 4, as follows: Suppose that <math>g(x) \ge f(x)</math> for all <math>x</math>, for some two functions <math>f</math> and <math>g</math>. Prove that, at any local maximum <math>x_m</math> of <math>f</math>, one of these two conditions must hold: (1) <math>g(x_m) > f(x_m)</math> [in which case the E-M algorithm has not yet terminated], or (2) <math>g(x_m)</math> is a local maximum of <math>g</math> [in which case the E-M algorithm terminates at a maximum of <math>g</math>, as advertised]. You can make any reasonable assumption about continuity of the functions.<br />
<br />
====To Think About====<br />
1. Jensen's inequality says something like "any concave function of a mixture of things is greater than the same mixture of the individual concave functions". What "mixture of things" is this idea being applied to in the proof of the E-M theorem (slide 4)?<br />
<br />
2. So slide 4 proves that some function is less than the actual function of interest, namely <math>L(\theta)</math>. What makes this such a powerful idea?<br />
<br />
===Activity===<br />
The class activity for Friday can be found at [[EM activity]].</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_29._GMMs_in_N-Dimensions&diff=3420Segment 29. GMMs in N-Dimensions2016-04-22T18:43:07Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/PH8_qqDTCYY&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/PH8_qqDTCYY http://youtu.be/PH8_qqDTCYY]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/29.GMMsInNDimensions.pdf PDF file] or [http://wpressutexas.net/coursefiles/29.GMMsInNDimensions.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Compute====<br />
The file [[Media:Twoexondata.txt|twoexondata.txt]] contains data similar to that shown in slide 6, as 3000 (x,y) pairs.<br />
<br />
1. In your favorite computer language, write a code for K-means clustering, and cluster the given data using (a) 3 components and (b) 8 components. Don't use anybody's K-means clustering package for this part: Code it yourself. Hint: Don't try to do it as limiting case of GMMs, just code it from the definition of K-means clustering, using an E-M iteration. Plot your results by coloring the data points according to which cluster they are in. How sensitive is your answer to the starting guesses?<br />
<br />
2. In your favorite computer language, and either writing your own GMM program or using any code you can find elsewhere (e.g., Numerical Recipes for C++, or [http://scikit-learn.org/stable/modules/mixture.html scikit-learn], which is installed on the class server, for Python), construct mixture models like those shown in slide 8 (for 3 components) and slide 9 (for 8 components). You should plot 2-sigma error ellipses for the individual components, as shown in those slides.<br />
<br />
====To Think About====<br />
1. The segment (or the previous one) mentioned that the log-likelihood can sometimes get stuck on plateaus, barely increasing, for long periods of time, and then can suddenly increase by a lot. What do you think is happening from iteration to iteration during these times on a plateau?<br />
<br />
<br />
====Class Activity====<br />
[[GMM activity]]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_28._Gaussian_Mixture_Models_in_1-D&diff=3419Segment 28. Gaussian Mixture Models in 1-D2016-04-22T18:42:20Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/n7u_tq0I6jM&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/n7u_tq0I6jM http://youtu.be/n7u_tq0I6jM]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/28.GaussianMixtureModels1D.pdf PDF file] or [http://wpressutexas.net/coursefiles/28.GaussianMixtureModels1D.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. Draw a sample of 100 points from the uniform distribution <math>U(0,1)</math>. This is your data set. Fit GMM models to your sample (now considered as being on the interval <math>-\infty < x < \infty</math>) with increasing numbers of components <math>K</math>, at least <math>K=1,\ldots,5</math>. Plot your models. Do they get better as <math>K</math> increases? Did you try multiple starting values to find the best (hopefully globally best) solutions for each <math>K</math>?<br />
<br />
2. Multiplying a lot of individual likelihoods will often underflow. (a) On average, how many values drawn from <math>U(0,1)</math> can you multiply before the product underflows to zero? (b) What, analytically, is the distribution of the sum of <math>N</math> independent values <math>\log(U)</math>, where <math>U\sim U(0,1)</math>? (c) Is your answer to (a) consistent with your answer to (b)?<br />
<br />
====To Think About====<br />
1. Suppose you want to approximate some analytically known function <math>f(x)</math> (whose integral is finite), as a sum of <math>K</math> Gaussians with different centers and widths. You could pretend that <math>f(x)</math> (or some scaling of it) was a probability distribution, draw <math>N</math> points from it and do the GMM thing to find the approximating Gaussians. Now take the limit <math>N\rightarrow \infty</math>, figure out how sums become integrals, and write down an iterative method for fitting Gaussians to a given <math>f(x)</math>. Does it work? (You can assume that well-defined definite integrals can be done numerically.)<br />
<br />
===Class Activity===<br />
<br />
Let's explore a data set and try to make sensible statements about it.<br />
<br />
[[Media:netflixishdata.txt|netflixishdata.txt]]<br />
<br />
Rows are 200 movie watchers, columns are 100 movies, entries are their ratings on a scale<br />
of 1 (I hated it!) to 5 (I loved it!). This is not real data, of course, so it is only Netflixish, not Netflix.<br />
<br />
Questions to explore<br />
<br />
How much are people alike? <br/><br />
How much are movies alike? <br/><br />
Distribution of the data in various ways? <br/><br />
<br />
By summing over all the columns and dividing by number of entires, we got the average rating for each movie. Something surprising was that the max of all the mean ratings was 3.46 (so there was no mean rating greater than 3.46 stars!), the min was 2.3650, the mean of all the mean movie ratings is 2.9998, the median of these mean ratings was 2.9925. <br/><br />
<br />
Insight #1:<br/><br />
Looking at the actual data set, we see that there are a lot of "haters" i.e. there are a lot of people who gave a lot of 1 ratings.<br />
<br />
[[File:28ca1.png|thumb|center|700px|link=|alt=]]<br />
<br />
Insight #2:<br/><br />
Overall there are exactly 4000 (+ or - 1) of each rating.<br />
<br />
[[File:28ca2.png|thumb|center|700px|link=|alt=]]<br />
<br />
Insight #3 There seem to be exactly 4 kinds of movies:<br/><br />
<br />
Why? Movie ratings were generated from the sides of a regular tetrahedron<br />
<br />
[[File:28ca3.png|thumb|center|700px|link=|alt=]]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_27._Mixture_Models&diff=3418Segment 27. Mixture Models2016-04-22T18:41:43Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/9pWnZcpYh44&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/9pWnZcpYh44 http://youtu.be/9pWnZcpYh44]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/27.MixtureModels.pdf PDF file] or [http://wpressutexas.net/coursefiles/27.MixtureModels.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
The file [[Media:Mixturevals.txt]] contains 1000 values, each drawn either with probability <math>c</math> from the distribution <math>\text{Exponential}(\beta)</math> (for some constant <math>\beta</math>), or otherwise (with probability <math>1-c</math>) from the distribution <math>p(x) = (2/\pi)/(1+x^2),\; x>0</math>.<br />
<br />
1. Write down an expression for the probability of the file's data given some values for the parameters <math>\beta</math> and <math>c</math>.<br />
<br />
2. Calculate numerically the maximum likelihood values of <math>\beta</math> and <math>c</math>.<br />
<br />
3. Estimate numerically the Bayes posterior distribution of <math>\beta</math>, marginalizing over <math>c</math> as a nuisance parameter. (You'll of course have to make some assumption about priors.)<br />
<br />
====To Think About====<br />
1. In problem 3, above, you assumed some definite prior for <math>c</math>. What if <math>c</math> is itself drawn (just once for the whole data set) from a distribution <math>\text{Beta}(\mu,\nu)</math>, with unknown hyperparameters <math>\mu,\nu</math>. How would you now estimate the Bayes posterior distribution of <math>\beta</math>, marginalizing over everything else?</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_24._Goodness_of_Fit&diff=3417Segment 24. Goodness of Fit2016-04-22T18:41:09Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/EJleSVf0Z-U&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/EJleSVf0Z-U http://youtu.be/EJleSVf0Z-U]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/24.GoodnessOfFit.pdf PDF file] or [http://wpressutexas.net/coursefiles/24.GoodnessOfFit.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. Let <math>X</math> be an R.V. that is a linear combination (with known, fixed coefficients <math>\alpha_k</math>) of twenty <math>N(0,1)</math> deviates. That is, <math>X = \sum_{k=1}^{20} \alpha_k T_k</math> where <math>T_k \sim N(0,1)</math>. How can you most simply form a t-value-squared (that is, something distributed as <math>\text{Chisquare}(1)</math> from <math>X</math>? For some particular choice of <math>\alpha_k</math>'s (random is ok), generate a sample of <math>x</math>'s, plot their histogram, and show that it agrees with <math>\text{Chisquare}(1)</math>.<br />
<br />
2. From some matrix of known coefficients <math>\alpha_{ik}</math> with <math>k=1,\ldots,20</math> and <math>i = 1,\ldots,100</math>, generate 100 R.V.s <math>X_i = \sum_{k=1}^{20} \alpha_{ik} T_k</math> where <math>T_k \sim N(0,1)</math>. In other words, you are expanding 20 i.i.d. <math>T_k</math>'s into 100 R.V.'s. Form a sum of 100 t-values-squareds obtained from these variables and demonstrate numerically by repeated sampling that it is distributed as <math>\text{Chisquare}(\nu)</math>? What is the value of <math>\nu</math>? Use enough samples so that you could distinguish between <math>\nu</math> and <math>\nu-1</math>.<br />
<br />
3. Reproduce the table of critical <math>\Delta\chi^2</math> values shown in slide 7. Hint: Go back to segment 21 and listen to the exposition of slide 7. (My solution is 3 lines in Mathematica.)<br />
<br />
====To Think About====<br />
1. Design a numerical experiment to exemplify the assertions on slide 8, namely that <math>\chi^2_{min}</math> varies by <math>\pm\sqrt{2\nu}</math> from data set to data set, but varies only by <math>\pm O(1)</math> as the fitted parameters <math>\mathbf b</math> vary within their statistical uncertainty?<br />
<br />
2. Suppose you want to estimate the central value <math>\mu</math> of a sample of <math>N</math> values drawn from <math>\text{Cauchy}(\mu,\sigma)</math>. If your estimate is the mean of your sample, does the "universal rule of thumb" (slide 2) hold? That is, does the accuracy get better as <math>N^{-1/2}</math>? Why or why not? What if you use the median of your sample as the estimate? Verify your answers by numerical experiments.<br />
<br />
===Class Activity===<br />
<br />
I measured the temperature of my framitron manifold every minute for 1000 minutes, with the same accuracy, <math>\sigma = 5</math>, for each measurement. The data is plotted below (with data points connected by straight lines), and is in the file [[media:Modelselection1.txt|Modelselection1.txt]].<br />
<br />
[[File:modelselection1.png|400px|1st set]]<br />
<br />
It's a contest! Which group can write down a model <math>T(t|\mathbf{b})</math>, where <math>\mathbf{b}</math><br />
is a vector of parameters, that gives the best fit to the data in a least squares sense.<br />
<br />
Part 1. Any number of parameters in <math>\mathbf{b}</math> are allowed.<br />
<br />
Part 2. At most 20 parameters are allowed.<br />
<br />
Part 3. At most 10 parameters are allowed.<br />
<br />
Part 4. At most 4 parameters are allowed.<br />
<br />
And, oh by the way, we'll actually test your model on a different realization of the same<br />
process, possibly one of the ones shown below.<br />
<br />
[[File:modelselection2.png|400px|2nd set]]<br />
<br />
[[File:modelselection3.png|400px|3rd set]]<br />
<br />
[[media:Modelselection2.txt|Modelselection2.txt]]<br />
<br />
[[media:Modelselection3.txt|Modelselection3.txt]]<br />
<br />
[[media:Modelselection4.txt|Modelselection4.txt]]<br />
<br />
[[media:Modelselection5.txt|Modelselection5.txt]]<br />
<br />
[[media:Modelselection6.txt|Modelselection6.txt]]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_23._Bootstrap_Estimation_of_Uncertainty&diff=3416Segment 23. Bootstrap Estimation of Uncertainty2016-04-22T18:40:15Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/1OC9ul-1PVg&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/1OC9ul-1PVg http://youtu.be/1OC9ul-1PVg]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/23.Bootstrap.pdf PDF file] or [http://wpressutexas.net/coursefiles/23.Bootstrap.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Compute====<br />
1. Generate 100 i.i.d. random draws from the beta distribution <math>\text{Beta}(2.5,5.)</math>, for example using MATLAB's betarnd or Python's random.betavariate. Use these to estimate this statistic of the underlying distribution: "value of the 75% percentile point minus value of the 25th percentile point". Now use statistical bootstrap to estimate the distribution of uncertainty of your estimate, for example as a histogram.<br />
<br />
2. Suppose instead that you can draw any number of desired samples (each 100 draws) from the distribution. How does the histogram of the desired statistic from these samples compare with the bootstrap histogram from problem 1?<br />
<br />
3. What is the actual value of the desired statistic for this beta distribution, computed numerically (that is, not by random sampling)? (Hint: I did this in Mathematica in three lines.)<br />
<br />
====To Think About====<br />
1. Suppose your desired statistic (for a sample of N i.i.d. data values) was "minimum of the N values". What would the bootstrap estimate of the uncertainty look like in this case? Does this violate the bootstrap theorem? Why or why not?<br />
<br />
2. If you knew the distribution, how would you compute the actual distribution for the statistic "minimum of N sampled values", not using random sampling in your computation?<br />
<br />
3. For N data points, can you design a statistic so perverse (and different from one suggested in the segment) that the statistical bootstrap fails, even asymptotically as N becomes large?<br />
<br />
====Class Activity====<br />
<br />
Download the data set given below. It contains 100 draws from a 4 dimensional distribution i.e. each draw returns a 4 dimensional vector, <math>[x_1, x_2, x_3, x_4]\;.</math> The statistic which we are interested in is,<br />
<math> t = [\langle x_1 x_2 \rangle,\;\langle x_3 x_4 \rangle]\;.</math><br />
Carry out the following tasks:<br />
*Give a point estimate of the statistic.<br />
* Carry out bootstrapping and visualize the uncertainty in the statistic using a scatter plot.<br />
<br />
[[Data Set]]<br />
<br />
[http://wpressutexas.net/coursewiki/images/9/96/Dataset.txt Dataset_txtfile]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_22._Uncertainty_of_Derived_Parameters&diff=3415Segment 22. Uncertainty of Derived Parameters2016-04-22T18:39:22Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/ZoD3_rov--w&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/ZoD3_rov--w http://youtu.be/ZoD3_rov--w]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/22.UncertaintyOfDerivedParms.pdf PDF file] or [http://wpressutexas.net/coursefiles/22.UncertaintyOfDerivedParms.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Compute====<br />
1. In lecture slide 3, suppose (for some perverse reason) we were interested in a quantity <math>f = b_3/b_5</math> instead of <math>f = b_3b_5</math>. Calculate a numerical estimate of this new <math>f</math> and its standard error.<br />
<br />
2. Same set up, but plot a histogram of the distribution of <math>f</math> by sampling from its posterior distribution (using Python, MATLAB, or any other platform).<br />
<br />
====To Think About====<br />
1. Lecture slide 2 asserts that a function of normally distributed RVs is not, in general, normal. Consider the product of two independent normals. Is it normal? No! But isn't the product of two normal distribution functions (Gaussians) itself Gaussian? So, what is going on?<br />
<br />
2. Can you invent a function of a single normal N(0,1) random variable whose distribution has two separate peaks (maxima)? How about three? How about ten?</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_21._Marginalize_or_Condition_Uninteresting_Fitted_Parameters&diff=3414Segment 21. Marginalize or Condition Uninteresting Fitted Parameters2016-04-22T18:38:55Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/yxZUS_BpEZk&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/yxZUS_BpEZk http://youtu.be/yxZUS_BpEZk]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/21.MarginalizeVsCondition.pdf PDF file] or [http://wpressutexas.net/coursefiles/21.MarginalizeVsCondition.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. Consider a 2-dimensional multivariate normal distribution of the random variable <math>(b_1,b_2)</math> with 2-vector mean <math>(\mu_1,\mu_2)</math> and 2x2 matrix covariance <math>\Sigma</math>. What is the distribution of <math>b_1</math> given that <math>b_2</math> has the particular value <math>b_c</math>? In particular, what is the mean and standard deviation of the conditional distribution of <math>b_1</math>? (Hint, either see Wikipedia "[https://en.wikipedia.org/wiki/Multivariate_normal_distribution Multivariate normal distribution]" for the general case, or else just work out this special case.)<br />
<br />
2. Same, but marginalize over <math>b_2</math> instead of conditioning on it.<br />
<br />
====To Think About====<br />
1. Why should it be called the Fisher <i>Information</i> Matrix? What does it have to do with "information"?<br />
<br />
2. Go read (e.g., in Wikipedia or elsewhere) about the "Cramer-Rao bound" and be prepared to explain what it is, and what it has to do with the Fisher Information Matrix.<br />
<br />
===Class Activity===<br />
Today we'll do [[Find the Volcano]].</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_20._Nonlinear_Least_Squares_Fitting&diff=3413Segment 20. Nonlinear Least Squares Fitting2016-04-22T18:38:27Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/xtBCGPHRcb0&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/xtBCGPHRcb0 http://youtu.be/xtBCGPHRcb0]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/20.NonlinearLeastSquares.pdf PDF file] or [http://wpressutexas.net/coursefiles/20.NonlinearLeastSquares.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. (See lecture slide 3.) For one-dimensional <math>x</math>, the model <math>y(x | \mathbf b)</math> is called "linear" if <math>y(x | \mathbf b) = \sum_k b_k X_k(x)</math>, where <math>X_k(x)</math> are arbitrary known functions of <math>x</math>. Show that minimizing <math>\chi^2</math> produces a set of linear equations (called the "normal equations") for the parameters <math>b_k</math>.<br />
<br />
2. A simple example of a linear model is <math>y(x | \mathbf b) = b_0 + b_1 x</math>, which corresponds to fitting a straight line to data. What are the MLE estimates of <math>b_0</math> and <math>b_1</math> in terms of the data: <math>x_i</math>'s, <math>y_i</math>'s, and <math>\sigma_i</math>'s?<br />
<br />
====To Think About====<br />
1. We often rather casually assume a uniform prior <math>P(\mathbf b)= \text{constant}</math> on the parameters <math>\mathbf b</math>. If the prior is not uniform, then is minimizing <math>\chi^2</math> the right thing to do? If not, then what should you do instead? Can you think of a situation where the difference would be important?<br />
<br />
2. What if, in lecture slide 2, the measurement errors were <math>e_i \sim \text{Cauchy}(0,\sigma_i)</math> instead of <math>e_i \sim N(0,\sigma_i)</math>? How would you find MLE estimates for the parameters <math>\mathbf b</math>?<br />
<br />
===Class Activity===<br />
Here is some data: [[Media:Chisqfitdata.txt]]<br />
<br />
In class we will work on fitting this to some models as explained [[ClassActivity20130325|here]].<br />
<br />
Here are Bill's numerical answers, so that you can see whether you are on the right track (or whether Bill got it wrong!): [[Media:Chisqfitanswers.txt]]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_19._The_Chi_Square_Statistic&diff=3412Segment 19. The Chi Square Statistic2016-04-22T18:37:41Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/87EMhmPkOhk&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/87EMhmPkOhk http://youtu.be/87EMhmPkOhk]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/19.ChiSquareStatistic.pdf PDF file] or [http://wpressutexas.net/coursefiles/19.ChiSquareStatistic.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. Prove the assertion on lecture slide 5, namely that, for a multivariate normal distribution, the quantity <math>({\mathbf x-\mathbf\mu})^T{\mathbf\Sigma}^{-1}({\mathbf x-\mathbf\mu})</math>, where <math>\mathbf x</math> is a random draw from the multivariate normal, is <math>\chi^2</math> distributed.<br />
<br />
====To Think About====<br />
1. Why are we so interested in t-values? Why do we square them?<br />
<br />
2. Suppose you measure a bunch of quantities <math>x_i</math>, each of which is measured with a measurement accuracy <math>\sigma_i</math> and has a theoretically expected value <math>\mu_i</math>. Describe in detail how you might use a chi-square test statistic as a p-value test to see if your theory is viable? Should your test be 1 or 2 tailed?<br />
<br />
<br />
==== Class Exercise ====<br />
[http://wpressutexas.net/coursewiki/images/3/39/Chi_square.pdf Class Exercise]<br />
<br />
Data file: [[Media:mv_chi.txt]]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_18._The_Correlation_Matrix&diff=3411Segment 18. The Correlation Matrix2016-04-22T18:36:50Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/aW5q_P0it9E&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/aW5q_P0it9E http://youtu.be/aW5q_P0it9E]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/18.CorrelationMatrix.pdf PDF file] or [http://wpressutexas.net/coursefiles/18.CorrelationMatrix.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. Random points i are chosen uniformly on a circle of radius 1, and their <math>(x_i,y_i)</math> coordinates in the plane are recorded. What is the 2x2 covariance matrix of the random variables <math>X</math> and <math>Y</math>? (Hint: Transform probabilities from <math>\theta</math> to <math>x</math>. Second hint: Is there a symmetry argument that some components must be zero, or must be equal?)<br />
<br />
2. Points are generated in 3 dimensions by this prescription: Choose <math>\lambda</math> uniformly random in <math>(0,1)</math>. Then a point's <math>(x,y,z)</math> coordinates are <math>(\alpha\lambda,\beta\lambda,\gamma\lambda)</math>. What is the covariance matrix of the random variables <math>(X,Y,Z)</math> in terms of <math>\alpha,\beta,\text{ and }\gamma</math>? What is the linear correlation matrix of the same random variables?<br />
<br />
====To Think About====<br />
1. Suppose you want to get a feel for what a linear correlation <math>r=0.3</math> (say) looks like. How would you generate a bunch of points in the plane with this value of <math>r</math>? Try it. Then try for different values of <math>r</math>. As <math>r</math> increases from zero, what is the smallest value where you would subjectively say "if I know one of the variables, I pretty much know the value of the other"?<br />
<br />
2. Suppose that points in the <math>(x,y)</math> plane fall roughly on a 45-degree line between the points (0,0) and (10,10), but in a band of about width w (in these same units). What, roughly, is the linear correlation coefficient <math>r</math>?<br />
<br />
====Class Activity====<br />
[[Class Activity 3/5/14]]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Review_Session_for_Mid-Term_Exam&diff=3410Review Session for Mid-Term Exam2016-04-22T18:35:45Z<p>Bill Press: </p>
<hr />
<div>[http://wpressutexas.net/coursewiki/images/e/ed/Generalized_monty.pdf Generalized Monty Hall]<br />
<br />
[http://wpressutexas.net/coursefiles/blitz_2014.pdf Probability blitz]<br />
<br />
[http://wpressutexas.net/coursewiki/images/f/f4/Multivar_normal.pdf MVN Exercise]<br />
<br />
<br />
<br />
[http://wpressutexas.net/coursefiles/assorted_solutions.pdf Solutions to generalized Monty Hall and blitz]<br />
<br />
[http://wpressutexas.net/coursewiki/images/f/f4/Multivar_normal.pdf Solutions to MVN Exercise]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_17._The_Multivariate_Normal_Distribution&diff=3409Segment 17. The Multivariate Normal Distribution2016-04-22T18:34:28Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/t7Z1a_BOkN4&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/t7Z1a_BOkN4 http://youtu.be/t7Z1a_BOkN4]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/17.MultivariateNormal.pdf PDF file] or [http://wpressutexas.net/coursefiles/17.MultivariateNormal.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. Calculate the Jacobian determinant of the transformation of variables defined by<br />
<br />
<math>y_1 = x_1/x_2, \qquad y_2 = x_2^2</math><br />
<br />
2. Consider the 3-dimensional multivariate normal over <math>(x_1,x_2,x_3)</math> with <math>\mu = (-1,-1,-1)</math> and<br />
<br />
<math>\Sigma^{-1} = \left(<br />
\begin{array}{ccc}<br />
5 & -1 & 2 \\<br />
-1 & 8 & 1 \\<br />
2 & 1 & 4<br />
\end{array}<br />
\right)</math>. (Note the matrix inverse notation.)<br />
<br />
What are 2-dimensional <math>\mu</math> and <math>\Sigma^{-1}</math> for<br />
<br />
(a) the distribution on the slice <math>x_3=0</math>?<br />
<br />
(b) the marginalization over <math>x_3</math>?<br />
<br />
Hint: The answers are all simple rationals, but I had to use Mathematica to work them out.<br />
<br />
====To Think About====<br />
1. Prove the assertions in slide 5. That is, implement the ideas in the blue text.<br />
<br />
2. How would you plot an error ellipsoid in 3 dimensions? That is, what would be the 3-dimensional version of the code in slide 8? (You can assume the plotting capabilities of your favorite programming language.)<br />
<br />
====Class Activity====<br />
<br />
[[Media:somematrices.txt|Some 3x3 Matrices]]<br />
<br />
[http://wpressutexas.net/coursewiki/images/f/f4/Multivar_normal.pdf MVN Exercise]<br />
<br />
[[Media:MultivarGaussExample.nb.txt|Bill's Mathematica notebook for problem 2 (above)]]. (Download file, rename as MultivarGaussExample.nb, then open in Mathematica.)</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_15._The_Towne_Family_-_Again&diff=3408Segment 15. The Towne Family - Again2016-04-22T18:33:40Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/Y-i0CN15X-M&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/Y-i0CN15X-M http://youtu.be/Y-i0CN15X-M]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/15.TheTowneFamilyAgain.pdf PDF file] or [http://wpressutexas.net/coursefiles/15.TheTowneFamilyAgain.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. In slide 4, we used "posterior predictive p-value" to get the respective p-values 1.0e-13, .01, .12, and .0013. What if we had mistakenly just used the maximum likelihood estimate r=0.003, instead of integrating over r? What p-values would we have obtained?<br />
====To Think About====<br />
1. Can you think of a unified way to handle the Towne family problem (estimating r and deciding which family members are likely "non-paternal") without trimming the data? We'll show one such method in a later segment, but there is likely more than one possible good answer.<br />
<br />
===Class Activity===<br />
<br />
We divided into three teams. Each team prepared a single solution set for last year's [[Media:SurpriseQuiz20130304.pdf|surprise quiz]] of around this date.<br />
<br />
Here are the three solutions:<br />
<br />
[[02-24-14 -- Group 1 -- Group Quiz|Team 1]]<br />
<br />
[[Group Two: The Towne Family - Again, Class Activity]]<br />
<br />
[[Media:Team3teamquiz.pdf|Team 3 scanned PDF]]<br />
<br />
Every class member gets to vote for TWO of these for which is best, your own team and one other. You must vote for two, not just 1. Please edit this page to add your (screen) name to two of the following lists:<br />
<br />
Team 1 votes: Vsub, Jonathan, Daniel, Aaron, Deepesh, Sanmit, Todd, Nick, Eleisha, Andrea, Rene<br />
<br />
Team 2 votes: Todd, Eleisha, Elad<br />
<br />
Team 3 votes: Vsub, Jonathan, Daniel, Aaron, Deepesh, Sanmit, Nick, Andrea, Elad, Rene<br />
<br />
Here is Bill's solution set from last year. (I wasn't trying to be as complete or neat as I expect this year's teams to be.) [[Media:MidtermSolutions20130317.pdf|Solutions]]<br />
<br />
===Voting Comments===<br />
Vsub: Team-1: 8/9 (lacking in explanations). Team-2: 6.5/9 (lacks explanation, nice plots, too concise on 6,7) Team-3: 7/9 (incomplete ans3; incorrect answers 6,7; good detailed derivations). Vote preferenceorder: Team1 > Team3 > Team2</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_16._Multiple_Hypotheses&diff=3407Segment 16. Multiple Hypotheses2016-04-22T18:32:18Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/w6AjduOEN2k&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/w6AjduOEN2k http://youtu.be/w6AjduOEN2k]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/16.MultipleHypotheses.pdf PDF file] or [http://wpressutexas.net/coursefiles/16.MultipleHypotheses.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. Simulate the following: You have M=50 p-values, none actually causal, so that they are drawn from a uniform distribution. Not knowing this sad fact, you apply the Benjamini-Hochberg prescription with <math>\alpha=0.05</math> and possibly call some discoveries as true. By repeated simulation, estimate the probability of thus getting N wrongly-called discoveries, for N=0, 1, 2, and 3.<br />
<br />
2. Does the distribution that you found in problem 1 depend on M? On <math>\alpha</math>? Derive its form analytically<br />
for the usual case of <math>\alpha \ll 1</math>?<br />
<br />
====To Think About====<br />
1. Suppose you have M independent trials of an experiment, each of which yields an independent p-value. Fisher proposed combining them by forming the statistic<br />
<br />
<math>S = -2\sum_{i=0}^{i=M}\log(p_i)</math><br />
<br />
Show that, under the null hypothesis, S is distributed as <math>\text{Chisquare}(2M)</math> and describe how you would obtain a combined p-value for this statistic.<br />
<br />
2. Fisher is sometimes credited, on the basis of problem 1, with having invented "meta-analysis", whereby results from multiple investigations can be combined to get an overall more significant result. Can you see any pitfalls in this?<br />
<br />
===Class Activity===<br />
[http://wpressutexas.net/coursefiles/p_value_follow_ups.pdf P-value follow-ups]<br />
<br />
*[[Team 1 - Feb 21 Activity]]<br />
*[[http://wpressutexas.net/coursewiki/index.php/Segment_16...Multiple_Hypotheses Team Girls + Sanmit - Feb 21 Activity]]<br />
*[[Team3-021714part2]]<br />
*[[Feb20-Team4-P-value follow up]]<br />
<br />
Here is John's written up solution: [http://wpressutexas.net/coursewiki/images/f/fd/Pvalue_examples.pdf Pvalue Examples Solutions].</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_14._Bayesian_Criticism_of_P-Values&diff=3406Segment 14. Bayesian Criticism of P-Values2016-04-22T18:31:19Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/IKV6Pn18C7o&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/IKV6Pn18C7o http://youtu.be/IKV6Pn18C7o]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/14.BayesianCriticismOfPValues.pdf PDF file] or [http://wpressutexas.net/coursefiles/14.BayesianCriticismOfPValues.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. Suppose the stopping rule is "flip exactly 10 times" and the data is that 8 out of 10 flips are heads. With what p-value can you rule out the hypothesis that the coin is fair? Is this statistically significant?<br />
<br />
2. Suppose that, as a Bayesian, you see 10 flips of which 8 are heads. Also suppose that your prior for the coin being fair is 0.75. What is the posterior probability that the coin is fair? (Make any other reasonable assumptions about your prior as necessary.)<br />
<br />
3. For the experiment in the segment, what if the stopping rule was (perversely) "flip until I see five consecutive heads followed immediately by a tail, then count the total number of heads"? What would be the p-value?<br />
====To Think About====<br />
1. If biology journals require p<0.05 for results to be published, does this mean that one in twenty biology results are wrong (in the sense that the uninteresting null hypothesis is actually true rather than disproved)? Why might it be worse, or better, than this? (See also the provocative [http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0020124 paper by Ioannidis], and [http://www.technologyreview.com/view/510126/the-statistical-puzzle-over-how-much-biomedical-research-is-wrong/ this blog] in Technology Review (whose main source is [http://arxiv.org/ftp/arxiv/papers/1301/1301.3718.pdf this article]). Also [http://www.sciencemag.org/content/331/6015/272.full this news story] about ESP research. You can Google for other interesting references.)</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_13._The_Yeast_Genome&diff=3405Segment 13. The Yeast Genome2016-04-22T18:30:21Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/QSgUX-Do8Tc&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/QSgUX-Do8Tc http://youtu.be/QSgUX-Do8Tc]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/13.TheYeastGenome.pdf PDF file] or [http://wpressutexas.net/coursefiles/13.TheYeastGenome.ppt PowerPoint file]<br />
<br />
Link to the file mentioned in the segment: [http://slate.ices.utexas.edu/coursefiles/SacCerChr4.txt.zip SacSerChr4.txt]<br />
<br />
Link to all yeast chromosomes: [http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/chromosomes/ UCSC]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. With p=0.3, and various values of n, how big is the largest discrepancy between the Binomial probability pdf and the approximating Normal pdf? At what value of n does this value become smaller than <math>10^{-15}</math>?<br />
<br />
2. Show that if four random variables are (together) multinomially distributed, each separately is binomially distributed.<br />
<br />
====To Think About====<br />
1. The segment suggests that <math>A\ne T</math> and <math>C\ne G</math> comes about because genes are randomly distributed on one strand or the other. Could you use the observed discrepancies to estimate, even roughly, the number of genes in the yeast genome? If so, how? If not, why not?<br />
<br />
2. Suppose that a Bayesian thinks that the prior probability of the hypothesis that "<math>P_A=P_T</math>" is 0.9,<br />
and that the set of all hypotheses that "<math>P_A\ne P_T</math>" have a total prior of 0.1. How might he calculate the odds ratio <math>\text{Prob}(P_A=P_T)/\text{Prob}(P_A\ne P_T)</math>? Hint: Are there nuisance variables to be marginalized over?<br />
<br />
===Class Activity===<br />
<br />
[http://wpressutexas.net/coursefiles/chrIV.txt Yeast chromosome 4]<br />
<br />
[http://wpressutexas.net/coursefiles/yeast_ORFs Activity slides]</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_12._P-Value_Tests&diff=3404Segment 12. P-Value Tests2016-04-22T18:29:19Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/2Ul7TI0B5ek&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/2Ul7TI0B5ek http://youtu.be/2Ul7TI0B5ek]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/12.P-ValueTests.pdf PDF file] or [http://wpressutexas.net/coursefiles/12.P-ValueTests.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. What is the critical region for a 5% two-sided test if, under the null hypothesis, the test statistic is distributed as <math>\text{Student}(0,\sigma,4)</math>? That is, what values of the test statistic disprove the null hypothesis with p < 0.05? (OK to use Python, MATLAB, or Mathematica.)<br />
<br />
2. For an exponentially distributed test statistic with mean <math>\mu</math> (under the null hypothesis), when is the the null hypothesis disproved with p < 0.01 for a one-sided test? for a two-sided test? <br />
<br />
====To Think About====<br />
1. P-value tests require an initial choice of a test statistic. What goes wrong if you choose a poor test statistic? What would make it poor?<br />
<br />
2. If the null hypothesis is that a coin is fair, and you record the results of N flips, what is a good test statistic? Are there any other possible test statistics?<br />
<br />
3. Why is it so hard for a Bayesian to do something as simple as, given some data, disproving a null hypothesis? Can't she just compute a Bayes odds ratio, P(null hypothesis is true)/P(null hypothesis is false) and derive a probability that the null hypothesis is true?<br />
<br />
===Class Activity===</div>Bill Presshttp://wpressutexas.net/coursewiki/index.php?title=Segment_11._Random_Deviates&diff=3403Segment 11. Random Deviates2016-04-22T18:28:54Z<p>Bill Press: </p>
<hr />
<div>====Watch this segment====<br />
(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)<br />
<br />
{{#widget:Iframe<br />
|url=http://www.youtube.com/v/4r1GlyisB8E&hd=1<br />
|width=800<br />
|height=625<br />
|border=0<br />
}}<br />
<br />
The direct YouTube link is [http://youtu.be/4r1GlyisB8E http://youtu.be/4r1GlyisB8E]<br />
<br />
Links to the slides: [http://wpressutexas.net/coursefiles/11.RandomDeviates.pdf PDF file] or [http://wpressutexas.net/coursefiles/11.RandomDeviates.ppt PowerPoint file]<br />
<br />
===Problems===<br />
====To Calculate====<br />
1. For the Cauchy distribution (Segment 8, Slide 3), find the inverse function of the CDF.<br />
<br />
2. In your favorite programming language, write a function that returns independent Cauchy deviates.<br />
<br />
====To Think About====<br />
1. Suppose you want a function that returns deviates for Student<math>(\nu)</math>. Could you use the Cauchy pdf (or some scaling of it) as a bounding function in a rejection method? How efficient is this (i.e., what fraction of the time does it reject)?<br />
<br />
2. Explain the three inequality tests in the "while" statement in Leva's algorithm (slide 7) and why they are hooked together with logical operators in the way shown.<br />
<br />
===Class Activity===<br />
[[Build your own random number generator]]</div>Bill Press