Segment 39. MCMC and Gibbs Sampling

From Computational Statistics Course Wiki
Jump to navigation Jump to search

Watch this segment

(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)

{{#widget:Iframe |url= |width=800 |height=625 |border=0 }}

The direct YouTube link is

Links to the slides: PDF file or PowerPoint file


To Calculate

1. Suppose the domain of a model are the five integers , and that your proposal distribution is: "When , choose with equal probability . For always choose . For always choose . What is the ratio of 's that goes into the acceptance probability for all the possible values of and ?

2. Suppose the domain of a model is and your proposal distribution is (perversely),

Sketch this distribution as a function of . Then, write down an expression for the ratio of 's that goes into the acceptance probability .

To Think About

1. Suppose an urn contains 7 large orange balls, 3 medium purple balls, and 5 small green balls. When balls are drawn randomly, the larger ones are more likely to be drawn, in the proportions large:medium:small = 6:4:3. You want to draw exactly 6 balls, one at a time without replacement. How would you use Gibbs sampling to learn: (a) How often do you get 4 orange plus 2 of the same (non-orange) color? (b) What is the expectation (mean) of the product of the number of purple and number of green balls drawn?

2. How would you do the same problem computationally but without Gibbs sampling?

3. How would you do the same problem non-stochastically (e.g., obtain answers to 12 significant figures)? (Hint: This is known as the Wallenius non-central hypergeometric distribution.)

[Answers: 0.155342 and 1.34699]

Class Activity

There's a story here, about diagnosing rats by which branches they pick in a maze. Bill will explain in class. Unless he thinks up a better story.

Mathematically, it's another one of these amazing Gibbs sampling examples. Suppose 2 unknown distributions over the digits 0..9, that is and , of course with and . This data file has 1000 lines, each with 10 i.i.d. draws of digits, either from the 's or the 's -- but, for each line, you don't know which.

1. Estimate and from the data. If you are ambitious, do this by two different methods: First, by Gibbs sampling. Second, by an E-M method. (Although these are conceptually different, my code for them differs by only a few lines.)

2. Estimate a probability for each line in the data file as to whether it is drawn from the 's (as opposed to the 's.

3. Plot histograms that show the uncertainties of your Gibbs estimate for the 's. Do your E-M estimates appear to be at the modes of your Gibbs histograms? Should they be?


Jeff's solution