Segment 39. MCMC and Gibbs Sampling

Watch this segment

(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)

{{#widget:Iframe |url=http://www.youtube.com/v/4gNpgSPal_8&hd=1 |width=800 |height=625 |border=0 }}

Links to the slides: PDF file or PowerPoint file

To Calculate

1. Suppose the domain of a model are the five integers $\displaystyle x = \{1,2,3,4,5\}$ , and that your proposal distribution is: "When $\displaystyle x_1 = 2,3,4$ , choose with equal probability $\displaystyle x_2 = x_1 \pm 1$ . For $\displaystyle x_1=1$ always choose $\displaystyle x_2 =2$ . For $\displaystyle x_1=5$ always choose $\displaystyle x_2 =4$ . What is the ratio of $\displaystyle q$ 's that goes into the acceptance probability $\displaystyle \alpha(x_1,x_2)$ for all the possible values of $\displaystyle x_1$ and $\displaystyle x_2$ ?

2. Suppose the domain of a model is $\displaystyle -\infty < x < \infty$ and your proposal distribution is (perversely),

$\displaystyle q(x_2|x_1) = \begin{cases}\tfrac{7}{2}\exp[-7(x_2-x_1)],\quad & x_2 \ge x_1 \\ \tfrac{5}{2}\exp[-5(x_1-x_2)],\quad & x_2 < x_1 \end{cases}$

Sketch this distribution as a function of $\displaystyle x_2-x_1$ . Then, write down an expression for the ratio of $\displaystyle q$ 's that goes into the acceptance probability $\displaystyle \alpha(x_1,x_2)$ .

1. Suppose an urn contains 7 large orange balls, 3 medium purple balls, and 5 small green balls. When balls are drawn randomly, the larger ones are more likely to be drawn, in the proportions large:medium:small = 6:4:3. You want to draw exactly 6 balls, one at a time without replacement. How would you use Gibbs sampling to learn: (a) How often do you get 4 orange plus 2 of the same (non-orange) color? (b) What is the expectation (mean) of the product of the number of purple and number of green balls drawn?

2. How would you do the same problem computationally but without Gibbs sampling?

3. How would you do the same problem non-stochastically (e.g., obtain answers to 12 significant figures)? (Hint: This is known as the Wallenius non-central hypergeometric distribution.)

Mathematically, it's another one of these amazing Gibbs sampling examples. Suppose 2 unknown distributions over the digits 0..9, that is $\displaystyle p_0,p_1,\ldots,p_9$ and $\displaystyle q_0,q_1,\ldots,q_9$ , of course with $\displaystyle \sum_i p_i = 1$ and $\displaystyle \sum_i q_i = 1$ . This data file has 1000 lines, each with 10 i.i.d. draws of digits, either from the $\displaystyle p$ 's or the $\displaystyle q$ 's -- but, for each line, you don't know which.
1. Estimate $\displaystyle p_0,p_1,\ldots,p_9$ and $\displaystyle q_0,q_1,\ldots,q_9$ from the data. If you are ambitious, do this by two different methods: First, by Gibbs sampling. Second, by an E-M method. (Although these are conceptually different, my code for them differs by only a few lines.)
2. Estimate a probability for each line in the data file as to whether it is drawn from the $\displaystyle p_i$ 's (as opposed to the $\displaystyle q_i$ 's.
3. Plot histograms that show the uncertainties of your Gibbs estimate for the $\displaystyle p_i$ 's. Do your E-M estimates appear to be at the modes of your Gibbs histograms? Should they be?