# Segment 39. MCMC and Gibbs Sampling

#### Watch this segment

(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)

The direct YouTube link is http://youtu.be/4gNpgSPal_8

Links to the slides: PDF file or PowerPoint file

### Problems

### To Calculate

1. Suppose the domain of a model are the five integers **Failed to parse (unknown error): x = \{1,2,3,4,5\}**
, and that your proposal distribution is: "When **Failed to parse (unknown error): x_1 = 2,3,4**
, choose with equal probability **Failed to parse (unknown error): x_2 = x_1 \pm 1**
. For **Failed to parse (unknown error): x_1=1**
always choose **Failed to parse (unknown error): x_2 =2**
. For **Failed to parse (unknown error): x_1=5**
always choose **Failed to parse (unknown error): x_2 =4**
. What is the ratio of **Failed to parse (unknown error): q**
's that goes into the acceptance probability **Failed to parse (unknown error): \alpha(x_1,x_2)**
for all the possible values of **Failed to parse (unknown error): x_1**
and **Failed to parse (unknown error): x_2**
?

2. Suppose the domain of a model is **Failed to parse (unknown error): -\infty < x < \infty**
and your proposal distribution is (perversely),

**Failed to parse (unknown error): q(x_2|x_1) = \begin{cases}\tfrac{7}{2}\exp[-7(x_2-x_1)],\quad & x_2 \ge x_1 \\ \tfrac{5}{2}\exp[-5(x_1-x_2)],\quad & x_2 < x_1 \end{cases}**

Sketch this distribution as a function of **Failed to parse (unknown error): x_2-x_1**
. Then, write down an expression for the ratio of **Failed to parse (unknown error): q**
's that goes into the acceptance probability **Failed to parse (unknown error): \alpha(x_1,x_2)**
.

### To Think About

1. Suppose an urn contains 7 large orange balls, 3 medium purple balls, and 5 small green balls. When balls are drawn randomly, the larger ones are more likely to be drawn, in the proportions large:medium:small = 6:4:3. You want to draw exactly 6 balls, one at a time without replacement. How would you use Gibbs sampling to learn: (a) How often do you get 4 orange plus 2 of the same (non-orange) color? (b) What is the expectation (mean) of the product of the number of purple and number of green balls drawn?

2. How would you do the same problem computationally but without Gibbs sampling?

3. How would you do the same problem non-stochastically (e.g., obtain answers to 12 significant figures)? (Hint: This is known as the Wallenius non-central hypergeometric distribution.)

[Answers: 0.155342 and 1.34699]

### Class Activity

There's a story here, about diagnosing rats by which branches they pick in a maze. Bill will explain in class. Unless he thinks up a better story.

Mathematically, it's another one of these amazing Gibbs sampling examples. Suppose 2 unknown distributions over the digits 0..9, that is **Failed to parse (unknown error): p_0,p_1,\ldots,p_9**
and **Failed to parse (unknown error): q_0,q_1,\ldots,q_9**
, of course with **Failed to parse (unknown error): \sum_i p_i = 1**
and **Failed to parse (unknown error): \sum_i q_i = 1**
. This data file has 1000 lines, each with 10 i.i.d. draws of digits, either from the **Failed to parse (unknown error): p**
's or the **Failed to parse (unknown error): q**
's -- but, for each line, you don't know which.

1. Estimate **Failed to parse (unknown error): p_0,p_1,\ldots,p_9**
and **Failed to parse (unknown error): q_0,q_1,\ldots,q_9**
from the data. If you are ambitious, do this by two different methods: First, by Gibbs sampling. Second, by an E-M method. (Although these are conceptually different, my code for them differs by only a few lines.)

2. Estimate a probability for each line in the data file as to whether it is drawn from the **Failed to parse (unknown error): p_i**
's (as opposed to the **Failed to parse (unknown error): q_i**
's.

3. Plot histograms that show the uncertainties of your Gibbs estimate for the **Failed to parse (unknown error): p_i**
's. Do your E-M estimates appear to be at the modes of your Gibbs histograms? Should they be?