# Segment 39. MCMC and Gibbs Sampling

#### Watch this segment

(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)

{{#widget:Iframe |url=http://www.youtube.com/v/4gNpgSPal_8&hd=1 |width=800 |height=625 |border=0 }}

The direct YouTube link is http://youtu.be/4gNpgSPal_8

Links to the slides: PDF file or PowerPoint file

### Problems

### To Calculate

1. Suppose the domain of a model are the five integers **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x = \{1,2,3,4,5\}}**
, and that your proposal distribution is: "When **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_1 = 2,3,4}**
, choose with equal probability **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_2 = x_1 \pm 1}**
. For **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_1=1}**
always choose **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_2 =2}**
. For **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_1=5}**
always choose **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_2 =4}**
. What is the ratio of **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle q}**
's that goes into the acceptance probability **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha(x_1,x_2)}**
for all the possible values of **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_1}**
and **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_2}**
?

2. Suppose the domain of a model is **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle -\infty < x < \infty}**
and your proposal distribution is (perversely),

**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle q(x_2|x_1) = \begin{cases}\tfrac{7}{2}\exp[-7(x_2-x_1)],\quad & x_2 \ge x_1 \\ \tfrac{5}{2}\exp[-5(x_1-x_2)],\quad & x_2 < x_1 \end{cases}}**

Sketch this distribution as a function of **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_2-x_1}**
. Then, write down an expression for the ratio of **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle q}**
's that goes into the acceptance probability **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha(x_1,x_2)}**
.

### To Think About

1. Suppose an urn contains 7 large orange balls, 3 medium purple balls, and 5 small green balls. When balls are drawn randomly, the larger ones are more likely to be drawn, in the proportions large:medium:small = 6:4:3. You want to draw exactly 6 balls, one at a time without replacement. How would you use Gibbs sampling to learn: (a) How often do you get 4 orange plus 2 of the same (non-orange) color? (b) What is the expectation (mean) of the product of the number of purple and number of green balls drawn?

2. How would you do the same problem computationally but without Gibbs sampling?

3. How would you do the same problem non-stochastically (e.g., obtain answers to 12 significant figures)? (Hint: This is known as the Wallenius non-central hypergeometric distribution.)

[Answers: 0.155342 and 1.34699]

### Class Activity

There's a story here, about diagnosing rats by which branches they pick in a maze. Bill will explain in class. Unless he thinks up a better story.

Mathematically, it's another one of these amazing Gibbs sampling examples. Suppose 2 unknown distributions over the digits 0..9, that is **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p_0,p_1,\ldots,p_9}**
and **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle q_0,q_1,\ldots,q_9}**
, of course with **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sum_i p_i = 1}**
and **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sum_i q_i = 1}**
. This data file has 1000 lines, each with 10 i.i.d. draws of digits, either from the **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p}**
's or the **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle q}**
's -- but, for each line, you don't know which.

1. Estimate **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p_0,p_1,\ldots,p_9}**
and **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle q_0,q_1,\ldots,q_9}**
from the data. If you are ambitious, do this by two different methods: First, by Gibbs sampling. Second, by an E-M method. (Although these are conceptually different, my code for them differs by only a few lines.)

2. Estimate a probability for each line in the data file as to whether it is drawn from the **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p_i}**
's (as opposed to the **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle q_i}**
's.

3. Plot histograms that show the uncertainties of your Gibbs estimate for the **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p_i}**
's. Do your E-M estimates appear to be at the modes of your Gibbs histograms? Should they be?