Segment 41. Markov Chain Monte Carlo, Example 2

From Computational Statistics (CSE383M and CS395T)
Jump to navigation Jump to search

Watch this segment

(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)

{{#widget:Iframe |url= |width=800 |height=625 |border=0 }}

The direct YouTube link is

Links to the slides: PDF file or PowerPoint file


To Calculate

1. Show that the waiting times (times between events) in a Poisson process are Exponentially distributed. (I think we've done this before.)

2. Plot the pdf's of the waiting times between (a) every other Poisson event, and (b) every Poisson event at half the rate.

3. Show, using characteristic functions, that the waiting times between every Nth event in a Poisson process is Gamma distributed. (I think we've also done one before, but it is newly relevant in this segment.)

To Think About

1. In slide 5, showing the results of the MCMC, how can we be sure (or, how can we gather quantitative evidence) that there won't be another discrete change in <math>k_1</math> or <math>k_2</math> if we keep running the model longer. That is, how can we measure convergence of the model?

2. Suppose you have two hypotheses: H1 is that a set of times <math>t_i</math> are being generated as every 26th event from a Poisson process with rate 26. H2 is that they are every 27th event from a Poisson process with rate 27. (The mean rate is thus the same in both cases.) How would you estimate the number <math>N</math> of data points <math>t_i</math> that you need to clearly distinguish between these hypotheses?

Class Activity

Here's another one of these amazing Gibbs sampling examples. Suppose 2 unknown distributions over the digits 0..9, that is <math>p_0,p_1,\ldots,p_9</math> and <math>q_0,q_1,\ldots,q_9</math>, of course with <math>\sum_i p_i = 1</math> and <math>\sum_i q_i = 1</math>. This data file has 1000 lines, each with 10 i.i.d. draws of digits, either from the <math>p</math>'s or the <math>q</math>'s -- but, for each line, you don't know which.

1. Estimate <math>p_0,p_1,\ldots,p_9</math> and <math>q_0,q_1,\ldots,q_9</math> from the data. If you are ambitious, do this by two different methods: First, by Gibbs sampling. Second, by an E-M method. (Although these are conceptually different, my code for them differs by only a few lines.)

2. Estimate a probability for each line in the data file as to whether it is drawn from the <math>p_i</math>'s (as opposed to the <math>q_i</math>'s.

3. Plot histograms that show the uncertainties of your Gibbs estimate for the <math>p_i</math>'s. Do your E-M estimates appear to be at the modes of your Gibbs histograms? Should they be?