Segment 41 Sanmit Narvekar

From Computational Statistics Course Wiki
Jump to navigation Jump to search

Segment 41

To Calculate

1. Show that the waiting times (times between events) in a Poisson process are Exponentially distributed. (I think we've done this before.)


The time between each event corresponds to setting k = 1 in the distribution below:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p(\tau|k, \lambda) = \frac{\lambda^k}{(k-1)!} \tau^{k-1} e^{-\lambda \tau}}

Doing this yields:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p(\tau|\lambda) = \lambda e^{-\lambda \tau}}

This is exactly the Exponential distribution.


2. Plot the pdf's of the waiting times between (a) every other Poisson event, and (b) every Poisson event at half the rate.

The Matlab code is below, followed by the figure of the pdf (part a is in blue, part b is in red):


hold on

p = @(tau, k, lambda) ((lambda.^k) ./ (factorial(k-1))) .* tau.^(k-1) .* exp(-lambda .* tau);
tau = 0:0.01:5;

% Every other poisson event
pa = p(tau, 2, 2);
plot(tau, pa, 'b')

% Every poisson event at half the rate
pb = p(tau, 1, 1);
plot(tau, pb, 'r')

legend('Every other', 'Half rate')

SanmitSeg41.png

3. Show, using characteristic functions, that the waiting times between every Nth event in a Poisson process is Gamma distributed. (I think we've also done one before, but it is newly relevant in this segment.)


We know that the waiting time between each event is Exponentially distributed. The characteristic function of the Exponential distribution is:


Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \frac{\lambda}{\lambda - it}}


Thus, the waiting time between N events is the sum of N exponentially distributed variables. In characteristic function space, this is the characteristic function of the product of N exponentially distributed functions. By the Fourier convolution theorem, this then is simply the product of N of the above characteristic functions. Thus, the characteristic function of the waiting times between every Nth event is:


Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \left( \frac{\lambda}{\lambda - it} \right)^N}


Compare this with the characteristic function for the Gamma distribution (from Wikipedia):


Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \left( \frac{\beta - it}{\beta} \right)^{-\alpha}}


And you can see they are exactly the same if you replace lambda with beta and N with alpha.


To Think About

1. In slide 5, showing the results of the MCMC, how can we be sure (or, how can we gather quantitative evidence) that there won't be another discrete change in Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle k_1} or Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle k_2} if we keep running the model longer. That is, how can we measure convergence of the model?


2. Suppose you have two hypotheses: H1 is that a set of times Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle t_i} are being generated as every 26th event from a Poisson process with rate 26. H2 is that they are every 27th event from a Poisson process with rate 27. (The mean rate is thus the same in both cases.) How would you estimate the number Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle N} of data points Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle t_i} that you need to clearly distinguish between these hypotheses?