From Computational Statistics (CSE383M and CS395T)
Jump to navigation Jump to search

To Calculate

Teamed up with Kkumar and Dan.

The file Media:Mixturevals.txt contains 1000 values, each drawn either with probability <math>c</math> from the distribution <math>\text{Exponential}(\beta)</math> (for some constant <math>\beta</math>), or otherwise (with probability <math>1-c</math>) from the distribution <math>p(x) = (2/\pi)/(1+x^2),\; x>0</math>.

1. Write down an expression for the probability of the file's data given some values for the parameters <math>\beta</math> and <math>c</math>. <math> P(data| c, \beta) = \prod_{vi=1} p(x_i|1) * \prod_{vi=0}p(xi|0) = \prod_{vi=1}\beta \exp({-\beta x_i}) * \prod_{vi=0} \frac 2{\pi(1+{x_i}^2)} </math>

<math> = \prod_{i=1}^{1000} \beta \exp({-\beta x_i})*c + \frac {2*(1-c)}{\pi(1+{x_i}^2)} </math>

2. Calculate numerically the maximum likelihood values of <math>\beta</math> and <math>c</math>.

First we eye-approximated beta and c with the following distribution of mixed data:


we noticed the heavy tail of this distribution, which should belong to the power-law part of the data. We assumed from x=10 onward, the data points belongs to power law, whose tail decreases slower than exponential distribution.

<math> \int_{10}^{\infty} \frac 2{\pi *(1+x^2) } =0.0635 </math>

0.0635*(1-c)*1000 should be the number of values that are above10.

From there, we estimated that c should be close to 0.3.

Then we plotted exponential distribution with different beta values and superimposed the curves above our histogram from 0-10. We found beta=0.6 fits our data the best (red curve in the figure above).

def func(b,x):
    return -sum(np.log((b[1]*b[0]*np.exp(-b[0]*x)) + (2*(1-b[1])/(math.pi*(1+x**2)))))

scipy.optimize.fmin(func, [0.6,0.3], args=(xs,), full_output = True)

3. Estimate numerically the Bayes posterior distribution of <math>\beta</math>, marginalizing over <math>c</math> as a nuisance parameter. (You'll of course have to make some assumption about priors.)

<math> P(c, \beta|data) \propto \prod_{vi=1}[ p(x_i|1) c]* \prod_{vi=0}[p(xi|0) (1-c)] *P(c) = \prod_{vi=1}[\beta \exp({-\beta x_i})* c] * \prod_{vi=0} [\frac 2{\pi(1+{x_i}^2)}* (1-c) ] P(c) </math>

<math> = \prod_{i=1}^{1000} [\beta \exp({-\beta x_i})*c + \frac {2*(1-c)}{\pi(1+{x_i}^2)}]*P(c) </math>

<math> = \int_0^\infty \prod_{i=1}^{1000}[\beta \exp({-\beta x_i})*c + \frac {2*(1-c)}{\pi(1+{x_i}^2)}]*P(c) dc </math>

To Think About

1. In problem 3, above, you assumed some definite prior for <math>c</math>. What if <math>c</math> is itself drawn (just once for the whole data set) from a distribution <math>\text{Beta}(\mu,\nu)</math>, with unknown hyperparameters <math>\mu,\nu</math>. How would you now estimate the Bayes posterior distribution of <math>\beta</math>, marginalizing over everything else?