# /Segment27

## To Calculate

Teamed up with Kkumar and Dan.

The file Media:Mixturevals.txt contains 1000 values, each drawn either with probability $c$ from the distribution $\text{Exponential}(\beta)$ (for some constant $\beta$), or otherwise (with probability $1-c$) from the distribution $p(x) = (2/\pi)/(1+x^2),\; x>0$.

1. Write down an expression for the probability of the file's data given some values for the parameters $\beta$ and $c$. $P(data| c, \beta) = \prod_{vi=1} p(x_i|1) * \prod_{vi=0}p(xi|0) = \prod_{vi=1}\beta \exp({-\beta x_i}) * \prod_{vi=0} \frac 2{\pi(1+{x_i}^2)}$

$= \prod_{i=1}^{1000} \beta \exp({-\beta x_i})*c + \frac {2*(1-c)}{\pi(1+{x_i}^2)}$

2. Calculate numerically the maximum likelihood values of $\beta$ and $c$.

First we eye-approximated beta and c with the following distribution of mixed data:

we noticed the heavy tail of this distribution, which should belong to the power-law part of the data. We assumed from x=10 onward, the data points belongs to power law, whose tail decreases slower than exponential distribution.

$\int_{10}^{\infty} \frac 2{\pi *(1+x^2) } =0.0635$

0.0635*(1-c)*1000 should be the number of values that are above10.

From there, we estimated that c should be close to 0.3.

Then we plotted exponential distribution with different beta values and superimposed the curves above our histogram from 0-10. We found beta=0.6 fits our data the best (red curve in the figure above).


def func(b,x):
return -sum(np.log((b*b*np.exp(-b*x)) + (2*(1-b)/(math.pi*(1+x**2)))))

scipy.optimize.fmin(func, [0.6,0.3], args=(xs,), full_output = True)



3. Estimate numerically the Bayes posterior distribution of $\beta$, marginalizing over $c$ as a nuisance parameter. (You'll of course have to make some assumption about priors.)

$P(c, \beta|data) \propto \prod_{vi=1}[ p(x_i|1) c]* \prod_{vi=0}[p(xi|0) (1-c)] *P(c) = \prod_{vi=1}[\beta \exp({-\beta x_i})* c] * \prod_{vi=0} [\frac 2{\pi(1+{x_i}^2)}* (1-c) ] P(c)$

$= \prod_{i=1}^{1000} [\beta \exp({-\beta x_i})*c + \frac {2*(1-c)}{\pi(1+{x_i}^2)}]*P(c)$

$= \int_0^\infty \prod_{i=1}^{1000}[\beta \exp({-\beta x_i})*c + \frac {2*(1-c)}{\pi(1+{x_i}^2)}]*P(c) dc$

1. In problem 3, above, you assumed some definite prior for $c$. What if $c$ is itself drawn (just once for the whole data set) from a distribution $\text{Beta}(\mu,\nu)$, with unknown hyperparameters $\mu,\nu$. How would you now estimate the Bayes posterior distribution of $\beta$, marginalizing over everything else?