Seg18. The Correlation Matrix

From Computational Statistics (CSE383M and CS395T)
Revision as of 17:49, 26 March 2013 by Jzhang (talk | contribs) (Class activity)
Jump to navigation Jump to search

Skilled problem

problem 1

Random points i are chosen uniformly on a circle of radius 1, and their <math>(x_i,y_i)</math> coordinates in the plane are recorded. What is the 2x2 covariance matrix of the random variables <math>X</math> and <math>Y</math>? (Hint: Transform probabilities from <math>\theta</math> to <math>x</math>. Second hint: Is there a symmetry argument that some components must be zero, or must be equal?)

The matrix would be <math> \begin{bmatrix}

                          Cov(X,X) & Cov(X,Y) \\
                          Cov(Y,X) & Cov(Y,Y) \\
                          \end{bmatrix}</math>

Since the sample space is symmetric and the sampling is uniform, it's easy to see the mean for x,y are both 0. And the covarience of X and X is it's variance. So the matrix is actually:

<math> \begin{bmatrix}

                          <x^2> & <xy> \\
                          <yx> & <y^2> \\
                          \end{bmatrix}</math>

Only the angle <math>\theta</math> is distributed uniformly, and after normalization we can get <math>P(\theta) = \frac1{2\pi} </math>

And since x,y is on the circle, thus <math>x = cos\theta, y = sin\theta</math>

Thus, the matrix would be <math> \begin{bmatrix}

                                \frac12 & 0 \\
                                0 & \frac12 \\
                          \end{bmatrix}</math>

problem 2

Points are generated in 3 dimensions by this prescription: Choose λ uniformly random in (0,1). Then a point's (x,y,z) coordinates are (αλ,βλ,γλ). What is the covariance matrix of the random variables (X,Y,Z) in terms of α,β, and γ? What is the linear correlation matrix of the same random variables?

The covariance matrix C will be (Let <math> X_1 = X, X_2 = Y, X_3 = Z</math>):

The mean for three variables are (<math>\frac{\alpha}2, \frac{\beta}2, \frac{\gamma}2 </math>)

The diagonal value will be the variance of each variable

<math> Var(X) = \int_0^1 (\alpha \lambda - \frac{\alpha}2 )^2 \cdot 1 d \lambda = \frac{\alpha^2}{12}</math>

For values that are not diagaonal,

<math> Cov(X,Y) = <(X - \bar{X}) (Y - \bar{Y})> = \alpha\beta \int_0^1 (\lambda - \frac12)^2 d\lambda = \frac{\alpha\beta}{12}</math>

So the covariance matrix is

<math> \begin{bmatrix} \frac{\alpha^2}{12} & \frac{\alpha\beta}{12} & \frac{\alpha\gamma}{12} \\ \frac{\alpha\beta}{12} & \frac{\beta^2}{12} & \frac{\beta\gamma}{12} \\ \frac{\alpha\gamma}{12} & \frac{\beta\gamma}{12} & \frac{\gamma^2}{12} \\ \end{bmatrix} </math>

since <math> r = \frac{C_{ij}}{\sqrt{C_{ii} \cdot C_{jj}}} </math>

The linear matrix will be <math> \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \\ \end{bmatrix} </math>

Thought problem

problem 1

Suppose you want to get a feel for what a linear correlation r = 0.3 (say) looks like. How would you generate a bunch of points in the plane with this value of r? Try it. Then try for different values of r. As r increases from zero, what is the smallest value where you would subjectively say "if I know one of the variables, I pretty much know the value of the other"?

We can construct the point (X,Y) in the way that:

<math>X \sim Norm(0,1)</math>

<math>Y \sim \alpha y' + \beta X</math>

<math>y' \sim Norm(0,1) </math>

We can easily calculate the expected value of X and Y:

<math> <X> = 0 </math>

<math> <Y> = \alpha<y'> + \beta<X> = 0 </math>

we can get <math> <X^2> = 1 </math> <math> <Y^2> = <Y^2> - <Y>^2 = \alpha^2 Var(y') + \beta^2 Var(X) = \alpha^2 + \beta^2 </math>

<math> <XY> = <X(\alpha y' + \beta X)> = \alpha<Xy'> + \beta<X^2> </math>

Since X and y' are independent, so

<math> <XY> = \alpha * 0 + \beta * 1 = \beta </math>

<math> r = \frac{C_{XY}}{\sqrt{C_{XX}C_{YY}}} = \frac{<XY> - <X><Y>}{\sqrt{(<X^2> - <X>^2)(<Y^2>-<Y>^2)}} = \frac{<XY>}{\sqrt{<X^2><Y^2>}} = \frac{\beta}{\sqrt{\alpha^2 + \beta^2}}</math>

Thus we get

<math> \frac{\alpha}{\beta} = \sqrt {\frac{1-r^2}{r^2}} </math>

Plug in the value r =0.3, we get

<math> \frac{\alpha}{\beta} = \sqrt {\frac{91}9} </math>

problem 2

Suppose that points in the (x,y) plane fall roughly on a 45-degree line between the points (0,0) and (10,10), but in a band of about width w (in these same units). What, roughly, is the linear correlation coefficient r?

The point (x,y) satisfy the relation that:

<math> \begin{cases}

| x - y | \le \frac{\sqrt{2}w}2 \\ x, y \isin [0,10]

\end{cases} </math>

We can represent y as x in another way:

<math> y = x + \frac{\sqrt{2}w}2 * m, m \isin [-1,1] </math>, Here m is independent of x and y and we assume it's uniformly distributed, easy to see the mean and variance of m is 0, <math>\frac23</math>

Let mean and variance of X be <math> \mu_x, \sigma_x^2 </math>

Thus, the mean and variance of Y can be expressed as <math> \mu_x, \sigma_x^2 + \frac{w^2}3 </math>

The correlation coefficient

<math>r = \frac{<(X-\bar{X})(Y-\bar{Y})>}{\sqrt{<(X-\bar{X})^2>}\sqrt{<(Y-\bar{Y})^2>}} = \frac{<(X-\mu_x)(X+\frac{\sqrt{2}wm}2-\mu_x)>}{\sqrt{\sigma_x^2}*\sqrt{\sigma_x^2 + \frac{w^2}3}} = \frac{<(X-\mu_x)^2> + <\frac{\sqrt{2}wm}2><X-\mu_x>}{\sqrt{\sigma_x^2}*\sqrt{\sigma_x^2 + \frac{w^2}3}} = \sqrt{\frac{\sigma_x^2}{\sigma_x^2 + \frac{w^2}3}}</math>

Class activity

In group with Kai and Sean Trettel

For solutions see skilled problem 1 and thought problem 1.