Segment 18: The Correlation Matrix - 3/18/2012

From Computational Statistics (CSE383M and CS395T)
Jump to navigation Jump to search

Problem 1

Random points i are chosen uniformly on a circle of radius 1, and their (xi,yi) coordinates in the plane are recorded. What is the 2x2 covariance matrix of the random variables X and Y? (Hint: Transform probabilities from θ to x. Second hint: Is there a symmetry argument that some components must be zero, or must be equal?)
<math> \theta </math> ~ <math>U(0,2\pi) </math>
<math> p(\theta) = \frac{1}{2\pi}</math> for <math>0\leq \theta \leq 2\pi</math>
<math> X=cos(\theta) </math>
<math> Y=sin(\theta) </math>
<math>E[X] = \int_0^{2\pi}\frac{cos(\theta)}{2\pi} = 0 </math>
<math>E[Y] = \int_0^{2\pi}\frac{sin(\theta)}{2\pi} = 0 </math>
<math>E[X] = \int_0^{2\pi}\frac{sincos(\theta)}{2\pi} = \int_0^{2\pi}\frac{sin(2\theta)}{4\pi} = 0 </math>
Cov(X,Y) = E[XY] - E[X]E[Y] = 0 <math>Var[X] = E[X^2] - E[X]^2 = E[X^2] = \int_0^{2\pi}\frac{cos^2(\theta)}{2\pi} = 0.5 </math>
<math>Var[Y] = E[Y^2] - E[Y]^2 = E[Y^2] = \int_0^{2\pi}\frac{sin^2(\theta)}{2\pi} = 0.5 </math>
Cov(X, Y) = <math>\begin{bmatrix} 0.5 & 0 \\ 0 & 0.5 \end{bmatrix}</math>

Problem 2

Points are generated in 3 dimensions by this prescription: Choose λ uniformly random in (0,1). Then a point's (x,y,z) coordinates are (αλ,βλ,γλ). What is the covariance matrix of the random variables (X,Y,Z) in terms of α,β, and γ? What is the linear correlation matrix of the same random variables?
Since X, Y, and Z are all calculated in the same way with the exception of the coefficient, I will show how to calculate one variable by hand and then apply to the other 2 variables.
E[X] = E[<math>\alpha \lambda</math>] = <math>\alpha</math>E[<math>\lambda</math>] = <math> \alpha\int_0^1 \lambda d\lambda = \frac{\alpha}{2} </math>
E[Y] = <math>\frac{\beta}{2} </math> and E[Z] = <math>\frac{\gamma}{2} </math>

Var[X]= Var[<math>\alpha \lambda </math>] = E[<math>(\alpha \lambda)^2</math>] - E[<math>\alpha \lambda ]^2</math> = <math>{\alpha}^2 \int_0^1{\lambda}^2 d \lambda - (\frac{\alpha}{2})^2 = \frac{\alpha^2}{3} - \frac{\alpha^2}{4} = \frac{\alpha^2}{12}</math>
Var[Y] = <math>\frac{\beta}{12} </math> and Var[Z] = <math>\frac{\gamma}{12} </math>

Cov(X,Y) = Cov(Y, X) = E[XY] - E[X]E[Y] = E[<math>\alpha \beta \lambda^2</math>] - <math>\frac{\alpha}{2}*\frac{\beta}{2}</math> = <math>\alpha \beta </math> E[<math>\lambda^2</math>] - <math>\frac{\alpha \beta}{4}</math> = <math>\frac{\alpha \beta}{3} - \frac{\alpha \beta}{4} = \frac{\alpha \beta}{12} </math>
Cov(X,Z) = Cov(Z,X) = <math>\frac{\alpha\gamma}{12}</math>
Cov(Y,Z) = Cov(Z,Y) = <math>\frac{\beta\gamma}{12}</math>

Thus our Covariance Matrix = <math>\begin{bmatrix} \frac{\alpha}{12} & \frac{\alpha \beta}{12} & \frac{\alpha\gamma}{12} \\ \frac{\alpha \beta}{12} & \frac{\beta}{12} &\frac{\beta\gamma}{12} \\ \frac{\alpha\gamma}{12} & \frac{\beta\gamma}{12} & \frac{\gamma}{12} \end{bmatrix}</math>
Since Corr(X,Y) = <math>\frac{Cov(X,Y)}{Var(X) Var(Y)}</math>, our Correlation Matrix = <math>\begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 &1 \\ 1& 1& 1 \end{bmatrix}</math>

To Think About 1

Suppose you want to get a feel for what a linear correlation r = 0.3 (say) looks like. How would you generate a bunch of points in the plane with this value of r? Try it. Then try for different values of r. As r increases from zero, what is the smallest value where you would subjectively say "if I know one of the variables, I pretty much know the value of the other"?
"Unable to Link to the Python Document at the moment" But the way to do is that in Python, there is a function where if you fix the covariance and number of points to plot, we can change our correlation coefficient and it plot it to see what the scatterplot would look like. We know that if r = 1 or -1, then I can tell you when I know one variable, I know the other. To the human eye, an r around 0.93 or -0.93 would be somewhat close to where if I know one variable I can predict the other.

To Think About 2

Suppose that points in the (x,y) plane fall roughly on a 45-degree line between the points (0,0) and (10,10), but in a band of about width w (in these same units). What, roughly, is the linear correlation coefficient r?
If w is the band around the 45-degree line between the points (0,0) and (10,10), width would represent a point P(<math>x_0,y_0</math>)'s Distance to the line. Thus, it would be in a similar of form of <math>\sqrt{(x - x_0)^2 + (y- y_0)^2}</math> which is similar to <math>\sqrt{Var[X] + Var[Y]}</math> I would roughly say r would be roughly <math>\frac{Cov(X,Y)}{w^2}</math> where the covariance is some function of w because w could act as a Var[X] +Var[Y]. the bigger the w, the smaller the r is. The smaller the w, which would mean the points don't deviate too much from the line, the closer the value would be to 1.
If we let Cov(X,Y) = w + 1 and let Var[X] + Var[Y] = <math>w^2</math> + 1 where taking the square root of this function is approximately w, r could be represented correctly. When the width is 0, r = 1. As the width gets bigger, r gets closer to 0. My function would look similar to this: r = <math>\frac{w+1}{w^2 + 1}</math> where w represents the width. To represent negative or positive correlation, one can just look at the function it is bounded by and add a negative if the function is a decreasing function.