# Segment 48. Principal Component Analysis (PCA)

## Contents

#### Watch this segment

(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)

Links to the slides: PDF file or PowerPoint file

### Problems

#### To Compute

1. Suppose that only one principal component is large (that is, there is a single dominant value Failed to parse (unknown error): s_i ). In terms of the matrix Failed to parse (unknown error): \mathbf V (and anything else relevant), what are the constants Failed to parse (unknown error): a_j and Failed to parse (unknown error): b_j that make a one-dimensional model of the data? This would be a model where $ij$ with each of the data points (rows) having its own value of an independent variable Failed to parse (unknown error): \lambda_i and each of the responses (columns) having it's own constants Failed to parse (unknown error): a_j,b_j .

2. The file dataforpca.txt has 1000 data points (rows) each with 3 responses (columns). Make three scatter plots, each showing a pair of responses (in all 3 possible ways). Do the responses seem to be correlated?

3. Find the principal components of the data and make three new scatter plots, each showing a pair of principal coordinates of the data. What is the distribution (histogram) of the data along the largest principal component? What is a one-dimensional model of the data (as in problem 1 above)?