# Segment 48. Principal Component Analysis (PCA)

## Contents

#### Watch this segment

(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)

{{#widget:Iframe |url=http://www.youtube.com/v/frWqIUpIxLg&hd=1 |width=800 |height=625 |border=0 }}

Links to the slides: PDF file or PowerPoint file

### Problems

#### To Compute

1. Suppose that only one principal component is large (that is, there is a single dominant value ). In terms of the matrix (and anything else relevant), what are the constants and that make a one-dimensional model of the data? This would be a model where with each of the data points (rows) having its own value of an independent variable and each of the responses (columns) having it's own constants .

2. The file dataforpca.txt has 1000 data points (rows) each with 3 responses (columns). Make three scatter plots, each showing a pair of responses (in all 3 possible ways). Do the responses seem to be correlated?

3. Find the principal components of the data and make three new scatter plots, each showing a pair of principal coordinates of the data. What is the distribution (histogram) of the data along the largest principal component? What is a one-dimensional model of the data (as in problem 1 above)?