1. Suppose that only one principal component is large (that is, there is a single dominant value si). In terms of the matrix (and anything else relevant), what are the constants aj and bj that make a one-dimensional model of the data? This would be a model where with each of the data points (rows) having its own value of an independent variable λi and each of the responses (columns) having it's own constants aj,bj.
2. The file dataforpca.txt has 1000 data points (rows) each with 3 responses (columns). Make three scatter plots, each showing a pair of responses (in all 3 possible ways). Do the responses seem to be correlated?
Yes, they are correlated.
3. Find the principal components of the data and make three new scatter plots, each showing a pair of principal coordinates of the data. What is the distribution (histogram) of the data along the largest principal component? What is a one-dimensional model of the data (as in problem 1 above)?
Comp.1 Comp.2 Comp.3 Standard deviation 13.0381587 1.98926277 0.992297226 Proportion of Variance 0.9717506 0.02262073 0.005628671 Cumulative Proportion 0.9717506 0.99437133 1.000000000
The distribution of PC1 is:
The model under each components are:
PC1 PC2 PC3 Response1 0.4712553 0.5373623 -0.69939990 Response2 0.3532848 0.6115767 0.70792919 Response3 0.8081512 -0.5807027 0.09836685