From Computational Statistics (CSE383M and CS395T)
Jump to navigation Jump to search

To calc

1. Suppose that only one principal component is large (that is, there is a single dominant value si). In terms of the matrix (and anything else relevant), what are the constants aj and bj that make a one-dimensional model of the data? This would be a model where with each of the data points (rows) having its own value of an independent variable λi and each of the responses (columns) having it's own constants aj,bj.

2. The file dataforpca.txt has 1000 data points (rows) each with 3 responses (columns). Make three scatter plots, each showing a pair of responses (in all 3 possible ways). Do the responses seem to be correlated?

Yes, they are correlated.


3. Find the principal components of the data and make three new scatter plots, each showing a pair of principal coordinates of the data. What is the distribution (histogram) of the data along the largest principal component? What is a one-dimensional model of the data (as in problem 1 above)?


                           Comp.1     Comp.2      Comp.3
Standard deviation     13.0381587 1.98926277 0.992297226
Proportion of Variance  0.9717506 0.02262073 0.005628671
Cumulative Proportion   0.9717506 0.99437133 1.000000000


The distribution of PC1 is:

Pc1 distribution.png

The model under each components are:

                PC1        PC2         PC3
Response1 0.4712553  0.5373623 -0.69939990
Response2 0.3532848  0.6115767  0.70792919
Response3 0.8081512 -0.5807027  0.09836685