0:01

So let's talk about another instance of an orthonormal basis that comes up

Â quite frequently.

Â So imagine our x is n by p and let's still assume that p is less than or

Â equal to n, but imagine if we have a large number of subjects.

Â Or records, so n is large, and then p is also large.

Â So we want some way to reduce the dimension of x to make it a little

Â bit more manageable.

Â 0:39

d is a p by p diagonal matrix of singular value,

Â and v is a v transpose is a p by p matrix.

Â And these are such that u transpose u equals v transpose v equals I.

Â Okay, so one thing I want to know, is imagine for the time being

Â that x has been centered in the sense that all of its column means are zero.

Â 1:08

And consider x transpose x, which is effectively

Â the variance covariance matrix of the x matrix disregarding the n minus 1.

Â Well, that's equal to using this result and this result.

Â That's equal to vD u transposed uDv.

Â Which is equal to v D squared v transposed.

Â So the eigon value decomposition of x transfers

Â x is related to the singular value decomposition of x by itself.

Â 1:46

The squared singular values,

Â the eigon values from the eigon value decomposition helps summarize the variance

Â in the x transpose x matrix, which is itself a variance-covariance matrix.

Â And these are usually ordered so that the larger d squared values are earlier.

Â So they're usually ordered in decreasing numbers.

Â So the first one is the largest.

Â The second one is the second-largest and so on.

Â And so, consider the fact that the trace of x, transpose x,

Â is equal to the trace of vD squared v ranspose.

Â And then I can move that v over there, since trace ab is trace ba.

Â v transpose v is I.

Â So that trace of x transpose x is equal to the trace

Â 2:39

of the sum of my eigenvalues, squared.

Â And so what this means is the eigen values are summarized in the variability,

Â in the sense that the trace is the total variability In my x transpose x matrix,

Â by taking all the diagonals, sum of all the diagonals, sum of all the variances.

Â Okay, so what we could do is take the first three components of our

Â 3:45

V times d inverse is equal to u.

Â So a way to think about how we get at the scores,

Â these are the so called scores, the u, the vectors

Â associated with u is by multiplication of x by v.

Â So what v does is it combines the columns of x

Â in such a way that it gives us these scores.

Â And then the D inverse is sort of a normalization term, right, the D's are,

Â you can think of as variances, so multiplication by DM,

Â versus sort of like, normalizing in the sense of dividing by a standard deviation.

Â Okay, so, what we could do if, to every column of v, which

Â is an eigen vector is associated an eigen value which is an element of d squared.

Â We might say the top three and

Â then only say take the first three elements of this u matrix okay.

Â So by only taking the first three elements of v,

Â we'd only be taking the first three elements of d.

Â Or we could just of course do the singular value composition which will give us U,

Â D, and V and just take the first three vectors of U.

Â So we could then try to minimize y

Â minus u gamma let's say squared, okay.

Â Where I'm going to put a little three under my u because I

Â just happened to grab the first three columns of u and

Â what this would mean is I'm trying to regress y

Â with the design matrix u but my u was selected in a way to

Â capture as much variation as possible as I could in my x.

Â Of course I'm just using three as an example,

Â you could use any number of the, number of columns of U to do this with.

Â But you want to explore what percentage of the variation they explained, and

Â how tolerable that percentage is to your goals.

Â 5:52

But at any rate, our discussion from orthonormal bases notes that because of

Â course U is orthonormal, grabbing any three columns of U,

Â especially the first three columns of U, is also going to be orthonormal.

Â And so our estimate of gamma, our gamma hat

Â is just going to be u 3 transposed times y.

Â Okay so what we find is that the way in which we get sort of

Â principal component regressors is simply by taking

Â the singular value decomposition of our centered X matrix,

Â taking the relevant columns from our score.

Â Our vector last singular values which if we think

Â about it in terms of principal components, as our scores and

Â then if we simply multiply them by y multiply the transpose of them times y.

Â We actually get the associated coefficients.

Â So this just goes to show how we can use these nice operations that

Â we get out of squares in this particular case.

Â So using the singular value decomposition to come up with the orthonormal

Â basis I think represents of the three most important bases,

Â concepts and statistics, certainly I would describe wavelengths,

Â transforms, and principal components basis as the three.

Â And I think you can see that in this case

Â it fits very nicely into the topic of regression.

Â And it also fits very nicely if we have a large

Â X matrix with a lot of columns that we want to summarize.

Â One caveat, I would suggest to be careful of.

Â Again, we get U.

Â We can think of U as these linear combinations of our columns of X.

Â If the units of x don't make sense to combine

Â then this procedure may not make a lot of sense to do.

Â So if the first column of x is a particular kind of units and

Â the second column of x is a different kind of units then the interpretability

Â of your scores may really suffer as a result.

Â So again, there's a lot of intricacies to do this.

Â And I think if you wanted to learn more about this,

Â a class on multivariate statistics would be the way to go.

Â But I just wanted to reinforce this point that when we have a design matrix that's

Â orthonormal, we work out with a really simple solution for the coefficients.

Â Okay, and next we'll go through a coding example where we go through some

Â of these sorts of examples.

Â