0:01

I'd like to talk about a different way to think about these squares

Â that's very useful for conceptualizing what it's doing.

Â So, imagine our Y is a 3 x 1 vector.

Â And our x is a3 x 2 matrix, so we only

Â have three data points and we're going to try to explain them with two predictors.

Â I agree, not a terribly useful scenario, but

Â because I can only draw this in three dimensions, that's the best we can do.

Â But hopefully you can use this to think in,

Â use it to reorganize how you think about least squares.

Â And we want to minimize Y- X beta squared.

Â So let me define by gama equal to the space

Â of x Beta, such that Beta is in r2.

Â So what we want to do is minimize y- y-hat squared,

Â over the collection of y-hats that live in gamma.

Â So let's draw a picture.

Â Here's our three dimensions for our three data points,

Â and our point y is a vector in this space.

Â 1:44

It's indexed by a two dimensional parameter.

Â So it is a two dimensional space, we're drawing it in three dimensional space and

Â it's linear so it's going to be a plane.

Â So this looks like a plane.

Â 2:00

And that's gamma.

Â So, what is y-hat?

Â So, actually let me re-write this earlier statement

Â a little bit by calling it y- z for z and gamma.

Â And let me let y-hat to be the actual minimizer of that equation.

Â Okay, so what is y-hat?

Â Well, it's the point that minimizes the distance between y and

Â points in this plane.

Â So it's the point in this plane that's minimally distant to y.

Â So it's the orthonormal, orthogonal projection onto the plane.

Â So let's draw it like that.

Â Here's our orthogonal projection and there's y-hat.

Â So y-hat is the point in gamma that is minimum distance to y.

Â 2:54

So, thinking about it this way really kind

Â of helps us think about different aspects of how we apply multi variable regression.

Â For one thing, notice that if we include a redundant column in x,

Â if we include an extra column in x.

Â Say x was the vector's x1, x2 and

Â then the completely redundant column x1 + x2, okay?

Â We'd see that the rank of that matrix is of course still 2.

Â But notice that the space gamma, let's call this, I don't know,

Â we'll call that x prime, oh no, x prime is going to be like transpose.

Â Let's call that x tilde, okay.

Â So the space gamma tilde, which is equal to x

Â tilde beta such that beta is in R3.

Â 3:51

That space you can prove to yourself though I think it's pretty easy to see is

Â identically equal to gamma.

Â Because you know any, if we take a linear combination of x1 and x2, we can,

Â and we take the collection of all possible linear combinations of x1, x2,

Â thatÂ´s the same thing as the collection of all possible linear combinations of x1,

Â x2, and x1 + x2.

Â So what we see is that we donÂ´t actually need an invertible matrix

Â to do regression, we just need the space defined by

Â a linear independent subset of the collection of columns of x.

Â That's an important point.

Â 4:32

So, at any rate, so

Â it also tells you that if you have, let's say, let me give you an example.

Â If you have an x that includes an intercept and

Â a vector that's ones for a while, and then zeroes for a while.

Â And then consider another x that's ones for a while, and then zeroes for

Â a while, then zeroes for a while, and then ones for a while.

Â 5:01

And then consider another x which is ones and

Â then zeros and then ones.

Â Okay?

Â Notice, these two add up to a vector of one, of vector ones,

Â so all three of these cases are identical, they all have the same column space.

Â 5:27

Okay?

Â So, in every one of these cases, the gamma, the space defined by

Â linear combination of the columns of these design matrices is identical, so

Â the y-hat defined by design matrices will all be the same.

Â Okay so that's an important point.

Â 5:55

That also means because we know what the solution is, beta-hat,

Â which is the particular, which is x transpose x inverse x transpose y.

Â That is the particular vector in R2 that converts x into

Â the correct set of linear combinations of its columns to form the projection.

Â So our y-hat, or our projection, is x, x transpose x inverse x transpose y.

Â 6:28

So, it is interesting to note that the linear operator that takes in out of

Â vector Z and moves it to HxZ is the operation that projects a vector

Â in our end onto the two-dimensional space spanned by the columns of x.

Â So Hx is exactly a projection vector, it's a projection operator.

Â So it's often called the projection matrix.

Â Sometimes it's also called,

Â the reason I give it the letter H is it's often called the hat matrix as well.

Â 7:01

The final thing I'd like to point out is if we consider our residuals,

Â that's the points e=y- y-hat.

Â So that's y- x, x transpose x inverse x transpose y,

Â which we can also write out as I minus the hat matrix times y.

Â So notice our residual is exactly the difference between y and

Â y-hat, so that's this vector right here.

Â That's e.

Â So if we write it over there, that's e.

Â But notice that e is going to be orthogonal to any point in gamma, right?

Â So e is going to be orthogonal to any point in gamma.

Â And what does that mean?

Â That means e transpose times z

Â 8:08

And in particular our residuals are orthogonal to any column of x,

Â and we'll elaborate on that point here in a minute.

Â But let's actually prove this mathematically.

Â We can see it geometrically pretty easily but let's prove it mathematically.

Â Well, we've mentioned at some previous lectures that if I take I- Hx and

Â multiply it times x, I get 0.

Â So certainly if I multiply it times x times gamma, that is also going to get 0.

Â So you can go through, if you didn't see this in a previous lecture,

Â just actually go through it, it's a very easy thing to prove.

Â And so that's showing again,

Â that the residuals are orthogonal to every point in the space gamma.

Â So this means that if our x contains a vector Jn intercept,

Â and then other columns, this means that e transposed times Jn has to equal 0,

Â which means that the sum of our residuals has to be zero.

Â But it's not just that the sum of the residuals have to be zero,

Â it's e transposed times xk for any xk column of x also has to be zero.

Â Or any linear combination of the columns of x also has to be zero.

Â Okay, so the point I wanted to get across in this lecture is that it's

Â quite useful to think about geometry and

Â geometrically what's occurring when you think about least squares.

Â Often you can, this will help you logic your way through actual

Â applied statistical problems if you can think about these problems geometrically.

Â