0:00

In this lecture we're going to talk about partitioning variability.

So, assume my x matrix includes a 1.

So, it has a vector, let's just call it J.

0:17

So, I'm going to define, well first of all, let me define HJ

as J J transpose J, inverse, J transpose, and Hx, as x.

0:30

x transpose x, inverse, x transpose.

Now, since my h, and since my x, contains an intercept,

then my, then J can be written as a linear combination of the columns of x.

If I just take the vector that grabs the first column of x, so

a vector that's 1 and then a bunch of zeroes and multiply it times x.

That will grab the first column, which is the intercept.

So I know from the previous lecture that if I take (I- H sub x) and

I multiply it by any vector that's a linear combination of the columns of X.

Let's say x times gamma.

In this particular case I've set up gamma so that it grabs J.

Then, this has to be equal to 0.

Well that's going to imply that J minus H of X time J equals 0.

In other words J is equal to H of x times J.

So squirrel that little pearl of wisdom right there away, and

we're going to use that later.

1:34

Now let me define as the total variation the sum of the squares of

total as the norm of y minus y-hat.

At y minus y bar times J, okay.

So y minus y bar times J.

2:23

Then let me write out the residual sums of squares as the norm of the residuals.

The norm of e squared which is norm y- y-hat,

where y-hat is the fit of values from the full model with x in it, okay?

So that we know is y transposed I minus H of x times y.

So this is the numerator of the variability estimate that we would get

if we only included an intercept.

And this is the numerator of the variability estimate that we would get if

we had included an intercept plus all the other regressors.

3:20

So, let me make a third definition, which is the sum of the squares for

regression, which is the distance between the fitted value

if I only include an intercept and the fitted value if I include

the intercept plus all these other regressors squared,

which I can write as norm HJ times y minus HX times Y.

Let me, squared, so let me work with this term a little bit.

The regression sums of squares a little bit, and so

I'm going to write this as y transpose time HJ minus H of X times HJ

minus H of X and I don't have to worry about the transposes for

those terms, because both HJ and H of X are symmetric.

Okay?

And then let me write this out, now, as y transpose times,

or HJ is item potent, so HJ times HJ.

HJ squared is just going to be HJ.

And then H of X times HJ minus,

that's going to get, that's from that one,

and then HJ times H of X.

That's going to be that one.

And then H of x squared.

Remember, H of x is item.

So that's plus H of x times y.

5:07

J is equal to H of X times J but I could similarly multiply

here by a J transposed J inverse J transpose and

multiply here by J transpose J inverse J transpose so

I haven't done anything.

And then what I get is that H of J that is implied

is that H of J is equal to H of x times H of J and

then by taking the transpose, because they're symmetric,

I also see that H of J is also equal to H(j) H(x).

Okay, so this quantity right here is H(x).

This quantity right here is H(x).

So we get Hj- 1H(x),- another H(x), + 1H(x).

So we get that y transposed is Hj minus H of X times Y.

Now, I'll get to my point.

Let me take my sum of the squares for total,

which is the norm of Y minus Y bar times J squared which

we wrote out before as Y transpose times I minus HJ Times y,

H of J times y on the outside of the parenthesis.

6:41

subtract and then add, I guess, and then let me organize it this way.

Hopefully, you'll see what I'm doing here.

H of x plus H of x minus H of j times y.

6:57

Okay, so I haven't done anything going from this line to this line other than

adding or subtracting H of x.

Then I get y transpose I minus H of x

Times y + y transpose

(Hx-Hj) times y.

And this term right here is the sum of the squares for residual.

And this, right here, is the sum of the squares for regression.

Okay, so this small distinction is that I have Hx minus Hj down here and

Hj minus Hx up there.

But I would like you to prove for

homework that the order of subtraction doesn't matter in this quadratic form.

That the two are equal.

8:06

So my total variability in my response gets decomposed into

the variability explained by my regression model, and

the remaining variability left unexplained by my regression model.

All these are positive.

All these are positive because they're all sums of squares, and so

what is very common thing to do is take SS regression, the amount of variability

explained by my regression model, and divide it, by the total variation.

And then, what is that going to give us?

That is going to give us the percentage

of the total variability.

Total variability,

9:34

So R squared is interpreted as the percentage of your total variability

explained by the linear association with your added regressors.

And we see that it was a pretty easy to proof to get that the total

variability decomposes in to the residual variability.

And the regression variability and it all involved

this little trick up here that said that j was equal to h of x times j.

Okay, so in case you were wondering how these things worked out and why everything

added up when you were looking at your regression output, this is why.

Okay, so thank you for listening and we'll talk a lot more about partitioning

variability and how that relates to things like f test later on in the course.