0:01

Let's consider a different kind of residual.

Â So consider the model, y = x tilde beta tilde + epsilon,

Â where x tilde = to my traditional design matrix,

Â x, and the vector, delta i.

Â 0:30

Okay, and then my beta tilde vector is my normal vector beta,

Â that I almost always have, and then parameter Delta.

Â And let me put a little i there just to indicate that's Delta,

Â the parameter just devoted to the ith data point.

Â Okay, now I could equivalently write my.

Â With my least square criteria,

Â let's figure out what the MLE is when we just add this extra term.

Â I could write this as y- x tilda beta tilda squared.

Â But I could equivalently write that as the summation of (yi- the summation over k,

Â this is the summation over i, summation over k, let's say xik.

Â And let me use i prime just so we're not messing

Â up with the i, the index that we fixed earlier.

Â Times beta k- delta sub i,

Â evaluated at the index i prime

Â times delta i squared.

Â 1:53

So this is going to be equal for

Â all those elements of the sum that are not equal to i.

Â This vector right here, the elements of this vector are going to be 0,

Â except in the instance when i = i prime.

Â So for i not equal to i prime, this is just going to

Â be summation (yi prime- summation over k xi prime k beta k.

Â And then +, the one instance where i prime equals i,

Â (yi- summation over k, oops, squared,

Â 2:47

xi prime k beta k-

Â delta i) squared.

Â Now, this is going to be greater than or equal to,

Â if I were to get this term right here equal to 0.

Â But one way I can set a linear constraint and

Â make that term equal to 0 is to set delta i

Â = yi- summation over k xi prime k beta k.

Â 3:19

So then that term is 0, and then we're left with the term.

Â The summation over all points i prime not equal

Â to i (yi prime- summation over k xi prime k beta k) squared.

Â Well, we know when this is minimized,

Â that's going to be greater than or equal to.

Â If I were to just to plug in my least squares estimate,

Â 3:43

where in this case, notice the sum is over everything except the ith data points.

Â So if I just plug in my least squares estimate for my vector, x,

Â where I've deleted the i primeth data points.

Â So let me just call that beta ^ k- i, just meaning the least squares estimate

Â where I've deleted the ith data point, then that would be minimized.

Â And then my Delta i ^ is going to

Â be = yi- summation xi prime k

Â beta k- i, over k = 1, to p.

Â So lets look at this term Delta now.

Â 4:26

So notice the format of this.

Â This is yi- let's say, and ^, yi- i ^.

Â So what do I mean by that?

Â So this the fitted value for the ith data point,

Â where the ith data point wasn't used in the fitting.

Â So there's a couple of interesting points to make out of this.

Â 1, adding a regressor,

Â 6:07

And so this residual, right here.

Â The difference between the ith data point outcome And

Â what you would predict for the ith data point.

Â If the ith data point wasn't allowed to actually influence the model

Â is called a press residual, or a leave-one-out cross-validation residual.

Â The other interesting fact about this is that, first of all,

Â that you can obtain the leave-one-out cross-validation

Â residual by fitting a model with this extra term in.

Â Another interesting fact is that the coefficient table for the Delta i,

Â that t test is a test, sort of an outlier test, for the ith data point.

Â If you need a coefficient that's devoted just for

Â that data point, then that data point is an outlier.

Â And the t test for this is actually a valid t test.

Â So that gives us a form of standardized residual for

Â the ith data point that is t distributed.

Â So there's a lot of interesting facts about this.

Â And one fact that we'll cover later on is that you

Â don't even actually have to fit this model or

Â in any way delete the ith data point in order to obtain these residuals.

Â In order to get to leave-one-out cross-validated residuals,

Â you don't actually have to leave one out.

Â It's a surprising thing in linear models.

Â But these residuals are quite useful for a variety of reasons.

Â One is they have this motivation as an observation-specific mean shift.

Â 7:51

Two is the cross-validated residuals,

Â like this, have an obvious intuitive interpretation of, well,

Â how different is my outcome than what my model will predict?

Â Where that data point hasn't been allowed to impact the model fitting is

Â a very powerful idea for assessing things like model fit.

Â 8:13

And then finally, the t-test that you get from fitting this model with this extra

Â term for the ith data point yields a form of standardized residual that's a kind of

Â useful form of standardized residual that is exactly t-distributed.

Â So we could establish cutoffs for thresholding them, and so on.

Â