0:00

Okay, so let's go through some simulation experiments

Â to understand how some of these diagnostic measures work.

Â In this first case,

Â I'm looking at an instance where there's a big cloud of uncorrelated data and

Â I did this by just generating a bunch of pairs of independent random normals.

Â Then I added random standard then I added the point at ten,ten which

Â clearly does not fit the rest of the trend of data.

Â What we can see in this case is that there's a strong correlation

Â estimated by the data merely because of the existence of this point

Â otherwise the correlation would be estimated to be zero.

Â 1:02

the first thing, why we look at the DF datas, so here's the dfbetas,

Â the round statement here just rounds into the third decimal place when I put

Â the three there and notice this first point which is the ten, ten point is or

Â as the magnitude larger of a dfbeta then, the remaining point.

Â 1:26

Let's look at the hat values.

Â The hat value for this point is much larger than the hat values for

Â the remaining points.

Â The hat values have to be between zero and one.

Â So and it's of course much larger than the other points.

Â So if we are looking at these, we would obviously single this point out.

Â 1:49

Now let's look at another point, another instance where there's a clear

Â regression relationship, here I just generated the data along a line, and

Â then I generated another outlier that's similarly distant from the cloud of data.

Â However it adheres very nicely to the regression line.

Â So let's see how our diagnostic values look in this specific case.

Â 2:18

Okay, so here's my dfbetas and if you look this first point,

Â which was that outlying point.

Â It's still large but

Â nowhere near as distinctively large as in the other case.

Â So it still does appear to have some influence in the fit, but

Â nothing like in the other case.

Â However if you look at the hat values right.

Â If you look at the hat values, this has a much larger hat value than and

Â the remaining all the other points.

Â A factor of ten than most of the other points.

Â Why?

Â Well, if we go back what we see is that this point

Â is outside of the range of the X values.

Â 3:04

But it does adhere to the direction relationship.

Â So it's going to have a large leverage value but

Â not a large DF beta or DFS or some of these other things.

Â Let's look at this example by Stefanski that to me shows why we do residual plots.

Â Basically, the reason we look at residual plots is because they zoom

Â on potential problems with our model.

Â In this case you can download the data from their website.

Â 3:45

Here I show my linear model fit and if you look every single P

Â values highly significant for every single coefficient here.

Â I set all of the data points but, for this to work for this model,

Â you have to try to intercept in the way that he generate with the data.

Â Okay, should we be done, is this fine, well here's the problem.

Â 4:10

So the residual plot that you intend to do when you have multi variable examples

Â because you can't plop the residuals versus the only

Â acts as you can in linear regression is you need to pick a number

Â of the X with the most common ways to plot the residuals versus the fitted values but

Â residuals which are e vs y hat, okay.

Â So what happens in this particular instance when we plot

Â the residuals versus y hat.

Â Now you can see you get this very clever little picture that comes out.

Â So what's happened?

Â Without looking at the residuals we wouldn't have seen anything.

Â But looking at the residuals has zoomed in on this very clear aspect of

Â poor model fit and

Â is a very clear systematic pattern in our residuals that we've missed.

Â And remember in most cases in regression we want to model the systematic things and

Â everything that we can explain, we want to leave to,

Â we want to model this as if we're noise but the systematic things like

Â obviously this picture is systematic we want to actually be able to model.

Â So in this case without having to look at the residual plot,

Â we would have missed this pattern.

Â It is really just created by Stephansky and his co-authors to

Â described for us why it is that we do residual plots and

Â that is that they zoom in ,so finally on aspects of poor model fair.

Â 5:45

So let's go back to the swiss data and just look at the different

Â examples of diagnostic plots that can spit out by default by art.

Â Remember at the beginning of the lecture we started out with this so

Â let's see if now we can interpret these things.

Â Well, for the residual plot versus the fitted values

Â that's the same plot as we just saw with the oriliow just a slide ago.

Â So what you're looking at in that plot is trying to find anything that's systematic.

Â For example, if you saw the data look something like that.

Â That would suggest heteroscedasticity

Â that the variance is either increasing or decreasing.

Â In a way that you wouldn't like and so on.

Â So we in this plot it doesn't look so bad.

Â There doesn't seem to be too many aspects of absence

Â 6:43

The Q-Q plot is specifically designed to test normality or

Â not to test to evaluate normality of the error terms, okay?

Â This scale location plot is plotting the standardized residuals, remember we

Â talked about standardized residuals they're the ordinary residuals but

Â standardize, so they have a more kind of comparable scale across subject,

Â across experiments and the scale to try and make them to be like a TY statistic.

Â So again, this is a lot like the residual plot, you're applying them against

Â the fitted values but, now you just mostly, you've change the scale.

Â So that's potentially useful for looking at these across different experiments.

Â The final is plot of the Residuals vs Leverage.

Â So here's the standardized Residuals on this scale and

Â then here's the Leverage on that scale and in this plot

Â again you're trying to look at any sort of systematic pattern any reason why points

Â with higher leverage are having higher or particularly small residual values.

Â If you had an instance like where you have plots like this and

Â you have one very high leverage point and you get something like that.

Â That point might have a very small residual but

Â unnecessarily while it has happened to have very large leverage or

Â you might have an instance like this where even though it's really impacted

Â the regression line, it still has high leverage and also has a high residual.

Â So at any rate these are many, these are just a couple of examples of the kind of

Â plots you might want to look at in this data set none of them seem

Â to look too inherently bad but when you go through these things ideally you

Â would have something where you could click on the individual points and it would It

Â would describe the aspects of the point when you hover over it with your mouse.

Â Some of the other software systems can do that.

Â R can do that now, we'll talk in the data products

Â class how you can actually create those kinds of plots.

Â 8:57

So that's the end of the lecture and

Â I look forward to seeing you next time, but I hope now that you know a little bit

Â about diagnostic measures, and influence diagnostics, and leverage diagnostics that

Â you can incorporate them into your analysis in the future.

Â All right. See you next time.

Â