The next condition is nearly normal residuals with mean zero.

Remember that some residuals will be positive

and some are going to be negative.

On a residuals plot we look for a random scatter of residuals around zero.

This translates to a nearly normal

distribution of residuals centered at zero.

And we can check this using a histogram or a normal probability plot.

So, once again, using R, we can make a histogram of

our residuals that are stored in the object for the regression model.

And we can also make a normal probability plot

using the functions qqnorm for the plot, and qqline for

the, guidance line that we're going to use to

see if the points actually align on a straight line.

This is what our plots look like.

We are seeing a little bit of a skew in the residuals.

However, the skew doesn't look too bad.

And looking at the normal probability plot as well, except for

at the tail areas, we're not seeing huge deviations from the mean.

So I think we can say that this condition seems to be fairly satisfied.

The next condition is constant variability of residuals.

We want our residuals to be equally variable for

low and high values of the predicted response variable.

So we check the residuals plot of residuals versus

the predicted values, that's e versus r y hat.

And note that we're using residuals versus predicted, instead of residuals versus x,

because it allows for considering the entire

model with all explanatory variables at once.

We want our residuals to be randomly scattered

in a band with a constant width around zero.

So in other words, we're looking to see nothing like that resembles a fan shape.

It is also worthwhile to view the absolute value of residuals versus

the predicted values to identify any unusual observations easily.

As usual, we can easily create both of these parts in R.

Here for example, we have our residuals on our y axis, and

on the x axis we have what R calls the fitted values.

What this basically means is our predicted values, or in other words our y hats.

And we can also calculate the absolute values of these

residuals and plot that against the fitted values as well.

So here's what our plots look like.

The first plot is a residuals versus fitted plot.

We don't see a fan shape here.

It appears that the variability of the

residual stays constant as the value of the

fitted or the predicted values change, so,

the constant variability condition appears to be met.

The absolute value of residuals plot can be

thought of simply the first plot folded in half.

So if we were to see a fan shape in the first plot,

we would see a triangle in the absolute value of residuals versus fitted plot.

Doesn't exactly seem to be the case, so it seems like this condition is met as well.

Lastly, independent residuals, and note that

independent residuals basically means independent observations.

If we have any time series structure, or if

we're suspecting that there may be any time series structure

in our data set, we can check for independent residuals

using the residuals versus the order of data collection plot.

If, on the other hand, that is not a consideration, to check to see, if the

residuals are independent, we don't really have another

diagnostic approach, diagnostic graph that we can use.

Instead, we want to go back to first principles

and think about how the data are sampled.

We've talked numerous times in this course

about what independence of observations means and what

do we need in terms of the sampling of the data to obtain independent observations.

So let's quickly take a look to see if this

order of data collection plot looks wonky in any way.

For that, we simply plot our residuals, and

we don't even have to specify anything for our

x-axis, because R will basically plot them in

the order that they appear in our data set.

And the order of data collection plot where we have the residuals on the y-axis,

and the order of data collection on the x-axis, does not show any patterns.

If there was some non-independent structure we would see

these residuals increasing or decreasing but we don't see any

such pattern, so it appears that any sort of

time series structure is not a consideration for this dataset.