The following Python code can be used to generate a few more plots

to help us determine how specific explanatory variables contribute to

the fit of our model.

In this example we're examining the Internet use explanatory variable but

we can also look at these plots for other explanatory variables.

Here we are using the plt.figure function to create an object called

fig3 that contains an empty figure with figsize equal to 12,8.

These numbers specify the size of the plot image.

Each unit is equal to 80 pixels.

12 times 80 equals 960 pixels and 8 times 80 equals 640 pixels.

So what we're doing with the fig size command here

is generating a 960 by 640 pixel plot image.

You can change the size of this image by changing the values in the parenthesis.

The second line of code uses the graphic.plot_regress_exog

function from the stats model library to generate multiple diagnostic plots.

In the parenthesis, we specify the name of the object that has our model results,

which is reg3 followed by a comma.

Then in quotes we specify the explanatory variable that we want to plot

followed by a comma.

Finally fig equals fig 3 tells Python to put the information from

our sm.graphics.plot_regress_exog exog function into the fig 3 plot object.

Then we ask Python to print the plots.

The primary plots of interest are the plots of the residuals for

each observation of different of values of Internet net use rates in the upper right

hand corner and partial regression plot which is in the lower left hand corner.

The plot in the upper right hand corner shows the residuals for

each observation at different values of Internet use rate.

There's clearly a funnel shaped pattern to the residuals

where we can see that the absolute values of the residuals

are significantly larger at lower values of Internet use rate.

But get smaller, closer to zero, as Internet use rate increases.

But then they start to get larger at higher levels.

This is consistent with the other aggression diagnostic plots

that indicate that this model does not predict female employment rate as well for

countries that have either high or low levels of Internet use rate.

But is particularly worse predicting female employment rates for

countries with low Internet usage.

Similar to our urban rate variable, there also appears to be a sort of a curve, or

linear pattern to these observations.

Where the residuals seem to get larger again for countries for

which Internet use rates exceeds about 80 per 100 residents.

This suggests that the association between Internet use rate and

female employment rate may also be curvilinear.

So maybe we also want to add a second order polynomial, or quadratic term, for

Internet use rate as an explanatory variable to the model as well.

Finally, because we have multiple explanatory variables, we might want to

take a look at the contribution of each individual explanatory variable

to model fit, controlling for the other explanatory variables.

One type of plot that does this, is the partial regression residual plot.

The third plot,

in the lower left hand corner, is a partial regression residual plot.

It attempts to show the effect of adding Internet use rate

as an additional explanatory variable to the model.

Given that one or more explanatory variables are already in the model.

For the Internet use rate variable,

the values in the scatter plot are two sets of residuals.

The residuals from a model predicting the female employment rate response from

the other explanatory variables, excluding Internet use, are plotted on the vertical

access, and the residuals from the model predicting Internet use rate from all

the other explanatory variables are plotted on the horizontal access.

What this means is that the partial regression plot shows the relationship

between the response variable and specific explanatory variable,

after controlling for the other explanatory variables.

We can examine the plot to see if the Internet use rate residuals show a linear,

or non-linear pattern.

If the Internet use variable shows a linear relationship to the dependent

variable after adjusting for the variables already in the model,

it meets the linearity assumption in the multiple regression.

If there is an obvious non-linear pattern, this would be additional support for

adding a polynomial term for Internet use rate to the model.

When we take a look at the plot for Internet use rate in the lower left-hand

corner, we see that, in contrast to the plot

of the residuals at different values of Internet use rate without adjusting for

the urban rate variables, which is shown above, the partial regression plot for

Internet use does not clearly indicate a non-linear association.

Rather, the residuals are spread out in a random pattern

around the partial regression line.

In addition, many of the residuals are pretty far from this line,

indicating a great deal of female employment rate prediction error.

This suggests that although Internet use rate

shows a statistically significant association with female employment rate,

this association is pretty weak after controlling for urbanization rate.

We can look at the partial regression residual plots for

each of the other explanatory variables as well.

Finally, we can examine a leverage plot to identify observations that have

an unusually large influence on the estimation of the predicted value of

the response variable, female employment rate, or that are outliers, or both.

The leverage of an observation can be thought of in terms

of how much the predicted scores for the other observations would differ

if the observations in question were not included in the analysis.

The leverage always takes on values between zero and one.

A point with zero leverage has no effect on the regression model.

And outliers are observations with residuals greater than 2 or less than -2.

We use the following Python code to generate a leverage plot.

We use the stats model graphics function again,

but this time we use the code influence_plot.

In parentheses, we include the name of the object that has the result of our

regression analysis, reg3, followed by a comma.

Size=8 is an option to make the points on the plot smaller than the default size so

that they're easier to distinguish.

One of the first things we see in the leverage plot is that we have a few

outliers, contents that have residuals greater than 2 or less than -2.

We've already identified some of these outliers in some of the other plots we've

looked at, but this plot also tells us that these outliers have small or close to

zero leverage values, meaning that although they are outlying observations,

they do not have an undue influence on the estimation of the regression model.

On the other hand,

we see that there are a few cases with higher than average leverage.

But one in particular is more obvious in terms of having an influence on

the estimation of the predicted value of female employment rate.

This observation has a high leverage but is not an outlier.

We don't have any observations that are both high leverage and outliers.