0:00

In this video, we'll look at Model Evaluation using Visualization.

Â Regression plots are a good estimate of the relationship between two variables,

Â the strength of the correlation,

Â and the direction of the relationship (positive or negative).

Â The horizontal axis is the independent variable.

Â The vertical axis is the dependent variable.

Â Each point represents a different target point.

Â The fitted line represents the predicted value.

Â There are several ways to plot a regression plot.

Â A simple ways to use regplot from the seaborn library.

Â First, "import seaborn."

Â Then use the "regplot" function.

Â The parameter x is the name of the column that

Â contains the dependent variable or feature.

Â The parameter y, contains the name of the column that

Â contains the name of the dependent variable or target.

Â The parameter data is the name of the dataframe.

Â The result is given by the plot.

Â The residual plot represents the error between the actual value.

Â Examining the predicted value and actual value we see a difference.

Â We obtain that value by subtracting the predicted value,

Â and the actual target value.

Â We then plot that value on

Â the vertical axis with the dependent variable as the horizontal axis.

Â Similarly, for the second sample,

Â we repeat the process.

Â Subtracting the target value from the predicted value.

Â Then plotting the value accordingly.

Â Looking at the plot gives us some insight into our data.

Â We expect to see the results to have zero mean,

Â distributed evenly around the x axis with similar variance.

Â There is no curvature.

Â This type of residual plot suggests a linear plot is appropriate.

Â In this residual plot, there is a curvature.

Â The values of the error change with x.

Â For example, in the region,

Â all the residual errors are positive.

Â In this area, the residuals are negative.

Â In the final location,

Â the error is large.

Â The residuals are not randomly separated.

Â This suggests the linear assumption is incorrect.

Â This plot suggests a nonlinear function.

Â We will deal with this in the next section.

Â In this plot, we see the variance of the residuals increases with x.

Â Therefore, our model is incorrect.

Â We can use seaborn to create a residual plot.

Â First, "import seabourn."

Â We use the "residplot" function.

Â The first parameter is a series of dependent variable or feature.

Â The second parameter is a series of dependent variable or target.

Â We see in this case, the residuals have the curvature.

Â A distribution plot counts the predicted value versus the actual value.

Â These plots are extremely useful for visualizing

Â models with more than one independent variable or feature.

Â Let's look at a simplified example.

Â We examined the vertical axis.

Â We then count and plot the number of

Â predicted points that are approximately equal to one.

Â We then, count and plot the number of

Â predicted points that are approximately equal to two.

Â We repeat the process.

Â For predicted points, they are approximately equal to three.

Â Then we repeat the process for the target values.

Â In this case, all the target values are approximately equal to two.

Â The values of the targets and predicted values are continuous.

Â A histogram is for discrete values.

Â Therefore, pandas will convert them to a distribution.

Â The vertical axis is scaled to make the area under the distribution equal to one.

Â This is an example of using a distribution plot.

Â The dependent variable or feature is price.

Â The fitted values that result from the model are in blue.

Â The actual values are red.

Â We see the predicted values for prices in the range from 40,000 to 50,000 are inaccurate.

Â The prices in the region from 10,000 to 20,000 are much closer to the target value.

Â In this example, we use multiple features or independent variables.

Â Comparing it to the plot on the last slide,

Â we see predicted values are much closer to the target values.

Â Here's the code to create a distribution plot.

Â The actual values are used as a parameter.

Â We wanted distribution instead of a histogram.

Â So we want the hist parameters set to false.

Â The color is red. The label is also included.

Â The predicted values are included for the second plot.

Â The rest of the parameters are set accordingly.

Â