0:04

The residual analysis is a vital part of any statistical method.

Â In the previous two videos, you learned when and how to perform an ANOVA analysis.

Â The P-value we use in a main analysis is only

Â valid if the assumptions are satisfied.

Â In this video,

Â you will learn how to validate these assumptions using a residual analysis.

Â Remember that we were wondering in the moisture content in coffee beans

Â differs between the four machines it can be produced on.

Â Moisture was our numerical y variable, and machine was our categorical x variable.

Â We performed an ANOVA analysis and these were the results we obtained.

Â Our ANOVA analysis gave us a p-value, which shows a statistical significant

Â difference between the average moisture percentages of the machines,

Â because the p-value is below 0.05.

Â This difference in the means can also be seen in the individual value plot,

Â as the line connecting the means is not horizontal.

Â Let's take a look at the R squared.

Â This shows that the influence factor machine explains 26% of

Â the variation in the moisture percentage.

Â However, before we can completely trust these conclusions,

Â we have to validate the assumptions underlying the ANOVA.

Â These checks are called the residual analysis, and

Â this is the last and final step of your ANOVA.

Â As you probably remember, ANOVA consists of three steps in total.

Â To validate the assumptions, we will check if the residuals are normally

Â distributed and if there are any outliers or other irregularities present.

Â But what is a residual?

Â Let's take a look at the data to answer this question.

Â Every dot in the graph is one measurement.

Â We also know the value that we would expect from a measurement for Machine 1.

Â That is the estimated mean.

Â So there is a difference between the measurement and our expectation.

Â This difference is not explained by our influence factor machine.

Â It is left over variation, and this difference is called the residual.

Â The residuals are calculated by subtracting the expected value from

Â each observation.

Â In the case of ANOVA,

Â this expected value is the mean output over the relevant machine.

Â This is our data in a time order.

Â Our categorical variable has four different groups and

Â the red lines are the group means.

Â Then the residuals will look like this,

Â with the mean of the residuals equal to zero by construction.

Â Okay, let's go back to our moisture example and

Â let's perform a residual analysis with Minitab.

Â Now, pause the video, load your data into Minitab before continuing.

Â Once you loaded your data into Minitap,

Â this is what your data file would look like.

Â You have Machine 1 in the first column, Machine 2, Machine 3, and Machine 4.

Â Note that I already stacked my data into a column Moisture, and Machine.

Â Okay, let's look at our residual analysis.

Â We can find this in our ANOVA menu,

Â which was under Stat > ANOVA > One Way.

Â Well, maybe you still have it there, but otherwise, fill in your response,

Â which is moisture, and your factor, which is machine.

Â Your residual analysis can be found under the options graph,

Â and then half way, it ask you for residual plots.

Â If you click on the four in one, you get all plots once.

Â Furthermore, you can also unclick the interval plot, because we don't need it.

Â Well, that's it.

Â OK > OK, and then this is your four in one plot.

Â Let's study the four in one plot.

Â Remember, that we needed to check two things in the residual analysis.

Â Let's start with the normality assumption.

Â These can be checked in the probability plot.

Â Are your residuals normally distributed?

Â Yes, they are.

Â Now, let's have a look at the second assumption That there are no outliers or

Â irregularities in the residuals.

Â To check this assumption, we take a look at the four and one plot again.

Â But now, we look at the line graph.

Â We see that there are no outliers or strange patterns present.

Â This means that this assumption is also satisfied, and

Â that the original analysis is valid.

Â Let's have a look at another example, and assume that these are our residuals.

Â We see in the probabiity plot that the residuals are not normally distributed.

Â And in the line plot, we see outliers in the residuals.

Â This means that if these were your residuals,

Â the assumptions of the ANOVA are violated.

Â This implies that the conclusion in step two would not have been valid,

Â or at least they're not very precise.

Â If this is the case, you can perform a Kruskal-Wallis analysis.

Â In summary, in this series of videos I have explained that the ANOVA is

Â a technique to test whether a categorical influence factor X has

Â a significant effect on a numerical Y.

Â After organizing your data in the first step You run the analysis in

Â the second step and interpret the p value for significance,

Â and the r squared for importance.

Â In the third step, you will validate your conclusions by checking whether

Â the residuals are normally distributed.

Â And whether they don't contain any outliers or other strange patterns.

Â