1:30

made discrete from the point of view of we're talking about 300 degrees,

Â 350 degrees, 400 degrees, 450 degrees.

Â You can't have a continuous measurement there.

Â You're going to have it in terms of it being in discrete buckets or

Â in discreet numbers that we're talking about.

Â Because at the end of the day for

Â analysis of variants you're calling it four different groups.

Â So you're calling it four different levels of a factor, so

Â in terms of when do we use this.

Â We use this to test differences between

Â mean values of Y across multiple levels of a factor X.

Â The conditions more formally stated are that the Xs should be

Â in discrete groups and the Y should be continuous.

Â So when you're talking about the dependent variable,

Â the dependent variable has to be continuous.

Â The independent variables have to be discrete.

Â So here are in addition to the example of four levels of temperature,

Â you could be also looking at for example three suppliers of pizza dough.

Â Three suppliers of raw material and what kind of

Â output difference does it give based on the three suppliers that you have?

Â Does it take more time for

Â the pizza to get baked, based on which supplies and materials you are using?

Â That could be a question you could be addressing using ANOVA.

Â So some assumptions underlying the use of the analysis of variance model, the idea

Â of ANOVA, is that the samples are randomly and independently drawn from populations.

Â So when we're talking about a one-way ANOVA,

Â what you're saying is that if there are four levels of a factor or four different

Â groups from which you're going to pick samples of four different populations,

Â to use statistics terms from which you're going to get samples.

Â Those samples should be independently and randomly picked from each of those groups.

Â There should be no relationship in terms of how you pick the samples from one

Â group versus the other, they should be randomly picked or randomly assigned.

Â If you're talking about random assignment into a particular group,

Â if you're thinking about doing some kind of experiment where you're saying I'm

Â going to use a certain temperature for measuring its effect on the defect rate.

Â Then I'm going to randomly assign units or raw material or

Â whatever I'm using to test to

Â each of those four levels of temperature at which I'll be testing things.

Â So the random assignment becomes important.

Â Now why does it become important?

Â It's important because you're simply looking at

Â 4:10

one particular cause for the effect.

Â And you want to be able to rule out any other causes, so

Â the way you can get around ruling out any other causes for

Â that effect is you say you try to randomly assign.

Â So what you are hoping with random assignment or what you are aiming for

Â with random assignment is that all those other factors

Â are getting equally represented or similarly represented

Â in each of those four levels of that one factor that you're measuring.

Â So if I'm looking at three different temperature levels, but I'm also using

Â different materials from different people, but I'm also using different employees to

Â do the actual process, but I'm also using different lighting conditions.

Â All those are getting randomly distributed and

Â hopefully equally distributed across each of those three levels of temperature which

Â I'm actually measuring, which I'm actually focused on, which I'm actually addressing.

Â So that's why random assignment or

Â random selection from each of those becomes important.

Â The second assumption that we have here is more of a technical assumption because

Â ANOVA is robust to its violation.

Â So we are assuming that when you're measuring three different samples

Â from three different populations, that all of those are normally distributed.

Â Now, as it says on the slide, ANOVA is robust to its violation.

Â So if there is no normal distribution for

Â each of those populations, it's okay for you to carry on.

Â The third assumption is the population variance are equal.

Â Now this is an assumption that, in order to get around it.

Â If you may not have a equal variance in each of those populations.

Â But to get around it, what you can do,

Â in order to not let that have an impact on the results that you're getting.

Â You can try to make your sample sizes as equal as possible in each of those groups.

Â So you can say, if I can have equal sample sizes in each of those groups and

Â sometimes that's difficult to do.

Â Sometimes it's difficult for

Â you to get an equal sample size in four different groups.

Â Especially if you're dependent on something that you're measuring,

Â let's say from customers, from the market, then it's hard for you to assign things.

Â So you're simply measuring from each group and

Â you say this is what I ended up getting.

Â I didn't end up getting enough of each level of income.

Â I got 300 people from the lowest level,

Â 250 from the second one and then 400 from the third one.

Â They might not be equal sample sizes.

Â But if you can get equal sample sizes, then you are okay with

Â violating the assumption of equal population variances.

Â All right, so these are technical assumptions for

Â us to use analysis of variance.

Â When would you use analysis of variance?

Â We talked about the technicalities of it in terms of

Â the y variable should be continuous and the x variable should be categorical.

Â The effect should be continuous,

Â the causes should be something that you can put in categories.

Â So some examples here, gas mileage.

Â Now, gas mileage,

Â continuous variable, affected by four formulations of motor oil.

Â Formulation one, two, three, and four, right?

Â You have nominal categories there.

Â You're not even saying one is higher than the other, anything like that.

Â You're simply saying, a, b, c, and d or 1, 2, 3, and 4 or

Â p, q, r and s, or whatever you wanna call it.

Â And you can put them in any order that you want but

Â you're simply looking at the effect on gas mileage of four formulations of motor oil.

Â 9:22

effect is coming from these causes is sample size.

Â If you can get a larger sample size and a large random sample for

Â each of the areas that would be the most beneficial.

Â All right, so let's get into the idea of doing this kind of analysis

Â based on an example.

Â The example you have here and

Â you have this available to you on your website as well.

Â It's about training for animation.

Â So, Julia Sabatini wonders whether the time required to compete

Â a common animation task, differs by four types of employee training.

Â There are four types of employee training and the data that we have and

Â it's also available to you on this slide is right here.

Â The data we have is there are four types of training, we're simply calling them A,

Â B, C and D.

Â Whichever one could of been called A, B, C and D, doesn't matter and

Â we have people that were assigned to each of these four different types of training.

Â And what you have in terms of the data are the number of hours that they take for

Â a common task.

Â After they were trained using method A, method B and method C or method D.

Â These are, different people that have been assigned to each of these tasks.

Â These are six people that have been assigned to each of these tasks.

Â And, and again the idea of random assignment being that, you have,

Â you should not be able to, to figure out

Â any kind of pattern when you look at people that were assigned to B versus

Â people that were assigned to C versus people that were assigned to A and D.

Â You should not be able to see that, they should be completely random.

Â That's what should be happening in terms of the assignment of

Â people to these tasks.

Â That's the data that we have and our task is to figure out

Â if the type of training has an effect on the amount of time that they take

Â to complete a common animation task after they get the strain.

Â How would you test this based on Anova?

Â And let's take a look at some basic underlying things

Â that are working when you're thinking about Anova.

Â Some things that are actually very intuitive about Anova and

Â what's it trying to do.

Â What is the null hypothesis?

Â What is your hypothesis when you're doing any kind of hypothesis test using

Â analysis, any kind of Anova Analysis, one-way Anova analysis.

Â You know hypothesis is that when you have multiple groups,

Â that each of those groups have the same population mean.

Â The null hypothesis or HO is that MU one equals MU two equals MU three and

Â you can have as many groups as you wish in terms of doing an analysis variance.

Â There's no treatment effect if these means are actually equal,

Â that means that whatever is the cause that you're studying, has no effect on this Y.

Â Whatever is the X you are studying has no effect on this F X or

Â what is Y as our dependent variable.

Â There's no treatment effect if all these means turn out to be equal.

Â Your alternative hypothesis is that not all of them are equal.

Â Here's a thing that you want to note about the alternative hypothesis for

Â analysis of variance.

Â It is simply saying that at least one of these means is different from the other.

Â It's not saying that all of them are different from each other, it's enough for

Â one of them to be different from the others for

Â the null hypothesis to be rejected.

Â We're simply saying one of them may be different from others.

Â In fact, you should make a note of the fact that it's not even saying

Â which one is different.

Â It's simply saying that one of them is gonna be different.

Â That's the alternative hypothesis.

Â The standard alternative hypothesis when you're talking about an analysis variance,

Â so one be analysis of variance.

Â Now let's take a look at translating this hypothesis to our particular example.

Â Remember the example is animation times, and how are they affected by training?

Â Our null hypothesis is that, training has no effect.

Â And the animation times are gonna be equal for

Â the four types of training that we have.

Â We have A, B, C, and D.

Â And the alternative hypothesis is that at least one of them is different.

Â Now how does an Anova work?

Â In terms of figuring out whether there's a difference or not.

Â What it does, is it takes the total variation in the data.

Â Now what you notice is that when you add the data for training type A,

Â the time taken by all the six people was not exactly the same.

Â There was variation in that time within that category A.

Â There wasn't variation within the category B, C, and D.

Â So, the task that Anova has, or the way it works,

Â is that it takes the total variation and divides it up between a variation

Â that is because of the treatment and variation that is because of random error.

Â What do we mean by variation that's because of random error?

Â It's saying that the variation that exists within a group is because of random error,

Â is because of there might be different things that might be explaining it.

Â It's not because of the fact that it's in that group.

Â It's not because of the fact that somebody is getting trained

Â using training method A, or B, or C, or D.

Â It's because of random error, then being assigned to A, B, C, or D.

Â And on the other hand, on the left side you have the treatment effect,

Â which is a variation that's happening because of the treatment effect.

Â Here we're talking about the difference between the groups.

Â These are called by different names, so the variation due to treatment effect,

Â you can see it's called sum of squares treatment, sum of squares between groups.

Â Variation due to random error is some of squares errors,

Â sum of squares within and within groups to variation.

Â The concept offer measuring these variations

Â is the idea that we're going to take the data and

Â figure out what is the variation within each group and then between each group.

Â And then we're not gonna get into the actual calculations for this,

Â but how does this end up looking in terms of the results when we see Anova results.

Â When we look at Anova results you basically see this

Â One-Way Anova Summary Table.

Â What is this table telling us?

Â It's telling us the two sources of variation in the first and

Â the second row in this table.

Â The first row is between treatment variation, and

Â you see something called Degrees of Freedom there.

Â Degrees of Freedom is K minus one here, and the K stands for the number of levels.

Â What was the number of levels that we had here?

Â We had four levels of training, so K minus one would be four minus one and

Â degrees of freedom are simply saying how much is going to be based on.

Â The treatment in how much is going to be determined based on the other values

Â being known.

Â So that's what degrees of freedom stands for and

Â without getting into the technicalities of how it's used in statistics,

Â we simply need to know that it's going to be a k minus 1.

Â It's simply gonna be the number of levels minus 1, so in our case it'll be three.

Â The sum of squares is going to be between treatment effects, and

Â that's going to be the difference between a, b, c and d.

Â And then before we move on to the mean square column and the F column,

Â let's go down to the second row, which is the within source of variation column.

Â And there you see the degrees of freedom being n minus k.

Â N being the sample size.

Â We had a sample size of 24, so

Â 24 minus 4 is going to give us the degrees of freedom of 20.

Â And why is degrees of freedom important from an interpretation perspective,

Â you'll see that we have a decision rule for

Â deciding whether we accept or retain or reject the null hypothesis.

Â We shouldn't be saying exceptional hypothesis technically.

Â We should say reject or retain the null hypothesis, so

Â when we have to make that decision these degrees of freedom will matter.

Â The k minus 1, the n minus k will matter.

Â That's what we'll be using in order to make that decision.

Â Anyways, moving through the second row of within errors,

Â we have n minus k as being the degrees of freedom,

Â the sum of squares is the sum of squares error, sum of squares within.

Â That is talking about the variation within each of those a,b,c and d groups.

Â Now, let's move on to the second to last column which is the means square variance.

Â The mean squared variance is taking the sum of squares between and

Â then adjusting it by the degrees of freedom.

Â So SSB divide by k minus 1.

Â The SSB was the sum of square between treatments you divide that by the degrees

Â of freedom and you get the mean square between and similarly you get

Â the mean square within by taking SS within and dividing it by n minus k.

Â Finally you get to what is going to be important as our decision rule for

Â deciding whether we will reject or retain the null hypothesis and

Â that is what is called the F statistic.

Â The F statistic is a statistic that you will calculate and

Â that follows the F distribution.

Â We'll see what the F distribution is in a minute, but

Â the S statistic is calculated based on MSB divided by MSW.

Â And the intuition behind using this S statistic is that if there

Â is a treatment effect, this F ratio, this F statistic should have a high value.

Â 18:45

If you think about what we're saying there, we're saying the mean square

Â between should be higher than the mean square within by a substantial quantity.

Â And that's how we would get a higher ratio here,

Â mean square between divided by mean square within.

Â Because that's we'll be telling us that when we compare the variations

Â between treatments, it is much higher than the other variation that we find

Â than the variation that we find within each of those groups.

Â So that's what it would be telling us, intuitively speaking.

Â So higher F ratios would lead us to say yes,

Â we can say that there is a difference between these groups.

Â There is a difference that is in y based on where on the levels of x at which

Â we have these four different groups in our case, four different types of training.

Â Now, even before we collect the data and do the statistical test,

Â we can set up a decision rule for our analysis of variance test.

Â So here's the decision rule that we can set up.

Â We can say that we know we're going to use the F distribution for

Â analysis of variance and the F distribution has only upper tail,

Â unlike the normal distribution that doesn't go to negative, which should

Â make sense because it's a ratio and ratio doesn't make sense for it to be negative.

Â So you have both values that are coming from squared values

Â that's why both of them are going to be positive.

Â So going back here we have the sum of square, which is a squared value,

Â sum of square within which is a squared value.

Â So both positive values and then we get a ratio which is going to be positive.

Â That's why it's an upper tail test, but

Â coming back to the idea of what is it in terms of the decision rule?

Â Well, we have to set up a alpha value even before we collect data and

Â go out and do the test.

Â So we say, this is our value and

Â once we've set up the alpha value we can set up a rejection region.

Â The rejection region is going to be based on what is the alpha value.

Â So we say if it's 0.05 or 5% that becomes our rejection region and

Â based on that rejection region, we can come up with a F value.

Â F critical value which is going to say, here is the F value based on our alpha,

Â and if you remember we were looking at degrees of freedom earlier.

Â And this is where the degrees of freedom come into play.

Â The F statistic is going to come from knowing the alpha value and

Â knowing the degrees of freedom.

Â So k minus 1, or the number of levels at which that factor

Â is going to be which is k minus 1 is going to be the numerator degrees of freedom.

Â And the denominator degrees of freedom are going to be n minus k or

Â the sample size minus the number of levels at which you have that particular factor.

Â So that's what we're going to use.

Â Now, before we had Excel to do our job for us and other software to do our job for

Â us we had to go to this distribution and have table for it.

Â So we had a table of F distributions and the tables were different based based

Â on numerator and denominator degrees of freedom.

Â So based on the numerator and denominator degrees of freedom,

Â you looked at a table and you said, given my alpha value of 0.05 or

Â 0.10 or whatever I have decided, I will get a F value.

Â What does that F value give me?

Â It basically gives me a anchor.

Â It gives me a number.

Â So in our case, we had numerator degrees of freedom 3,

Â denominator degrees of freedom 20, and we use an alpha value of 0.05.

Â It's giving us a critical value of 3.1.

Â It's giving us a value of 3.1, which is saying,

Â if I get a F ratio based on my data,

Â which is greater than 3.10 I will reject the null hypothesis.

Â So as you see before I collected the data I am setting up the rejection rule.

Â If I get a F observed which is what we're gonna call what we get from our data.

Â If I get a F observed value that's greater than 3.10,

Â I will reject the null hypothesis.

Â And what you can see on the screen here is that

Â there is an Excel formula that you can use in order to get the exact F value.

Â Now, we'll use Excel to do these calculations, so

Â you need not use the Excel formula to come up with the F critical.

Â But if you were interested you would be able to come up with the F critical

Â based on this.

Â So now, let's go ahead and use Excel and see what we can do

Â in terms of the analysis for the data that we have over here.

Â 23:28

So here we have the data for the training for animation problem and

Â you can see that there are four types of training, a, b, c and d.

Â And what you have in each of those columns are the animation times for

Â each of the four types of training.

Â So these are random samples taken and put into each type of training.

Â And then their timings are measure for how much time they took for

Â the animation task.

Â So first, let's check if we have the Excel add in for solving this problem in here.

Â So we have to look for.

Â We go to the options in Excel, and we are looking for

Â the add-in four, the analysis two pack.

Â So, we make sure we have that added in, and if you don't then you need to go in

Â here and add this by saying go, on this button and then hitting OK.

Â So, that should add that in.

Â And then, we can move on to do the analysis of variance for this problem.

Â So, let's click on Data.

Â And then, you see data analysis.

Â This is the add-in that has been activated by you clicking on that option.

Â We hit data analysis.

Â The first option on this menu should be, and

Â here we see it, it should be ANOVA single factor, which is what we're doing.

Â So, it's ANOVA single factor because it's one independent

Â variable with multiple level.

Â So, we hit OK there.

Â And that brings us to the menu of Input Range.

Â Here, you have to pay attention to two things,

Â you can have the data grouped in columns or in rows.

Â We have it in four different columns.

Â So, we are going to let it be on the default which is columns.

Â We will have labels in the front row.

Â So, you see a check box for that.

Â And we're going to click on that when we get to it.

Â But let's put the input range, and

Â we're simply going to highlight the input range as being A, B, C, and D.

Â 26:21

Now, that you've seen how we used Excel to do this analysis, do the one way analysis

Â of variance, you can see the results here replicated from what you saw in Excel.

Â And here you can see pointed out that the P value is .0005,

Â so .001, and that's less than .05.

Â What you can also see in this output is

Â the different averages for each of those groups.

Â But, coming back to first thing that we want to do in terms of testing the null

Â hypothesis, here our P value, our observed P value is .001 and

Â that's less than our alpha value that we had set of .05,

Â so we reject the null hypothesis that these are equal.

Â That the values of the animation times are equal for the four types of training.

Â We say, that there's actually going to be at least

Â one that is different from another one in these times.

Â So, the task times are not the same for different training regimes.

Â Now, what you can also see is that we did get the off F observed value of 9.2,

Â so if you see in that analysis of variants table,

Â you see the column that says F has a value of 9.00335,

Â or we can call it a 9, and that F observed is what we are going

Â to compare with the F critical to make our decision.

Â So, the F critical that you see, which is in the last column, is 3.09.

Â So, what we're saying here is there are two ways in which we can come up with

Â the decision to reject the null hypothesis.

Â They will always be giving us the same result.

Â One is to say that the F critical is 3.09 and the F observed

Â happens to be much greater than that, so we can reject the null hypothesis.

Â And the second way of saying it,

Â is that the P value of .001 is less than the alpha value of .05.

Â And therefore, we will reject the null hypothesis.

Â Now, from a practical point of view.

Â You'd be saying, well, this told us what?

Â This told us that one of them is different, but which one is better?

Â Which one gives us lower animation times?

Â In terms of completing the task in less time.

Â So then, we wouldn't be able to say this

Â from a statistical perspective as to which one is actually better, but

Â you can at least go and look at the means of each of those group.

Â So for A, B, C, and D, you can see that the means are 8.5, 6.5, 8.83, and 10.33.

Â So, what can you say from there?

Â That B has the lowest average time

Â that was taken by people to complete their animation task.

Â So, people who got trained using method B

Â were the quickest in completing their animation task.

Â And we can also say that people who got trained using method D took

Â the longest in terms of completing the animation task.

Â So, that's something that we can also say.

Â 29:24

So, once more, let's put the decision rule into a picture form and

Â see what that same rule that we saw from the table and how it applies.

Â So, what we're saying here is that the F critical of 3.10 and

Â the F observed of 9.0.

Â If you remember, we had set up the rejection region to be to the right of

Â 3.10 and here, we can see that 9.0 is way out there, so

Â we will reject the null hypothesis.

Â And we say the training method has a significant effect.

Â Now again, you'll be saying well,

Â what's the practical value in terms of pointing out one of them that's different?

Â We do have something that we can do once we get this result.

Â So, there's something called a post hoc analysis that you can do

Â after you get a significant result in ANOVA.

Â It only makes sense to do a post hoc analysis,

Â if you got a significant result in ANOVA.

Â So, ANOVA is telling you at least one of them is different

Â the Post Hoc analysis will tell you, well, which one of these?

Â Which pair is different?

Â A versus B, A versus C, A versus D.

Â B versus C.

Â B versus D or C versus D.

Â So, you have these pairs that you're looking at.

Â And you want to find out which of them is different.

Â And we have something called a post hoc test.

Â And here is one example of a post hoc test, because there are multiple post

Â hoc tests and without getting into the specifics of each one of them, some have

Â advantages for certain types of data some have advantages for other types of data.

Â So, there are advantages and

Â disadvantages of using multiple types of test depending on which software you use.

Â If you use something like MINITAB or SPSS or SAS or R, you might be

Â able to get different post hoc tests and results from different post hoc tests.

Â But the idea is going to be there, you're going to be testing the specific

Â hypothesis of each mean being equal to the other.

Â Mu1 being equal to MU2, or MU1 being equal to MU3.

Â That would be the null hypothesis, and you would.

Â If you reject the known hypothesis,

Â you would say that one is indeed different than the other.

Â That's what you'd be able to say, but only based on a Post Hoc test.

Â 31:29

All right so we've looked at a one way ANOVA.

Â We looked at a very simple example of using one particular X and

Â looking at it's effect on Y.

Â But of course, you can have a situation where you can use more than one

Â X and look at the Y, so that's called a two way ANOVA and there

Â what you can do is you can be looking at X1 and X2 as having an effect on Y.

Â The same rules apply in terms of the types of data.

Â Y has to continuous.

Â X1 and X2 have to be discrete.

Â So, for example, you could be looking at two different types of pizza dough.

Â And two different types of training for people.

Â And the joint effect on the time that it takes for the pizza to get ready.

Â Something like that is possible using a two-way ANOVA.

Â And there are the advantages.

Â You will also be able to look at the interactions between those two effects.

Â And you're putting those two X's together, so

Â you can look at the interactions of those two effects.

Â Of course, you can look at each of those separately using one-way ANOVA, but

Â the advantage of doing that as a two-way Anova, is that you can say,

Â you can answer the question of, does the Y value depend

Â on X1 and does the Y value depend on X2, and

Â does the effect of X1 on Y depend on the value of X2?

Â Does the effect of X2 on Y depend on the value of X1?

Â You can answer those kinds of questions using a two-way ANOVA.

Â The other point that I'd like to make, in closing, about this topic of

Â analysis of variance, is that depending on which kind of software you use,

Â you might have to structure the data differently.

Â You might have to structure it as being in one column, and

Â giving all of the Y values in one column.

Â And the column next to it stating repeatedly the levels of X.

Â So, X equals 1, 1, 1, 1, 1, many, many times.

Â And then, our Y values based on X equals 1, and then X equals 2 many, many times.

Â And that could be a single column, in which you have the data.

Â Or you could do it in four columns like we had when we used Excel,

Â we had it in four different columns,

Â and we did the calculation from data was structured that way.

Â So, that's something that you have to keep in mind.

Â The final thing I'd like to say about analysis of variance is remember that you

Â do not have to have an equal sample size.

Â So if you have four columns of data and the four columns are unequal,

Â that's fine, if there are four different levels of the factor and

Â you don't have equal sample sizes.

Â That's fine.

Â You can carry on with using an ounce of variance.

Â And most software, including Excel, will be able to handle that for

Â you in terms of giving you the analysis results.

Â So, that's ANOVA for you.

Â Next, we'll look at regression.

Â