0:00

[SOUND]

Â [MUSIC]

Â So next, let's move onto what do we start doing with data?

Â And here, what you have in front of you is an example,

Â this is data that I just made up.

Â So this is simply looking at time for service at a restaurant.

Â So if you think about time for service at a restaurant,

Â if I were to collect data on this, what am I going to get?

Â I'm going to get different times for different people.

Â Now without getting into reasons for why there may be different times for

Â different people, what might happen is in this case,

Â you can see it ranges from 10 minutes to above 20 minutes.

Â So the last category on this chart is above 20 minutes.

Â And what have I done here, I've taken the data from roughly about 105 observations

Â that I can imagine I would have collected if I was standing at the restaurant and

Â collecting data on times that customers took, and I've looked at the frequency.

Â So what is it saying?

Â The first bar is telling me that on 3 of those 105 occasions,

Â the time taken was between 10 and 10.9 minutes, and

Â we can keep going from there, and we can see that all the way toward the end.

Â There were 3 of the parties, 3 of the customers that took more than 20 minutes.

Â So it's giving you a range of values that were possible in this particular instance.

Â In this context, I had a range going from 10 to above 20,

Â and it's also giving me frequencies.

Â So what do we mean by frequencies?

Â They're simply saying, how often did this occur in my data?

Â And you can see that it's creating what we call in statistics a data distribution.

Â So a data distribution is nothing but simply taking data and

Â getting their frequencies and then drawing a picture of it,

Â drawing a bar chart of it and saying how does it look?

Â And then you start looking at the shape of this distribution, and

Â you can say something about the shape of this distribution.

Â So you may recognize this as looking somewhat like a bell curve distribution,

Â which you may already be familiar with and which we are going to look at next.

Â 2:28

So here's the normal distribution.

Â This is the distribution that is very common.

Â This is the distribution that's very commonly used.

Â We hear about it all the time, and it's because we like to convert things to

Â the normal distribution as much as possible.

Â It's because it gives us this power of being able to use z-scores.

Â And we'll talk about z-scores in a minute and what that means.

Â But basically, it has properties that we can use in order to make

Â any kind of inferences about populations based on samples that we've collected.

Â So that's why it's very popular.

Â So what is the normal distribution?

Â It is a distribution where, if you were to actually collect data,

Â it would look, and the frequencies would take this shape.

Â It would be bell-shaped, there are two parameters to this.

Â When we say parameters, there are two things in this distribution that matter.

Â One is the mean and one is the standard deviation.

Â The mean is indicated by the Greek letter mu, and

Â the standard deviation is indicated by sigma.

Â This is where the idea of Six Sigma comes from.

Â So sigma stands for standard deviation of the population.

Â When we have population parameters, we talk about it in Greek letters.

Â The corresponding statistics that we get from samples are talked about as x-bar and

Â sd or s as being the standard deviation.

Â So those are the two main things that we look at when we look at a normal

Â distribution.

Â Now the cool thing about this normal distribution or

Â a couple of cool things about this normal distribution are that it's symmetric.

Â One is it's 50% of the data is to the left of the mean,

Â 50% of the data is to the right of the mean.

Â And that's what we mean by when we say that the median and

Â the mean are exactly the same.

Â So those two measures of central tendency are exactly the same.

Â The mean is simply the calculated average.

Â And then the median is where 50% of the values lie bellow that,

Â 50% of the values lie above that.

Â And then the third measure of the central tendency here is the mode,

Â which is the value that has the highest frequency.

Â So in the case of the normal distribution, all these three are identical, so

Â the center meal value is the mean, it is also the median, it is also the mode.

Â 4:57

The probabilities of the values within this normal distribution,

Â we know that between plus or minus 1 standard distribution, we have 68%.

Â So if you go from the mean to the right that's 34%,

Â if you go from mean to the left, that's 34%, so 1 standard deviation to the left,

Â 1 standard deviation to the right, that encompasses 68%.

Â Similarly when you go to 2 standard deviations,

Â that's 95%, when you go to 3 standard deviations that's 99.7%.

Â And theoretically speaking, although this is never true in reality,

Â but theoretically speaking, this distribution has an infinite range.

Â So if you see, if you notice the way the normal distribution curve

Â has been drawn in this picture, it does not touch the x-axis.

Â It stays away from the x-axis, it becomes parallel to the x-axis.

Â And the point is that it keeps going up to plus or minus infinity on either side,

Â and it's an infinite distribution, theoretically speaking.

Â So those are the characteristics of the normal distribution.

Â And the way we use this normal distribution a lot, or

Â the reason we use this normal distribution a lot is this central limit theorem.

Â So what is the central limit theorem?

Â It's basically saying, if we were to take random samples from any population,

Â the probability distribution of the sample means starts to become

Â approximately normal as the sample size becomes large.

Â And what do we mean by sample size becomes large?

Â We like to think of the number 30.

Â A sample of size 30 is considered a good sample size for

Â the central limit theorem to apply.

Â And there may be debate about whether that's a good number or not.

Â There are other characteristics of the distribution that you need to think about,

Â but generally speaking, that's what we use

Â as a rule of thumb in terms of applying the central limit theorem.

Â We start talking about z values and

Â things like that when we have a sample size of 30 or greater.

Â So what do we mean by z values?

Â So here are two characteristics of the standard normal distribution.

Â So we talked about the normal distribution earlier,

Â whats a standard normal distribution?

Â A standard normal distribution is one where the mean is 0,

Â the mean is 0 and the standard deviation is 1.

Â So it has a fixed mean and a fixed standard deviation.

Â And we also know that once we can calculate the z value and

Â what do we mean by the z value?

Â We're basically taking any normal distribution and we are converting

Â it into the standard normal distribution by doing some calculations on it,

Â and we'll see those calculations on the next slide.

Â