0:02

There's an important result that arises out of these facts and the important

Â result is that the variance of the sample mean of a collection of independent and

Â identically distributed random variables is sigma squared over n.

Â So let's assume that we have a collection Xii equals one to n, that are independent

Â and identically distributed, IID, and that the variance of the distribution that

Â they're drawn from is sigma squared. Okay?

Â So let's calculate variance of X bar. Well that's just the variance of one / n

Â times the sum of the Xs, right. That's just the sample mean formula, the

Â sum of everything divided by the number of things you added up.

Â The one / n is a constant, so we can pull it out.

Â One / n^2, and we get the variance of the sum.

Â The variance of the sums, the sum of the variances because the Xs are independent.

Â Hence uncorrelated, and then because they're IID, the variance of each XI is

Â the same, its sigma squared. And we've added up n sigma squared so we

Â get n sigma squared, so it works out to be sigma squared over n for the final line

Â here. So, what does this mean?

Â It's really quite an interesting fact. What this says is, if I want to know

Â what's the variance of the distribution of averages of ten random variables, say,

Â from a distribution. I don't actually have to know what that

Â distribution of averages actually is. So I don't have to do that.

Â All I have to know is what the variance from the original distribution that the

Â individual observations are drawn from and that gives it to me.

Â I just have to divide that by n. Right?

Â So if I want the variance, I divide by n. If I want the standard deviation, take the

Â original standard deviation and divide by square root n.

Â And why is this important? Because remember, eventually we'd like to

Â connect all of these ideas, these population model ideas, to data.

Â And, if we have a bunch of things that we're willing to model as if they were

Â IID. Well, we get multiple draws from the

Â distribution of individual observations. All the XI's are separate draws from the

Â original distribution. So we can estimate things like sigma

Â squared. But we only get one sample mean.

Â Let's say we have a sample of 100 observations, we only get one sample of

Â 100. So if we calculate the sample mean of all

Â those 100 observations we have nothing empirically to estimate the variance of

Â sample means of a 100 variables, we don't have repeated samples of 100 variables.

Â We only have the one. What this result says, you don't need

Â that, right? Because all you need is the variance of

Â the original population and divide it by n.

Â The variance of the original population is something we can estimate.

Â And so it's a very nifty result. Let me give you an example of this

Â property that you could do at home to just test this result to make sure it's true.

Â Recall in the last lecture. We said the variance of a die roll, which

Â takes values one to six with equal likelihood.

Â One, six for each number. The variance of a die roll was 2.92.

Â Okay. So what that says is if you roll a die

Â over and over and over and look at the distribution, you'll get about one-sixth

Â of each number. And that the variance of that

Â distribution, so if you were to roll it thousands and thousands of times and take

Â the variance of the thousands of measurements, you would get around 2.92.

Â So do that, roll a die a lot of times and take the sample variance of the thousand

Â die rolls for example and you'll get about 2.92.

Â Why is that? That's saying because the sample variance

Â of lots of die rolls estimates the distribution of the population of die

Â rolls which is this uniform distribution of one to six and its variance is 2.92, so

Â you'll get that. Now here's the question that the

Â calculation on the slide is answering. Suppose now instead of rolling a die over

Â and over again you roll ten dice and took their average.

Â And repeated that process over and over again.

Â Right? So now this would no longer be uniform on

Â the numbers one to six. Still, the minimum would be one, right?

Â If you got all ten 1s, the average of ten 1s is one.

Â And it, the maximum would still be six. The average of ten 6s is still six.

Â And so, the bounds are one and six. But it would not look like a uniform

Â distribution on the numbers between one and six cuz you can get all sorts of

Â different numbers, right? You can get numbers between one and two,

Â two and three, and so on. So it has kind of a funny distribution,

Â the distribution of averages of ten die rolls.

Â So imagine if you were to do that. Roll your ten dice, and take the average.

Â And do that over and over again so that you got, say, 10,000 averages of ten die

Â rolls. Right?

Â And you wanted to know what was the variance of that distribution.

Â Well it seems kind of like a hard calculation.

Â First you'd have to figure out what's the distribution of the average of ten die

Â rolls which seems kinda like a hard distribution.

Â We'll actually later on discuss that that's maybe even a little bit easier to

Â calculate than you might've thought. But this calculation says you don't even

Â have to worry about that. We know that the variance of the

Â distribution of individual die rolls is 2.92, so the variance of the distribution

Â of averages of ten die rolls is 2.92 / ten, so it will be 0.292.

Â And so we could run this experiment in R. For example, where we rolled a digital die

Â thousands of times and took the variance of a 1,000 die rolls and you'd find it's

Â about 2.92. And then we could also do this experiment

Â where we roll ten dice, took the average, and repeated that process over and over

Â again and got 10,000 averages of ten die rolls and you would find that the variance

Â of those averages was about 0.292. Very interesting, and so it's a very

Â simple formula. And so, let's belabor this point, on the

Â next slide. So when the xs are independent with a

Â common variance, the variance of x-bar is sigma squared over n.

Â The quantity sigma over n, the square root of this is so important and we give it a

Â name and we call it the standard error of the sample mean.

Â Basically, a standard error is nothing other than the standard deviation of a

Â statistic, in this the statistic is the sample mean, but you might have a standard

Â error another statistic for example the median, then itself has a standard error.

Â It's may be little hard to calculate but nonetheless it has a standard error.

Â So, what is the standard error? The standard error of a sample mean is the

Â standard deviation of the distribution of the sample mean.

Â So, sigma, the standard deviation talks about how variable the population is.

Â Sigma over square root n talks about how variable the population of average is of

Â size n from that population R. So two different statements and they

Â estimate different things. So, for example, if the Xs are IQ

Â measurements, Sigma talks about how variable IQs are.

Â Sigma over square root ten, say, then talks about how variable averages of ten

Â IQs are. Okay, so they're different, they're

Â obviously related, but they're different concepts, and it's easy to confuse the

Â two. An easy way to remember this, by the way,

Â is that the sample mean has to be less variable than a single observation,

Â therefore its standard deviation is divided by square root n, so that also

Â gives you a sense of how the rate at which standard aviation's decline as you collect

Â more data. So, since we've talked about the sample

Â variance a lot why don't we actually define it.

Â So the sample variances, that entity that we used data to estimate the population

Â variance. So recall the population variance was the

Â expected value or the average, the expected deviation of a random variable

Â around its population mean. Right?

Â So what is the sample variance? Well it's the average deviation of the

Â sample values around the sample mean. So it's quite convenient.

Â Now notice it's not exactly the average. We divide by n - one instead of n, which

Â is a little annoying but we do it. So imagine for the time being that this an

Â n in the denominator, not an n - one. Then the sample variance is nothing other

Â than the average square deviation of the observations around the sample mean.

Â So the sample variance is an estimator of the population variance sigma squared.

Â And just like the population variance has a short-cut formula, the sample variance

Â also has a short-cut formula. Summation Xi minus X bar squared the top

Â of the variance calculation is summation Xi squared minus nX bar squared.

Â So, if some one gives you the sum of the squared observations in the sample mean,

Â then you can calculate the sample variance really quickly.

Â So, why do we divide by n - one instead of'n?

Â And again, for large samples it's irrelevant right.

Â The factor n - one / n is small. So you are going to get about the same

Â answer either way. But for small samples it can make a

Â difference. So why do we choose to divide by n -one?

Â So, recall we have this property unbiasness.

Â And the property of unbiasness meant that the statistic, it's expected value equal

Â to the quantity that it's estimating. So, just to remind you, the sample

Â variance is a function of our observed data.

Â It's a function of our random variables, right?

Â So itself is a statistic. So it is a random variable itself.

Â So it has a distribution, and so, that distribution has a variance, and that

Â distribution has a mean. Okay?

Â That's what were going to talk about right now, is that the mean of that distribution

Â turns out to be sigma squared if you happen to do the calculation where you

Â divide by n - one. So I'm going to show it by showing that

Â the expected value of the numerator of the statistic is equal to n - one times sigma

Â squared, that's the same thing as showing that the, the sample variance is on biased

Â because then you just divide both sides of this equation by n - one and you get the

Â result. So let's do that.

Â Just to say it again because it's important, what are we doing?

Â Remember the sample variance is itself a random variable, that random variable has

Â a distribution, that distribution has a population mean, and we want to say that,

Â that population mean is in fact sigma squared.

Â Okay. So expected value of the numerator part of

Â the sample variance calculation, the sum of the squared deviations around the

Â sample mean. If we use the shortcut formula, that's sum

Â of the expected value of the Xi^2 of minus expected value of X bar squared.

Â Okay. And, now let's use a really kind of nifty

Â fact. Recall for the variance.

Â The shortcut variance formula was defined as the expected value of a random variable

Â squared, minus the expected value of the random variable quantity squared.

Â Well, we can shift that formula around, to get it to say that the expected value of a

Â random variable squared is the variance plus the mean squared.

Â And that's what we do right here, so the expected value of Xi^2, is variance Xi +

Â mu^2. Okay.

Â And then the same thing is true of course for the mean because the mean itself is

Â another random variable. So expected value of' X bar squared is

Â variance x bar + mu^2 and then we have this NR front.

Â Okay. And so the variance of Xi is sigma

Â squared, so you wind up with some sigma square + mu^2 which is the constant so we

Â wind up with n of those and then the variance of X bar we just arrived a little

Â bit ago is being sigma squared over n. So we get n times sigma squared over n +

Â mu^2 and just collect terms now and you get n - one sigma squared.

Â So this is really interesting fact. So this says that the expected value of

Â the variance is in fact the quantity its trying to estimate if in fact you divide

Â by n - one instead of n, and that's why we divide by n - one.

Â Another way to think about this is that well, you know, we don't know the

Â population mean, mu, and if we knew it, instead of plugging X bar into the sample

Â variance formula, we would plug mu into the sample variance.

Â We would calculate the deviations of the observed observations around the

Â population mean rather than the deviations around the sample mean.

Â And so, the idea is that we will sort of lose a degree of freedom by plugging in X

Â bar, its sample analog, instead of plugging in that mu.

Â So that's the kinda heuristic behind why you divide by n - one.

Â It's an interesting fact tough. It's not a 100 percent clear that you do

Â want to divide by n - one, it's sort of every introductory statistics textbook

Â divides by n - one but there's this interesting phenomenon called the

Â bias-variance tradeoff and in this case we've obtained an unbiased estimator by

Â dividing by n - one instead of n but what if we'd divided by n.

Â Maybe as exercise, I could ask you to calculate the expected value of the sample

Â variance. If it was calculated with n in the

Â denominator instead of n - one. Okay, so basically, what is the expected

Â value of n - one / n s^2 And you can calculate that very easily, it is not

Â sigma square but it is quite close to it. So it's, it's a biased estimator but the

Â other thing I would ask is well which of the two estimators, the estimator s^2

Â calculated with n - one in the denominator or calculation of the variance with an n

Â in the denominator, has a lower variance, and what I do mean by that.

Â Remember the sample variance is a random variable.

Â It has a distribution, that distribution has a variance.

Â And the question is, which of the two calculations dividing by n or dividing by

Â n - one results in a smaller variance of that distribution.

Â And what does that mean, that would mean how precise your estimate of the variance

Â is. I'll give you the punch line.

Â The sample variance divided by n has a slightly lower variance, than the sample

Â variance divided by n - one. So, it's another kind of classic bias

Â variance trade-off. In this case, we divide by n -one because

Â we want unbiasedness. But then we wind up with slightly, greater

Â variance. If we divide by n, we wind up with a

Â slightly lower variance of our sample variance but it's slightly biased.

Â I know extremely well established statisticians that say they would prefer

Â to have the lower variance. But pretty much every introductory

Â statistics textbook divides by n - one. It's kind of an interesting discussion,

Â you know, one of the confusions that always comes up seems quite simple.

Â We divide by an n - one when we calculate the sample variance.

Â People have a tendency to confuse that with the n that we divided by when we

Â talked about the standard error of the mean.

Â And so let's just try to avoid some of this confusion.

Â Suppose you have a bunch of observations that you're willing to model as IID with

Â population mean mu and population variance sigma squared.

Â Then the sample variance, S^2 estimates the population variance, sigma squared.

Â The calculation of S^2 involves dividing by n - one, and we just spent forever

Â talking about the difference between dividing by n and dividing by n - one.

Â Then, the standard error of the mean is Sigma over square root end.

Â So, S over square root n will estimate the standard error of the mean.

Â So we've already divided S^2 by n - one then we square rooted.

Â And then we divide by an additional square root of n if we want the standard error of

Â the mean. Okay, and I am just trying to avoid some

Â confusion because people seem to get confused by that.

Â So, I, I guess if you wanted to attach a label to the quantity S over square root

Â n, it's the sample standard error of the mean.

Â What does it estimate? It estimates the population standard error

Â of the mean, sigma over square root n. Let's tie this down with some actual

Â numbers. So I was involved in a study where there

Â was a lot of organolead workers in this case, I took a subset of 495 of them and

Â the total brain volume for the lead workers, they were interested in studying

Â how their exposure to lead in their job changed their brain volume.

Â So TBV stands for total brain volume, in this case as a measure of the brain volume

Â on the inside of the skull, so and all of the measures are in cubic centimeters.

Â So the mean, in this case, is 1151. If we're willing to assume these

Â organolead workers are, say, an IID draw of organolead workers from a population

Â that we're interested in. Then the sample mean, 1151 would be an

Â estimate of that population mean. The sum of the squared observations works

Â out to be this number. So the standard deviation, the sample

Â standard deviation works out to be that number Minus 495 times the sample mean

Â squared all divided by 494 that minus one in the denominator.

Â Square root the whole thing you end up with 112.

Â So what does 112 describe? 112 describes the variance of the

Â population of brain volumes of organolead workers.

Â Okay, so it, its a direct estimate of my sample variation, right, and then its an

Â attempt estimate, if you view my data as a sample from a population of organolead

Â workers. It attempts to then, estimate the

Â population standard deviation of that distribution.

Â So we can, for example, use Chevey Cheves rule to interpret what the combination of

Â the mean and the standard deviation say about brain volumes of lead workers in the

Â population. Now, what does, if I take this 112.6 and

Â divide it by square root 495 give me, gives me five as the numerical result but

Â what does that five actually estimate or do for us?

Â Well, the five is no longer talking about the variation in total brain volumes in

Â the population. It's talking about the variation in

Â averages of 495 organolead workers. So the idea is if we're willing to model

Â our 495 organolead workers from as a draw from a population of organolead workers,

Â then five estimates the distribution of averages of 495 draws of organolead

Â workers from that population. It talks about how variable averages of

Â 495 brain volumes are. The 112 talks about how variable brain

Â volumes are, okay? So, let me just repeat that cuz it's very

Â important. The 112 talks about how variable brain

Â volumes are of organolead workers in the population and it directly talks about it

Â in the sample. But it's an estimate of our population

Â variance and the five is an estimate of the population standard deviation of

Â averages of 495 organolead workers. So, I hope you're getting a sense of what

Â these numbers are, are trying to calculate.

Â So, there's several concepts that are being used here, first we have our

Â observed data, right? And these quantities, the sample mean,

Â sample standard deviation and standard error tell us things about out observed

Â data. Right?

Â And then, there's the assumptions, for example, that they're IID, that help to

Â try and connect it to a population. So that we can maybe generalize the

Â results from this data to a population of organalead workers.

Â Say, for example, if you wanted to use this data to inform policy and then, these

Â numbers would then be estimates of these population quantities.

Â And then dividing by the square root 495, it's telling us things about.

Â How variable this mean is relative to the variability in the population.

Â Okay? So, that's the concepts that we're trying

Â to use, and we'll formalize these much more when we actually do things like

Â generate confidence interval and perform hypothesis test in these things [music].

Â