0:03

Hi, welcome back to this lecture on asymptotics.

Â We're going to talk about the central limit theorem,

Â one of the most important and celebrated theorems in statistics.

Â The idea behind the central limit theorem is very neat.

Â It gives you a way to perform inference with random variables.

Â We actually don't know what distribution they come from.

Â And basically what the CLT states is that the distribution of

Â averages of iid variables, properly normalized,

Â becomes that of a standard normal as the sample size increases.

Â It's basically saying is if you wanna evaluate error rates associated with

Â averages, it's often enough to only

Â compare them relative to a normal distribution.

Â And the CLT applies in an endless variety of settings, and

Â with a collection of various asymptotic tools,

Â people have figured out a way to apply the central limit theorem in so many cases.

Â The majority of inference you do, if you've used statistical software and

Â calculated a p value or confidence interval,

Â probably the underlying motivation of what you're doing is asymptotics in most cases,

Â maybe not all cases, definitely not all cases, but in most cases,

Â you're probably appealing to some use of the central limit theorem especially if

Â you have larger sample sizes.

Â Let's actually just state the central limit theorem at least as far as

Â we're gonna use it.

Â So let's let X1 to Xn be a collection of iid random

Â variables that come from some population with mean, mu, and variant sigma squared.

Â And let's let Xn bar be their sample average.

Â Then, probabilities, and in this case let's look at the distribution function,

Â the distribution function of the normalized mean.

Â So we take Xn bar, subtract off its expected value, mu, and

Â divide by its standard deviation, sigma over square root n.

Â So this whole quantity here has mean zero and variance one, so

Â this probability that this z random variable is less than or

Â equal to the specific point z, is

Â the standard limits to the standard normal distribution evaluated at that point z.

Â Okay, so what does this say?

Â This basically says that probabilities associated with

Â sample means looks like probabilities associated with normals.

Â And if you standardize the sample mean so that it has mean zero in variance one,

Â then the probabilities look like standard normal probabilities.

Â So I wanted to reiterate the form of this normalized quantity here.

Â It's Xn bar, n estimate, minus

Â the population mean of the estimate, divided by the standard error, okay?

Â And this is something that you can practically bank on.

Â Cuz if you take any statistical estimate based on iid data, and subtract

Â off its population mean and divide by its standard error, that quantity's most

Â likely going to wind up limiting to having a standard normal distribution.

Â Let's just go through an example.

Â And this is kind of a neat example and I'll explain why in a minute.

Â Imagine if you were stuck on a desert island.

Â 3:45

So, remember that for a die, the expected value of a die roll is 3.5.

Â If you don't remember,

Â we went through this calculation a couple lectures ago when we covered the variants.

Â The variance of a die roll is 2.92.

Â And the standard error is square root 2.92 divided by the number of

Â die rolls going into the average so that's 1.71 divided by square root n.

Â And so the standardized mean would just be the average of the die rolls minus 3.5

Â divided by 1.71 divided by square root of n.

Â On the next slide, basically what I've done is I rolled the die,

Â let's say one time, and I did that over and over and over again.

Â So an average of one.

Â And I standardized the average of 1, right?

Â So in case, for an average of 1, it would be a die roll minus 3.5 divided by 1.71.

Â And so now we have a distribution that's centered at zero and has variance one.

Â And I plotted the standard normal density in gray in the back.

Â I plotted the histogram of my die rolls.

Â And of course it can only take six possible values, and

Â you see those six spikes at one to six, and it's not perfectly discrete because

Â the software I'm using to plot the histogram assumes the data is continuous.

Â So, you see the six spikes basically from the fact that one die roll,

Â if we plot a histogram of a bunch of one die rolls, they are going to look like

Â a bunch of spikes, one to six, and in this case cuz it was normalized, they're gonna

Â look like the numbers one to six, where you subtract off 3.5 and divide by 1.71.

Â Okay now imagine if I just took two die rolls.

Â I took a die, rolled it once, I rolled it a second time, I got my average.

Â I subtracted off 3.5 and I dived by 1.71 divided by square root 2.

Â And I repeated that process over, and over, and over again and

Â I plotted histogram of the result of a lot of averages of two die rolls.

Â And so the histogram is going to give me a good sense of what the distribution of

Â the average of two die rolls what the standardized average of two die rolls is.

Â And in the background we have the normal distribution, and

Â on top of it we have the distribution of the average of two die rolls.

Â And I think that you'll agree that it's already, just by two die rolls,

Â looking pretty good.

Â Amazingly good.

Â And now imagine if you had six die rolls.

Â So, I rolled the dice six times, right?

Â Took the average, subtracted 3.5,

Â and then divided by 1.7 when divided by square root 6, right?

Â And then I did that process over and over and over again, and

Â I got lots of normalized averages of six die rolls, and I plotted a histogram,

Â and you can't even see the standard normal distribution in the background.

Â Because the distribution of the average of six die rolls looks so similar.

Â So if I was on my desert island and if I needed a standard normal, I think I could

Â probably get away with six die rolls and taking an average subtracting by 3.5 and

Â dividing by 1.71 divided by square root 6.

Â And the reason why I bring this up is because,

Â it's an interesting fact that the famous statistician Francis Galton,

Â who was quite a character, you should look up Francis Galton if you get a chance,

Â he was Charles Darwin's cousin, and he's a very brilliant guy.

Â He needed standard normals, but

Â was trying to get to simulate standard normals prior to having computers.

Â So how did he do it?

Â Well, he basically rolled dice and applied the central limit theorem

Â to get standard normals, which is really quite clever.

Â And he had, because it was a pain in the butt, and he wasn't on a desert island.

Â So he had time constraints.

Â He actually invented dice that made it a little bit easier for him to do.

Â I think he took standard dice and kind of wrote on the corners and stuff like that.

Â So there was more values than one to six.

Â But the basic idea is that the distribution of averages

Â looks like that of a normal distribution.

Â And just about in any sense,

Â regardless of what the underlying distribution of the date is.

Â There's some assumptions there that we assume that the variance was finite and

Â some other things like that.

Â But, for the purpose of this class,

Â it's basically any distribution that we can probably think of.

Â So let's take another version of the central limit theorem, or

Â another instance of the central limit theorem, from flipping coins.

Â So now, instead of a die I have coin.

Â And I want to evaluate the average of a bunch of coin flips.

Â So let's let Xi be zero or one result of the ith coin flip of a possibly unfair

Â coin where the p is the true success probability of the coin.

Â And the sample proportions say p hat is just the average of the coin flips.

Â Right?

Â The p hat is, in this case,

Â the percentage of ones and the average of the x i is the same thing, of course.

Â So remember that the expected value of the x i's is p.

Â 9:06

And things look worse than the die roll, in this case.

Â Let's take a fair coin and generate some plots.

Â So if you take a fair coin, and flip the coin,

Â and record either zero or one, depending on whether it's heads or tails,

Â subtract off 0.5, and divide by square root .5

Â times 1 minus .5 and do that over and over again and get a histogram of the results.

Â You only get two possible values right?

Â They're not 0 1 because you've normalized them and

Â here in this first plot you see it doesn't look normal at all of course.

Â After 10 coin flips, so here now we're flipping the coin 10 times,

Â taking the sample proportion of heads subtracting off 0.5 and then dividing by

Â 0.5 times 1 minus 0.5 divided by 10 square root of that whole thing.

Â And we repeat that process over, and over, and over again and

Â get a histogram of the results.

Â We see that the distribution of the normalize average of ten coin flips

Â looks pretty normally distributed.

Â But still is very discreet compared to the normal distribution.

Â Once you get to 20 coin flips it's looking pretty good.

Â It's overlaying the standard normal distribution pretty well.

Â And it turns out in the coin flipping example,

Â it converges to normality quick if the coin is fair.

Â If the coin is unfair, if you see the bottom row of plots.

Â Here we have a coin that's unfair where it's more likely to get a tale

Â then a head.

Â I think it's 0.7 0.3, I believe is what I used for the simulation.

Â Then you see that at the start it's much more likely to get the zero value than

Â the head value, the one value.

Â These aren't zero and one because they've been normalized.

Â If you look at the distribution of averages of ten coin flips it doesn't look

Â very normal at all, twenty coin flips it still doesn't look very normal.

Â So it takes a lot longer.

Â 10:52

So this is a problem with the central limit theorem

Â that generally doesn't get a lot of play.

Â The central limit theorem says, basically,

Â if you have I ID random variables and you take an average,

Â the distribution of the normalized average converges to that of a distribution.

Â But it doesn't tell you how fast it does that?

Â Right? It just tells you that it does eventually.

Â So, for some distributions, you know,

Â it might take thousands of observations going into the sample mean for

Â the distribution of sample means to behave like that of a Gaussian distribution.

Â And for others like the die roll, we saw that it only took six.

Â And then it was nearly overlaying the standard normal distribution.

Â So, it's an unfortunate fact that the central limit theorem can't give you any

Â guarantees on how quickly things converge to normality.

Â 11:46

Oh, and I wanted to point out one last thing on this coin flipping example.

Â So, if you've ever been to a science museum, and you've seen these machines,

Â where say, a ping pong ball is dropped and it goes through like kind

Â of a Pachinko collection of left-right decisions kind of randomly.

Â That's exactly a binomial experiment.

Â Every time the ping pong ball hits a nail,

Â basically, it can either, with a 50% probability, go left of right.

Â So by the end of the process, it's had a bunch of coin flips, so the position it is

Â at the bottom Is exactly the sum of a bunch of Bernoulli trials and principle.

Â You have to actually create it so that it approximates the coin flipping well.

Â For example, if it was tilted to the side, it wouldn't be a fair coin anymore.

Â But as it gets towards the bottom where the ping-pong balls are collected,

Â It's sum of a bunch of Bernoulli random variables.

Â Well the distribution of averages is approximately Gaussian,

Â so of course the distribution of sums is then approximately Gaussian,

Â cuz sums are just averages multiplied by n.

Â So what you'll see in the science museum is that

Â in the bottom they'll have traced out a Gaussian distribution.

Â And you'll see that if they run this ping pong ball machine long enough

Â the balls tend to fall in a bell shaped curve.

Â And that's just an application of the central limit theorem saying that for

Â this particular value of n, the central limit theorem

Â approximation of the sum of a bunch of coin flips is pretty good.

Â And this ping pong device is actually called a quincunx, and it was invented

Â by Francis Dalton, who we talked about earlier, the cousin of Charles Darwin.

Â So it's kind of a very interesting little tidbit.

Â So the next time you're at the science museum,

Â you can explain this to whoever you're with.

Â So the reason we use the Central Limit Theorem, in practice it's useful as

Â an approximation, basically saying that the normalized mean.

Â It has a distribution that's approximately like the standard normal distribution.

Â So let me give you an example.

Â So, remember that 1.96 is a good approximation to the 0.975th

Â quantile of a standard normal and so negative 1.96 is a pretty

Â good approximation to to the two point 5th percentile.

Â 14:13

So what the central limit theorem then says is that 95% is about

Â the probability that a standardized mean lies between minus two and plus two.

Â So let me just repeat that.

Â So it's about 95% the probability that

Â a standardized mean lies between the values minus 2 and plus 2.

Â And then, so let's like take the interior.

Â Of this probability statement and just rearrange terms a little bit.

Â Making sure that if we multiply everything by a minus sign we flip the inequalities.

Â And we get that xm bar plus 1.96 sigma over square root end is bigger than or

Â equal to mu.

Â Which is greater than or equal to xm bar minus 1.96 sigma over square root end.

Â That probability is about .95, what that's saying Is that the random interval,

Â Xn-bar, plus or minus 2 standard errors contains mu,

Â the non-random entity, with about 95% probability.

Â In this case we wanted, let's say, a 95% interval, so we took 5%,

Â divided it by 2 and got 2.5 in the 97.5th.

Â Quantiles, basically add and subtracted those, x and bar, plus or

Â minus, that quantile times the standard error.

Â So that's 95%, but if we wanted something other than 95% why don't we just say,

Â the standard normal quantile z, at one minus alpha over two?

Â So in this case it was one, minus alpha is 0.05, divided by two, that's .025.

Â So that would be .975.

Â So that would be z.975, which is 1.96.

Â That's where we got the 1.96 from.

Â But we could do it for another value.

Â Imagine if you wanted a 90% interval.

Â Right, then we would have z, the quantile for the probability,

Â 1- alpha in this case, which would be 0.1 divided by 2.

Â So 0.1 divided by 2 is 0.05.

Â So we would need the 95th percentile to plug in there if we wanted a 90% interval.

Â 16:18

Okay, so this is the idea that we can create so-called confidence intervals,

Â random intervals that create the quantity that they're trying to estimate

Â with 95% probability with 1 minus alpha over 2% probability.

Â We can create such quantities by taking the estimate plus or

Â minus a standard normal quantile times the standard error.

Â And in this case it's the sample mean.

Â But we'll find that we can kind of trick and

Â play games with the central limit theorem in a lot of large numbers and

Â get this kind of interval to work in a lot of cases.

Â So tons of the intervals that you're going to look

Â at in your statistics classes are going to be exactly of the form, estimate, plus or

Â minus, standard normal quantile, times standard error.

Â Anyway this is called a 95% confidence interval,

Â it's a main stay of statistics and especially so called frequent statistics.

Â And it's basically an estimate from you that has some acknowledgement

Â of the uncertainty from the fact that we

Â have data that we're treating as if it's random.

Â So that's what a confidence interval is.

Â If we just give Xn bar, we just give the sample averages an estimate that has no

Â acknowledgement that there's random variation that we're not accounting for.

Â So in this case, we're saying if we take Xn bar,

Â We're willing to assume that the data's independent and identically distributed

Â with a finite variance, then we can apply the central limit theorem.

Â And the central limit theorem says that, well, if the distribution is cooperating

Â and n is large enough, then the interval contains mu with probability 95%.

Â Now, so unfortunately, I'm gonna pontificate a little bit.

Â It's an unfortunate fact that confidence intervals are kind of really horribly hard

Â to interpret.

Â 18:08

And this is just a by-product of so called frequentist inference.

Â So, the interval, if we actually get data, and

Â calculate a confidence interval, then we just have two numbers.

Â Those two numbers either contain mu or they don't.

Â So the standard frequentist logic says that probably that interval contains mu or

Â not, is either zero or one.

Â It either contains it or it doesn't.

Â So what the real interpretation of a confidence interval is,

Â is that this procedure, when applied over and over again,

Â creates confidence intervals that will contain mu 95% of the time.

Â So that's actually the real interpretation of a confidence interval if you

Â are a hardball frequentist.

Â And it's unfortunate that that interpretation is so hard.

Â So let me just repeat it because it is kinda of tricky.

Â It's basically saying that the confidence interval procedure, given that the central

Â limit theorem applies and all of our assumptions apply, the confidence interval

Â procedure creates intervals that if we are to repeat the procedure over and over and

Â over again on repeated experiments, let's say if for 95% intervals, about

Â 95% of the time the intervals will contain the value that they're trying to estimate.

Â That's a very confusing statement, and that's one of the main criticisms of

Â confidence intervals, that if you're strict about their interpretation,

Â which not everyone's strict about it because the strict interpretation's so

Â hard, that it's actually quite difficult.

Â Maybe I'll try and dig up some examples when a confidence interval

Â makes its way into some very important problem and show you how the press

Â is basically incapable of interpreting intervals this way.

Â And to their credit, it's because it's hard and

Â it's kind of a crazy interpretation.

Â There is, in fact,

Â a different way of creating intervals that gives you a better interpretation.

Â 20:40

Let me give you another formula that I find a super useful instance of the CLT,

Â because it's a kind of quick back of the envelope type calculation.

Â So, by the way, we're gonna spend much more time

Â calculating specific confidence intervals and so on.

Â Right now we just wanted to give you the theory behind why confidence

Â intervals work.

Â But there's a specific incidence of confidence intervals that's really quite

Â useful so for sample proportions remember that variance is p(1- p).

Â And so the confidence interval just plugging in takes the form p hat

Â the sample proportion plus or

Â minus the standard normal quantile times square root p(1-p) over n.

Â Now in all the intervals p is what we want estimate, so

Â we obviously can't plug into this formula.

Â We need to replace p with an estimate.

Â If we replace p with the sample proportion, p hat, 1- p hat, you get a so

Â called walled interval.

Â And as I stated in this previously slide but

Â didn't really talk about, you generally,

Â replacing the unknown parameters in the standard error calculation

Â with their estimates, we usually work with something that's called Slutsky's Theorem.

Â And that will usually work and will create a interval that's asymptotically valid.

Â So let's try and instead of doing that right now we'll talk a lot more about

Â walled intervals and that sort of thing.

Â Let's actually talk right now about quick back of the envelope bounds.

Â So remember that the worse that p(p- 1) can be,

Â the biggest it can be is if p is a half.

Â So this thing is less than or equal to a quarter as long as p is between 0 and

Â 1 which is our restrictions because we're talking about a proportion or probability.

Â So in this case if we let alpha be 0.05 so that the standard normal quantile is 1.96,

Â which is close enough to 2, among friends, let's just call it 2.

Â Then the standard error, the whole margin of error,

Â part of the confidence interval 2 square root p(1- p) over n.

Â Works out to be 2 times square root a quarter divided by n, and so

Â you wind up with 1 over square root n.

Â So, if you want a quick back of the envelope confidence interval estimate for

Â a sample proportion, just take the sample proportion and and add and

Â subtract 1 over square root n.

Â And that's a really handy little formula, and it kind of tells you sort of

Â when you calculate a proportion, about how accurate it is.

Â So, for example, if I have a proportion of 100 coin flips,

Â we know that the accuracy is going to be about 1 over square root 100, or 0.1.

Â So it's a very useful back of the envelope calculation.

Â Just remember p-hat plus or minus 1 over square root n.

Â