0:00

Hi, my name is Brian Caffo. I'm in the Department of Biostatistics at

Â The Johns Hopkins Bloomberg School of Public Health, and this is Mathematical

Â Biostatistics Bootcamp--lecture eight, on asymptotics.

Â In this lecture, we're going to take a trip to Asymptopia.

Â Asymptopia is a land where we have an infinite amount of data, so it should be

Â fun. We're going to talk about limits, but

Â limits of random variables. And so there's intricacies that you have

Â to account for when you consider random variables instead of standard mathematical

Â limits. And it's a quite difficult subject, but

Â we're going to show that we have basically two tools, the Law of Large Numbers and

Â the Central Limit Theorem, that are going to be our primary methods for looking at

Â random variables. So let me just review numerical limits

Â first. And I'm not gonna go into too much detail,

Â and I'm gonna kind of treat it here as heuristically.

Â Just to illustrate, in case you asymptotic's a little bit rusty.

Â Suppose I had a sequence, where A1 was .9, A2 was .99, and A3 was.999.

Â And I, and I hope you can get the pattern at this point.

Â So clearly, this sequence in some sense of the word converges to one.

Â It converges to .9999999, that gets closer and closer to one.

Â As, the element of the sequence, gets larger and larger.

Â Well we can formalize this. The formal definition of a limit is for

Â any fixed distance we can find a point in the sequence.

Â So if the sequence is closer to the limit than the distance from that point on.

Â So take, for example. This sequence.

Â The distance between AN and one is ten to the - N where N is the point in the

Â sequence. And so, if we pick any distance, say,

Â epsilon, we can find an n so that ten to the minus n is smaller than epsilon.

Â And then, because ten to the minus n just keeps getting smaller as n get larger,

Â then it's smaller than epsilon from that point onward.

Â So that satisfies our definition of limit. So clearly, this converges to one.

Â And it's kind of an interesting fact that and infinite sequence of .9s and one is

Â the same number. But that's kind of an interesting fact

Â that if you ever take a class on real analysis though, they'll discuss that sort

Â of thing at length. But anyway I hope you get this basic sense

Â of a limit as'n' goes to infinity the sequence converges it looks more and more

Â like it's limit so that the distance just gets closer and closer and closer and

Â never gets bigger again. The problem with that is that, that only

Â works for just a series of numbers, right? Now we wanna talk about say, limits of

Â averages of coin flips. And then it gets much harder.

Â So take, for example, the average. And now we're gonna talk about.

Â In average comprised of n observations. So let's say x n bar and now instead of

Â saying x bar like we typically do which is our average, we're going to annotate it by

Â a subscript n to show that it's an average of the first n of a collection of IID

Â observations. So for example x bar could be the average

Â of the result of n coin flips which is the sample proportion of heads.

Â Well, there is a limit theorem for averages, and we would say that xn bar

Â converges in probability to a limit. And we, relate this back to the ordinary

Â definition of limit, by saying, well, the probability that x is closer, than any

Â specific distance converges to one. So in this case, probability xn bar minus

Â the limit being less than, any quantity epsilon that you fix.

Â Then that probability is a number pn. Right?

Â And the definition of convergence in probability is that sequence of numbers,

Â pn, converges to one, right. So we've converted the problem of what

Â does it mean for a random variable to converge, we've converted that back to the

Â definition of convergence of a series of numbers.

Â We've said convergence and probability. Implies convergence of the collection of

Â probability numbers in the standard sense of the definition of convergence.

Â So now we have a way of talking about how random variables converge.

Â So, establishing that a random sequence of variables actually converges is hard, as

Â you can imagine, it's hard. If you look back at the previous

Â definition, it's not the easiest thing in the world to think about.

Â So we have something that makes it a lot easier for us, and that's called the law

Â of large numbers. So, if you've heard people talk about the

Â law of averages, typically they don't know what they're talking about.

Â But probably they are referring to the law of large numbers.

Â And the law of averages and there is the law of large numbers.

Â Basically says that if x and x on a IID from a population from mean-mu and various

Â sigma squares so in this case we are going to assume that the random variables have a

Â variance. Its interesting to note that there are

Â distributions where there is no variance. You try to calculate the variance and you

Â get infinity or something like that. In this case we're gonna assume that there

Â is a variance and it's finite. Then the sample average of IID

Â observations always converges in probability to mew.

Â And that's the, called the law of large numbers.

Â Probabilists make a lot of distinction over various kinds of the laws of large

Â numbers. And they've worked very hard to get kind

Â of minimal assumptions for the law of large numbers to work.

Â And in fact, we're using a very lazy version of the law of large numbers.

Â They would probably upset at us for teaching this one, but it's okay, we don't

Â care. The basic idea I want you to get is that,

Â averages converge to mu. Averages of IID Observations converge to

Â mu, the population mean, that, from which the observations were drawn.

Â This is a good thing, right? This basically says, if we go to the

Â trouble of collecting an infinite amount of data.

Â Then we get the number that we wanna estimate mu, exactly.

Â Right? Which is good, because collecting an

Â infinite amount of data takes a lot of time.

Â Actually, infinite amount of time. If you're willing to make this many

Â assumptions, the Xs all have a finite variance.

Â It's pretty easy to prove the law of large numbers using Chebyshev's inequality

Â which, Chebyshev's inequality if you remember had a pretty simple little proof

Â so this kind of very complicated idea, it's amazing that it has a fairly simple

Â little proof. So remember that Chebyshev's inequality

Â states that the probability that a random variable is more than k standard

Â deviations from the mean is less than one over k squared.

Â So therefore the probability that Xn bar minus mu, in absolute value, is bigger

Â than or equal to k standard deviation of x bar sub n is less than or equal to one

Â over k squared. Now, let's pick an epsilon.

Â Pick any distance epsilon. Because remember to establish the

Â convergence of a limit of numbers we have to pick an epsilon.

Â And now to establish convergence and probability, we have to show that the

Â probability that xN bar minus mu, being bigger than epsilon, goes to zero or being

Â less than epsilon goes to one, those two statements are equivalent.

Â And so let's let K, from our previous definition, be epsilon divided by the

Â standard deviation of Xn bar. K is not a random variable, right?

Â Epsilon is a number that we pick. And standard deviation of Xn bar, Xn bar

Â is a random variable, but the standard deviation of it is sigma over square root

Â N. Okay?

Â So this is just a number, there's nothing random in our definition of K.

Â So if you plug that back in for K, right? You get the probability that Xn bar, minus

Â mu, being bigger than epsilon, is less than or equal to, the standard deviation

Â of Xn bar squared, which is the variance, of Xn bar, divided by epsilon squared.

Â And we, from a previous lecture, already calculated what the variance of Xn bar is.

Â It's sigma squared over n. So this probability is, less than or equal

Â to, sigma squared over n, epsilon squared. Now as n goes to infinity, sigma squared

Â isn't changing and epsilon squared isn't changing and the n's in the denominator,

Â so this whole thing goes to zero, okay? So the probability that the random average

Â xm bar is more than epsilon away from the mean goes to zero as n goes to infinity or

Â we stated the probability that xm bar is less than.

Â Epsilon from the mean goes to one as N goes to infinity.

Â So either of those statements equivalently say that Xn bar converges in probability

Â to mu. I think it's kind of staggering that it's

Â really basically two lines is all you need to establish this fairly complicated

Â result. Now on the next, page, I just have a,

Â simple example where I, simulated random normal with a mean of zero, and I show the

Â cumulative sample mean. So I took one random normal, and then I

Â took that random normal and generated a second random normal and then averaged it.

Â And then generated a third random normal and just averaged it with a remainder.

Â And the iteration at the bottom is the number of observations that goes into that

Â mean, right? And then on the vertical axis it shows the

Â value of the average. And you can see if, first, there's quite a

Â bit of variability, right? Remember, the variance of the average is

Â sigma squared over n. The variability is going to zero and this

Â dash line is the asymptote. Right?

Â And as you can see this average, as we include more observations in it, is going

Â to converge to this. You can see it already converging a little

Â bit by 100 iterations to this dashed line. That's simply the law of large numbers.

Â So let's cover some useful facts about the law of large numbers.

Â One Interesting fact is. Functions of convergent random sequences

Â converge to the function evaluated at the limit.

Â So. This includes sums, products, differences.

Â So, for example Xn bar, squared, converges to mu squared, right?

Â Because, x bar converges to mu, and so this is just a function of x bar.

Â So Xn bar squared converges to mu squared. Something different is that average of the

Â squared observations converges to a different entity okay, right?

Â Let's go through this a little bit carefully, just because it's kind of an

Â odd little point. So Xn barred squared converges to mu

Â squared, but if we sum up Xi squared the individual observation squared and divided

Â by n that no longer converges to mu squared.

Â Well why not? Well it's the difference between the

Â square of the average and the average of the squares.

Â So, in this case it's the average of the squares.

Â In this case each xi squared is a random variable so we could just call it y

Â instead of Xi squared. And then their average of these Xi

Â squareds or average of these Ys is gonna converge to the population mean of those

Â Ys. Well we can calculate that.

Â We know what the expected value of Xi squared is because we can use the shortcut

Â formula for the variance. Which, recall, was expected value of Xi

Â squared minus expected value of Xi quantity squared.

Â We can just work that formula to solve for expected value of Xi squared.

Â And show that that's equal to sigma squared plus mu squared.

Â So, the average of the squared observations converges to sigma squared

Â plus mu squared, whereas the square of the average converges to mu squared.

Â So it's kind of an interesting little point, but just remember that those things

Â are different. And by the way, this little fact that we

Â just sh-, showed, we can use this to prove that the sample variance converges to

Â sigma squared, and we'll do that on the next slide.

Â So let's actually go through this proof that the sample average converges.

Â To sigma squared. And I think you'll see in the process of

Â the proof that it doesn't matter whether we divide by n or n - one, it's going to

Â converge to sigma squared. So here we have the definition of the

Â sample average, summation xi - xmbar squared, all divided by n - one, we're

Â going to use the unbiased estimate of the sample variance.

Â Well, recall there was a shortcut formula for the sample variance and it worked out

Â to then be the numerator had a shortcut formula that was summation Xi squared - n

Â xbar squared. And so we're going to use that formula.

Â And then we get summation Xi squared over N minus one minus N barred squared over N

Â minus one. And let's just rearrange terms, and

Â multiply and divide by some Ns because that N minus one is a little annoying.

Â So we have n over n - one times summation xi squared over n - n over n - one Xn bar

Â square. Let's look at each of these things in turn

Â and remember, from the previous slide I told you, I didn't prove this, and just

Â have to take it as true, is that, you know, if you multiply convergent sequences

Â they converge to the product of the limits.

Â If you add and subtract convergent sequences they converge to the difference

Â of the limits and so on. So let's look at each of these terms one

Â at a time. N over n minus one, clearly that converges

Â to one. You don't believe me, plug n over n minus

Â 1n for very big value of n in your computer and you'll see that it gets

Â closer and closer to one. Probably the easiest way to see this is

Â it's one over, one minus one over n and that one over n clearly goes to zero.

Â Okay. We just on the previous slide talked about

Â how, summation Xi squared over n converges to sigma squared plus mu squared.

Â So, the second term converges to sigma squared plus mu squared.

Â And we have minus n over n minus one, which again converges to one.

Â And then Xn bar squared converges to mu squared.

Â We talked about that on the previous slide.

Â So we have this expression right here, sigma squared plus mu squared minus mu

Â squared, which is just sigma squared. So that proves that the sample variance

Â converges to sigma squared, and then of course, the sample variance if we happen

Â to divide by the bias sample variance, if we happen to divide by N instead of N

Â minus one, also converges to sigma squared.

Â And then we can square root the sample variance and get the sample standard

Â deviation and see that it converges to sigma as well, just by the rule from the

Â previous page where we said that functions, in this case the square root

Â function, of convergent random variables converge to the function of the limit.

Â So what we've found is, we have our law of large numbers, and with a couple of rules

Â that we just stipulated, we've got that the sample mean of IID random variables

Â converges to the population mean that it's trying to estimate.

Â The sample variance converges to the population variance that it's trying to

Â estimate. The sample standard deviation converges to

Â the population standard deviation that it's trying to estimate.

Â And in all these cases you see the pattern that the sample entity converges to the

Â population quantity that it's trying to estimate.

Â Basically saying, that if you go to the trouble of collecting an infinite amount

Â of data. Then you actually get the value you that

Â want to estimate. You don't get it with noise, you get the

Â actual value. We give this name a property and we say

Â that an estimator is consistent if it converges to what you want to estimate.

Â And the Law of Large Numbers is basically saying that the sample mean is consistent,

Â and then we know, now know that the sample variance and the sample standard deviation

Â are consistent, and it doesn't matter whether you're dividing by n or n - one,

Â they're all consistent. But also remember the sample mean and the

Â sample variance are unbiased as well. And by the way the sample standard

Â deviation is not unbiased. Consistency by the way is a very weak

Â property so the sample standard deviation is biased, unlike the sample mean and the

Â sample variance. Consistency is a very weak property.

Â Saying that an estimator is consistent is not even really a necessary property.

Â It, it seems like it should be necessary, but if, if something converges to mu plus

Â epsilon where epsilon is a miniscule number that is of no importance, then that

Â estimator is not consistent. So it's fair to say that consistency is

Â sort of a, kind of a weakly necessary but definitely not sufficient property for an

Â estimator to be useful. We have also seen that being unbiased is

Â neither necessary nor sufficient for an estimator to be useful either.

Â For example, we've talked about the biased variance trade off that estimators can be

Â slightly biased and you can want that, because you improve on the variance.

Â So, what we're winding up with is a collection of properties that describe

Â estimators. And you really need to think about the

Â collection of properties as a whole to evaluate an estimator.

Â And these various mathematical concepts are, are useful but they never, in

Â isolation, tell the full story on the utility of a estimator.

Â They might be useful for eliminating really dumb things, if something's

Â definitely not consistent in a way that it doesn't converge anywhere near the

Â estimate. Probably that's not something that you

Â wanna use. But apart from those kind of stark

Â circumstances, these properties you need to take as a collection to try and decide

Â which estimators are the right ones to use.

Â So let me give an example of an estimator that's consistent but not very good.

Â So take the data, and, only take the first half of the collected observation.

Â So we have, instead of Xn bar, we have Xn over two bar, right?

Â That estimator is, of course, consistent, as n goes to infinity.

Â You know, it just has, fewer observations than if you took all of them.

Â But still the n is going to infinity in this case.

Â It's just n over two. So that estimator's consistent, but it's

Â got an obvious better estimator right in front of you.

Â Basically, the estimator using all of the data.

Â So there's a, a particular example of an estimator that's not consistent where

Â there's a better estimator that comes to mind.

Â Here you have to actually account for the fact that the estimate with all of the

Â data has a lower variance than the estimate with half of the data.

Â So that's enough discussion of limits and the law of large numbers.

Â Next we're gonna go on to the central limit theorem.

Â A very important theorem in statistics.

Â