0:07

Lecture nine, on Confidence Intervals. In this lecture, we're going to talk about

Â confidence intervals mostly in the setting where we're going to assume that our data

Â come from a Gaussian distribution. So we'll talk about confidence intervals,

Â Confidence intervals for variance. We'll talk about Gosset's T distribution.

Â And we'll use Gosset's T distribution to create confidence intervals for means.

Â And we'll touch on the subject of profile likelihoods.

Â In the last lecture, we talked a little bit about the Central Limit Theorem and we

Â talked about the Central Limit Theorem to create a confidence interval.

Â I think in that example we created a confidence interval for a Binomial

Â Proportion. Now we'll discuss the creation of better

Â confidence intervals for small samples using Gosset's T Distribution.

Â Small samples where we're willing to treat the data as if it's continuous.

Â So to get to that point, And Gosset's T, t distribution is often

Â called student's T, t distribution and we'll explain why in a little bit.

Â So to discuss the T, t distribution, we first have to go through what the

Â Chi-squared2 distribution is. And so we'll develop that first.

Â And any rate, what you'll probably hopefully have noticed whenever we create

Â confidence intervals, there seems to be some kind of prevailing logic that we use.

Â Basically we try to create a probability statement.

Â And then we, in a sense, manipulate the probability statement to generate an

Â interval. Well, this strategy is codified here.

Â So basically, we create a pivot or a statistic that doesn't depend on the

Â parameter of interest. I should say we create a parameter or a

Â statistic whose distribution doesn't depend on the parameter of interest.

Â So, for example, if you use the central limit term; if you take a sample mean,

Â subtract off the population mean that you're interested in, and divide by the

Â standard error, well that statistic clearly depends on the parameter of

Â interest. But the distribution of that statistic, at

Â least in the limit, doesn't depend on the parameter that you're interested in, in

Â the sample mean. And then after we've created that pivot,

Â we solve the probability that the pivot lies between bounds for the parameter.

Â And so that's the kind of general strategy we'll go through.

Â You don't have to really know or really understand the strategy at a very general

Â level, but just in case you're wondering why does it always seem like we're

Â generating confidence intervals using basically exactly the same technique, it's

Â because we're employing the strategy kind of like this.

Â So let's talk about the Chi-squared distribution.

Â So remember the S^2 is the notation we have been using for the sample variance.

Â And let's further assume that the data that comprised the sample variance are all

Â IID normal with mean mu and variance sigma2.

Â Squared Well then, n - one times the sample variance divided by sigma2 squared

Â is a random variable that we call a Chi-squared2 distribution.

Â And the Chi-squared2 distribution has an index something that differentiates

Â between different kinds of Chi-squared2 distribution and we call that index the

Â degrees of freedom. So this statement right here will be read.

Â The normalized sample variance follows a Chi-squared2 distribution with n - one

Â degrees of freedom. So the Chi-squared2 distribution is the

Â skewed distribution in it. Of course since the sample variance has to

Â be positive it has support between zero and infinity.

Â And the mean of the Chi-squared2 distribution is its degrees of freedom.

Â And we can see that very directly. Because we recall the sample variance is

Â an unbiased estimator. That's why we divide by n - one instead of

Â n. So if you look at this equation, when you

Â take the expected value. The expected value of S^22 is sigma2.

Â Squared You can see that n minus expected value of n2 - one S^2 /2.

Â Sigma squared, the sigma2 squared will cancel out and you'll get the degrees of

Â freedom, or its expected value. The variance of the Chi-squared2, by the

Â way, is the, twice the degrees of freedom. As an aside, we're not actually going to

Â spend a lot of time doing this, but as an aside, you can use this idea to create a

Â confidence interval for the variance. So imagine if I were to draw a Chi-squared

Â density and Chi-squared2 n - one alpha is the alpha quantile from that distribution.

Â Then imagine taking the, say, alpha over two, you know?

Â Let's take alpha to be 0.05 for example. The 2.5th percentile and the 97.5th

Â percentile from the Chi-squared2 distribution and looking at the

Â probability that this Chi-squared2 random variable, n - one S^2 over2 sigma2 squared

Â is between those two quantiles. Well, that has to be one - alpha, just by

Â the definition of those being the 2.5th and 97.5th quantiles of the Chi-squared2

Â distribution. So this equality holds the equality that

Â one - alpha equals this probability. So this statistic, n - one S^2 over sigma

Â squared, is our pivot. Let's solve for the parameter that we're

Â interested in, sigma squared and you do that, keep track of your inequalities

Â being sure to flip them if you invert everything, and that sort of thing; and

Â you wind up with following probability statement.

Â There's a one - alpha probability that the random interval, n - one S^2 divided by

Â the upper quantile, and n - one S^2 divided by the lower quantile contains

Â sigma squared. So we call this interval, the n - one S^2

Â 5:38

divided by the two quantiles, we call that interval a confidence interval for sigma

Â squared. And because the probability that the

Â random interval contains the parameter it's estimating is one - alpha, we call

Â it, say, a 100 times one minus alpha percent confidence interval.

Â So, as an example, alpha might be 0.05 and so you would then wind up with a 95%

Â confidence interval for the parameter sigma squared.

Â Now, we should talk a little bit about what this confidence interval means.

Â It's the interval that's random, in the paradigm that we're sort of thinking about

Â here. The interval is random.

Â And the parameter sigma squared is fixed. So when you actually collect data and you

Â form this confidence interval, it either contains sigma squared, which you don't

Â know, or not. There's no probability with that statement

Â anymore. It's either one or zero, it either

Â contains Sigma square or not. So what's the actual interpretation of a

Â confidence interval? Well if you take an Intro Stat class, they

Â make a lot of hay out of this point. And they basically say, okay, the

Â confidence interval is a procedure that if you were to repeatedly do the experiment

Â and form confidence intervals, 95% of the confidence intervals say that if you're

Â creating 95% confidence intervals. 95% of the confidence intervals would

Â contain the parameter that you're interested in.

Â And you could, as an example, do this in R.

Â You could generate normal data. You could, from a normal, let's say mu is

Â zero, from a normal zero sigma square distribution,

Â You could formulate this confidence interval from the sample variance, you

Â could check, whether or not that interval contained the sigma squared that we used

Â for simulation. And you can repeat that process over and

Â over, and over again. And you will find that about 95% of the

Â intervals that you get, if you construct 95% confidence intervals, will contain the

Â Sigma square that you used for simulation. And that's the logic behind confidence

Â intervals. And, they're, they're a little notoriously

Â hard to interpret, if you go for this sort of hardball interpretation.

Â They're notoriously hard to interpret. Kind of a, a much weaker interpretation of

Â the confidence interval that's a little less specific, is you get two numbers out.

Â And these two numbers are an interval estimate of the parameter that you want to

Â estimate but the interval estimate incorporates uncertainty.

Â So lets go through a couple of comments about this interval.

Â So one thing is this interval is not terribly robust, to departures from

Â normality. So, if your data is not normal, then this

Â confidence interval tends to not be that great.

Â Also, if you want a confidence interval for the standard deviation instead of the

Â confidence interval for the variance, You can just square root the n points of

Â the interval. The probability statement, one - alpha

Â equal to the probability that the random interval contains sigma squared.

Â Well you can still say that, that's one - alpha is equal to the probability of the

Â square root of the endpoints of the interval contains sigma and you haven't

Â mathematically change anything. So if you want an interval for sigma you

Â just square root the endpoints. So you might be wondering, okay, well if

Â this is heavily required normality do we have any other solutions other than this

Â interval if we want a confidence interval for the variance?

Â And it turns out the answer is yes, and several ways; but bootstrapping is kinda

Â the way that I prefer. But we're not going to talk about

Â bootstrapping in today's lecture. So today we're only going to take about

Â this confidence interval when you happen to be willing to stomach the assumption

Â that your data is exactly gaussian and you are willing to live with the consequence

Â that the interval you attained is not going to be terribly robust and departure

Â from that assumption. So the other thing I wanted to mention,

Â it's kind of a nifty little point, is suppose you wanted to create a likelihood

Â for sigma, and in this case the underlying data is Gaussian, with mean mu and

Â variance sigma squared. So it's hard because you have two

Â parameters. The likelihood is a bivariate function,

Â right? It has mu on one axis, sigma on the other

Â axis. And then the likelihood on the vertical

Â axis. So there's a little trick you can use to

Â create, I guess what I would call a marginal likelihood for sigma2.

Â Squared It turns out, and we're not gonna cover the mathematics behind this.

Â But that if you don't divide by sigma2, squared n - one S^22. And then don't

Â divide by sigma ^two. Well, first of all, that can't be

Â Chi-squared Let me just logic through that real quick.

Â That can't be Chi-squared because the Chi-squared density doesn't have any

Â units. Right?

Â So S^2 has whatever units the original data has.

Â Say it's in inches. It has inches squared units.

Â So you haven't divided by anything that's in the inches squared.

Â So n - one S^2 has inches squared units. And so it can't follow a distribution

Â that's unit-less like the Chi-squared distribution.

Â That's one of the reason why you have to remember to divide by sigma squared to get

Â the Chi-squared distribution to get rid of the units.

Â Let's suppose we don't divide by sigma squared.

Â Then you end up with a gamma. And a so-called gamma distribution, and

Â the gamma's indexed by two parameters, its shape parameter and its scale parameter.

Â In this case, the shape parameter is n - one / two and the scale parameter is two

Â sigma squared. And, either way, what you have is data,

Â You have a single number, n one, one S^2 and if you're willing to assume the data

Â points that comprise that number are Gaussian, then you can take the gamma

Â density and plug in the data and view it as a function of the parameters and plot a

Â likelihood function. So I'll go through an example of doing

Â this. So in our Organa Lead Manufacturing

Â Worker's example that we've looked at before, there was an average total brain

Â volume of 1,150 cubic centimeters with a standard deviation of 105.977.

Â And let's assume normality of the underlying measurements, which is not the

Â case, but let's do it. And let's calculate a confidence interval

Â for the population of variation in total brain volume.

Â I give the R code here, so I gave the standard deviations so our variances, you

Â know 105.106^2. Our n in this case as 513, confidence

Â interval, we want a 95% confidence interval so our alpha is 0.05. The

Â quantiles that we want we can just use the qchisq q function to grab those quantiles.

Â This function right here just grabs the two quantiles.

Â And then our interval is just n - twelve. S^2.

Â You know, the S^22 divided by the quantiles.

Â And then this puts it out from bigger to smaller.

Â I want it from smaller to bigger. So I use the RAV function to reverse it.

Â I think if I had just input my quantiles in the reverse direction, I would have

Â been okay too. And then, here, just take the square root

Â of that interval for an interval for the standard deviation.

Â And we get the interval is about 100 to 113.

Â So this interval, 100 to 113, is created in a way such that if the assumptions of

Â the interval are correct, namely that the underlying data are IID, normal, with a

Â fixed variance, sigma squared, and a fixed mean, mu.

Â Then the procedure, if repeated over and over again, 95 % of the intervals that we

Â obtain would be intervals containing the true standard deviation that we're trying

Â to estimate. Lets actually plot the likelihood as well

Â using this kind of likelihood trick that I gave.

Â So I, sigma valves is the sequence I want to plot.

Â And actually, I don't have to guess this because I just created this confidence

Â interval on the previous page that went from 100 to 113, so let's for good measure

Â go from 90 to 120. And I want to plot 1,000 points.

Â In R, you kind of have to pretty specific about the range that you want to plot and

Â how many points you want going into your plot.

Â And then I just give you the code here for evaluating the gamma likelihood.

Â It says basically, plug in the data n - one S^2. Right.

Â And remember the likelihood views that as fixed.

Â The shape doesn't involve anything other than things we know n - one / two.

Â And then the scale is the part that varies two sigma2.

Â Squared. And here, we're going to evaluate it over all the sigma vowels that I

Â assigned in the previous line. So this will evaluate that likelihood over

Â 1000 points, and return a vector of length 1000.

Â I want to normalize my likelihood. And I'll just kind of, you know, mostly

Â approximately do that by taking this vector, and dividing by its maximum value.

Â And then I'll plot it, type = l means plot it as a line instead of as a bunch of

Â points and then these two lines commands adds the one eighth and one sixteenth

Â reference lines. And then on the next page you actually see

Â the marginal likelihood for sigma. That's a whirlwind tour of confidence

Â intervals and likely a clause for variances when you're willing to assume

Â your data is exactly Gaussian. I hesitate to say this, but kind of those

Â slides aren't exactly terribly useful material.

Â You won't find a lot of people plotting marginal likelihoods for sigma.

Â I just gave it to you cuz it's kind of a nifty little result.

Â And, to be honest, the Gaussian confidence interval for variances,

Â You don't see them as much. People just would tend to do bootstrapping

Â these days instead, or some other more robust technique.

Â So, this material, It's neat, and it's, the, the primary

Â thing to do was actually introduce the Chi-squared distribution.

Â So next, we're going to talk about something that's incredibly useful,

Â probably one of the single most used distributions and techniques in all of

Â