0:04

So, welcome back.

Â The final thing we're going to do in today's lecture is to talk about

Â the profile likelihood, which is a method

Â for creating univariate likelihoods from multivariate likelihoods.

Â So in this case, we're going to look at the

Â bivariate normal distribution, which has two parameters, mu and sigma.

Â And we're going to figure out to get likelihoods for mu alone

Â and you could equivalently do it to get likelihoods for sigma alone.

Â And here's the idea.

Â The multivariate

Â likelihood is a bivariate surface.

Â It has mu on one axis, sigma on another axis, and the surface above it.

Â And to obtain a likelihood for mu, profiling is

Â basically like let's imagine we took a lamp and shine it along

Â the sigma direction and looked at the shadow that the

Â likelihood placed on the plane defined by the mu direction.

Â And that's exactly what it gives so it's

Â name is exactly indicative of the, of the technique.

Â And then, now we'll just go to how

Â do you actually execute the mathematics to do that.

Â So, in other words, we want to shine the light on

Â this bivariate likelihood and we want to get the function that you

Â obtain onto say, the wall, where the shadow occurs, okay?

Â So let's pick a particular value of mu 0 and we want to

Â know what's the value of this curve in the shadow at mu 0.

Â Well basically, the light will go through all values above the likelihood

Â and we'll get stopped anywhere on the likelihood and up until the maximum value.

Â And so what

Â we basically do is we maximize the joint

Â likelihood for sigma with mu fixed at mu 0.

Â And then, this process is repeated for lots of values of mu 0.

Â So, let's actually go through it.

Â So the joint likelihood with mu fixed at mu 0 is just the Gaussian density.

Â And then we have independent data so we take a product out front.

Â So it's sigma squared to the minus 1 half e to the minus xi minus and

Â we're fixing mu 0 squared divided by 2 sigma squared.

Â And collect all the terms and you get the next line.

Â With mu 0 fixed, then the maximum likelihood estimator

Â for sigma squared and you can go through this.

Â Log the likelihood, take derivatives, solve for

Â sigma squared, you know, so maybe just so

Â you don't accidentally take the derivative with respect

Â to sigma, replace sigma squared say by theta,

Â so that you remember that you're fixing sigma

Â squared as the parameter, not sigma as the parameter.

Â If you accidentally take derivatives with respect to sigma,

Â you'll get the square root of this answer, of course.

Â So then, you wind up with a summation i

Â equals 1n xi minus mu 0 squared divided by n.

Â That's actually a nice result, right?

Â If you fix mu at a particular value then your MLE for sigma squared

Â is the sample variance.

Â But instead of plugging in the sample mean and subtracting deviations around

Â the sample mean, you're subtracting deviations

Â around that specific value of the mean.

Â 3:12

So, it's a nice little result.

Â So, anyway, with mu 0 fixed, the maximum likelihood estimator

Â for sigma squared is this generalization of the variance right there.

Â So that's the peak of our likelihood, all right?

Â That's the point where the light

Â switches from being able to not go through the likelihood to the point

Â right above it where the light is

Â actually, you know, passes over the likelihood.

Â And that's that point.

Â That's that point that gets shadowed onto the wall at mu 0.

Â And so, we want to plug this back in

Â to the likelihood and we get this function right here.

Â Summation xi minus mu 0 squared over n raised to the minus n over 2 power,

Â and then e to minus -n over 2.

Â And this e to the -n over 2 is irrelevant because that it doesn't involve mu 0.

Â So that's for one mu 0 and if we did that for every mu 0, we would get a function.

Â And so here's our profile likelihood is this function.

Â Summation xi minus mu square raised to the negative n over 2.

Â That function is our profile likelihood.

Â And then again, this function is clearly maximized it at mu equals x bar.

Â You can,

Â of course, solve it.

Â But in general, one nice property of

Â the profile likelihood is that the maximizer of

Â the profile likelihood, the maximum profile likelihood

Â estimate is also your MLE for the parameter.

Â So, in this case, the maximum of the profile likelihood for mu is

Â going to be x bar, the same as the maximum likelihood for the complete value.

Â So if

Â we wanted to divide this by its peak value, we would simply divide it

Â by the same thing in, with instead of mu, there plug x bar n.

Â And that would normalize this function so it tops it out at 1.

Â So, lets actually go through the R code to generate this function, our

Â mu for the sleep data. So, our muVals, we're going to go from say

Â zero to three and do a thousand of them, so we plot a function of the thousand

Â mu not values, our likelihood values.

Â And then so it would just be the sum xi minus mu squared sum raise to the minus

Â n over 2 power so that's this term right

Â here, raise to the negative n over 2 power.

Â But I want it to be maxed out at

Â one, so normally I create the likelihood and then divide by its maximum value.

Â But in this case, I know exactly what the exact maximum value is.

Â It's when you replace mu by the mean.

Â So instead, I divide it by the mean right here and this sapply

Â is just a loop it says loop over mu values and do this function.

Â And then I'll plot them and connect them together with type equals l and then I'll

Â put the likely values above 1 eighth and

Â above 1 sixteenth and then I get this plot.

Â So that is my profile

Â likelihood for mu.

Â That is the function that I get if I take

Â the bivariate likelihood for mu and sigma, place a light along

Â the direction of the sigma axis and look at the

Â shadow on the wall, this is the outline of that shadow.

Â And that's called the profile likelihood.

Â And there's many theoretical properties of the profile likelihood.

Â But, most importantly, you can kind of treat them as

Â if they were a standard univariate likelihood.

Â So, you would treat this just like a regular likelihood for mu, the

Â higher values are better supported, the peak

Â is where the maximum likelihood estimate occurs.

Â And you could draw horizontal lines to get likelihood-based intervals for mu.

Â 6:39

Well, that's the end of today's lecture.

Â We gave you many ways to create confidence intervals.

Â We gave you methods for creating T confidence intervals.

Â We gave you a method for creating a confidence interval for

Â a variance, maybe not the most useful one, but we did it.

Â We also showed you lots of really neat kind of well-known amongst statistics

Â circles but not generally well-known techniques for

Â generating likelihoods when you have Gaussian data.

Â And so all these techniques you could use in practice.

Â If you have data, and you're willing to assume that it's Gaussian,

Â all of the techniques would apply.

Â The specific technique of the T confidence interval is a very robust interval,

Â as long as your data looks roughly

Â mount-shaped then you, you're going to be okay.

Â The last thing I always mention is a question I

Â always get about the T confidence interval is basically, the T

Â confidence interval and the standard normal confidence interval look the same,

Â except with the T quantile replaced by a standard normal quantile.

Â And people always ask me at what point do I switch, what sample size

Â do I switch between its T confidence

Â interval and a standard normal confidence interval.

Â But the point is, is that the T

Â confidence interval limits to the standard normal confidence interval.

Â So my answer to that is just always do a T confidence interval.

Â Just never do a standard normal confidence interval.

Â And then you don't even have to worry about it because if your

Â sample size is big enough that T quantile looks like a normal quantile anyway.

Â So hopefully that answers that question.

Â And I look forward to seeing you next time where we'll expand

Â on confidence intervals for more general settings where we have multiple groups.

Â [MUSIC]

Â