Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

From the course by Johns Hopkins University

Mathematical Biostatistics Boot Camp 2

48 ratings

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

From the lesson

Techniques

This module is a bit of a hodge podge of important techniques. It includes methods for discrete matched pairs data as well as some classical non-parametric methods.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

Okay, so let's just discuss maximum likelihood a little bit more.

The value of theta where the curve reaches its maximum is the so-called

maximum likelihood element estimator.

So, it is the value of the parameter that is most well supported by the data

given the likelihood.

So, it's called the maximum likelihood estimator or MLE.

So we could just define the MLE as the argument maximum of the likelihood

over theta.

And it has this nice interpretation that the MLE is the value of the parameter

that makes the data that we observed most probable, right.

So the likelihood is kind of thinking of the joint probability

of the data as a function of the parameter.

So it's sort of like tuning that parameter to where it makes the probability of

the data that we observed most probable, which seems to make sense because

we did observe the data that we observed, so it must be somewhat probable.

So here's some results.

If we have some normal data, iid normal data, the MLE of mu is X bar, and

the MLE of sigma squared is the bias sample variance.

We divide by N instead of N minus 1.

If X1 to Xn are Bernoulli, then the MLE of p is X bar, the sample proportion of 1s.

If Xi are binomial nipi, then the MLE of

p is the total proportion of 1s, okay?

If the Xs are Poisson lambda t, if an X is Poisson lamba t,

then the MLE of lambda is X/t, the rate.

If you have a bunch of iid Poisson random variables then the MLE of

lambda is the total number of events divided by the total rate, okay?

So let's go through this example right here,

where you saw 5 failure events for

94 days of monitoring a nuclear pump.

Assuming a Poisson model, plot the likelihood.

And by the way, right, we know what the MLE is right.

The MLE is 5 over 94.

Okay, let's see, so my lambda is

the parameter I'm interested in.

So I'm just going to create a grid of lambda values to create my function.

My likelihood is just the Poisson density,

now viewed as a function of all these lambda parameters.

But remembering that in each case I sampled for

94 days with the data fixed at 5.

And then the MLE for lambda,

lambda hat is equal to 5/94,

so 94 time lambda hat is just 5.

So if I plug in 5 for lambda, I will get the MLE, okay?

I will get the likelihood at the MLE.

So if I plot my lambda by my likelihood, and then I don't like the frame around it.

Line width is 3, type = "l" connects it in a line.

So if you type expression,

it'll actually put the symbol lambda rather than the word lambda.

So you can be fancy like that if you'd like.

And then, here's this red line, is just showing where the MLE is And

then of course the likelihood achieves 1 at that point.

And then I draw these two kind of reference lines,

I'll talk about them in a second.

Okay, so this is the likelihood.

So this is a plot of the estimated rate by the evidential support for

that estimated rate.

So if we wanted to compare this point and this point,

it would be that ratio of those two guys, okay?

Now, I like to draw these reference lines for the following reasons.

So this reference line is at one-eighth, and

this reference line is at one-sixteenth, okay?

And the reason I like to draw these reference lines is as follows.

This point, so I know that this line is one-eighth right here.

So, take this value of lambda right there, and

compare it to the MLE value of lambda, right.

This ratio, because the top is 1, and the bottom is one-eighth,

that ratio is 8 if you put the MLE in the numerator, and

one-eighth if you put this value in the numerator.

The same thing would go for this value, right here at that corner.

So, basically, the MLE is eight times better supported than that point.

But then if I take any other point in between the two, right,

its height, because its height is a little bit less than the MLE,

we know that it's going to be slightly less well-supported, and so

that it'll be slightly less than eight times better, supported.

Okay, but, what's interesting right,

if you were to take, now remember

this is one-eighth right here.

Take this point right here, right?

Okay, that value of lambda, right,

its value is less than one-eighth.

So when we were compare it to the MLE,

it would be worse than eight times better support.

The MLE would be better than eight times supported than that point.

Okay, so any other point in this range, right, you cannot find

a point that's more than eight times better supported, right?

Every point outside of that range you can

find a point that is more than eight times better supporte.

So these collection of points right here, these collection of lambda values

right here are exactly the points such that there is no other point that

is more than eight times better supported if we draw that line at one-eighth.

Of course this is all predicated that at having normalized the likelihood

with the MLE at 1.

And this line right here, all of the points of lambda that fall within this

interval, right, because it's one-sixteenth, are the points

such that there is no point that is more than 16 times better supported.

And that's kind of useful, right, because, and

you can do these experiments, so where do I get 8 and 16?

So 8 is 2 cubed and 16 is 2 to the 4th.

I get those because if you do these coin flipping experiments where they'll

comparing the likelihood of whether a coin is two-headed versus the likelihood

of the coin being fair, you find that kind of intuitively, people start switching.

If they get three consecutive heads, they start saying, oh,

that coin appears to be unfair, to the tune of being two-headed.

And in about four consecutive coin flips,

they're quite certain that the coin is unfair.

So, I consider these kind of like moderate and

sort of strong evidence in favor of the points within these lines.

So just to reiterate, if you take this interval its kind of like

these are basically the likelihood equivalent of a confidence interval.

And so if you take all of these points, there is no such point

that is more than eight times better supported given the data in the model.

If you take a point outside of it, say this point, we know at least one point,

namely the MLE, that's more than eight times better supported.

So those are the so-called likelihood intervals.

And you might say well you might say, well wait,

the one-eighth is kind of arbitrary, but it's no more or

less arbitrary than say constructing a 95% confidence interval.

What you really want to give people is the full plot,

because it conveys all of the relevant information.

The only problem with likelihood is if you have

multiple parameters like in a regression setting, then you have to figure out

how to do something that displays just the likelihood.

But I wanted everyone to at least hear about the likelihood.

The remainder of the class, we'll be focusing more on frequency style inference

that does not show likelihood based plots.

But I wanted people to be aware of this style of inference.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.