Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

From the course by Johns Hopkins University

Mathematical Biostatistics Boot Camp 2

45 ratings

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

From the lesson

Two Binomials

In this module we'll be covering some methods for looking at two binomials. This includes the odds ratio, relative risk and risk difference. We'll discussing mostly confidence intervals in this module and will develop the delta method, the tool used to create these confidence intervals. After you've watched the videos and tried the homework, take a crack at the quiz!

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

Okay, so last time we considered absolute changes in proportions.

But what about relative changes?

And as I said, relative changes are often of interest.

for example, when the proportions are small, you might have in,

in the example we were discussing earlier, a ten fold change.

In the exposed group contracting the disease

over the control group, the non exposed group.

but the absolute difference be quite, very small, less than a fraction of 1%.

And so in that case it's very natural

to consider relative amounts rather than absolute amounts.

The most obvious thing to do in the case of proportions is to define

the relative risk which is simply just one of the proportions divided by the other.

And then of course it doesn't matter which one you put in the numerator and

which one in, you put in the denominator as long as you remember what you did.

So and, and just to remind you, so

we have p1 and p2 population parameters defined here.

Because remember on the previous slide we actually

talked about a motto where our data IED

draws from a population.

There's other ways to think about this problem

but that's how we're going to think about it.

Okay, so the obvious estimator of the relative risk is the

estimated relative risk which is p1 hat divided by p2 hat.

Which of course is just x over n1 divided by y over n2.

and so the relative risk

is obviously there's an issue if, if y has zero counts right?

You're dividing by zero.

So that's potentially a problem.

But also if we want to create an asymptotic confidence interval

where we treat our data, our estimator as if it's normal.

There's a problem that this is of course bounded from below by zero.

And it turns out that the log relative

risk, the log estimated relative risk has much better

asymptotic behavior and the confidence interval looks a little bit better.

So what people do is they, they say okay and often when you're

working with ratios we know already that its very convenient to look at logs.

So the log relative risk is just the log of this quantity here.

X over n1, Y over x2.

And that quantity has a asymptotic standard

error that you can calculate and in fact,

in, in one of the next lectures.

One of the next two lectures or so, we

show you exactly how to get this standard error.

that standard error works our to be 1-p1 over p1n1,

plus 1-p2 over p2n2 all raised to the one half power.

And that's the standard error for the log relative risk.

So what you do to create a confidence interval for

the relative risk is you log the estimated relative risk.

You add and subtracted standard normal quantile, and then you multiply that

standard normal quantile times this standard error of the log relative risk.

That will give you a confidence interval for the Log relative risk.

And if you then want a confidence interval for the relative risk you then

exponentiate the end points. Okay, so we could equivalently

take the ratio of odds, instead of the ratio of probabilities, if you prefer.

There's a variety of reasons to do that, but one is maybe, what if

you just prefer to interpret things in, in terms of odds rather than probabilities.

Anyway, some people like to think about things in terms of odds.

saying the odds of say, 10 to 1 is

more interpretable than, than saying a probability of 10 elevenths.

So the, the odds ratio, the population odds ratio would just be

p1 over 1 minus p1, divided by p2 over 1 minus p2.

The odds associated with probability 1 divided

by the odds associated with probability 2.

So this would be the ratio of the odds

of contracting side effects comparing drug A to drug B.

and of course, just like the relative risk, you'll be comparing the odds

ratio to 1.

If it's bigger than one, it's going to suggest a greater propensity of

of side effects for drug A.

And if it's less than 1, it's going to suggest

a smaller propensity for side effects of drug A.

and then, of course, the obvious estimate of the odds

ratio is p1 hat divided by 1 minus p1 hat.

P2 hat divided by 1 minus p2 hat. And you can plug in you can plug in the

x and n1 and y and n2 into that formula. And what you'll find

is that it works out to be if we go to our notation where we talk about the cell

counts. The n11 the n22 times n21 times n2, n12.

And the easy way to remember that is its so called cross product ratio.

So the product along the Let's see.

When I'm recording, it's not, the product along that diagonal.

divided by the product along this other diagonal.

And and so that way you can you can always remember how to do it.

Some, some people tend to if, if, if you label the elements of the two by two

table a, b, c, d some people tend

to remember it in terms of the specific letters.

But I think its probably better to remember it in terms of its the.

The product along one diagonal divided by the

product along the other diagonal, the off diagonal.

So just like for the relative risk, the odds ratio, the

estimated odds ratio has better asyntotic Gaussian behavior if you log it.

so, so we can calculate a standard error for the log

odds ratio that works out to be an incredibly convenient formula.

It's the square root of 1 over all the cell counts added up.

So 1 over

n 1 1, plus 1 over n 1 2, plus 1 over n

2 1, plus 1 over n 2 2, square root of the whole thing.

So you shouldn't forget the standard error of

the log odds ratio, because it's super easy.

So again, to create a confidence interval for

the population odds ratio, you calculate the odds ratio.

Show.

Log it.

Calculate the standard error of the log odds ratio.

Add or subtract a standard normal quantile say 1.96 if you wanted

a 95% confidence interval to your log odds ratio.

And that will give you an interval estimate for the log odds ratio.

Exponentiate the end points and that gives

you an interval estimate for the odds ratio.

Now, again we have a little bit of a problem if n 1 2 or n 2 1 is 0.

and there's some fixes for that often

involving adding small amounts to every cell is

a, is a, is a common fix.

Okay so some comments about the odds ratio.

So one interesting fact is if you, notice if you

transpose so that the little two by two tables a matrix.

If you transpose the matrix so simply recorded in this case

we had side effects versus non-labeled on the columns.

I believe in drug A versus drug B labelled along the rows.

But what if we correctly entered in the data with side effects entered

along the rows and drug A versus drug B entered along the columns.

And then calculate an odds ratio. Will you get identically the same number.

So the odds ratio doesn't care what is the sort of predictor and

what is the response. You get the same exact answer.

and for,for both the odds ratio and the relative risk,

taking logs help with the adherence to the error rate.

So again because its asypmtotics, you don't

necessarily, if you want a 95% confidence interval.

You don't exactly get a 95% confidence interval because its all based

on the central limit theorem and it's only true in the limit.

so the taking logs helps adherence to that error rate.

so the taking logs helps

adherence to that error rate.

and then the interval is just at log estimate, right, log

odds ratio log relative risk plus or minus the appropriate normal quantile.

Then times the standard error of the estimate.

And then exponentiating the odds interval on the natural scale.

a, a, and I would also add, though, the logging helps.

These, these intervals still do have some, some issues.

They're better than the the risk

difference which I,I,I,I think they are better than the risk difference.

but still, there's ways you can, you can improve on these confidence intervals.

But they involve a little bit more lengthy calculations, so we won't go over them.

These are the, the quick versions.

Okay, so let's go let's go over our example next.

[BLANK_AUDIO]

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.