Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

From the course by Johns Hopkins University

Mathematical Biostatistics Boot Camp 2

41 ratings

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

From the lesson

Techniques

This module is a bit of a hodge podge of important techniques. It includes methods for discrete matched pairs data as well as some classical non-parametric methods.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

Hi, my name is Brian Caffo, and this is, Mathematical

Â Biostatistics Boot Camp 2, lecture 10, on case control data.

Â In this lecture, we're going to briefly talk about case control methods.

Â We'll talk about an instance where using retrospective case control data.

Â And a so called rare disease assumption, we can estimate prospective odds ratios.

Â And then because this is kind of a lot focused on the odds ratio,

Â I thought I'd talk a little bit about exact inference for the odds ratio.

Â Okay, so let's talk about retrospective, you know kind of case reference sampling.

Â And again this is a deep subject, we're going to scratch the surface of it.

Â So in this case, imagine if we wanted to study,

Â study lung cancer and here we had some cases and controls.

Â And we ascertained whether or not they were a smoker.

Â Now there's two ways we could collect it,

Â well there's.

Â Conceptually, two ways we could collect this data.

Â One is, we could follow a bunch of people over time some of them would smoke

Â and some of them wouldn't, and then we could see who obtained lung cancer.

Â That, that's very hard.

Â Right.

Â I think conceptually, you can all

Â see that, that experiment is basically impossible.

Â a much

Â easier experiment would be to go to hospital records, and find

Â a bunch of people that were cases, that had lung cancer.

Â In this case, we found 709 of them.

Â And then we also found 709 controls that were at some level comparable.

Â And then we retrospectively determined whether or not they were smokers.

Â Now, in this case, 709 is fixed, right, and it's

Â whether or not they were a smoker that kind of has the ability to vary.

Â Now, I should also say that the most common

Â way to do case control methods would be, for every

Â case, to try and very closely match a control,

Â so that for every case, there's a specific matched control.

Â But in this case, we're not doing that.

Â Let's say we had a group of cases, a group of

Â case hospital records and a group of control hospital records, and we,

Â or group of control patients, and we figured out

Â You know, a reasonable strategy for getting control patients.

Â And now these, these 709 is fixed, so what we wanted

Â to ascertain is who is a smoker and, and not, and

Â whether or not the cases had a great proportion of smokers,

Â and to kind of make prospective conclusions from this retrospective data.

Â So just you know, in terms of probability.

Â Right.

Â We we cannot estimate the probability of being a

Â case given that you're a smoker directly from the data.

Â but we can estimate the probability of being a smoker given that you were case.

Â Right, and so the co-, so we want to work on that.

Â You know?

Â Kind of probable probability rubric.

Â What is interesting, is we can estimate an odds ratio.

Â so the odds ratio that we would, want to estimate is

Â the odds of being a case, given that you're a smoker.

Â Relative to the odds of becoming a case relative

Â to being a non, given that you're a non smoker.

Â Okay.

Â So we want the odds of, of developing lung cancer given that you smoked compared

Â to the odds of developing lung cancer given that you didn't smoke.

Â Well, it turns out that that odds ratio is exactly

Â equal to the odds of being odds of being a

Â smoker given, you're a case, relative to the odds of

Â being a smoker, given that, given that you are control.

Â So, in, in, in the bottom one we can estimate, the top one we cannot.

Â So here I just directly go

Â through the calculations.

Â The odds of being a case given that you're a smoker, divided by the

Â odds of being a case given that you're a non-smoker, the odds ratio interest.

Â Right.

Â And let me just replace case and not with C and S.

Â And case and non case with C and C bar, and smoker and nonsmoker with S and S bar.

Â And here I just churn through the calculations.

Â You can go through these three steps

Â to make sure that you agree

Â so here I carry through the calculation.

Â And look, this works out to be the probability of being a

Â case and a smoker, times the probability of non-case and a non-smoker,

Â divided by the probability of Being a case and a non-smoker divided

Â by the probability of being not a case and not a smoker.

Â So it's sort of like the probability cross

Â product ratio, the probability of caseness and smokerness times

Â the probability of being not case and

Â non-smoker divided by the kind of off-diagonal probabilities.

Â Now, and I say this actually proves the

Â result, and I think it does, because honestly.

Â You know, you can just see that if you were to exchange the words case and smoker

Â at top up at the top that nothing changes when we get down to the bottom line here.

Â Right.

Â Because probability of C, S is the same as the probability S, C.

Â so I think you can, you can tell to me, that the, that this.

Â this is exact legal or if you want to, if you want to be very particular, you

Â can, you can then keep working and get to the odds, the other odds ratio.

Â but to me this, this proves the result from the previous page.

Â And, you know, it also reminds you

Â that this is, these are the probability statements.

Â But we estimate those probability and

Â odds ratios from data, and of course the sample odds ratio is the

Â cross product ratio n1, n22, divided by n12 and n21.

Â And the odds ratio is invariant to transposing the rows and the columns.

Â So it, you know, our estimator has this kind of invariance property.

Â which we would hope, right.

Â It would be weird, if we said that the two odds ratios' probabilities

Â were equal, but oh, the sample estimates were

Â not equal depending on which, which, which which

Â one you were treating as the outcome and

Â which one you were treating as the predictor.

Â So that's nice.

Â By the way, the sample odds ratio is unchanged if

Â a row or a column is multiplied by a constant.

Â and then the last thing, and this is what we'll talk about.

Â The odds ratio, turns out to be related to the relative risk.

Â So you know the thing is if you want odds ratios,

Â we just kind of demonstrated, that the odds ratio works out really well.

Â And you can kind of reverse conditioning a

Â little bit when talking about the odds ratio.

Â But we'll talk about specifically the relative risk which is what people

Â often want to estimate, and how it relates to the odds ratio.

Â Okay, so the odds ratio is here, right?

Â The probability of a smoker given that your a case divided by the probability

Â of non-smoker given that your a case. And so on, you can read this top line.

Â Okay then we

Â can reverse the odd ratio, right?

Â using the argument from the other page, right?

Â So now, we have the probability of a

Â case given smoker divided by probability of non-case

Â given smoker, divided by probability of case given

Â non-smoker, divided by probability of non-case given non-smoker.

Â Okay.

Â then in the, in the next line, just everything

Â is multiplied out.

Â Denominators are raised up to numerators, and so on.

Â And then, look at this

Â first term here.

Â Probability of case given smoker, divided by probability of case given non-smoker.

Â that's the relative risk. Right.

Â That's, if you wanted who develops lung

Â cancer comparing who's smoked to who didn't smoke.

Â That's the relative risk.

Â The ratio of the two probabilities.

Â And then that's multiplied by times these things, but I you know,

Â I wanted to, to refer them with a respect to case status.

Â So I just 1 minus

Â [INAUDIBLE]

Â to the probabilities. And what you can see is if this ratio

Â that we're multiplying the relative risk times if, if, if, if its

Â about 1, then odds ratio is approximating the relative risk.

Â so, and you know, often is the case if the, the,

Â these two numbers, 1 minus this number, and 1 minus that number.

Â that they're, they're similar enough if in fact the

Â case is very rare, in, in other words, regardless of

Â whether or not you smoke, the probability that you'd get

Â this disease, let's say lung cancer, is, is quite small.

Â if that's the case, this so-called rare disease assumption, if

Â that's true, then this ratio will be about 1, and then

Â the odds ratio will approximate the relative risk, and that's what

Â people often talk about the rare disease assumption, and they use.

Â The retrospectively collected data, along with the

Â odds ratio, to then approximate the relative risk.

Â It's so common often people don't even really talk about what they're doing.

Â They just do it.

Â I think that's so common in the

Â epi literature, it's, it's generally not described

Â in a, in a, say, American Journal

Â of Epidemiology article or something like that.

Â So now, just make the small point that the disease has to

Â be rare among the exposed and the non-exposed, not just rare overall.

Â So here's a simple example.

Â Chuck Rodi reminded me of this at one point.

Â So here we have the exposure, yes or no.

Â Disease yes or no. We have 911999 so just from the data.

Â And let's just assume that this is just cross

Â sectional data.

Â So all the margins or everything are estimable.

Â So the probability of disease, the estimated profitability of disease

Â is about 1%, the odds ratio works out to be almost 9000.

Â the relative risk works out to be about 900,

Â so clearly the odds ratio is not estimating the relative

Â risks, and in this case, like I said, because of

Â the sampling I'm assuming the two are,

Â are estimate, directly estimable from the data.

Â So in this case what happens is disease is, is rare among the among the exposed.

Â I'm sorry, D is rare overall.

Â Right. let's see, what is it, 10 out of 1010.

Â but these not-rare among the, among the exposed.

Â Right.

Â So among the exposed, you actually had 9 times

Â the number of people having the disease rather than not.

Â So any rate, I, I think, you know, this, this is a.

Â If you look at the equation right, it, it's clear,

Â you know, that, that both the P of C given as far and P of C given

Â as both have to be small in order for the rare diseases assumption apply.

Â And

Â that's the real criteria.

Â I think this is just a numerical, this is a numerical illustration in a, in

Â a hypothetical circumstances where we can estimate all the probabilities as well.

Â And we can show that the two aren't approximately equal to each other.

Â So let's just recap about the odds ratio.

Â So an odds ratio of 1 implies no association.

Â odds ratio greater than 1 is a positive association.

Â Odds ratio less than 1 is a negative.

Â Association the for retrospective case control studies.

Â Odds ratios can be introspectively for diseases

Â that are rare among the cases in controls

Â the odds ratio approximates the relative risk.

Â and the delta method's standard air for the odds ratio

Â is the square root of 1 over the cell counts.

Â added up.

Â oh and, and just to remind you, that's the standard error for

Â the log odds ratio, not the standard error for the odds ratio.

Â So let's just go through our example.

Â Here is, we have our lung cancer cases, and control, smokers yes or no.

Â We get our odds ratio works out to be 3.

Â The inner standard error for the L log odds ratio works to be 0.26.

Â If we want a confidence interval, it's log of 3

Â plus or minus 2 standard errors, we get 0.59 to 1.61.

Â We would compare this interval

Â to whether or not 0 is in that interval. If we exponentiate it.

Â Then we would compare whether or not 1 is in the interval.

Â In this case if we exponentiate it we get 1.8 to 5.0 so

Â 1 is not in the interval it you know, in our estimated odds of lung cancer for

Â smokers is 3 times that the odds for non-smokers.

Â Coursera provides universal access to the worldâ€™s best education,
partnering with top universities and organizations to offer courses online.