Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

From the course by Johns Hopkins University

Mathematical Biostatistics Boot Camp 2

34 ratings

Johns Hopkins University

34 ratings

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

From the lesson

Techniques

This module is a bit of a hodge podge of important techniques. It includes methods for discrete matched pairs data as well as some classical non-parametric methods.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

So there's a, a great example of this Imagine just going back

Â to our death penalty example, and we here we have two, two by two tables.

Â We have the defendant's race and whether or not they got the death penalty.

Â And then we stratified that by a third variable, victim's race.

Â So let me let nijk be the ijk

Â entry of table k.

Â So in this case, k equals 1 to 2, k equal 1

Â was, be the first victim's race, k equal 2, k equal 2

Â is be the second root victim's race, and the inj would index

Â defendant's race and whether or not the person got the death penalty.

Â So in our first example, we had two 2 by tables stacked right on top of the others.

Â k equal 1 indexes the first one, k equal 2 indexes the second one,

Â and then nij indexed the individual elements of that 2 by 2 table.

Â So then the k'th odds ratio. The case sample odds ratio.

Â So remember the odds ratio was the cross product

Â ratios, the, the main diagonal divided by the off diagonal.

Â So N11 times N22, divided by N12 times N21.

Â So in this case, everything indexed by K,

Â referencing the K table.

Â So the kth sample odds ratio is sigma hat sub k.

Â Then the Mantel Haenszel estimator is exactly

Â our weighted average of these straightest specific estimates.

Â It's summation weight rk times theta hat sub k divided by the sum of the rks.

Â And by the way, you know, when we had two, when we had two, it was r1

Â times x1 plus r2 times x2 divided by r1 plus r2.

Â But if we had three, it would be, r1 times x1 plus r2

Â times x2 plus r3 times x3 divided by the sum of the r's.

Â If we had four, same thing.

Â Anyway, this is the Mantel Haenszel estimate, this

Â is the sum of the weights times the

Â straightest specific odds ratio, the a hat sub k divided by the sum of the weights.

Â So we just get a weighted average of simplicial

Â convex combination of the straightest specific odds ratios.

Â And then what are the weights?

Â Well, okay.

Â So the weights in this case are this little formula.

Â Right here, and I'll describe where they come from.

Â But the the the motivation for the

Â weights is that they're inverse variances.

Â That's 100% the motivation for the weights,

Â they're inverse variances from a hyper-geometric distribution.

Â So you can just think of we're exactly doing the same thing

Â we did with the scales, only now in terms of the odds ratio.

Â At any rate, this simplifies this so-called Mantel Haenszel estimator.

Â Here's the formula right here.

Â I would suggest that you look in Agresti's book, page 235.

Â In the version I was looking at, or

Â you can look it up, the version's probably changed,

Â Rosner's book which is very comprehensive, page 656.

Â They give the standard error, it's a long formula, I'm

Â sure you can find it on the internet as well.

Â This is the so-called Mantel Haenszel estimator, named

Â after the two great epidemiologists Mantel and Haenszel.

Â Okay, so here's an example.

Â and here's a great example of, of this Mantel Haenszel estimator.

Â So here we had an active drug, T, and C being a placebo, control, I guess, and

Â then here we had success versus failure, and I'm

Â going to abstract what the specifics were of the experiment.

Â And then what people were concerned about was whatever policies and practices

Â existed at the various centers at which the data were collected.

Â So they stratified by center.

Â One two three four five six seven eight centers.

Â So they got eight odds ratios. Right?

Â And they were worried that the the center was a confound, there, there's

Â several reasons, actually, you might want to do

Â the mental hands estimator in this case.

Â but let's talk about it in terms of confounding first.

Â So imagine if you thought that the center. Was specifically associated

Â with the treatment application some centers tended

Â to apply the treatment more than others.

Â and that the, the center was associated with the zess of the treatment because of

Â different policies associated with the center and how

Â the treatment was delivered or something like that.

Â Then you wouldn't want to adjust for the center as a potential confounder.

Â And then here you would, you know, one and

Â we're, we're going to adjust for this confounder by

Â statifying my center, getting an odds ratio

Â specific to every center, and then averaging

Â over the odds ratio, but factoring in the inverse variance of the odds ratio.

Â Some centers have more patience, right?

Â This one had 73, this one had 14, so we want to weight the one with

Â say, 73 more than the one with 14,

Â because they have a better, more precise odds ratio.

Â And so that's what the Mantel Haenszel estimate does.

Â And so you get this, you get an odds ratio of 2.13.

Â the log odds ratio is 0.758, and the standard error, cause

Â of the standard error calculate It uses a delta method type argument.

Â Standard error of the log odds ratio works out to be 0.303.

Â You would take 0.758, add and subtract say two standard errors, and

Â exponentiate the end points to get the confidence interval for the odds ratio.

Â let me talk a little bit about.

Â Without the anoth-, another rational for why we might want to

Â to, do some sort of stratified estimate.

Â It's also often the case in this case we think

Â that, that, that center is at some level, this random effect.

Â Type.

Â modify modifier of the treatment of efficacy.

Â And we're willing to think of these centers as

Â sort of a random draw from the population of centers.

Â Then in that case, it still actually makes a lot of sense to combine them.

Â Because we're not so much interested in center specific effects.

Â We don't care if the treatment works at center number 1.

Â What we care is, if it works overall.

Â And it turns out, you know, even if the

Â center isn't just a confounder, but it really modifies the

Â effect of the treatment, then you know, some places

Â are just better at doing the treatment than the others.

Â The really, kind of what you, in, in, but if you're willing to

Â make that the center, make the assumption that the centers are the sort

Â of random draw from a population of centers, then this CMH

Â estimate makes a lot of sense saying okay, across, averaging across centers.

Â Here's the effect of the, of the treatment.

Â And that's another instance where you, where

Â you would consider doing something like CMH.

Â Or I'm sorry, a Mantel Haenszel log a common odds ratio estimate.

Â So this is what is often

Â called as the common odds ratio, the common odds ratio across centers.

Â So

Â then there's, there's a famous test for testing whether or not

Â the odds ratios are, the common odds ratios are equal to 1.

Â So the test is usually stated as the null hypothesis.

Â That all of the straight is specific odds ratios are equal

Â and they happen to be equal to 1, versus the alternative.

Â And there's some amount of dispute over the

Â alternative, but I'm going to teach the alternative this way.

Â The alternative is that they're all equal, right, but they're not equal to one.

Â Okay?

Â So notice this is different than the

Â alternative that they're not necessarily equal, even.

Â so here we're assuming that we have a common odds ratio, under the null

Â and alternative, and we just want to test

Â whether that common, common across straight odds ratio

Â is 1 versus nought.

Â And it, it turns out that CMH tests applies to other

Â alternatives but it's more powerful for the particular alternative given above.

Â and, and it, I'll, I'll also mention that this test is exactly the same

Â as testing for conditional independence of response,

Â and exposure given the strata find variable.

Â and so, this Cochran Mantel Haenszel test it, the way it's executed

Â is, the conditions on the rows and columns for each of the contingency tables,

Â exactly like Fisher's exact.

Â Test resulting in hyper geometric decay, hyper geometric distributions,

Â and then leaving only the upper left-hand cell of each

Â table free, just exactly like we did in Fischer's exact

Â test, only this time doing it for each specific cell.

Â So let me, I'll go through the mechanics now.

Â Okay.

Â So under this conditioning, and under the null hypothesis under both

Â those circumstances, then the expected value of the upper left-hand cell of each

Â table is this value, the variance of the upper left-hand cell is this value.

Â So the Cochran Mantel Haenszel test statistic works out

Â to be this guy right here, kind of, the sum

Â of the deviation of the upper left-hand cells, from their

Â expected values, but summed up and then squared, unlike the

Â chi square test, where they're squared and then summed up.

Â So, summed up and then squared.

Â And then, regardless of how many tables you have,

Â under the null hypothesis this is a chi squared one.

Â so, remember this is a different test than the

Â chi squared test we talked about in the previous lecture.

Â So that's why it's a different test statistic.

Â And the idea of testing conditional

Â independence, or testing for this confounder relationship is a different

Â idea, and that's why you get a, a different test statistic.

Â I think it's a little bit beyond the scope

Â of this class to derive the CMH test statistic.

Â But the idea is here you want to test for whether or

Â not the odds ratio is one, given that it's common across Australia.

Â versus the odds ratio given that it's common across Australia is not one.

Â So here I'm going to, to implement it, its, its a bit, a bit of a pain to

Â actually execute the CMH test by hand, so here I'm putting the data into an array.

Â in this case eight two by two tables and then you

Â can do this mantelhaen test correct equals false I put again

Â correct equals false here, so that if you do the calculations

Â by hand you'll get an agreeing result to the R output.

Â You generally want to leave this correct equals true.

Â The result is the test statistic is 6.38.

Â Compare that to a chi square 1 value or, you know,

Â again, the only reject for larger values is the test statistic.

Â Again, it being a two-sided test. because it's a squared statistic and

Â so, so in this case the p value is 0.012 so the task presents evidence to suggest

Â that a treatment and response are not conditionally independent given center.

Â some final notes on this testing is

Â You know, it's possible to perform an analogous

Â test using a kind of what I consider to be a little more modern of approach.

Â Using a random effects logit model.

Â And, and the reason I like, but again we're not you know, we haven't covered

Â aggressions or we can't cover mixed models and

Â then we can't cover generalize the mixed models.

Â And all the machinery that you would need to cover this.

Â This it's

Â possible anyway cause this I, I think you know, you should take the time if

Â you're going to work as a statistician or use a lot of statistics in your life.

Â You should take the time to build your way up to

Â where you're studying mixed effect models and generalize your mixed effect models.

Â And there you can do exactly things like the CMH test is doing.

Â Only you can do it in a very general way that

Â allows for other variables to be specified in the model, and so on.

Â The, the reason for presenting it this way in this class

Â is to just give you a sense about the idea of confounding.

Â To give you a sense of what, what you can do

Â just in particular, in the case of 2 by 2 tables.

Â And then later on, I'm hoping that you'll take some more.

Â Statistics classes and learn about mixed models

Â in general, and so, linear mixed models.

Â it's also, you know, so here we assumed all the odds ratios are

Â equal versus the alternative that they were all equal, but not equal to 1.

Â It's also possible to test whether or not all the odds ratios are equal.

Â There's a test for that, and it's called Wolf's

Â test, and you know, that's a very good test.

Â I don't have time to cover it.

Â The final thing I would mention also is that, you know, we have these K

Â hypergeometric distributions that we used in the CMH test statistic.

Â So you could probably guess that you can exactly

Â do some sort of exact test in this case.

Â And you can, you know, just, in R, you can just

Â do exact equals TRUE as an argument in the Mantelâ€“Haenszel test.

Â But you can probably envision how, how,

Â how it's done, Imagine this within each center

Â you were to permute the do this permutation

Â process that we talked about for Fisher's exact test.

Â Imagine if you were to do that now within each

Â strata and recreate the chi squared statistic each time and

Â do that over and over. Over and over again.

Â In that simulation would yield an exact p-value.

Â So I think you could actually probably come pretty close to doing exactly what

Â the exact quals TRUE argument is doing in R by that permutation process.

Â Of course, they, in, in this case, they can do the calculations exactly without.

Â Monte Carlo.

Â So it's faster, but conceptually, that's exactly what they're doing.

Â Okay, well that's the end of the lecture.

Â And so this was a teaser on the idea of confounding.

Â And confounding is, I mean the one of, probably the biggest obstacle to

Â generating knowledge from observational data, data where you don't have

Â a heavy amount of control over the design of the experiment.

Â which is most data, right.

Â The easy data to collect is observational data.

Â so this gives you a teaser.

Â In how that data is collected, in, or I mean, how that data

Â is analyzed, and there's a tremendous art to analyzing that data that I

Â hope as you learn more statistics, you'll get more refined at that art.

Â And just like any other art, you can spend

Â a lifetime perfecting your craft, and you'll never really.

Â Hit a limit.

Â it's a very hard topic.

Â and then again, I highly recommend considering

Â learning some, something about causal inference, which is

Â the, the most modern attempt at, at

Â addressing this problem in a mathematically formal way.

Â Coursera provides universal access to the worldâ€™s best education, partnering with top universities and organizations to offer courses online.