Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

From the course by Johns Hopkins University

Mathematical Biostatistics Boot Camp 2

34 ratings

Johns Hopkins University

34 ratings

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

From the lesson

Techniques

This module is a bit of a hodge podge of important techniques. It includes methods for discrete matched pairs data as well as some classical non-parametric methods.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

So there's a, a great example of this Imagine just going back to our death penalty example, and we here we have two, two by two tables. We have the defendant's race and whether or not they got the death penalty. And then we stratified that by a third variable, victim's race. So let me let nijk be the ijk entry of table k. So in this case, k equals 1 to 2, k equal 1 was, be the first victim's race, k equal 2, k equal 2 is be the second root victim's race, and the inj would index defendant's race and whether or not the person got the death penalty. So in our first example, we had two 2 by tables stacked right on top of the others. k equal 1 indexes the first one, k equal 2 indexes the second one, and then nij indexed the individual elements of that 2 by 2 table. So then the k'th odds ratio. The case sample odds ratio. So remember the odds ratio was the cross product

ratios, the, the main diagonal divided by the off diagonal. So N11 times N22, divided by N12 times N21. So in this case, everything indexed by K, referencing the K table. So the kth sample odds ratio is sigma hat sub k.

Then the Mantel Haenszel estimator is exactly our weighted average of these straightest specific estimates. It's summation weight rk times theta hat sub k divided by the sum of the rks. And by the way, you know, when we had two, when we had two, it was r1 times x1 plus r2 times x2 divided by r1 plus r2. But if we had three, it would be, r1 times x1 plus r2 times x2 plus r3 times x3 divided by the sum of the r's. If we had four, same thing. Anyway, this is the Mantel Haenszel estimate, this is the sum of the weights times the straightest specific odds ratio, the a hat sub k divided by the sum of the weights. So we just get a weighted average of simplicial

And then what are the weights? Well, okay. So the weights in this case are this little formula. Right here, and I'll describe where they come from. But the the the motivation for the weights is that they're inverse variances. That's 100% the motivation for the weights, they're inverse variances from a hyper-geometric distribution. So you can just think of we're exactly doing the same thing we did with the scales, only now in terms of the odds ratio. At any rate, this simplifies this so-called Mantel Haenszel estimator. Here's the formula right here. I would suggest that you look in Agresti's book, page 235. In the version I was looking at, or you can look it up, the version's probably changed, Rosner's book which is very comprehensive, page 656. They give the standard error, it's a long formula, I'm sure you can find it on the internet as well. This is the so-called Mantel Haenszel estimator, named after the two great epidemiologists Mantel and Haenszel.

and here's a great example of, of this Mantel Haenszel estimator. So here we had an active drug, T, and C being a placebo, control, I guess, and then here we had success versus failure, and I'm going to abstract what the specifics were of the experiment. And then what people were concerned about was whatever policies and practices existed at the various centers at which the data were collected. So they stratified by center. One two three four five six seven eight centers. So they got eight odds ratios. Right? And they were worried that the the center was a confound, there, there's several reasons, actually, you might want to do the mental hands estimator in this case.

but let's talk about it in terms of confounding first. So imagine if you thought that the center. Was specifically associated with the treatment application some centers tended to apply the treatment more than others. and that the, the center was associated with the zess of the treatment because of different policies associated with the center and how the treatment was delivered or something like that. Then you wouldn't want to adjust for the center as a potential confounder. And then here you would, you know, one and we're, we're going to adjust for this confounder by statifying my center, getting an odds ratio specific to every center, and then averaging over the odds ratio, but factoring in the inverse variance of the odds ratio. Some centers have more patience, right? This one had 73, this one had 14, so we want to weight the one with say, 73 more than the one with 14, because they have a better, more precise odds ratio.

And so you get this, you get an odds ratio of 2.13. the log odds ratio is 0.758, and the standard error, cause of the standard error calculate It uses a delta method type argument. Standard error of the log odds ratio works out to be 0.303. You would take 0.758, add and subtract say two standard errors, and exponentiate the end points to get the confidence interval for the odds ratio.

to, do some sort of stratified estimate. It's also often the case in this case we think that, that, that center is at some level, this random effect. Type.

And we're willing to think of these centers as sort of a random draw from the population of centers. Then in that case, it still actually makes a lot of sense to combine them. Because we're not so much interested in center specific effects. We don't care if the treatment works at center number 1. What we care is, if it works overall. And it turns out, you know, even if the center isn't just a confounder, but it really modifies the effect of the treatment, then you know, some places are just better at doing the treatment than the others. The really, kind of what you, in, in, but if you're willing to make that the center, make the assumption that the centers are the sort of random draw from a population of centers, then this CMH estimate makes a lot of sense saying okay, across, averaging across centers. Here's the effect of the, of the treatment. And that's another instance where you, where you would consider doing something like CMH. Or I'm sorry, a Mantel Haenszel log a common odds ratio estimate. So this is what is often called as the common odds ratio, the common odds ratio across centers.

the odds ratios are, the common odds ratios are equal to 1. So the test is usually stated as the null hypothesis. That all of the straight is specific odds ratios are equal and they happen to be equal to 1, versus the alternative. And there's some amount of dispute over the alternative, but I'm going to teach the alternative this way. The alternative is that they're all equal, right, but they're not equal to one. Okay? So notice this is different than the alternative that they're not necessarily equal, even.

so here we're assuming that we have a common odds ratio, under the null and alternative, and we just want to test whether that common, common across straight odds ratio is 1 versus nought.

And it, it turns out that CMH tests applies to other alternatives but it's more powerful for the particular alternative given above.

and, and it, I'll, I'll also mention that this test is exactly the same as testing for conditional independence of response, and exposure given the strata find variable. and so, this Cochran Mantel Haenszel test it, the way it's executed is, the conditions on the rows and columns for each of the contingency tables, exactly like Fisher's exact. Test resulting in hyper geometric decay, hyper geometric distributions, and then leaving only the upper left-hand cell of each table free, just exactly like we did in Fischer's exact test, only this time doing it for each specific cell. So let me, I'll go through the mechanics now.

So under this conditioning, and under the null hypothesis under both those circumstances, then the expected value of the upper left-hand cell of each table is this value, the variance of the upper left-hand cell is this value. So the Cochran Mantel Haenszel test statistic works out to be this guy right here, kind of, the sum of the deviation of the upper left-hand cells, from their expected values, but summed up and then squared, unlike the chi square test, where they're squared and then summed up. So, summed up and then squared. And then, regardless of how many tables you have, under the null hypothesis this is a chi squared one.

so, remember this is a different test than the chi squared test we talked about in the previous lecture. So that's why it's a different test statistic. And the idea of testing conditional independence, or testing for this confounder relationship is a different idea, and that's why you get a, a different test statistic. I think it's a little bit beyond the scope of this class to derive the CMH test statistic. But the idea is here you want to test for whether or not the odds ratio is one, given that it's common across Australia. versus the odds ratio given that it's common across Australia is not one.

in this case eight two by two tables and then you can do this mantelhaen test correct equals false I put again correct equals false here, so that if you do the calculations by hand you'll get an agreeing result to the R output. You generally want to leave this correct equals true. The result is the test statistic is 6.38. Compare that to a chi square 1 value or, you know, again, the only reject for larger values is the test statistic. Again, it being a two-sided test. because it's a squared statistic and so, so in this case the p value is 0.012 so the task presents evidence to suggest that a treatment and response are not conditionally independent given center.

You know, it's possible to perform an analogous test using a kind of what I consider to be a little more modern of approach. Using a random effects logit model. And, and the reason I like, but again we're not you know, we haven't covered aggressions or we can't cover mixed models and then we can't cover generalize the mixed models. And all the machinery that you would need to cover this. This it's possible anyway cause this I, I think you know, you should take the time if you're going to work as a statistician or use a lot of statistics in your life. You should take the time to build your way up to where you're studying mixed effect models and generalize your mixed effect models. And there you can do exactly things like the CMH test is doing. Only you can do it in a very general way that allows for other variables to be specified in the model, and so on. The, the reason for presenting it this way in this class is to just give you a sense about the idea of confounding. To give you a sense of what, what you can do just in particular, in the case of 2 by 2 tables. And then later on, I'm hoping that you'll take some more. Statistics classes and learn about mixed models in general, and so, linear mixed models.

it's also, you know, so here we assumed all the odds ratios are equal versus the alternative that they were all equal, but not equal to 1. It's also possible to test whether or not all the odds ratios are equal. There's a test for that, and it's called Wolf's test, and you know, that's a very good test. I don't have time to cover it. The final thing I would mention also is that, you know, we have these K hypergeometric distributions that we used in the CMH test statistic. So you could probably guess that you can exactly do some sort of exact test in this case. And you can, you know, just, in R, you can just do exact equals TRUE as an argument in the Mantel–Haenszel test. But you can probably envision how, how, how it's done, Imagine this within each center you were to permute the do this permutation process that we talked about for Fisher's exact test. Imagine if you were to do that now within each strata and recreate the chi squared statistic each time and

do that over and over. Over and over again. In that simulation would yield an exact p-value. So I think you could actually probably come pretty close to doing exactly what the exact quals TRUE argument is doing in R by that permutation process. Of course, they, in, in this case, they can do the calculations exactly without. Monte Carlo. So it's faster, but conceptually, that's exactly what they're doing.

Okay, well that's the end of the lecture. And so this was a teaser on the idea of confounding. And confounding is, I mean the one of, probably the biggest obstacle to

generating knowledge from observational data, data where you don't have a heavy amount of control over the design of the experiment.

so this gives you a teaser. In how that data is collected, in, or I mean, how that data is analyzed, and there's a tremendous art to analyzing that data that I hope as you learn more statistics, you'll get more refined at that art. And just like any other art, you can spend a lifetime perfecting your craft, and you'll never really.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.