Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

From the course by Johns Hopkins University

Mathematical Biostatistics Boot Camp 2

34 ratings

Johns Hopkins University

34 ratings

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

From the lesson

Techniques

This module is a bit of a hodge podge of important techniques. It includes methods for discrete matched pairs data as well as some classical non-parametric methods.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

Okay, so let's go through our example. Our density estimate is, I don't have a density. Our, our d estimate of the difference in the marginal proportions works out to be 0.04. I you know, you can plug in to the sigma hat d squared formula here. It works out to be about 0.0 0.95. and for the sigma, not sigma d squared and then the confidence interval you can do right here. It's plus or minus the standard error the difference plus or minus two standard errors we get about 0.06 to 0.02. And notice what happens though, if you ignore the dependence. And just chop off this co-variance term here and forget about it. Then you wind up with a much significantly inflated a standard error. Sigma d works out to be about 0.0175. So there is kind of an interesting, relationship between the Cochran-Mantel-Haenszel test and the match 2 by 2 table. So imagine if you took each pair and represented their time, first and second, and gave their respon-, responses, yes or no.

and so we can really think of this as an extremely stratified setting, right? Where every strata just has the two measurements first, second.

if you, if you do that then there's only four possible tables, time, first, second, response, yes, no. We got a 1 1 0 0, 1 0 0 1, 0 1 1 0, and then 0 0 and a 1 1 like that. Okay? So imagine if you represented the table like represented all the tables like this, and I hope you can agree with me. That this would exactly reproduce the two by two table. If you knew all these tables, you would exactly reproduce the two by two table.

So here's a, kind of a famous old result. That the McNemar's test is equivalent to the Cochran–Mantel–Haenszel test. Where the subject is the stratifying variable, and each 2 by 2 table is the observed 2 by 2 table from the previous slide. So you could almost think it's And again, I put here that this representation is only interesting for conceptual purposes.

but I think you get, it is interesting to note that you can really view the subject, or if you're doing matched pairs. You really think of that as an incredibly stratified circumstance where there's, you know, only two you know, two counts per table. Then, and, and analyze the data that way with the Cochran–Mantel–Haenszel test. You wind up with exactly the same test as McNemar's test. It's just kind of a conceptually neat idea that, you know, is I don't know, kind of a fun little fact. Another fun little fact is that McNemar's test has an exact version.

And so consider the, only the off-diagonal cells, the dis-coordinated cells. And then under the null hypothesis. pi 1 2 over pi 1 2 plus pi 2 1 is 0.5, right? just look back at the null hypoethesis. If the two are equal, then 1 over the sum would be 0.5. Okay. then it turns out, also, under H 0, that n 2 1, given the sum, right? So n 2 1 or n 1 2 given the sum, is binomial with success probability 0.5 and n 1 plus n 2 trials.

so you can exactly use this to come up with an exact p value for, for matched pairs data. basically, what we're doing is saying, under the null hypothesis, whether,

that the, the, the two off-diagonal probabilities are identical. Whether you landed in the upper right hand cell or the lower left hand cell is a coin flipped, for every matched pair. And we would have evidence against the null, if a lot more wind up in one of those two cells. Okay, and so this is an example of, it's, it's actually a highly related to the, and we'll cover this as well, the so-called non-parametric sine test. And what, what you're saying is kind of under the null hypothesis.

you know, things should be exchangeable, whether they agree in terms of approving and disapproving or disapproving and approving.

H 0 that pi 2 1 equals pi 1 2 versus H a, pi 2 1 less than pi 1 2. And I put in parentheses that this is pi 1 plus, less than pi plus 1. pi plus 1 is less than pi 1 plus. So pi plus 1, is the approval at time 2 and pi 1 plus is the approval at time 1 disregarding time 2. Okay. So this is testing whether or not the approval at time 2 is lower than the approval at time 1. Okay, so that's the, the direction that the margin is looking at. So we saw 86 people in, that disapproved on the first sur, survey. And approved on the second survey, the n 2 1 cell. And, we want to test whether or not that's smaller than what, what would we expected by chance. And the probability of getting data as or more extreme in favor of the alternatives, so probably X is less than or equal to 86. And then because we're doing the exact version, we'll condition on the total's sample size, the 86 plus 150. The number of off-diagonal counts and, will use a binomial with a success probability of 0.5. And that probability is about 0. So we reject the null hypothesis. This, for a two sided test, just double the smaller the one sided test. For the purposes of this class you know if you do it in R it'll, it'll maybe do a slightly better procedure. Given that.

the marginal odds ratio would be the odds of approval at the first. com-, I'm sorry, the, the comparison of the odds approval at time 1 relative to the odds of approval at time 2. So here I put time 1 in the numerator of the odds ratio, and time 2 in the denominator of the odds ratio. So what I have are the, the odds of approval at time 1 at the top, versus the odds, divided by the odds of approval at time 2 in the denominator. So that is a margin, it's a marginal odds ratio, because these are all marginal probabilities. Right. And that is of interest in exactly the same way the, the difference in the marginal probabilities is, is of interest.

so but it's a different setting, right? It's a different setting than if we just sampled some people at time 1 and a different set of people at time 2. And we could assume they're independent. These are exactly the same two people sampled twice, so we need to To account for that. At any rate just like the ordinary odds ratio, the way that we conduct the odds ratio confidence interval. Marginal odds ratio confidence interval, was first we, we first calculate directly the marginal log odds ratio. It's given by theta hat here. And then the stand the, the variance of that estimate or hence the, you square root it to get the standard error. Is given by this guy right here, where you put hats over everything and estimate them with the relevant sample proportions.

In order to get the estimated standard error. And so, you can use that to create a confidence interval for the marginal log odds ratio, when you have matched paired data, matched 2 by 2 data.

Okay. So in the approval rating the marginal odds ratio compared to the odds of approval at time 1 to the odds of approval at time 2. The log odds ratio works out to be 0.16. The standard error works out to be 0.039. And then the constant interval for the log odds ratio then will be 0.16 plus two standard errors. It gives you this right here, about 0.084 to 0.236. You want to compare these to 0, because it's all in the log scale. And then exponentiated if you want the confidence interval for the marginal odds ratio rather than the marginal log odds ratio.

because several people will have seen a different formula for the odds ratio for 2 by 2 tables. And so I want to cover the one that they see. And there's a difference. One of them is a conditional odds ratio, and the other's a marginal odds ratio. So imagine if we created a logit model for our approval rating data. Where we say the logit, the probability that person I says yes at time 1 is alpha plus U i. And a logit of person, the probability that person I says yes at time 2, is alpha plus gamma plus U i. So U i is this person specific effect. Alpha is common across both times. And gamma is the log odds ratio comparing

the approval rating for given person at time 2 to time 1, right? So notice you have to compare the same person, because otherwise these U i's would not cancel out. When you took the difference in these two logits. So each U i contains a person-specific effect. So the person with large U i is likely to answer yes at both occasions. A person with small or negative U i is likely to answer no at both occasions. So then gamma here is the log odds ratio of comparing a yes at time 1 to a response of yes at time 2 And in this case, gamma is a subject specific effect. you, you only interpret gamma if in fact these U i's cancel out. And that's where you get the conditional this so called conditional formula for the odds ratio.

So one way to eliminate U i is to do a so called conditional estimate, estimator. And the condition on the total number of yes responses for each person.

so, so what you wind up with is only looking at the discordant cells again. And then the conditional ML estimator for this log odds ratio and its standard error turn out to be the log of the ratio of the off-diagonal counts. And the standard error turns out to be the square root of 1 over the off-diagonal counts. So I think people prefer this because it's a simpler formula, but notice it has a very different interpretation. In one case we were comparing the marginal probabilities. In the other case we had this formulation where we had these person specific, random effects that had to cancel out, in other words one of them.

averaged across people, and then the other one conditioned on people. So they have different interpretations, one is called the marginal one is called a marginal odds ratio. And it's confidence interval, and this one is called subject specific odds ratio. And its confidence interval. So they have different interpretations. The difference in interpretation is extremely subtle. But it still exists. and that's why you get different answers.

so let me just summarize here. The marginal ML has a marginal interpretation. And the, the effect is averaged over all these U i values, if you want to put it back to the same model. Okay. The conditional ML estimate has a subject-specific interpretation.

And so, you know, if you ask me when would you want to use one versus the other? I kind of think if you were talking about kind of policy type questions. Then you would want marginal statements.

and then if you want kind of clinical type questions, then you probably want subject-specific type statements. But it's, it's, you know, it's not perfectly clear.

but nonetheless, that's where the difference come from. It's, it's the fact that basically the, the logit is not a linear function.

creating odds ratios, or creating odds ratios then averaging odds ratios. You just get different answers. And so that's the difference between those two. I think it's a, it's a very subtle thing, and I think for the purposes of this class, you can ignore it. I just wanted to present it in case you were among the subset of people, that happen to see this formula, log n 2 1 over n 1 2 plus the standard error. That the reason that it's different is because we're kind of taking a different approach. And the reason I do the marginal approach, especially in this class, is because we talked about everything related to 2 by 2, match 2 by 2 tables that we discuss, is marginal. So we talk about McNemar's test, the exact version of McNemar's test. And then the marginal odds ratio. Everything is related to the marginal probability. So if you're okay with that, then just leave it. but if you're not okay with that. And you need to know, why is this different from the formula that you saw before, perhaps in an EPI class or in another bio-stat class. That's the reason it's a different formulation [SOUND].

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.