Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

From the course by Johns Hopkins University

Mathematical Biostatistics Boot Camp 2

41 ratings

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

From the lesson

Techniques

This module is a bit of a hodge podge of important techniques. It includes methods for discrete matched pairs data as well as some classical non-parametric methods.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

So

Â [NOISE]

Â it's not just enough to have a testing procedure.

Â we'd also like to have some sort of confidence interval.

Â So, let's let pi j hat be the sample proportions.

Â And imagine if we want to estimate d

Â equal to the difference in the marginal proportions.

Â So in this case this would be the

Â difference in the marginal probability of an approve vote.

Â so so then, this is equal to n 1 2 minus n 2 1 over n.

Â So that estimates the difference in the marginal proportions.

Â so we talked in the previous slide about the variance of

Â this estimator, about the variance of this estimator, under the null hypothesis.

Â Let's talk about the variance of the estimator in

Â general, and the variance works out to be this format.

Â This form, pi 1 plus 1 minus pi 1 plus plus pi plus 1 1 minus pi plus 1.

Â So that's the you know, divided by n, that would be the kind of difference in

Â binomial type variance that you would expect to see.

Â And because the the, the samples are correlated.

Â We have this correlation term, minus twice pi 1 1 pi 2 2 minus pi 1 2 pi 2 1.

Â Okay?

Â And so that's subtracting out the correlation here.

Â And

Â what would happen you know If, if basically there's a lot

Â of counts in these off-diagonal cells, pi 1 2 and pi 2 1, right?

Â Then pi 1 2 and pi 2 1, pi 1 2 times pi 2 1 would be a big number.

Â We have minus twice that big number, which would result in a larger variance.

Â if, if the off-idiota cells are really small, and

Â most of the data lie on the main diagonal.

Â then pi 1 2 at times pi 2 2 would

Â be very large, and we'd have minus twice that number.

Â And we'd wind up with a much smaller variance,

Â than the standard kind of difference in binomials variance.

Â Okay?

Â so we could take d minus the true difference

Â in proportions divided by the standard error estimate here.

Â And that follows an asymptotic normal distribution.

Â and we can use that again to create confidence intervals.

Â I think, I hope everyone at this point in the class, could do something like that.

Â So this last bullet point here, I say compare sigma

Â d to what we would use if the proportions were independent.

Â So compare the result to if, instead of asking the

Â same people on two occasions whether or not they approve.

Â What if we asked different set of people each time?

Â Then this minus twice part would go away.

Â Okay?

Â But what, what do we kind of think? We kind of think that

Â people who approve on the first occasion, would

Â be more likely to approve on the second occasion.

Â You might think if you are in the U.S, if you're, If

Â you're a democrat, you might, you know, approve of, say, President Obama.

Â On, on a first question, you'd be more likely to

Â approve on the second question, on the second time point.

Â And the same thing with the people who disapprove.

Â If you're a republican, and you disapproved on

Â the first time point, you, you'd be more

Â likely to disapprove the second time point.

Â So and that follows, you know, that's a very frequent form of correlation.

Â where the measurements tend to be concordant, they tend to agree.

Â so that is exactly this case, where pi 1 1 times pi 2 2.

Â will be much larger than pi 12 times pi 21.

Â In other words, things will tend to lie on the main diagonal of that 2 by

Â 2 table, of the matched 2 by 2 table in that people will tend to agree.

Â And so

Â if that's the case, this covariance term here will

Â be positive, so we'll have minus twice this positive number.

Â And, and you'll, you'll get a dramatic reduction in the variance.

Â So in other words failing to account for

Â the fact that the same people were asked twice.

Â In, in this case would be a a really kind of dumb

Â thing to do.

Â Because you have a reduction, reduction, you'd have a reduction in precision, you

Â get a much wider confidence interval if you, if you fail to do that.

Â So it gets, it's interesting in general.

Â But even if it, even if it resulted in

Â a, in a wider interval to account for the dependency.

Â You'd still want to do it, because that will give you the

Â correct interval rather than one that's based on completely incorrect assumptions.

Â Coursera provides universal access to the worldâ€™s best education,
partnering with top universities and organizations to offer courses online.