Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

From the course by Johns Hopkins University

Mathematical Biostatistics Boot Camp 2

34 ratings

Johns Hopkins University

34 ratings

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

From the lesson

Techniques

This module is a bit of a hodge podge of important techniques. It includes methods for discrete matched pairs data as well as some classical non-parametric methods.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

Hi. My name is Brian Caffo.

Â This is Mathematical Biostatistics Boot Camp Two,

Â Lecture Nine on Simpson's Paradox and Confounding.

Â In this lecture, we're going to talk about a phenomenon called Simpson's paradox.

Â And I, I, I don't find it to be a paradox.

Â But it's called Simpson's paradox.

Â we'll talk about some examples like the Berkley data of Simpson's paradox.

Â And then we'll talk about this related to the treatment of confounding, and then

Â I want to cover a particular way

Â to handle confounding through weighted estimators and

Â then talk a little bit about the Cochran Mantel/Haenszel estimator.

Â Okay, so consider this data right here, which

Â is taken from Agresti's Categorical Data Analysis book.

Â I, I think I've mentioned this book in the, in the

Â past, So, in this, in this instance there was a cross classification.

Â Of defendants from criminal trials where they

Â cross classify by the race of the victim. These are all murder trials.

Â So, the race of the victim, versus the race of the

Â defendant and then whether or not the person got the death penalty.

Â And here, I present all of the, the possible cells plus all the

Â possible marginals. So for example, here, you

Â see the eight cells that classify, victims race and here we are only factoring

Â in, two race denominations, white and black,

Â and death penalties, so there's eight cells.

Â and then here I have, the the, the margin for the

Â defendant white versus black. summed over victim's race.

Â And then here I have the victim, white versus bla, black.

Â summed over the defendant's race.

Â Okay. So let's actually investigate this.

Â So I'm, we're looking at the percentage of people that got the death penalty.

Â So if you look

Â white defendants receive the death penalty

Â a fewer percentage of the time, 11 to 22%,

Â for both white victims and black victims.

Â zero of the, of the white defendants receive the death

Â penalty for the black victims verses 2.8%.

Â Okay?

Â But then something kind of paradoxical occurs.

Â If you disregard the race of the victim, it actually comes out that white

Â defendants receive the death penalty a greater

Â percentage of the time, 11% to 8%, okay?

Â And then if you look at the race of the victim, disregarding the

Â race of the defendant, actually in the instance where the victim was white,

Â the, the the defendant received the death penalty

Â a higher percentage of the time, 12% to 2.5%.

Â But let's forget this last two, race of the victim, marginal.

Â And let's just compare the te, table itself,

Â in which case, in both cases, the the

Â white defendants got the death penalty a smaller

Â percentage of the time than the black defendants,

Â regardless of the ri, race of the victim versus the marginal, here, 11 to 8%.

Â Where the white defendants got the death penalty

Â a greater percentage of time than black defendants.

Â So what's happening?

Â If you were asked to, you know, this, this was related to a court case about whether

Â or not the death penalty was being equally applied, and so what would you conclude?

Â The, if you condition

Â on the race of the victim, you

Â get a totally different, the opposite answer than

Â if you look at the race of the defendant disregarding the race of the victim.

Â So what is the right answer?

Â So let's just discuss a little bit about what's going on.

Â So marginally, white defendants receive the death penalty

Â a greater percentage of time time than black defendants.

Â Across white and black victims, black defendants received the

Â death penalty a greater percentage of time than white defendants.

Â And Simpson's paradox refers to the fact

Â that marginal conditional associations can be opposing.

Â In this case, if you take the margin across victim's race.

Â You get a different answer than if you, condition on victim's race.

Â Here, So, here the death penalty was enacted more often for

Â the murder of a white victim than of a black victim.

Â And then whites tend to kill whites, it just demographically.

Â hence the larger marginal association.

Â but I want to, you know, kind of do a little bit

Â of a commentary before I go through more of examples.

Â So I'm going to cover several examples.

Â First of all, when you state Simpson's paradox in the

Â following way, it doesn't seem that paradoxical at, at all.

Â And that the paradox is, the apparent relationship between

Â two variables can change when factoring in a third variable.

Â And then that, that seems obvious.

Â Of course that's true.

Â It just seems difficult when you start to get mired in the specifics.

Â That,

Â and later on I'll go through the math to say

Â that there's nothing paradoxical about the mathematics of Simpson's paradox.

Â and Larry Wasserman, on his blog, The Normal Deviate,

Â has the most wonderful discussion of why Simpson's paradox is

Â difficult and what the mistake people are making.

Â And the mistake people are making is they're equating the statements the, the

Â causal statements with the probabilistic statements,

Â and the probabilistic statements can be misleading and

Â paradoxical and you're trying to make you know, difficult conclusions

Â in the light of noisy evidence.

Â so, and, and in addition, even if you knew

Â the exact probabilities, the probabilities themselves can exhibit the paradox.

Â However, the causal statements cannot exhibit a state of, of paradox.

Â Okay?

Â So the problem, I think, is his statement is that the.

Â The, the, the real confusion is equating the, the cause, in this

Â case, if you were to, the cause would be, you would say that

Â for example, juries tend to convict causally convict

Â black defendants more often than they convict white defendants.

Â If you were to make that causal statement, then it is impossible

Â for the, for marginal and conditional

Â associa, conditional causal statements to disagree.

Â and so, you know, the real details of this, I, you

Â know, put up the link to his blog post, which is wonderful.

Â But the real details of investigating the causal

Â statements is beyond the scope of this class.

Â We're not going to cover causal inference in this class but,

Â but I think it was a great discussion on his part.

Â To basically show or demonstrate that it is this conflation of cause.

Â With describing

Â probabilities and associations that is the apparent paradox.

Â but mathematically there is no paradox and I

Â think when we go through it you will see.

Â Coursera provides universal access to the worldâ€™s best education, partnering with top universities and organizations to offer courses online.