1:36

if we hadn't factored in x, this would be the mean for the group that, say, received

Â the treatment, and this would be the mean for the group that got the control.

Â Now there's a pretty clear linear relationship between

Â the outcome and the regressor.

Â So what we could do is fit a model that looks like y = beta

Â naught + beta 1 times our treatment indicator, 01 for

Â our treatment indicator, + beta 2 * x + epsilon.

Â This would fit two parallel lines.

Â Beta one would represent the change in intercepts between the groups,

Â whereas beta two would be the common slope that exists across the two groups.

Â 5:06

clearly didn't randomize, and which model

Â here is the right one to consider is not really up for discussion for today.

Â What we're just talking about today is about how

Â the inclusion of x can change the estimate.

Â Now which is the right model to consider?

Â That is a different thing.

Â For example, this might occur, an example of when this might occur is let's say

Â treatment is whether or not you're taking some blood pressure medication, okay?

Â And y, your outcome, is your blood pressure.

Â But suppose that the x variable was

Â 5:47

cholesterol or something highly related to whether or

Â not you would've gotten prescribed this medication to begin with, okay?

Â Then you could see that adjusting for x is really just adjusting for

Â the same sort of thing that would lead you to have treatment.

Â So again, this is what makes observational data analysis very hard

Â as opposed to instances where what you're interested in has been randomized.

Â Okay, so this is an example, just to reiterate this point,

Â this is an example where we had a strong marginal effect when we disregarded x,

Â and a very subtle or a non-existent effect when we accounted for x.

Â Let's try some different scenarios.

Â 7:09

There is some direct evidence right there comparing the two groups.

Â But, it also is kind of a hard

Â case because if you look, here's the marginal mean for the red group.

Â Here's the marginal mean for the blue group.

Â There's a probably significant effect here that says the red is higher than the blue.

Â However, we fit our model and look at the change in the intercepts,

Â what we see is that the blue is higher than the red.

Â So our adjusted estimate is significant and

Â the exact opposite of our unadjusted estimate, okay?

Â And again, during this lecture we're not gonna talk about which one's the right

Â one, we're just gonna talk about how this can obviously occur.

Â Here is a picture where you can see exactly what's happening.

Â This phenomenon here is often called Simpson's Paradox, and

Â the idea is you look at a variable as it relates to an outcome,

Â and that effect reverses itself as the inclusion of another variable.

Â That's often called Simpson's Paradox, which basically just says

Â things can change to the exact opposite when you perform adjustment,

Â which Is actually looking at this picture not that surprising.

Â It's not a paradox whatsoever.

Â 8:28

All right, let's try some other examples.

Â Again, the next slide's just gonna reiterate these points.

Â In this example, there's basically no marginal effect.

Â However, there's a huge effect when we adjust for x so

Â this is a case where we went from, we saw a case where we went from significant

Â effect to when we adjusted for x, we got a non-significant effect.

Â Well, here's an instance where we had a non-significant effect if we ignore x,

Â and then a significant effect when we include x, okay?

Â So there's no simple rule that says this is always what will happen with

Â adjustment.

Â Pretty much any permutation of going from significant to non-significant, staying

Â both significant, staying non-significant, flipping signs, all of them can occur.

Â 9:24

Here's the final example like this I'd like to show, and

Â this just considers an instance where we would surely get this wrong

Â if we were to assume the slopes were common across the two groups.

Â Obviously, the slopes are different.

Â And we know how to fit a model like this.

Â If we were to fit a model that said,

Â y = beta naught + beta 1 * our treatment effect,

Â + beta 2 * x, + beta 3 treatment, * x + epsilon.

Â That would fit two lines with different intercepts and different slopes,

Â and we could get a fit like to this data set.

Â Another important thing to ascertain for

Â this data set is there is no such thing as a treatment effect.

Â If you look right here, the red and the blue,

Â there's no evidence of a treatment effect.

Â If you look right here, there's a big evidence that blue

Â has a higher outcome than red, and if you look over here,

Â there's a lot of evidence that red has a higher outcome than blue.

Â And the reason, the interaction is the reason

Â that this main treatment effect doesn't have a lot of meaning.

Â The end result is that this coefficient,

Â the coefficient in front of the treated effects, which just spits out of course,

Â is not interpreted as the treatment effect by itself.

Â You can't interpret that number as a treatment effect.

Â As we can see from this picture,

Â there is no such thing as a treatment effect for this data.

Â The treatment effect depends on what level of x you're at.

Â So you can't just read the term from the regression

Â output associated with the treatment and act as if it's a treatment effect,

Â if in fact, you have an interaction term in your model.

Â So that's an important point, but this also just goes to show how adjustment can

Â really change things if you have a setting like this where you have not just

Â adjustment, but so-called modification.

Â Okay, so again that was a crazy simulation.

Â This just summarizes some of the points.

Â You often see interactions, but

Â you rarely see interactions that start, but still, nonetheless, they can occur.

Â And then I want to reiterate that nothing we've talked about is specific to having

Â a binary treatment and a continuous x.

Â In this case here, we have our same outcome, y.

Â But in our x1 variable is continuous and our x2 is continuous, but because

Â it's kinda hard to show 3 variables at the same time, x2 is color coded.

Â So higher lighter values mean higher, and

Â more red darker values means lower, okay?

Â So in this case, if you look at this plot you would say there isn't much of

Â a relationship between y and x1, however,

Â lets look at this in three dimensions, and then I need a different setting.

Â I need something where I can rotate the plot around.

Â So, I'm going to use RGL, and it's pretty easy.

Â You can, this doesn't show how I generated the x1, x2, and y,

Â that's in the mark down document, but here, I'll show you how to get it.

Â So there's the plot or data set that's equivalent cuz I reran the simulation.

Â And here's our plot.

Â So here's exactly that plot recreated,

Â 16:12

Okay, and then for automated model selection, which is another process that

Â you'll talk more about in your machine learning class, that's a different thing.

Â And I don't think it's particularly well suited for

Â painting interpretable coefficients.

Â It's data for obtaining good predictions with respect to a loss function.

Â So that's a different process.

Â I think there's no substitute if you're doing model building with a regular

Â data set where you want interpretable coefficients to getting your hands dirty,

Â getting the team of people that are working with on it together,

Â some with the right scientific expertise, some with the statistical expertise, and

Â some with the computing expertise, and so on, all together to fit the models.

Â And if there's a big change in coefficients after different adjustment

Â strategies, well, then those need to be discussed and

Â vetted and the benefits and side effects, and

Â downsides of each of the models should be discussed, okay.

Â