A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 2: Regression Methods

81 ratings

A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

From the lesson

Module 2A: Confounding and Effect Modification (Interaction)

This module, along with module 2B introduces two key concepts in statistics/epidemiology, confounding and effect modification. A relation between an outcome and exposure of interested can be confounded if a another variable (or variables) is associated with both the outcome and the exposure. In such cases the crude outcome/exposure associate may over or under-estimate the association of interest. Confounding is an ever-present threat in non-randomized studies, but results of interest can be adjusted for potential confounders.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

All right. In this section,

we'll talk about something we mentioned in section A,

the idea of adjusted estimates, adjusted for potential confounders.

And we'll talk about the presentation, the interpretation and

the utility of using these for assessing potential confounding.

So hopefully by the end of this lecture section, you'll understand how to

interpret estimates of association that have been adjusted to control for

a potential confounder or confounders, and compare and contrast

the comparisons being made by unadjusted and adjusted association estimates.

So adjustment is a method for making comparable comparisons between groups

in the presence of a confounder or confounding variables.

We will discuss the basics of the mechanics behind adjustment in

the next lecture section.

But in this section, we'll look at the results of adjustment and then try and

interpret into substantive context.

So let's go back to our fic, fictitious study just to get started.

You recall the results from this fictitious study was done to

investigate the association between smoking and

a certain disease in a population of male and female adults.

And what they did was they took random samples from smokers and

nonsmokers in the population and

then looked at the results of the disease outcome across these two groups.

And what they initially found with this study is that proportion of

smokers with disease was similar, but

slightly lesser in the sample than the proportion among non-smokers.

But upon further investigation in the last section,

we saw that this was distorted by the fact that the majority of

smokers were males who were less likely to have this disease.

So this relative risk is being influenced by the diff,

different sex distributions among the smokers and non-smokers.

This relative risk we were just looking at compares all smokers to

all non-smokers in the sample without taking any other factors into account.

This is called the unadjusted or

crude estimated association between disease and smoking.

Adjustment provides a mechanism for estimating

an outcome/exposure relationship after removing the potential distortion or

negation of the relationship that comes from a confounder or multiple confounders.

So in this fictional example for example, that the relationship between disease and

smoking can be adjusted for sex.

And frequently when you read papers and

they performed adjustments, they'll compare and contrast the unadjusted and

adjusted measures of associations side by side in the results table.

So if we were doing that for this situation, we might present a table that

look like this, the unadjusted and adjusted relate risks disease.

The only factor we have listed in the table is smoking.

We've set it up where there's two rows.

Those for smokers, those for non-smokers and the ref indicated that

the non-smokers are the comparison group that the other group levels and

there's only all another one smokers are being compared to.

And in the unadjusted column, we present this overall crude association.

Here's the confidence interval for

the association, so it's not statistically significant.

This next column presents the results of what I call the adjusted association and

there's a footnote here to indicate what it's been adjusted for,

in this case just for sex.

And notice that this association is numerically larger than the unadjusted.

It's 1.57 and the confidence interval does not include the null value of 1.

So what comparison is being made by this adjusted association?

Well, let's go back and review what the unadjusted relative risk is comparing.

This is comparing the risk of disease, of disease in all smokers.

To the risk of disease in all non-smokers.

The adjusted relative risk is a, is a more specific comparison.

It's going to bring sex into the story.

So what this compares is

the risk of disease among

smokers of a given sex.

To the risk of disease among non-smokers.

Of the same sex.

So in other words.

This relationship compares the relative risk of disease for

smokers to non-smokers where both groups are female or where both groups are male.

But in other words, the comparison has removed variability in sex

from the smoking, non-smoking groups.

This comparison compares those that are of the same sex.

This unadjusted comparison includes everyone who smokes compared to

everyone who doesn't smoke and

as we saw before, there's differing sex distributions between those groups.

So frequently, these young unadjusted and

adjusted associations can be used to assess whether another variable or

variables confound the original associations.

So just looking at this, we saw that the original association between smoking and

disease with less than one on the realm of the risk scale and

not statistically significant.

After adjusting for sex, the relationship showed a positive,

somewhat large association between smoking and sex.

57% increase in the risk of disease for smokers compared to

non-smokers after adjusting for sex and the result was statistically significant.

So this discordance between what we got,

when we look at the situation unadjusted compared to the situation adjusted for

sex is indicative that sex was having some impact on the unadjusted association.

In other words, it was confounding the relationship between disease and smoking.

Let's look at our arm circumference and height study through the same lens.

This was the observational study to estimate the relationship between arm

circumference and height in Nepali children less than a year old.

And this was what our data summary statistics, the ranges look like for

the three measures we were looking at.

So again, I'm just going to jump to the chase here and

actually look at a table of linear regression slopes focusing on height.

But I also put it in for weight.

And we'll look at the result when we look at the relationship between arm

circumference in each of these things ignoring the other.

And then when we look at the relationship between arm circumference and

height adjusted for weight.

And arm circumference and weight, adjusted for height.

So let's just focus on height for the moment.

You may recall this adjusted association, we saw for the past lecture was positive.

Suggesting that increase in average arm circumference per

centimeter increase in height.

And the result was statistically significant.

But after adjusting for the differences in weight

distribution across different height groups, this relationship became negative.

And in fact, it's just a coincidence.

But it becomes exactly the same magnitude, but negative as compared to

the unadjusted and statistically significant after adjustment for height.

So in, in a language of statistics, we see some clear evidence here.

That the original positive association we saw between arm circumference and

height was impacted be, behind, by the behind the scenes

relationship between weight, height in arm circumference.

And this association becomes negative.

If we look at the weight association, it was positive and statistically significant

with regards to arm circumference when we didn't take into account height.

But when we actually adjusted for height and fine-tuned the comparison,

it was still positive and statistically significant, but larger in magnitude.

Then when we didn't adjust for height.

So let's go and

talk about how to interpret these different associations for height.

The unadjusted linear regression slope estimate for

height compared two groups of children, the mean arm,

the difference in mean arm circumference for

two groups of children who differed by one centimeter in height.

No other variables or information considered.

So all children of one height versus all children of the other height,

where the difference is one unit.

And of course, we could expand this to consider differences across more than 1

centimeter in height.

But no other information was considered in this comparison.

So if we were comparing children over 30 centimeters tall to 29,

it was all children 30 centimeters tall for all children 29.

This estimate here, this adjusted estimate was a lot more specific of a comparison.

This can starts the same way as the previous one the difference and

I'll just put dot, dot, dot.

The difference in mean arm circumference for

two groups who differ by 1 centimeter in height and are the same weight.

So it's a much more restricted comparison.

It doesn't matter what the weight is,

as long as it's the same in the two height groups.

So it's a much more restricted comparison, it's comparing children who differ by

one unit height, but are all the same weight and think about this.

Now the association between average arm circumference and

height is negative when we're adjusting for weight.

So does that make some sense?

Well, if children are all the same weight, then taller children in some sense,

colloquial speaking, are more stretched out or

elongated, then shorter children of the same weight.

And that would lead to a stretched out arm,

if you will that would be narrower in arm circumference.

That's how I think of it.

The important thing to note though, is the comparison we're making here is

much more specific than the unadjusted comparison.

If we went back and contrasted the estimates for

the relationship between arm circumference and height in the same context.

We would, the first comparison would compare two groups different by,

by 1 kilogram in weight, regardless of their height.

And the second estimate, that 1.40, compares two groups of

children who differ by 1 kilogram in weight, but are of the same height.

So if we wanted to compare the unadjusted and adjusted associations in confounding,

so this would be presented perhaps in a journal article in

anthropometry associations in Nepali children less than a year old and

there would be a discussion of this in the results in discussion section.

But we as the readers can see here very clearly, that the association between arm

circumference and height changed not only in magnitude, but

in this case, direction when we adjusted for weight.

And the association between arm circumference and

weight change, in terms of magnitude, and the resulting

conference interval before adjustment, after adjustment do not overlap.

So we can see pretty clear evidence here that the association between arm

circumference and height was affected by weight and vice versa.

Let's look at our example on Academic Physician Salaries from 2012.

This was the salary survey of academic physicians intended to

estimate the difference between male and female academic physicians.

So what they initially report in their results was the mean salary

within the cohort overall was $167,669 per year for

females and 200,443, $433 per year for males.

But this comparison here,

these two means are not taking any other factors into account.

So this difference here, the unadjusted difference in male and

females now is on the order of nearly $33,000 per year.

At $200,433 minus the $167,669.

But what the authors go on to say is that male salary was asso,

still associated with higher salary on the order of $13,399 per year different

than females after adjustment for specialty academic rank.

Leadership positions.

Publication and research time.

So what they're saying here is that there are a lot of other factors that may be

potentially be related to being female.

And the salaries and they wanted to remove the impact of those and see if there was

still a clear difference between the salaries between females and males.

So what we see here is that if we look at the unadjusted difference,

the one that compares all males to all females this difference is $32,600,

700, $32,764 and there's a confidence interval for that reporting the article.

But after adjustment for all these other things,

the adjusted one, it was $13,399 per year.

And so it appears that there was some confounding a lot,

perhaps by these other factors.

But it didn't change the fact that males made a lot more on average than females.

It's just not so much as initially estimated after the comparisons, but

just for these other things.

So again, this mean difference compares the salaries for

all males to all females in the sample.

This estimates, the comparison between males and

females of the same specialty academic rank,

leadership position, number of publications and amount of research time.

So the only difference in those measurements between these two

groups is their sex.

Everything else has been equalized.

So if we were to regress based on the results here,

salaries on sex the, the unadjusted slope for sex would be $32,764.

This compares all males in the sample to all females,

regardless of those other things.

This next comparison, which is an adjusted linear regression slope comparing

males to females is $13,399 per year from males compared to females.

This is a much more specific comparison that compares males to females of the same

level of the factors for adjustment, the same specialty, the same academic rank.

The same number of leadership positions,

the same number of publications in the same amount of research time.

And the findings were here.

Obviously, the difference attenuated by a fair amount.

So there's evidence of confounding.

But it didn't disappear and I didn't put confidence limits on this, but

it's statistically significant.

It did not disappear after adjusting for these other things and

there was still a sizable difference in annual salaries, even after

removing the differing distributions of these things between males and females.

So in summary, adjustment is a method for making comparable comparisons between

groups in the presence of a confounder or confounding variables.

The group comparisons made by adjusted associations are more specific than

those made by the unadjusted or crude associations.

In comparing and contrasting crude unadjusted association estimates is

useful for identifying confounding and

potential confounders based on the results after adjustment.

In the next section, we'll talk briefly about the basics of mechanics and

set ourselves up for further explanation when we get into multiple regression.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.