A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 2: Regression Methods

76 ratings

A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

From the lesson

Module 3B: More Multiple Regression Methods

This set of lectures extends the techniques debuted in lecture set 3 to allow for multiple predictors of a time-to-event outcome using a single, multivariable regression model.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

Hello and welcome to Lecture 8.

Â In this lecture set we are going to discuss Cox Proportional

Â Hazards Regression for Estimation, Adjustment and Basic Prediction.

Â And this will parallel what we've done with logistic and

Â linear regression in the previous two lecture sets.

Â So first, let's look at some Multiple Cox Proportional Hazards Regression, and

Â give some examples of it.

Â So hopefully by the end of this section, you'll be able to interpret the estimated

Â hazard ratios from multiple Cox Proportional Hazard regression models in

Â a scientific context, and compare the results from simple and

Â multiple Cox regressions to assess confounding.

Â So let's go back to our PBC data, trial data, the randomized clinical trial on

Â 312 patients with primary biliary cirrhosis studied at the Mayo clinic.

Â Patients were randomized as we know very well,

Â I know to either the drug DPCA or placebo.

Â And patients were followed from enrollment until death or

Â censoring, and the follow up period was up to 12 years.

Â So the question we had tried to answer before is,

Â what is the association between treatment and patient survival?

Â And we had seen this before in the unadjusted or crude analysis when we

Â did the Cox regression, the resulting hazard ratio mortality for

Â those on the drug compared to those on the placebo was 1.06.

Â Slightly elevated.

Â 6% higher risk in the sample for those who got the drug, compared to the placebo

Â where the result was not statistically significant with the confidence interval

Â for the slope crossing 0 and the 95% confidence interval for

Â the exponentiated slope or the hazard ratio crossing one.

Â So, it is not expected that this crude unadjusted hazard ratio will be

Â confounded by other patient characteristics such as age, sex, and

Â bilirubin at time of randomization.

Â Why not?

Â Well, think about this with a randomized study,

Â so the distribution of these factors such age, sex, and bilirubin levels at time of

Â randomization should be similar in the treatment and control groups,

Â and hence should not distort, magnify or minimize the true association.

Â However, we still may want to look at a multiple regression that allows for

Â other predictors because other patient characteristics may

Â add additional information about mortality, above and beyond treatment.

Â And these other patient characteristics could be related to each other, and

Â also related to mortality.

Â So they may confound each others' relationships, potentially,

Â even if it's not likely going to affect the association with the treatment.

Â So let me just show you the results from simple and multiple Cox regressions.

Â The first set of results here are unadjusted.

Â This is the crude association between mortality in each predictor on its own.

Â So this is the crude association we had between mortality and the drug.

Â The hazard ratio we just quoted of 1.06 and the confidence interval.

Â Age was dealt with by putting into quartiles.

Â I put into quartiles to deal with the case where the association may not be linear.

Â And so I split this into quartiles.

Â The reference group is the first quartile, less than 42 years.

Â And you can see in, at least in terms of the estimates, with older age, not taking

Â into account any other factors, there's an increased hazard with increasing age.

Â Although the confidence intervals for

Â the estimated hazard ratios for the three older age groups all overlap somewhat.

Â And only the latter two are statistically significantly different than

Â the reference.

Â But on the whole it shows evidence of an increasing mortality with increasing age.

Â And the overall p value,

Â it's no surprise because some of the differences were significant.

Â But the overall p value for testing whether if there's any difference between

Â the age quartiles in terms of mortality is statistically significant.

Â Just one thing to think about it, it is theoretically possible and

Â it does happen sometimes, that none of the associations between

Â the other groups and the reference are statistically, significantly different.

Â But the predictor as a whole is still statistically significant

Â because each of these associations only

Â compares the difference between the particular group and the reference.

Â And it is possible that some of the differences between these

Â other groups are statistically significant.

Â So it's interesting in statistics things can be not statistically significantly

Â different from the same reference, but

Â statistically significantly only different from each other.

Â So that's why it's important with these categorical variables to do

Â the overall test as well, because you may catch things that you

Â wouldn't catch by simply looking at the results for some of the comparisons.

Â Billy Reuben, this shows the, the association we estimated in lecture three,

Â a 16% increase in mortality per one milligram per

Â deciliter increase in baseline levels measured at the time of randomization.

Â And there's a statistically significant sex association that shows that females

Â have a lower risk by an estimated 39%, but there's a lot uncertainty.

Â This reduced, reduction could be anywhere on the order of over 60%,

Â 61% to just shy of just slightly lower at 2%.

Â There's a lot of uncertainty there.

Â What happens when we actually put these all on the same model and

Â actually compute each estimate adjusted for the other factors?

Â Well, as expected there's a slight change numerically in

Â the estimated association with the treatment but

Â statistically speaking the results are still not significant and the magnitude of

Â the association is similar after adjustment for age, bilirubin, and sex.

Â And we would expect not to see much difference there because of

Â the randomization.

Â If you look at the age association, the individual comparisons between each

Â age group and the reference the lowest quartile, a tenuate gets

Â slightly smaller than they were when we didn't adjust for the other things.

Â But the overall result is similar than increasing age is associated

Â with increasing risk of mortality, and the highest age group is still

Â statistically significantly different than the lowest, or reference group.

Â So just, just to remind us of who's being compared to whom here,

Â this hazard ratio estimate of 0.99, for

Â example, compares the relative hazard of mortality for those persons 42

Â to 50 years compared to those less than 42 years old who where in the same treatment

Â group have the same bilirubin measurements at baseline in or of the same sex.

Â The relationship between bilirubin and

Â mortality is unaltered after adjustment for treatment, age and sex.

Â And there's very little difference in the adjusted relationship between mortality

Â and sex as compared to the unadjusted.

Â So just to summarize the main findings in this analysis,

Â is that the relationship between mortality in treatment, or

Â rather the lack of the relationship after accounting for statistical significance,

Â and the small relative increase for those in the drug group that was

Â not confounded by age, sex or bilirubin levels at the time of randomization.

Â And we would expect that to be the case because, well, it was a randomized trial,

Â and so the distribution of these factors should be equivalent or

Â nearly equivalent between the treatment and control groups, and

Â should not influence or affect the overall crude association.

Â Age, sex, and bilirubin were statistically significant predictors of

Â mortality in the unadjusted comparisons.

Â After adjustment for each other and

Â treatment, all three remained statistically significant predictors with

Â associations very similar in magnitude to the unadjusted associations.

Â So the take home message here is that we could do a better job of

Â explaining mortality by using these three factors together.

Â But none of them seem to confound the relationship between mortality and

Â the others.

Â Let's look at our predictors of infant mortality in that sample of

Â 10,300 Nepali newborns.

Â And we're going to look at this as a function of gestational age, and what

Â we're looking at here is the mortality in the six months following birth.

Â So you may recall we looked at this association in lecture three and we

Â had categorized gestational age into five categories, such that the reference com,

Â for the comparisons was the pre-term were less than 36 weeks gestational age group.

Â And one of the reasons we categorized this is and

Â kept it as categorical is because we saw that there was a pretty large reduction in

Â mortality when jumping from premature to full term.

Â And then there was a slight further reduction with a little bit

Â longer gestational ages.

Â But on the whole,

Â the big story was about that reduction that came from being full term.

Â And so, instead of putting in gestational ages a linear term, which would over

Â estimate, which would underestimate that jump from pre-term to full term,

Â and then over estimate the remaining impact of additional weeks or mortality.

Â We kept it or decided the categorical was the way to go, and these are the results

Â the relative hazard of mortality in the six months following birth for

Â those children who had gestational ages of 36 to 38 weeks relative.

Â The reference group was 0.41,

Â a 59% reduction that was statistically significant.

Â And similarly we got closer to 70% reduction, 67% for

Â the subsequent gestational ages categories.

Â Although these confidence intervals all overlap with each other.

Â So the big story of this is that there was a large reduction in mortality for

Â being full-term.

Â What are some other potential predictors of this?

Â Well, we had, this was embedded in that randomized trial of

Â maternal vitamin supplementation and similar to the previous trial,

Â there wasn't much of an impact with the treatment unadjusted, and

Â we wouldn't expect that to change with adjustment because it was randomized, but

Â this just shows the breakdown of some other potential predictors.

Â So we've got the treatment groups, which were roughly, there's a little bit of

Â fluctuation, but on the order of 33%, roughly a third of the children that were

Â born to mothers in each of the three vitamin treatment groups.

Â And here's the distribution of the gestational age categories.

Â You see a little over,

Â a little under a quarter of the sample were pre-termed, 22.5% were at 36 weeks.

Â And then this shows the remaining numbers and percentages.

Â The fem the sample was slightly majority female at 51.1%.

Â And then of the other potential predictors is maternal parity.

Â So just under a quarter of the mothers this was the first child they had,

Â they had no previous children prior to the one in this study.

Â Another 20% had one previous child.

Â And then another 43% had two to four prior children and

Â only less than 2% of the sample had more than eight previous children.

Â So we're going to look at these predictors.

Â We've already look at gestational age but we'll look at these other ones as

Â well unadjusted, and then when all are adjusted for

Â each other in one large multiple Cox regression model.

Â And we'll do it with two different levels of adjustment.

Â So here are the unadjusted associations.

Â We see what we saw before with gestational age.

Â Here are the treatment comparison and

Â you may remember from when we analyzed this in STAT reading one, there were no

Â significant differences in the mortality of children between those born to

Â mothers who got the vitamins, either vitamin A or beta-carotene in the placebo.

Â And the overall p value for

Â this was greater, that test for any difference, this was greater than .05.

Â The sex.

Â Comparison there was no difference in mortality by sex,

Â males had 2% higher in the sample, but it was not statistically significant.

Â And then there was an interesting association with maternal parody in

Â the unadjust level.

Â It was statistically significant predictor, and the reference compare group

Â was children born to mothers who had no previous children.

Â You can see that the unadjusted comparison of mortality for children born to mothers

Â who had one previous child compared to the group with no previous children.

Â There was a reduction in the mortality.

Â There has a ratio is 0.58, and was statistically significant.

Â And then when we go to 2 to four previous children compared to the same reference

Â there is still a reduction but not by, by less than the previous comparison.

Â And then similarly, with five to eight previous children,

Â there's still a slight reduction compared to the reference group, but

Â it's smaller of a reduction than the previous two.

Â And then when we get to the group that has eight or

Â more prior children, there's an increased risk,

Â although it's not statistically significant over the reference.

Â But it seems to suggest that having had some previous children is associated with

Â lower risk of mortality, but there's a threshold at which it becomes either

Â equivalent or slightly higher than the risk of not having any previous children.

Â If we go to the second model, I did here only included the predictors of

Â gestational age, which's four categories, and maternal parity which was

Â another four categories to see if those two were related to each other and, and

Â their relationship between [INAUDIBLE] caused any some court of confounding.

Â You can see if you look at these estimates side by side and the confidence

Â intervals for gestational age, were nearly identical to the unadjusted version.

Â And similarly, slight numerical shifts in the estimates, but pretty much

Â the same story with maternal parity even after adjustment for gestational age.

Â So it doesn't appear that gestational age and

Â maternal parity were related, even though both were related to mortality.

Â And then if we look at this third model, which includes all the predictors.

Â So if we wrote it out it's a long model, that has the log hazard of mortality,

Â at a given time, is equal to an intercept at that time.

Â Plus then we'd have four x's, for gestational age, and

Â then another two x's for treatment, an x for

Â sex, and then four more x's for maternal parity.

Â So this would have a lot of x's in it.

Â This would be our gestational age part.

Â And then we'd have treatment.

Â So, we'd have an indicator for vitamin A.

Â An indicator for placebo.

Â And so on, and so forth.

Â We then have the sex component, the indicator of male or

Â female, and then four more Xs, and I'll spare you my handwriting here.

Â Four more Xs for the gestational age categories, for

Â the maternal parity categories.

Â And if we wanted to actually get the values of these slopes we

Â could take the logs' for respective confidence intervals.

Â But the point here is there's an underlying regression model, and

Â watch the results on the exponentiated scale will give us these hazard ratios.

Â And on the whole, if you look across this model that, where for

Â example we look at the relationship between mortality and

Â gestational age adjusted for treatment, sex and

Â maternal parity, the results are pretty much similar or the same as they were when

Â we looked only at adjustment for maternal parity and in the unadjusted case.

Â So it, up here there's no confounding of the relationship between gestational age

Â and mortality by these other factors.

Â Similarly with treatment, the results are almost identical after adjustment which we

Â would expect because of the randomization and similarly with sex and

Â maternal parody, there's not much change in the associations above and

Â beyond what we saw on the other adjustments so.

Â So gestational age and maternal parity taken together add

Â more information about mortality as both are statistically significant, but

Â they don't appear to confound each other's association.

Â So in summary gestational age and

Â parity were both statistically significant predictors,

Â statistically significant predictors unadjusted and adjusted.

Â So they each had something to contribute above and

Â beyond the information from each other.

Â Where sex was not significant, nor was treatment.

Â And there was no real evidence of any confounding of these or

Â the other two relationships by each other, by the other factors.

Â So, in summary, multiple Cox regression can be used to both estimate adjusted

Â hazard ratios and assess the associations between timed event outcomes and multiple

Â predictors by one model similar or very analogous to what we did for binary

Â outcomes for logistic regression and continuous outcomes within the regression.

Â In the next section we'll look at making comparisons between more than,

Â between groups who differ by more than one predictor, using the results for

Â multiple Cox regression.

Â We'll talk a little bit about how to translate the estimated regression models

Â into survival curves for different groups to find by different values of x.

Â And then in the last section we'll look at several examples of the use of

Â Cox regression in the public health and medical literature.

Â Coursera provides universal access to the worldâ€™s best education,
partnering with top universities and organizations to offer courses online.