A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 2: Regression Methods

74 ratings

A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

From the lesson

Module 3B: More Multiple Regression Methods

This set of lectures extends the techniques debuted in lecture set 3 to allow for multiple predictors of a time-to-event outcome using a single, multivariable regression model.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So in this section we'll give some examples of the use of interaction terms

Â and their presentation with the results stemming from them,

Â from published research.

Â And so this will give you exposure to several examples of the use of

Â interaction terms in research, in pu, and hence, in published analyses.

Â So the first one we're going to look at it is from the Journal of

Â American Medical Association or

Â JAMA' s Pediatrics journal and it's looking at maternal depression and

Â post-traumatic stress disorder as a predictor for child maltreatment.

Â So they lay this out in the abstract.

Â They give the importance here saying maternal post-traumatic stress disorder or

Â PTSD may be associated with increased risks for child maltreatment and

Â child exposure to traumatic events.

Â Exposure to multiple traumatic events is associated with

Â a wide range of adverse health and social outcomes in children.

Â So the objective of their study as stated is to examine the association of probable

Â maternal depression, PTSD, and PTSD and depression comorbid with the risk for

Â child maltreatment and parenting stress and with the number of traumatic expense,

Â events to which preschool children are exposed.

Â And they used a cross-sectional observational study design.

Â And they talk about using analysis variance to compare means on

Â certain scales measuring child maltreatment, parenting stress, and

Â children's exposure to traumatic events between groups defined by the presence or

Â absence of either PTSD, depression, both or neither.

Â And then they go on to talk about what we'll focus on.

Â They say hierarchical regression, which they're using as a synonym for

Â multiple regression.

Â Analyses were used to examine the unique and interactive effects of depression and

Â PTSD severity scores on these outcomes.

Â So what they're measuring here both is their outcome and

Â their predictors are based on scale.

Â So let's talk about how they measure PTSD and depression to start,

Â to get a context for what this predictor looks like.

Â So mother's exposure to traumatic events, PTSD symptoms and impairment were

Â assessed using the 49-item Posttraumatic Stress Diagnostic Scale, the PDS.

Â This PDS assesses exposure to traumatic events, including serious accidents,

Â natural disasters, non-sexual or sexual assault by a stranger or

Â family member, military combat or exposure to war, imprisonment, torture,

Â life threatening events and a category of other.

Â And this PDS scale parallels the Diagnostic and

Â Statistical Manual of Mental Disorders, diagnostic criteria for PTSD.

Â So these big, 49 item scale that the mothers answered questions to and

Â the sum of the responses that are scored for each of the items is created higher

Â scores on this PDS scale, indicate higher levels of PTSD.

Â Then they talk about how they measured maternal depression.

Â And maternal depressive symptoms were

Â assessed using the Edinburgh Postnatal Depression Scale.

Â The Edinburgh Postnatal Depression Scale's a 10-item scale assessing depressive,

Â depressive symptoms in the past week.

Â Items are rated on the scale from zero to three.

Â So each of the ten items is rated from zero to three.

Â A total score is computed by summing the items so

Â theoretically the total score for any one person could range from zero to thirty.

Â And then they talk about using a cutoff score of thirteen or more.

Â The Edinburgh Postnatal Depression Scale has demonstrated good psychometric

Â properties among non postnatal women relative to clinical diagnosis so

Â this scale is measured on a continuum, but it can be used and

Â dichotomized with a cutoff of 13 or more, meaning depression versus not.

Â So let's go on to talk about how they perform the analyses of the regressions

Â they referenced in the abstract, and we'll focus on one such set of results.

Â So I'm pulling excerpts from their methods and statistical analyses section to

Â put together how they're handling the regressions here.

Â They say subsenantly, subsequently we conducted

Â hierarchical multiple regression, or just multiple regression,

Â to examine the relationship of continuous scores reflecting the severity of PTSD and

Â depressive symptoms and their potential interaction with child maltreatment,

Â child exposure to traumatic events, and parital, parental stress.

Â In the first step of each model for each of these outcomes, child maltreatment,

Â exposure to traumatic events, and parental stress, they entered PTSD and

Â depression severity scores on the continuous scale as predictors.

Â In the second step they added the interaction between PTSD and

Â depression severity as a predictor.

Â And they talk about how they got the interaction terms.

Â The interaction terms were created by multiplying center predictors, and

Â I'll talk about that briefly to address potential problems with multi-colinearity.

Â What they did is instead of taking the these were both measured on on

Â a continuum instead of measuring multiplying the actual values,

Â they took each person's value and subtracted off the mean for everyone.

Â That really has no impact on how we interpret the results.

Â One thing that makes this analysis different than what we saw in

Â the previous sections is both variables that

Â are being multiplied here to create an interaction term are continuous.

Â So let's look at the results.

Â And I'm going to, so they give, for example, for

Â the outcome of psychological abuse, which is another scale in which

Â higher values are associated with higher levels of the child re,

Â being on the receiving end of psychological abuse from the mother.

Â Their first model related the average value of psychological abuse,

Â to the predictor's of depression, the depression scale and PTSD.

Â And this gives the slopes and their standard errors, and we can see that

Â they're both statistically significant if you created confidence intervals.

Â And it says, taken together each, on it's own, not, with no interaction,

Â the R squared value, these depression, maternal depression and PTS scales

Â explained roughly 20% of the variability in the child's psychological abuse scores.

Â This model over here which we're going to look at in more detail extends that

Â previous model to include any interaction term between the depression and

Â post-traumatic stress disorder scales.

Â You see when we include the interaction term the R squared increases to 0.28.

Â And you can also see this interaction term is statistically significant.

Â If you create a confidence interval for what, for it, it does not include zero.

Â If you take the estimate and divide it by its standard error,

Â it's less than negative two and would result in a P value of less than 0.05.

Â The other thing they did, this is common, it's not so common in the standard public

Â health and medical literature when we're measuring things where the individual

Â values are interpretable like blood pressures, cholesterol levels, et cetera.

Â But these scales are on a continuum, and their individual values are hard

Â to interpret but we do know that with regards to the scale, higher values mean

Â higher levels of the construct they're measuring, either depression or PTSD.

Â So sometimes, researchers especially in psychology and social sciences,

Â will also convert the slopes that measure the change and the outcome for

Â one unit difference in for example, the depression scale adjusted for PTSD.

Â They'll also present a version of the slope that where the unit is

Â a standard deviation in depression scores.

Â And these are called standardized coefficients.

Â So the depression score may range from zero to 30 for example and

Â what they are measuring here is the change in the psychological abuse

Â outcome score per one unit increase in depression score, adjusted for PTSD.

Â What they're quantifying here is per one standard deviation of

Â the depression scores, whatever that standard deviation is.

Â So it's just presenting it in different units.

Â But we're going to focus on the standard units presented here.

Â So let's look at what the model looks like in my representation,

Â I'm just writing this out.

Â They didn't give the intercepts, so I'm just going to keep that as a generic

Â beta-naught, and then the slope for the main term for the depression scale,

Â which I'll call x1, was 0.15, for the PTSD scale was 0.26, and

Â then X3 here is the interaction term which is simply performed by multiplying the X1,

Â the depression scale score times X2, the PTSD scale score.

Â Let's look at what this means, we haven't seen an example where both

Â things that were interacting are continuous and

Â it's not commonly done, in fact they pull back we'll see and change things up into,

Â in, into how they finally ultimately present the message from this analysis.

Â But what this means is that let's for example,

Â isolate the relationship between psychological abuse score and

Â depression after adjusting for PTSD for various levels of PTSD.

Â So this model, in that direction,

Â suggests with the interaction term that the relationship between the outcome and

Â the depression score is going to depend on what the PTSD score is for a group.

Â So for example, if we're looking at individuals who are mothers,

Â with a low PTSD score, relatively speaking, of three, then the estimated

Â association between the outcome and depression, which is x1, looks like this.

Â We have the .15 times x1 here and then we get another occurrence of x1.

Â Remember x1, x3 the interaction term is equal to X1 times X2.

Â X2 is PTSD so that's equal to three.

Â And we get another occurrence of X1.

Â So if we factored this out in terms of the pieces that come with X1,

Â we'd get 0.15 plus negative 0.02 times 3,

Â all taken together would be our slope for depression score.

Â Or this would be 0.09 for the group of mothers with PTSD scores of 3.

Â If we look at mothers whose PTSD score was 4,

Â the results would be similar but slightly different.

Â We'd have this part for depression score plus its appearance here again.

Â But we've been multiplying it by 4 because x2 the PTSD score is 4.

Â So when you do the parsing here, the end result of the slope when all the dust

Â settles for x1 is 0.15, the original part, plus the piece because the interaction of

Â negative .02 times 4, all times x1.

Â And with a little math that is equal to .07x1.

Â And then if we wumped up to the group with the PTSD score of 10,

Â well when all the dust settles, and I'll let you verify,

Â we get .15 plus negative 0.20 times x1 and

Â the slope now is negative for depression.

Â So what we are showing here is that the relationship or

Â the estimated association between psychological abuse scores and

Â depression is different for every different PTSD score.

Â Well what's the big picture there?

Â As PTSD scores increase, the association between psychological abuse scores and

Â depression decreased.

Â So ultimately and thankfully, because I find the results of that regression

Â a little bit difficult to interpret, both because the scales only have relative

Â meaning in terms of higher values meaning higher depression or PTSD and having

Â an interaction, having an association for example, between the outcome and

Â depression that changes for every level of PTSD is kind of hard to make sense of.

Â So they represent this where they ultimately dichotomize both depression and

Â PTSD at some cutoff for high or low.

Â And what they show here which I think better illustrates what's going on is

Â the relationship between the psychological abuse score means in children

Â amongst mothers with low versus high depression separately for

Â the low PTSD groups and the high.

Â And so what we see here is if we look at the low-PTSD group,

Â the average psychological abuse score for children, for

Â mothers with low PTSD and low depression's negative 0.3, and for

Â mothers with low PTSD but with high depression is 3.57.

Â So a notable increase in the mean on that scale.

Â This slope here they're reporting, whereas in the units of the original scales we

Â would report it as the difference, 3.57 minus negative 0.3 or,

Â I'll call it beta is equal to, so all together this would

Â equal 3.87 on the original psychological abuse score scale.

Â What that means is that children of mothers with high depression versus low

Â pre children of mothers with low depression have average psychological

Â abuse scores of 3.87 units greater in the group of mothers with low PTSD.

Â What they're showing here this beta, is this thing standardized in terms

Â of the number of standard deviations in the psychological abuse scale.

Â So a narrow representation high depression is

Â associated with a half standard deviation increase on average in

Â the psychological abuse scores amongst mothers with low PTSD.

Â And they star this, I actually covered it with my ink here,

Â they star this which they say is they give a super descriptive name.

Â This, this slope is significant.

Â Higher maternal depression is statistically significantly associated

Â with higher average psychological abuse in children for mothers with low PTSD.

Â For the group with high PTSD, there's an actual downward shift in the average

Â psychological abuse score when going to high depression, looking at mothers with

Â high depression compared to low depression amongst those with high PTSD.

Â >> And this result is not significant, so this is just another way of

Â them saying that in low PTSD groups depression is positively and

Â statistically si, significantly associated with psychological abuse score for

Â the child where it's negatively and not statistically significantly associated

Â with the psychological abuse score for children born to mothers with high PTSD.

Â So just they, they actually write this up here but

Â describing what they say here, just repackaging what I said.

Â Figure one shows that when severity of maternal PTSD symptoms was high,

Â the risk for child's psychological abuse was consistently high regardless of

Â depression severity.

Â However, when the severity of maternal PTSD symptoms was low,

Â greater depression severity was associated with an increased risk for

Â child psychological abuse, or an increased score on that scale.

Â So, anyway, this is just one example of the use of interaction terms.

Â And they actually presented the results with the interaction term intact,

Â that doesn't always happen but

Â I do like the graphics they presented ultimately to summarize this.

Â I think that made it clearer than the original results reported from

Â the regression model.

Â Let's look at another example from the Journal of Nutrition.

Â This is second-hand smoke and fiber consumption.

Â And they do a nice job laying of laying this all out in the extract, so

Â I'll just read that to you.

Â They say either objective was the second hand smoke, SHS so ex, exposure increases

Â the risk for Coronary Heart Disease, by an estimated 25% to 30% via oxidative stress

Â and inflammatory mechanisms that may be ameliorated by dietary components.

Â The aim of this study was to evaluate the hypothesis my, modifying role of

Â nutrients, with known antioxidant and, or anti-inflammatory properties on

Â the relationship between secondhand smoke exposure and CHD mortality.

Â So they had detailed secondhand smoke exposure and

Â dietary information from nearly 30,000 non-smokers in

Â the Singapore-Chinese health study, a perspective population-based cohort.

Â The evaluation of whether or not dietary factors, and they list some here,

Â modify the relationship between SHS exposure and CHD mortality was conducted

Â within multi-variable Cox proportional hazard models by creating an interaction

Â term between the potential dietary effect modifier which were broken into quartiles.

Â And then they dicotomized as those being in the lowest quartile of intake versus

Â the second through fourth quartiles of intake an interaction with this and

Â the SHS exposure variable none versus living with at least one smoker.

Â And the results, and this says it all.

Â Evidence for a main effects association between SHS exposure and

Â risk for CHD mortality was not observed.

Â In a stratified analysis by levels of selected dietary nutrient intake,

Â fiber modified the effects of SHS exposure on risk for CHD mortality.

Â And then they result the P value for their interaction term from that Cox Model .02,

Â that there was a statistically significant difference in the association between

Â CHD mortality and second hand smoke exposure by fiber intake level.

Â The adjusted hazards ratio for SHS exposure and

Â CHD mortality was 1.62 with a 95% confidence interval that just

Â squeaks in as significant, goes from 1 to 2.63, for those with low fiber intake.

Â In contrast, among those with high fiber intake there was no association with SHS,

Â HS exposure.

Â So the confidence interval starts at 1 goes up to 2.63, and

Â certainly the estimated hazard ratio showed a substantial increase,

Â at least among those in the sample, so

Â they're saying there might be some evidence of association here, but there

Â was nothing, no association, no indication in the group with high fiber intake.

Â So what they're saying here is, in their conclusion, we provide ind,

Â evidence that a diet high in fiber may ameliorate the harmful effects of

Â SHS exposure on risk for CHD mortality.

Â And so, they did that, file the use of an interaction term in their Cox model.

Â So another study we've looked at before.

Â Now, we'll just bring it in the context to the interaction terms.

Â This is the Association of Race and

Â Age With Survival Among Patients Undergoing Dialysis.

Â And just to remind you of this,

Â the context of this says many studies have reported that black

Â individuals undergoing dialysis survive longer than those who are white.

Â This observation is paradoxical given racial disparities in access to and

Â quality to care, and it is inconsistent with observed lower survival among bl,

Â black patients with chronic kidney disease.

Â We hypothesize that age and the competing risk of transplantation, but

Â we're going to focus on age, modifies survival difference by race.

Â So this was the results from the United States Renal Data System,

Â an observational core study of over a million incidence end-stage renal disease

Â patients between 1995 and 2009.

Â And they say, multivariate age stratified Cox proportional hazard models were

Â constructed to examine death in those who received dialysis.

Â And we'll see what they mean by that,

Â is they include an interaction term with age and race.

Â And then they go on to report, and we'll just I'll let you, you've read

Â this before, but it's just here they report the results of mortality and

Â race separately for different age groups,

Â because what they found was that there was some age modified association and

Â their conclusions say overall among dialysis patients in the US, there was

Â a lower risk of death for black patients compared with their white counterparts.

Â That's when they looked at everyone together and came up with one

Â overall association for black versus white adjusted for a bunch of other factors.

Â However, when they actually investigated effect modification by age,

Â they found that commonly, survi, a side of survival advantage for

Â black dialysis patients applies only to older adults, and

Â those younger than 50 years have a higher risk of death than white patients.

Â So let me just show you we've looked at this before but what they did in the,

Â to start, they, they talk about using Cox proportional hazards regression, and they

Â started with getting one overall adjusted estimate between race and mortality by

Â adjusting for a bunch of factors, age, sex, insurance type, et cetera.

Â But then they ultimately go on to talk about how they looked at

Â age as an effect modifier, and they talk about using Cox proportional

Â hazards regression model adjusted for those, that previous long list of factors.

Â And then they said to confirm whether differences between the age groups were

Â statistically signficant, and what they're implying here with the rest of

Â the text is, to confirm whether the relationship between mortality and

Â race was different by age an additional model

Â was built on top of the one they were previously referring to

Â that included interaction terms for each age category and black race.

Â I'm just going to show you, this is quite an endeavor on their part.

Â It's actually, if you had the time to parse this, you would be able to do it,

Â but it looks a little daunting at first.

Â They had seven different age groups that they categorize their sampling to, and

Â they're all listed here.

Â And then, the, what they had to do since they had seven different age groups,

Â in order to include race and age to start, even if they weren't including

Â interaction, is they had, I'll call this our race variable, which is a one if

Â black and a zero if white, and they had six indicators for the age groups.

Â They had seven age groups, and the first one they used is the reference, and

Â then they had an indicator for each of the remaining six.

Â And then they had to create an interaction term between the race

Â variable and each of these six indicators, so there were six interaction terms

Â with race and age in this model, and this ultimately would allow them to estimate

Â seven different age specific associations between mortality and race.

Â But this model is even more complicated because they had a bunch of other x's and

Â betas that I'm not showing here for all the other things they adjusted for.

Â But I like what they did.

Â They ultimately presented the results graphically.

Â And we're going to look at the Cox model here.

Â There's another model they used which gives similar results.

Â But what they report here is the estimated hazard ratios and

Â mortality for black versus white patients in each of the age group.

Â And these were constructed using combinations of the log hazard ratio for

Â black to white plus the appropriate interaction piece for

Â each of the age groups.

Â And then the results being exponentiated.

Â And they show that these are all adjusted for

Â a bunch of other factors, they may differ between the race groups as well.

Â But they show that in the younger ages, for

Â example, 18 to 30, the relative hazard of mortality for

Â blacks to whites is greater than one on the order of close to 1.4.

Â And it's statistically significant because the confidence interval doesn't in,

Â cross one.

Â It attenuates a bit but it's still sta, statistically significantly higher for

Â blacks compared to whites in the 31 to 40 age group.

Â And then with subsequent age groups,

Â the risk is statistically significantly lower for blacks compared to whites.

Â And that's a nice way of presenting the results from that interaction model as

Â opposed to showing the entire model with the slopes or even and such

Â leaving not only to have to put things together and add but also exponentiate.

Â Many articles will mention that the researchers investigated the interaction,

Â even if no interactions were found or reported.

Â So just for one example of this we'll go back to our gender differences in

Â the salary of physician researchers.

Â Researchers ultimately reported one overall mean difference between males and

Â females adjusted for a bunch of other things.

Â They said as long as these other things were equivalent, academic ranks,

Â specialty, et cetera, between the males and females being compared, the average

Â estimated salary difference was the same, regardless of the values of these things.

Â So they did not report anything about interaction or

Â effect modification of the relationship between salary and

Â gender by other factors in their abstract, or in their final results.

Â But they did mention in their methods section, that hey explore pair

Â wise interactions between gender and the other characteristics.

Â So ostensibly they looked at whether the relationship between wages and

Â gender was modified by specialty type.

Â And then they looked at whether it was modified by marital status.

Â And then they looked at whether it was modified by years and rank etc.

Â And they likely did this through fitting a series of regression models, in this case

Â linear, with interaction terms for gender and each of those things one at a time.

Â All right, well hopefully this was

Â a nice little introduction to the use of interaction terms in the literature, and

Â hopefully it helps solidify some of the ideas we've laid out in this lecture unit.

Â Coursera provides universal access to the worldâ€™s best education,
partnering with top universities and organizations to offer courses online.