A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 2: Regression Methods

43 ratings

Johns Hopkins University

43 ratings

A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

From the lesson

Module 2B: Effect Modification (Interaction

Effect modification (Interaction), unlike confounding, is a phenomenon of "nature" and cannot be controlled by study design choice. However, it can be investigated in a manner similar to that of confounding. This set of lectures will define and give examples of effect modification, and compare and contrast it with confounding.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So in this section we'll continue our discussion of effect modification and we'll look at several examples of studies where one of the researcher questions involved was investigating effect modification.

So this lecture section will give more examples of effect modification and/or the processes used where necessary to investigate effect modification.

So let's look at this first article from the American Journal of Epidemiology. And the title of the article gives some hint as to where it was going in terms of effect modification. It says, Similar Relation of Age and Height to Lung Function Among Whites, African Americans, and Hispanics. So notice the talk about in this case. If you read the article more thoroughly you'll see their outcomes of interest have more to deal with lung function. And some of their predictors of interest include age and height. And what they conclude based on this article that, is that the relationship between lung functioning variables and age and height is statistically equivalent among Whites, African Americans, and Hispanics. So that they can estimate one overall association that applies to all three ethnic groups as opposed to there being effect modification, which would necessitate separate measures of the associations between lung function, age, and height. For each of the three eth, ethnicities. So lets look at what they say in the abstract. They say, current guidelines recommend separate spirometry reference equations for whites, African Americans, and Mexican Americans, but the justification for this recommendation is controversial. So what the authors were curious to see is if data they collected supported this process of having separate estimate

reference equations relating lung functioning to age and height, etc. For the separate ethnic groups. In other words, they are investigating whether there is indeed effect modification. So what they say is the authors examine the statistical justification for race and ethnic specific reference equations in adults in both the Third National Health and Nutrition Examination Survey and the Multi-Ethnic Study of Atherosclerosis Lung Study.

measured following American Thoracic Society guidelines. And then they go on to describe what they mean by statistical justification. Statistical justification, and, for estimating separate associations was defined as the presence of effect modification by race or ethnicity among never smoking participants without respiratory disease or symptoms. So they go on to say in the abstract, there was actually no evidence of effect modification by race, ethnicity for forced expiratory volume in one second. Forced vital capacity or the forced expiratory volume in one second

ratio compared to forced vital capacity in the three different ethnic groups. So, they went on to do an analysis and use the statistical techniques to, to test whether or not the relationship between these respiratory outcomes and predictors such as age and height were statistically different among the three ethnicity groups, and they found no evidence of a difference. We'll see how to do such tasks shortly in the upcoming sections on multiple regression. But what they're concluding here is based on this updated data this more modern data, there was no evidence to support the previously thought notion that the relationship or the reference equations that related lung functioning, the characteristics in such as the age and height needed to be different. And there were associations for white, Africa-American and Mexican-American men or women. They did go on to say though that the mean lung function for a given age, gender and height was the same for whites and Mexican Americans, but was lower for African Americans. So they did conclude that there were some overall differences between the ethnicities after adjusting for age, gender, and height, but then ultimately that the relationship between lung function and age, gender, and height did not depend on what ethnicity the person is. And this second article from the New Jer, England Journal Medicine their looking at statins to prevent vascular vents in men and women with elevated c-reactive protein. So this is a randomized study where the researchers randomized 17,800 healthy, in other words those without a history of cardiovascular disease, men and women with non-elevated LDL cholesterol levels to receive either 20 milligrams of statins daily, or a placebo.

At the end of the follow-up period, the study results include the following. Of the 8900 subjects randomized to the statins groups, a 142 developed cardiovascular disease. And of the 8900 subjects who were randomized to the placebo group, 251 developed cardiovascular disease.

So the unadjusted incidence rate ratio here is very similar to the straight comparison proportions, but the incidence rate ratio accounts for potentially differing follow up periods for each individual is the incidence rate ratio is 0.56 indicating that.

And this is unadjusted. This compares the incidence rate cardiovascular disease development for those who are randomized to receive statins to those who are randomized to receive a placebo. And this estimates a 44% reduced instance or risk of developing cardiovascular disease in the follow up period in the statins group relative to the placebo. And this result is statistically significant and the 95% confidence interval goes from 0.46 to 0.69. Now the authors did not go ahead and report adjusted relative risk or incidence rate ratio.

Despite other characteristics that may be associated with cardiovascular disease development. Sex. Age, and smoking. But why do you think they didn't have to go ahead and report an adjusted incidence rate ratio?

Well, the study was large and randomized. So ostensibly, if they were to report the adjusted incidence rate ratio, which should be similar, if not identical, to this other adjusted incidence rate ratio, 0.56. But because of randomization there was very little potential for any confounding.

However, the authors did investigate interactions or effect modification between some of these characteristics and stats. They said, well, the overall result that we just presented is not confounded by distributional differences between the statin and the placebo groups in these other measures. However, it is possible to the association between cardiovascular disease and statin use differs depending on the level of some of these other characteristics. So, this is a very common type of table shown in the results from randomized clinical trials. Especially where they look at the association of interests separately for different levels of other variables. So for example, they go ahead and show the estimated instance rate ratio of mortality for those on statins compared to those on placebo among males only and they give the estimated as a ratio here. Now that's confidence interval and females only. And they give the estimated haz ratio here and it's confidence interval. This vertical line here, this dotted vertical line is the 0.56 that they estimated for the overall association. And this solid line here is one, which would be the null value. So we can see very quickly that the association is statistically significant for both males and females, as neither confidence interval includes one, but you can see the estimates are relatively close to one another, and the confidence intervals overlap. So this suggests strongly that the relationship between cardiovascular disease development statins does not differ between males and females. In other words the association is not modified by sex. They do report something here that we haven't explored yet, but we will when we get into multiple regression techniques. There is a way to formally test whether the population level associations between cardiovascular disease and statins are statistically different for males and females. The null is that they are not different and this P-value is quite high, indicating the we would fail to reject a null, which is consistent with the fact that the estimates were similar and the confidence intervals overlap. They went in to do this type of analysis stratifying by age. They wanted to see if there was a difference in the association for younger people as defined by those less than equal 65 years and those greater to equal 65 years. And the estimates are different, as you can see, but the confidence intervals overlap and the interaction was not statistically significant. And they do this for several other characteristics. So what they ultimately report, the ultimately did not find any evidence of effect modification, even though they investigated it.

So they go on to actually report the results like this. They say the rates of the primary end point were 0.77 and 1.36 per 100 person-years of follow-up in the statins and placebo group, respectively, with a hazard ratio for statins of 0.56. In a 95% confidence interval 0.46 to 0.69, this is what we showed when we started talking about this. And a very small p-value. However, they go on to say consistent effects were observed in all subgroups evaluated.

So, what they're saying, in other words, is there was an overall association and it did not, in their investigation, appear to differ for different subgroups of the population. So the message that they are giving is pretty clear that there's an overall association of reduced cardiovascular risk associated with statin use. And this relationship does not vary by sex or by age. Or by any other characteristic they did a subgroup analysis on to look for effect modification. So they found no evidence of effect modification by any of the factors that they examined in their study.

Here's another study. Plasma Enterolignan Concentrations in Colorectal Cancer Risk in a Nested Case-Control Study. 'Kay. So this is a nested case-control study. We haven't looked at many case-control studies but we can still appreciate the associations.

So enterolignans and biphenolic compounds that possess several biologic activities whereby they may influence carcinogenesis.

The authors investigated the association between plasma entero lignan and enterodiol and colorectal cancer risk in a Dutch prospective study. Among more than 35,000 participants age 20-59 years 160 colorectal cancer cases were diagnosed after seven point years of follow up. So they used this as their starting point, these 160 cases. And they matched members in the cohort on frequency matching to the cases on age, sex, and study center. So they selected about double, two and a half times the number of controls. Frequency match. Not one to one matching. But they took a, a control group that had similar characteristics in terms of the age, sex, and sex distribution, and study center distribution as the cases.

Okay, so they actually show that plasma, enterodiol and enterolactone were not associated with the risk of colorectal cancer after adjustment for known colorectal cancer risk factors. And so they estimated odds ratio comparing the highest quartile versus the lowest quartile. So they categorized the enterodiol levels into four quartiles. The odds ratio is 1.11, and the results are not statistically significant.

And similarly they did this thing for the enterolactone quartiles. And while they showed in the sample an elevated odds of colorectal cancer in the highest quartile to the lowest, the results were not statistically significant as the confidence interval includes one. However, they go on to say, sex and body mass index modified the relationship between plasma enterolactone and colorectal cancer risk. Increased risks were observed among women and subjects with high body mass index. So what they're saying is on the whole there was an association after accounting for the sampling variability. But in certain subgroups they found that there was an association of increased risk associated with increased levels of these, of the plasma enterolactone.

And this was found among women, but necessarily among men. And among those with a high body mass index, but the association didn't hold up for other body mass index. So what they're saying is that the effect of ent-, enterolactones on their risk of colorectal cancer as measured by the odds ratio was modified by these characteristics. A different association existed for women than men, for example.

And let's look at one more example, the association of race with age among survival patients undergoing dialysis. And we looked at this several times with statistical reasoning one, but we'll come back to it. Now I'll just give you that context. It says from the abstract here, many studies have reported that black individuals undergoing dialysis survive longer than those who are white. This observation is paradoxical given racial sparities in access to inequality in care. And is inconsistent with observed lower survival among black patients with chronic kidney disease. And one of the things they hypothesized was that age modified survival differences by race.

So this goes on to talk about the study design this is just to say they pulled a large number of medical records from the Center for Medicare and Medicaid services forms.

And what they did in order to replicate previous studies was look at a Cox proportional hazard model to estimate the association between mortality and race and they actually went on to adjust for a bunch of other characteristics that may differ. Between the black and white subjects receiving dialysis including age, sex and insurance type. And we'll see how to adjust with multiple cox regression in our section on multiple regression.

But they did this first for everyone. And then they went on to actually look at the association. Between mortality and race adjusted for these other characteristics but separately by age. And so they said to confirm whether the differences between age groups were statistically significant an additional model was built. And including interactions terms for each category and black race. And again, we'll get how to do this in the multiple regression section. But essentially what they said is they used an approach that allowed them to estimate the separate relationship between mortality and race, adjusted for other characteristics, for separate age groups. To see if the association between mortality and race differed by age of the subjects.

And so we've looked at this picture before. But this is a close up of table 2. As close as I can get it. And these are the results from resulting analysis. What they are presenting here is the adjusted relative hazard of mortality for black patients to white patients in each of these age groups. And so they adjust for a bunch of other characteristics in each analysis that may differ between black and white patients who are going on dialysis and may affect mortality. But what you see here is that in the early ages, 18 to 30, this dot here is the estimated hazard ratio and this is the confidence interval. And this is actually all scaled, you may remember on the long scale. So these intervals are symmetric and comparable in terms of the risk. But what we see here is that younger ages the instance rate ratio. The mortality for black to white patients is above one and statistically significant. So black patients on, on dialysis have a higher risk of mortality compared to white patients in the 18 to 30-year-old age group. And this goes, decreases, but persists to be higher and statistically significant for blacks compared to white patients who are 31 to 40 years old when receiving dialysis. But after that age group, the trend goes the other direction, the relationship between mortality and race changes the other direction. After age 41 blacks consistently have a lower risk of mortality compared to white patients after adjustment for other differences. And so what the authors are showing here effectively is that there's effect modification by age. That the relationship between mortality and race, as defined by black or white in dialysis patients is modified by age.

So in summary, we'll look at a few more examples here. Effect modification occurs when the relationship between two quantities, Y and X, depends on the level of a third quantity, Z.

And effect modification cannot be ascertained by comparing unadjusted or crude associations and adjusted estimates adjusted fo Z. We actually need to see separate estimates of the Y/X relationship for separate levels of Z in order to ascertain whether that association is different or not. We will show very shortly how to set this up in a regression context and how to formally test whether at least some of the associations are different for some of the levels of Z. But, the fundamental idea holds that in order to investigate effect modification the researcher has to consider doing so in advance and.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.