A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 2: Regression Methods

43 ratings

Johns Hopkins University

43 ratings

A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

From the lesson

Module 2A: Confounding and Effect Modification (Interaction)

This module, along with module 2B introduces two key concepts in statistics/epidemiology, confounding and effect modification. A relation between an outcome and exposure of interested can be confounded if a another variable (or variables) is associated with both the outcome and the exposure. In such cases the crude outcome/exposure associate may over or under-estimate the association of interest. Confounding is an ever-present threat in non-randomized studies, but results of interest can be adjusted for potential confounders.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

Welcome back. In this very short section, we're just going to give a little bit of insight as to how adjusted estimates come about, the general idea behind the computations. What we'll see, very shortly, is that multiple regression methods provide a nice frame work for doing adjustments quickly and easily.

So, hopefully, upon completion of this short section you'll gain some insight, conceptually, as to how adjusted estimates are computed.

So, let's first look at our fictitious study. You'll recall that this was the fictitious study on a random sample from a population of persons that were males, in a population of male and female adults. And, there were 210 smokers and 240 non-smokers in this study sample. And, the crude association between smoking, and this not so rare disease outcome, again this is fictitious. Was such that it was close to 1 relative risk was close to 1 but in the sample smokers had a slightly lower risk of the disease than non smokers.

Then, when we actually broke things out specifically, we have sex, we looked back behind the scenes, and we had that sex was related to both the probability of having the disease. Females were more likely to have the disease than males. But, males were more likely to smoke than females. So, sex was related to the outcome of disease in the predictor of smoking. When we removed the variation in sex, between smoking and non-smoking groups, i.e, we looked at the sex groups separately. The relative risk among males of having the disease for smokers compared to non-smokers was one point eight. And, for females it was 1.5. Both estimates are greater than one. And, again, we're not considering statistical significance at this moment, just using the estimates to illustrate the point.

So, how would we adjust for confounding? Well, we, what we did here when z was categorical, our potential confounder z of sex was categoricals. We at, looked at the association between our outcome of disease and our predictor of smoking separately by levels of that potential categorical confounder, sex. So, our example of separate tables for males and females is an example stratifying by a potential confounder. And, what we could, we do, well we saw that the estimates, the estimated relative risks were both greater than one by differing degrees, but the difference in the estimates could be because of sampling variability. Again, we're not at this point considering statistical significance for this particular section, just talking about the overall concept. Well, what could we do to take those sex specific estimates, and aggregate them into one overall association between disease and smoking that had been adjusted for sex. Well, one way to do this would be to take a weighted average of these stratum specific estimates, these sex specific estimates. So, for example, to get a sex adjusted relative risk for the smoking disease relationship, We could weight the sex specific relative risk, for example, by the number of males and females. And we could take a weighted average by taking the number of males times the relative risk estimate for males plus the number of females times the relative risk estimate for females, and divide it by the total sample size. So, in this example there are 200 males, and the relative risk of disease for smokers to non-smokers is 1.8. There were 250 females, and the relative risk of disease for smokers to non-smokers was 1.5, and the weighted average using that weighing scheme is 1.6. So, this would be what we might call our sex adjusted relative risk of disease and smoking.

There are better ways to do this, to take such a weighted average. Instead of weighting by the sample size, we might be weight by the standard error of the relative risk estimates, or the log relative risk estimates, and do the weighted average on the log scale, and then exponentiate the results. Bu,t this just illustrates the idea of stratifying by the potential confounder, estimating the stata, stratam specific estimates of the outcome exposure association, and then taking away the average across the strata. We could also compute confidence intervals for these adjusted measures, but we're going to save that until we get very shortly to multiple regression. In this case, our outcome of disease was binary. So, we could do a multiple logistic regression to relate the binary outcome to smoking. And, we'll see that this multiple logistic can be used to adjust that association for other predictors. And, this will be a very useful tool for performing adjustments, so that we don't have to do this stratifying, averaging approach.

We've looked at the relationship between arm circumference and height in the sample Nepalese children, less than a year old, and we found that behind the scenes, not surprisingly, weight was related to both the outcome of arm circumference and the predictor of height. So, how could we go about adjusting this? Well, weight is a little trickier as a potential confounder, because it's measured on a continuum. And, the adjusted results we presented in a previous lecture set were adjusted for weight as a continuous variable. But, here's the idea.

We could, behind the scenes, look at the relationship. It's as if we were looking at the relationship between arm circumference and height, for very tight weight ranges. So, this is, you know, weight equal, or between 10 to 11 kilograms. And, we do the same thing for arm circumference and height. So, this is just trying to explain it conceptually for the next weight group, between 11 and 12 kilograms, et cetera. And, we could keep doing this for small ranges of weight across the entire range of the sample. And, what we'd get is we could estimate separate associations of the relationship between arm circumference, get separate regression lines of arm circumference and height for each of these small weight strata. And, then what the algorithm for presenting an over all weight adjusted association, weight adjusted association, association between,

Arm circumference and height would involve taking the estimated regression slopes for the regressions of arm circumference and height on each of these weight groups, so I'll call that Beta One, weight one, Beta One, weight two. And, we'd have multiple weight groups. And, we can take a weighted average of these regression slopes to get an overall adjusted regression slope of arm circumference and height, after adjusting for weight. This is just the idea behind the process. This would not be feasible to do by hand. And, this is where multiple regression, again, will be our, our saving grace, because it will do this effortlessly and easily by the computer.

So, in summary, the adjusted association between an outcome y and a predictor x, adjusted for a a single potential cofounder Z, can estimate, be estimated by stratifying on Z which is actually hard to operationalize if Z is continuous.

When Z is binary, like sex, or multi-categorical, the stratum are well defined. But, if Z is continuous, we couldn't do this easily by hand, unless we designated small ranges to enumerate the strata. Then when, within each strata of Z, we would estimate the Y X relationship, in whatever metric we were using, whether it be a relative risk or a linear regression slope, etc. >> and then we could take some sort of weighted average of all the Z strata level specific Y/X associations. So, we, across all stratum take our measure of association, and average those across all the strata based on either the sample size of each strata, or the standard error of the estimate in each strata, etc. But, some weighting process that would give more weight to those estimates informed by more information, or more precise information. This idea can be generalized, estimating the adjusted association between Y and X, adjusted for multiple potential confounders, Z1, Z2...up to Z, however many potential confounders we have, but obviously, that would be nearly impossible to do by hand.

Breaking our data up into groups stratified on multiple potential confounders by all possible combinations of these multiple confounder values. So this is where multiple regression methods are going to make the adjustment process easy and straightforward. But, at their core, this is essentially what they're doing behind the scenes with some assumptions built in. They're separating the data out into different strata based on the adjustment variables, and then estimating the outcome exposure association, and, then averaging across all those levels. And, multiple regression can estimate multiple adjustment associations in the context of one model. So, when we'll see shortly when we expand what we did in the first three lectures one of the natural ways to interpret the results will be in terms of adjusted estimates

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.