So, let's just talk briefly about

the general idea behind the computations for adjusted estimates.

So, hopefully this will give you some insights

conceptually as to how adjusted estimates are computed.

Let's go back to our fictitious study results where we looked at

the sample of 210 and 240 non-smokers and mix of males and females.

When we combined everyone together and just compared all smokers to all non-smokers,

the relative risk of disease for smokers to non-smokers was 0.93.

We saw that this overall crude or unadjusted relative risk

between smoking and disease was nearly one it's equal to 0.93 and additionally,

the risk difference here was negative 0.02.

If we looked at the sex specific results,

they showed similar positive associations between smoking and disease in

males with a relative risk of 1.8 and a risk difference of 0.08,

and in females with a relative risk of 1.5 and a risk difference of 0.16.

So, we might ask, how can we combine

these relative risk estimates for example in the males and females?

They're similar in terms of their direction and

magnitude but both have removed the overall distortion that was caused when

mixing males and females together so that imbalance of

a disproportionate number of males in

the smoking group which distorted the overall crude association.

So, how could we combine this 1.8 and 1.5 into

a more precise adjusted estimate of

the association between smoking and disease adjusted for sex?

So, here's how it works in sexually,

what you could do conceptually is stratify when the confounder Z is categorical.

So, what we did there was, once you've stratified,

you compute the association between the outcome and exposure

separately for each level or stratum of the confounder Z.

So, in this fictitious example,

we computed separate estimates of

the disease smoking relationship separately for males and females and then what you could

do to combine those stratum-specific estimates of

the outcome exposure relationship is take some weighted average of them.

So, for example, to get

a sex adjusted relative risk for the smoking disease relationship,

we could compute a weighted relative risk,

weight the sex-specific relative risk by the numbers of males and females in our study.

So, something like this,

we would take the number of males times

the relative risk estimate of disease for male smokers to

male non-smokers plus the number of females times

the relative risk estimate of disease for female smokers to female nonsmoker,

sum those two products up and divide by

the sum of the number of males and number females.

So, if we did that for this example,

there are 200 males the relative risk of disease for

male smokers to non-smokers was 200 times

1.8 plus the 250 females times

the relative risk of disease for

female smokers to non-smokers was 1.5 that would be our numerator,

denominator would be 200 plus 250.

So, this weighted average turns out to be a sex adjusted relative risk estimate of 1.6.

There actually better ways to do this weighted average.

First, we should actually do the computations on

the natural log scale weight by the standard error of the log relative risks.

But this just illustrates the concept,

we don't want anybody to do this by hand.

We can also compute confidence intervals for these adjusted measures of

association and they're a function of the uncertainty in each of the inputs.

So, the confidence interval for

the sex adjusted relative risk of smoking for disease would be a function of

the uncertainty of the relative risk for smokers to

non-smokers in males and the relative risk of smokers to non-smokers in females.

But instead of doing this by hand,

we'll see very shortly multiple regression, in this case,

since or outcome disease is binary,

there will be a very useful tool for performing the adjustment.

I'll just give you a hint as to where we're going.

What we've done traditionally with

the simple logistic regression will we do something like this;

the log odds of disease equals sum

intercept plus some slope times an indicator of smoking status.

What we would get so one for smokers and a zero for nonsmokers.

This would give us the log odds ratio,

the unadjusted log odds ratio of disease for smokers compared to non-smokers.

If we exponentiate that,

we get not the relative risk but the crude odds ratio.

If we want to adjust for sex differences between the smoking and non-smoking groups,

all we need to do is fit another regression model;

log odds of disease,

where we include smoking again but also include a predictor per sec,

so this might be a one for males and 0 for females.

So, the interpretation and moving on into full detail,

this is a heads up of this comparison that has been made by this slope for smoking now,

is the relationship between disease and smoking adjusted for sex.

So, the algorithm will do that seamlessly and

we won't have to take weighted averages by hand, et cetera.

Additionally, if we want to adjust for other factors

that may be related to smoking and disease,

we can add them in sequentially to the model.

Similarly, let's look at the arm circumference,

height and weight associations in the 150 Nepalese children sample.

Here's the unadjusted scatter plot with the regression line relating arm circumference to

height and the slope for height in this unadjusted comparison was 0.16.

But what if we wanted to adjust this association and get

a slope for height that was adjusted for weight differences between the height groups?

Well, what we could do ultimately is generate another regression line where the slope

shows the adjusted relationship between arm circumference and height adjusted for weight.

We saw that the slope of height after adjusting for weight was negative, negative 0.16.

So, how does the algorithm or how would one

adjust for continuous measure in this case weight?

What does the linear regression algorithm do in this situation?

Weight doesn't quite do this but you can think of it doing this conceptually.

It breaks the data into individual weight groups,

so you can think of it as splitting

the range of weights and the data into small intervals,

very small intervals down ranges.

In each of the specific weight strata,

a simple linear regression is fit for

the data points in that strata to relate arm circumference to height,

only for the values in that particular weight strata.

Then the overall height adjusted association between arm circumference and height,

is a weighted average of the arm circumference height slopes

across each of the individual narrow weight strata.

So, the linear regression does this more smoothly and then

eloquently than I've explained it here but that is conceptually what it's doing.

So, in summary, the adjusted association between an outcome y

and a predictor X adjusted for a single potential confounder,

we'll call it Z, can be estimated by stratifying on Z.

That's hard, although we could do it if C is continuous.

If we were doing this by hand and we didn't have the regression algorithm to help us,

if we wanted to adjust for weight and it was measured on a continuum,

we could do something like break weight into quartiles and then we'd have

four strata of weight where groups within the quartiles had similar weights at least.

We could estimate the Y/X outcome exposure relationship

for each stratum of Z and then take

a weighted estimate of all Z strata-specific Y/X associations.