A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

Loading...

From the course by Johns Hopkins University

Statistical Reasoning for Public Health 2: Regression Methods

86 ratings

A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction.

From the lesson

Introduction and Module 1A: Simple Regression Methods

In this module, a unified structure for simple regression models will be presented, followed by detailed treatises and examples of both simple linear and logistic models.

- John McGready, PhD, MSAssociate Scientist, Biostatistics

Bloomberg School of Public Health

So just allow me to jump in quickly and

say, welcome back to Statistical Reasoning.

This is part two, and in this first lecture set of three lectures we're

going to discuss the basis for something that will permeate the entire course,

something called Regression Methods.

So onward and upward, I look forward to working with you this term and

let's have some fun with regression.

Greetings, and welcome to the first lecture in Statistical Reasoning Two.

In this first lecture set, we're going to give an overview of an umbrella method

called Simple Regression, just a generalized method for

relating an outcome of any type to a single predictor via linear equation.

And then for the remaining lecture sections,

we'll talk about a specific type of simple regression called Linear Regression.

So in this set of lectures, we will develop a framework in the first section,

section A, for linear, simple logistic, and

simple Cox Proportional Hazards Regression.

The remaining sections will focus on simple linear regression, which is

a general framework for estimating the mean of a continuous outcome

using a single predictor, which may be binary, categorical, or continuous.

So, let's first give an overview of simple regression regardless of

the type of regression.

So hopefully in this section, if you're rusty,

you'll be able to re-familiarize yourself with the properties of a linear equation.

And then identify the group comparisons being made by

a simple regression coefficient regardless of the outcome variable type,

whether it be continuous, binary, or time-to-event.

And by simple regression coefficient here,

I'm referring to a specific quality called the Regression Slope.

So, what we're going to be doing to start will provide an extension of

the framework we set up for estimation and testing from the first term.

All methods we covered in term one can be done as simple regression models.

But the beauty of regression is that it'll cover situations we were not able to do

with the models we looked at.

And regression models can be extended to allow for

analyses beyond the scope of what we did in the first term,

which was comparing outcomes across two or more levels of a single predictor.

We'll be able to extend regression, and we'll get into this in subsequent lectures

to include multiple predictors which will allow us not only to estimate adjusted

relationships, but also do better prediction of outcomes using more inputs.

So just to link what we've done in the past term to what we'll be doing here with

the simple regressions is, the comparing means we did between two or

more groups using the t-test or ANOVA approaches, and then the corresponding

ways to estimate competence intervals for the mean differences.

This can all be done via a simple linear regression model.

Comparing proportions between two or more groups, which we did with

the Chi-square approach, can be done via a simple logistic regression model.

And comparing incidence rates between two or more groups, which we did with

a log rank test, can be done by a simple Cox Proportional Hazards regression model.

So the basic structure of these regression models will be a linear equation.

We'll have something that we're trying to estimate, some function of our outcome.

Depending on what type it is.

And we're going to model that as a linear function of our predictor.

So the generic equation of a line with a single predictor

is an intercept value plus a slope, times the predictor value, the x value.

You may have fond memories from secondary school where you saw this written in

the form of mx plus b for example where b was what was called the Intercept.

And m was the slope of the variable x.

We're going to change the notation slightly.

In statistics we like to use Greek letters specifically betas, and we'll rewrite this

as beta not or beta zero plus beta one times our predictor, x1.

So beta not here plays the role of the intercept.

And beta one plays the role of m in the previous formula.

It's the slope.

And x1 represents our predictor of interest.

The left hand side, the thing I left purposely empty as a box,

depends on what variable the outcome type is.

So if our outcome,

we're trying to the model the outcome of a continuous variable.

What we can do with linear regression is estimate the mean for

continuous variable as a function, a linear function of our predictor X.

So we can estimate different mean estimates for different values of X.

For binary outcomes, we start with zero one variables.

And you may recall we summarize them as proportions, but

we could also write them as odds.

And what can actually, or what we actually need to do to get an appropriate linear

function is take an interesting transformation of the binary outcome and

turn it into the log odds.

And we'll detail this in lecture two, but this is just a heads up.

Where we'll write the log odds of our outcome as a linear function of our

predictor X.

And then for time-to-event outcomes, this black.

This empty box on the left-hand side is, will relate the log of

our hazard rate of our event to a linear function of our predictor of interest.

The right-hand side, for simple progression.

Simple just means that we have one predictor of interest.

This includes the predictor of interest, x1.

And we'll see in a moment, that sometimes we have one predictor of interest, but

may need more than one x to represent it.

And this predictor, x1, or some form of it, can be binary,

categorical or it can be allowed to be continuous.

Something we haven't experienced thus far in the course.

Thus allowing the grouping factor to be continuous.

So let's just, let's kind of get a sense of what the resulting equation gives us

under a couple different scenarios.

We are not going to specify what our left-hand side is yet.

We'll get into the specifics for each type of regression.

But suppose I have a binary predictor such as sex.

So, it only takes on two values so this equation is only predicting two

outcomes one for those who are coded one and one for those who are coded zero.

So, if I'm looking at a group of females from my sample,where x1 equals one.

This equation is going to predict whatever function of my outcome,

depending on the outcome type is going to be equal to the intercept plus beta 1,

the slope times 1.

So together the entire full prediction for females will be

the sum of these two qualities for males, where x1 equal 0.

The predicted values equal the intercept plus the slope times zero, so

the slope disappears.

So this intercept functionally gives us the estimated value of the outcome for

the males in this sample in order to get the same value.

But for females, we take the value for males the intercept and

add this slope of beta one.

So what beta one really quantifies is the difference in the outcome.

For females for those with x1 equals one compared to those with x1 equals 0.

So this slope compares those two groups.

Suppose we had a predictor where it was nominal categorical.

Took on more than two values and

they were not ordinal in nature, so it was more than binary.

So suppose we were getting data from three different clinics in

the United Sates; from Johns Hopkins, from the University of Maryland, and

from the University of Michigan.

And we wanted to see how our outcome differed between these

three different clinic sites.

How can we handle this in a regression framework and

represent uniquely these three groups?

Well, we only have one predictor which is clinic.

But we're going to need more than one x to do this.

And, the approach to do this generically, and again, we'll do this in detail or

specific examples.

But I just want to give you a sort of starting point,

is we designate one of the three groups as our reference category,

the thing we'll compare the other groups to.

And then we create binary xs for each of the other groups.

And this is pretty much, when we hit a binary indicator in the previous example.

We had females and males.

We only needed one x, but we designated males to be the reference group.

They were the group that the other group, females, would be compared to.

So, what we're going to do here is, if we designate Hopkins, the reference group,

then we're going to make one indicator which is a one.

If the subject is from University of Maryland,.

And x1 will be a zero if the subject

is not from the University of Maryland.

Similarly, we'll do another predictor x2.

And I won't write this out fully but you can.

Which will be a one if the subject is from the University of Michigan,

a zero if they are not from the University of Michigan.

So how is this going to play out?

The resulting equation will look like this.

And think about this, there's only three groups we're estimating an outcome for.

But we get this linear equation that perfectly uniquely identifies each of

the three groups in terms of what we estimate.

So let's start with the University of Michigan.

Suppose we have a subject from the University of Michigan.

Well, his value of x2 is equal to 1.

His value as x1 is equal to 0, since he or she is not from University of Maryland.

Their predicted outcome is the intercept plus beta1 times 0.

because x1 is 0.

Plus beta2 times x1.

So, they pick up this beta2.

If we are looking for subjects from the University of Maryland,

it is up here with Michigan, sorry for the abbreviation, Maryland,

we have got x 2 is equal to 0, because they are not from Michigan,

x 1 is equal to 1, and so this group's predicted value of the outcome

is the intercept plus the slope for the indicator of University of Maryland.

If they're from Hopkins, the reference group, then both of the x's

are equal to zero, and the predicted outcome is simply the intercept.

So this intercept estimates the outcome for the reference group.

And then the slope for X one compares those who are coded one for

X one relative to the reference group if you take the difference between Maryland

and Hopkins you get this slope for Maryland.

Those whose X one is one.

Similarly, the slope for X two compares those subjects from Michigan.

It's the difference between them relative to those subjects from Hopkins.

So they get the estimate for the subjects from Michigan based on starting with

the Hopkins estimate the intercept we'd have to add this slope for being a 1 on

that variable, which corresponds to being from University of Michigan.

Here's the beauty of regression, and

this is where we're start getting into new territory, is that it allows for

continuous predictors, unlike the methods we learned in Statistical Reasoning 1.

All the comparisons we did between groups and

Statistical Reasoning 1 were for two or more categories.

But, sometimes, it's efficient to handle measurements that are made continuously,

age, height, et cetera, without having to arbitrarily categorize them to

become a predictor, and we'll see that this is under certain assumptions.

If the outcome predictor association when the predictor is

continuous is well characterized by a line.

So we'll look at that assumption for the different methods we'll explore.

But suppose for the moment that our x1 variable is age in years.

Well, we might want to relate some outcome to age and

years, treating age and years as continuous.

So let's think about what we get here.

Well, what we're describing in space is in, this is our outcome called

the Box because we haven't specified what it is, scaling here the vertical axis.

And x here, our predictor age and years, is on the horizontal axis.

And this is the line formed by the equation where we say the outcome

equals beta-knot, the intercept plus beta-1 times x1 or x1 is years.

So, what does the intercept represent?

Well, this is the value of our outcome,

the left hand side, when x is 0, at 0 years.

It is the point on the graph where this line crosses the vertical axis,

at the coordinate x equals 0,

and y the vertical component is equal to the intercept.

The slope, what the slope measures is the change in

the left-hand side corresponding to a unit increase in our predictor x1.

So for every 1 unit increase in x1, our y value changes by beta one.

In this particular instance, beta one is positive, so Y increases by beta one but

where beta one negative, it would indicate that the vertical value, the outcome value

which I'm generically calling Y decreases for each one unit increase in X1.

So the slope beta one is the change on

the left-hand side corresponding to a unit increase in X1.

Another way to think about this is beta one quantifies the difference in whatever

we're predicting on the left-hand side for X1 plus one compared to x1.

In other words, the difference between two groups who differ by one unit and x.

And this change or difference is the same across the entire line.

This one number pretty much summarizes any comparison of the outcome we make for

our range of ages for whatever generic x we have.

So this change in left hand side quantifies and

applies anywhere on this line.

Now we'll see that, you know, this line theoretically goes on forever into space,

but we'll see in practice we'll have to limit our

interpretations of it to the x range we have in our data.

But anywhere in that x range, this slope quantifies the difference in the vertical,

or predicted value for a one unit difference in our predictor X

regardless of the two values of X we're comparing the different by one unit.

Alright.

All information about the difference in our outcome are the left-hand side for

two differing values of x one is contained in that slope.

So, for example if I was comparing the outcome for two values of X1, for example

h, which were three units apart, two groups who differed by three years of age.

Their difference in the outcome would be three times theta one.

Because theta one represents the difference in the outcome for

one unit difference in x1 or age.

And the two groups are comparing different by three units.

So this is what that would look like on our linear scale.

[noise] So again, this slope contains all the information

when x is continuous to make any comparisons between

any groups who differ by specified values in x's.

So we'll see some very specific examples and put some context to this,

interpreting it in scientific realm, starting in the next section.

But, to close out, regression is a general set of methods for

relating a function of an outcome variable to a predictor by a linear equation.

And regardless of what our outcome looks like, in terms

of what the intercept and slope represent, in terms of which groups they estimate for

and the group differences, that will be consistent,

whether we're doing linear logistic or Cox Proportional Hazards Regression.

So in the next section,

we'll jump right into some concrete examples using simple linear regression.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.