Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

From the course by Johns Hopkins University

Mathematical Biostatistics Boot Camp 2

52 ratings

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

From the lesson

Techniques

This module is a bit of a hodge podge of important techniques. It includes methods for discrete matched pairs data as well as some classical non-parametric methods.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

So, I've been using the word confounder a lot and confounding.

So, let me define it.

so, variables that are correlated both

with the explanatory and response variables

are confounders and they can distort an, an effect, an estimated effect.

so in this case victim's race was correlated with both the

defendants race and whether or not the death penalty was executed.

and this is a,

this is I think, kind of an old school definition of confounding.

There's, there's a modern definition given by causal

inference that causal inference classes that, that I think

you, you, you might be if you are interested

in the field of statistics will be worth learning.

I think ultimately with a, with a

confounder here, we, we haven't really distinguished

between something that's causally related with the race in the death penalty

versus something that has a statistical association with race and death penalty.

In this class, we're mostly going to be talking about, things

that have a statistical association with the explanatory and response variables.

Where there is kind of a plausible causal connection between them.

Okay.

so, you know, again, putting aside the rather

difficult and lengthy discussion of what is a confounder?

how do we select our confounders, this, you know, how do

we, you know that, that discussion, putting it to the side.

Let's assume we have a single confounder, how do we adjust for it?

Well, you know, there's several ways regression is probably the

biggest and most common way to, to, to adjust for confounders.

But, the kind of an old school

way in categorical data analysis.

Is to stratify with a confounder and

then co, combine the straightest specific estimates.

And so requires, this requires appropriate

wei, weighting of the straightest specific estimates,

and we'll talk in a minute about how do you do the appropriate weighting?

And unnecessary stratification has its own set of problems, so

you know again, you know just bringing back this discussion.

The, the solution to the confounding

problem is not just to stratify or adjust for everything in sight.

Right.

That's not the solution, because that has its own host of consequences.

you know, for example in, in any of these Simpson's paradox

examples imagine a giant database.

And you're interested in, say, the death penalty.

and you had a giant database with lots of other other variables.

For sure you could find one variable that reverses the association, but has no

bearing on whether or not a person received the death penalty, right?

So, so, adjusting for that confounder will reverse the association.

But has no real

business for being adjusted for.

And so, it's a hard topic you know admittedly it's

a hard topic of selecting confounders and achi, achieving balancing.

Between the right amount of confounder adjustment and the

with over adjustment, balancing between that and over adjustment.

so let's stipulate for the time being, we

have a confounder and we want to adjust for it.

And what I really want to talk about is the method

for stratifying and then combining stratus with specific estimates.

And then because then we will be able to teach you some nice methodology.

And then, as we take more statistics, courses you'll learn

more about the delicate surgery of, of dealing with statistical confounding.

Okay, so I, here I have aside, but it's an important aside,

suppose you have two scales and what I mean by scales, I mean,

things for weighing objects.

And, let's assume both scales are so

called, unbiased, they both have some variance associated

with them, who weigh the same thing over and over again, you get different answers.

But one has a variance of one pound and the

other has a variance of nine pound, they're both unbiased.

so, confronted with weights from both scales,

would you give both measurements equal credence, so,

let's supposed we weigh an object.

And that our first weight was this variable X1.

And we're going to assume that it's normal.

Mu in the variance of the first scale, sigma 1 squared.

And then X2, because both scales are unbiased, we're going to assume that

it's normal and it has the same population mean, mu, and the same.

different variant sigma 2 squared and

let's assume both sigma 1 and sigma 2 are known,

we want to estimate mu this unknown weight of the objects.

Okay, so, we measured it with one scale with

one precision another scale with another precision, we're assuming

both scales are unbiased and that if we measured

the same object over and over and over again.

Again, the average would be about right. so, If, if we characterize

this in this way.

I'm hoping what everyone can do in the class is set up the likelihood.

Multiply the two.

add the 2 log likelihoods, or multiply the 2 log likelihoods, and take the log.

and then, come up with the fact that the log

likelihood from mu, disregard any terms that, that don't involve mu.

And I'm hoping everyone could come with the fact that the likeli,

the log likelihood for mu looks like this, bottom equation right here.

Okay.

And you know, you can, let's solve for a maximum likelihood estimate

so, the easiest way to do that right now would be to take the derivatives, set the

derivatives equal to zero and you get this answer.

X1 times r1 plus x2 times r2 divided by r1 plus r2, or in other words, x.

Times p plus x2 times 1 minus p, where p is r1

over r2 plus r2 and 1 minus p is r2 over r1

plus r2. And in this case, ri is 1 over sigma

squared sub i and then p is, of course, 1

over r1 divided by r1 plus r2. So, why does this makes sense?

This makes a lot of sense to me now.

but the first time you see it, you might say this makes

no sense but let me describe why this makes a ton of sense?

Okay, so notice what each ri is. It's 1 over the variance.

So,

if let's say sigma one

is huge. In other words that scale

stinks, it has this huge variants, then the weight r1

times x1. The weight given to the measurement from

that scale is very low. And then conversely you know, if, if sigma

2 is very small.

Then, I get r1 which is 1 over sigma squared will be a huge number

and then, x2 is given a gigantic weight and then we divide by r1 plus r2.

So that, so that when we weight these two things.

X1 and x2.

We, we get a convex combination, p times x1 plus 1 minus p times x2.

So, that it's an average.

It's just a weighted average.

Okay?

and by the way you can do this always

if you want to take a generalized form of average right?

You, you know, then you want r1 is the weight for x1, and

r2 is the weight for x2, they have to be positive, of course.

and you want to turn it into an average, then divide the whole by r1 plus r2

and then you'll turn it into p times 1 and 1 minus p times the other.

If r1 equals r2, they'll be the strict arithmetic average of the two numbers, if

r1 is different from r2, it will weigh one of the observations more than the other.

Well, the, the answer in this case is that

we want to weight by the inverse of the variance

giving high variance measurements, low weight and low weight

and low variance measurements high weight, which to me then.

And makes a ton of sense.

Okay. So, any way, the general principle.

Instead of averaging over several unbiased estimates, take an

average weighted according to the inverse of the variances.

And this is so ingrained in statistical

practice now that people do it without thinking.

They don't go through this exercise of deriving maximum likelihood equations.

so, in our case, sigma 1 squared was 9, was 1.

Sigma 2 squared was 9. So, in this case, you can work it out.

P works out to be 0.9.

So, it works out to be 0.9 times

the first measurement plus 0.1 times the second measurement.

The first measurement getting a lot more weight.

Because the scale you know?

Has is, you know? Has 1 9th the variance of the other one.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.