Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

From the course by Johns Hopkins University

Mathematical Biostatistics Boot Camp 2

39 ratings

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

From the lesson

Two Binomials

In this module we'll be covering some methods for looking at two binomials. This includes the odds ratio, relative risk and risk difference. We'll discussing mostly confidence intervals in this module and will develop the delta method, the tool used to create these confidence intervals. After you've watched the videos and tried the homework, take a crack at the quiz!

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

Okay.

So, for the relative risk, our PA hat worked out to be 11 over 20 which is.

55.

PB hat worked out to be 5 over 20 which is.

25 so the relative risk is. 55 over.

25 for just, I always think its a good

habit to write that we're comparing A over B.

In the relative risk, just to remind ourselves what order

we divide it in and we're set to be 2.2.

quite a, quite a large difference quite a large

indication of a difference, but is it actually statistically significant?

Is it, is it something that would be of interest that, in

the sense that it could be more than just a chance association?

Okay.

So, let's calculate the standard error of the log relative risk.

Here, I plug into the formula, I get. 44.

The interval for the log relative risk is then log 2.2, log of our relative

risk plus or minus 1.96. The standard normal, 97.

This quantile, times.

44. That gives us negative.

07 to 1.65.

We're interested on the log scale in

comparing wether or not this interval contains zero

and then if we were to exponentiate it

back, the interval for the relative risk is.

93 to 5.21. which again shows an

indication that drug A has a greater propensity for side effects than drug B.

But isn't exactly significant because this interval contains one

and on the log scale the interval contains zero.

Of course, because you know, log is a monotonic function if it contains

0 on the log scale, it will contain 1 on the natural scale and vice versa.

So the, you know, whether you check

for 0 on the log scale or 1 on

the unlog scale will always yield the identical answer.

Okay, let's go over the odds ratio.

The odds ratio for A divided by B.

Well, let's just do this cross product formula.

11 times 15 divided by 9, times 5. That gives us 3.67.

The standard error then is square root one

over the addition of one over the cell counts.

That works out to be 0.68, so the

interval for the log ulti ratio is log 3.67 plus or minus 1.96 times 0.68.

That works out to be negative 0.4 to 2.64.

The interval for the odds ratio is 0.96 to 14.01.

Now so this is on the natural scale. Okay.

And then, just to finish off our thinking about this problem consider the

risk difference of As well, so the risk

difference would be subtract, here you know, I

like, I think it's a good idea to put that you're subtracting A minus B there.

PA hat minus PB hat.

That works out to be 0.30.

The standard error of the risk difference is given in this formula here.

It works out to be 0.15 and the interval is, is again given here.

in this, this issues with

the risk difference formula as well. And we covered

some of that before. And, and showed that you can, maybe,

improve on it's performance a little bit by adding in adding one to every cell

for example. and that was covered when we when

we in the last lecture I believe and then the final thing I wanted

to show were were two plots. just to discuss, just to finish

some thoughts from the last lecture where we talked about Bayesian Analysis.

So, if you recall.

If you look back to the last lecture, what we did was, we

postulated a prior for P 1 and P 2 that were independent beta priors.

We found that if we did that then we got independent

beta posteriors for P 1 and P 2. We saw that an inefficient

way to explore the posterior was to do a simulation from it.

And that would allow us to calculate things like

the posterior mean, the posterior variance and so on.

So, if you go back to that, to the lecture

you'll, you'll hopefully be reminded of, of exactly what we did.

when you

conduct these posterior simulations, you get a PA and a PB.

P 1 and P 2.

you get lots and lots of pairs of

those things that represent draws from the posterior distribution.

And it's convenient to do that because it, it's

a, just a convenient numerical way to investigate the posterior.

If you take the, the arithmetic mean of those posterior draws, right?

you would get the posterior mean for PA, and the posterior mean for PA.

You could get the posterior mean then, for the risk difference.

The posterior mean for the odds ratio by taking every pair, PA and PB.

and, calculating the odds ratio and, and so on.

Well, here, what I did is I calculated

for every PAPB pair simulated from the posterior.

I then simply plotted a histogram of them.

And this is just a approximation of the posterior.

Where the accuracy of the approximation only

depends on how many Monte Carlo samples.

I elected to to for the computer to generate.

So, if I let the computer run for a really long

time I get a near perfect representation of the post error.

So, this in this case is the posterior for the risk ratio.

So again, I took PA divided

it by PB right and then for all of

those pairs I I, I plotted a, a density estimate.

And there's the density, this gives you a lot of information

about where the evidence concerning the risk ratio the relative risk lies.

Here I drew the blue lines for the 95% credible interval where there's 2.5%.

Above either below the lower blue line and, and above the upper

blue line and I put a reference line at one, at, ze, at 1 in this case.

And as we saw when we calculated the

[UNKNOWN]

interval the lower end point is just below one.

so that, that if you were sort of interested in

something like significance, you, you, you wouldn't, you wouldn't get significance.

But, but I think, you know, the posterior

displays a lot of information and, and, does a

lot, gives you a lot more than just a

confidence interval or the result of a hypothesis test.

Or, just even worse given

the asterisks on how significant the P value is, which some software does.

okay.

and then on the next slide what I'm showing

here is the same calculation done for the odds ratio.

So, this is just to give you a flavor of

what this sort of output or the desired output from simple

Bayesian Analysis would be. The posterior is the quantity that you

would use to, to, investigate the the relative proportions

either through the odds ratio or through the relative risk.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.