Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

From the course by Johns Hopkins University

Mathematical Biostatistics Boot Camp 2

52 ratings

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

From the lesson

Hypothesis Testing

In this module, you'll get an introduction to hypothesis testing, a core concept in statistics. We'll cover hypothesis testing for basic one and two group settings as well as power. After you've watched the videos and tried the homework, take a stab at the quiz.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

Okay, so now is a good time to talk about confidence interval.

confidence intervals and their connection to hypothesis testing since

we've talked about hypothesis testing and two sided hypothesis tests.

So now, consider testing H naught mu over, equal to

some value, versus H a mu different from some value.

Like we just discussed in the previous slide with respect to the

Respiratory Disturbance Index.

And then take the set of all possible

values for which you fail to reject H naught.

Now think about this set.

These are, these are in, in some level the values of mu.

that are supportable as null hypotheses. So they're,

they're reasonable values of mu. So it isn't that big of a stretch to,

to acknowledge, or to to guess that this will form a confidence interval for mu.

What's interesting is that it forms exactly a

1 minus alpha percent confidence interval for, for mu.

So, if you have a 5% type one error rate for a

set of tests, then you have a 95% confidence interval, which is nice.

and then, the same works in reverse, which

is probably even the more useful direction for us.

If 95% intervals say contains mu not, then we fail to reject mu naught, right?

Which makes sense, right?

The, the value of mu naught was supported as

a potential value when we created the confidence interval.

So it would make sense that we'd elect to, to, to fail to reject H naught.

Then, then, to conclude that mu was different from mu naught.

And, in the next slide, we'll go through the argument.

Okay, let's just briefly go through this argument.

so consider that we do not reject H naught, for a two-sided test

of mu equal to mu naught, versus mu different from mu naught.

If our test statistic, absolute value x bar

minus mu naught divided by the standard error,

s over square root n.

If that's less than the t quantile, the valued it

at 1 minus and n minus 1 degrees of freedom.

And 1 minus alpha over two quantile.

So, you remember, when we reject, if it's bigger than that particular t quantile.

Okay, so we can just, the s over square root of n is positive

so we can just move it over to the right-hand side and we have a

[UNKNOWN]

to the inequality. And this inequality here absolute value of

the x bar minus mu naught less or equal to t, this t quantile times s over square

root of n. And that's exactly equivalent to the

statement below that the mu naught lines is in between x bar minus t times

the square root of standard error. And so and so that is exactly

the same as saying mu naught lies inside the confidence interval.

So this is equivalent to saying, if mu naught lies inside

the confidence interval, then we would have failed to reject H naught.

And then you can obviously reverse this argument to get the other direction.

So that proves the statements we made from the previous slide.

And it

shows that this, sort of inherent duality

between confidence intervals and two-sided hypothesis tests.

This has several uses.

First of all, it, it'll, it tells you if you create say, a 95%

confidence interval, it conveys a little bit

more information than the result of hypothesis test.

Because A, you can do the hypothesis test.

but B it also gives you a sense of what values of mu are sort of, well supported.

and this helps to combat things like

the difference between scientific significance and statistical significance.

Where statistical significance you know, if you, you know, if we have a giant,

giant sample size, and our x bar, is, from our respiratory disturbance index example.

If our x bar is 30.01,

well that isn't very different from 30,

and it may not be scientifically meaningful.

The confidence interval would both show us what values of

mu, what range of values of mu are, are estimated.

And that, that in fact, our interval's quite close

to 30, even if 30 isn't right in it.

plus we can actually mechanically perform the hypothesis test.

So that's why, I think, in general, people have a preference.

If, when you can, if you could report a confidence

interval rather than, simply the result of a hypothesis test.

Okay, let's introduce the concept of P-values.

So, when we had a sample size 100, and we were doing the z test.

we rejected the one sided hypothesis test when alpha was 0.05.

would we have rejected if alpha was 1%, or how bout 0.001 percent?

Okay, now of course at some point, we're going to get to the

point where the z quantile is larger than our observed test statistic.

And that will correspond to a specific alpha level.

And that value, the smallest alpha for

which you still reject a null hypothesis is

called the attained the, that's exactly called the attained significance level.

and this is equivalent, but philosophically I guess a

little bit different from an entity called the P-value.

The P-value, on the other hand, which, which again is the

same number but conceptually is a different thing in my opinion.

The P-value is the probability under the

null hypothesis of obtaining evidence, as or more

extreme than would be observed by chance alone.

where here chance is governed by the null distribution.

the null probability distribution.

So here's the less, if P-values were

invented by the the great statistician Fisher.

And here's the logic by, so the attained

significance level has a kind of a easy logic

there to it, right?

You know, it would just say, you know, why don't we

just report the smallest significance level for which you fail to reject.

Then if I give someone that number, then they'll know,

whatever their alpha level is, whether or not they reject.

If their alpha happens to be bigger than the smallest significant level.

Then, then they would reject if it's smaller than the

smallest significance level, then it, then they would fail to reject.

So, they obtained significant,

thinking about the P-values and the obtained significance level, at

that level it's merely just a convenient thing to report.

Because regardless of a person's alpha level, then they can compare

it to the P-value and tell you whether or not they reject.

The P-value, on the other hand, has an, has a more interesting, interpretation.

Because the idea is that it, it is at some

level, people, claim that it's, it's, it's a measure of evidence.

so, here, here's the logic. If the P-value is small, then

either, the null hypothesis is true and we've observed something

that's very unlikely given that the null hypothesis is true.

Or that the null hypothesis is false.

And that's why Fisher introduced the P-value.

The P-value he thought was a, the, was a convenient calibrated

entity, because it was a probability. That would tell you, sort of, in

a sense, whether or not getting a test statistic as or more

extreme than you observed, wa, was rare under the null hypothesis.

It is, it is, and if it was rare, then

that cast some doubt on the veracity of the null hypothesis.

And, you know, I think this use of the, the, the P-value as a, as a measure

of evidence is a little bit more controversial of an entity.

The P, attained significance level which is again

the same exact number, it's just a different interpretation.

The attained significance level is maybe a a less controversial entity.

It's merely just telling you it's it's a mere mathematical answer to the question.

What's the smallest alpha level for which I would have rejected the null hypothesis?

Okay, so, let's calculate our P value from

our from our example. lets do it for our T statistic.

so if we're thinking that the sample size is 16, our test statistic was 0.8.

What's the probability of getting a T statistic as or larger than 0.8?

Well, this is, pt, which stands

for T probability, 0.8, 15 degrees of freedom.

And this lower tail equals false just means that

it, we want above 0.8, not below 0.8.

So, this works out to be, 22%, of course, it's larger than, say, 5%, which we knew.

Because we fail to reject the null hypothesis, if the

P-value's larger than alpha, you're going to fail to

reject, if it's smaller than alpha, you will reject.

Okay, so the probability of seeing evidence as or more extreme

than actually obtained, that probability calculated

under the null hypothesis is 22%.

Okay, let's just show the calc, the, the, computing of the P-value, in this case.

So our test statistic, x bar minus 30 over

s over square root 16, worked out to be 0.8.

So the probability, let's just see, the probability of, of being 0.8

or larger from the T-distribution was 15 degrees of freedom works out

to be 22%. So here, let's draw a picture, so there's

our T-distribution. and then, so right

here is, where our test statistic is 0.8,

the probability of lying above it, from the

t-distribution is 22%. And so you can see this area right here.

obviously 0.8 then is below the fifth, the fifth

upper quantile, which would be up here somewhere.

and so we would know that we would reject, but we also know that because 22%

is larger than 5% it just given our P-value that we would fail to reject.

Okay, let's do some notes. So, by reporting a P-value,

the, the reader or whomever can form the hypothesis

test of whatever alpha level he or she chooses to.

that's because the P-value is mathematically

equivalent to the obtained significance level.

So, if the P-value is less than alpha, you reject the null hypothesis.

If the P level is bigger than alpha, you fail to reject.

so for two sided hypothesis test,

my recommended P-value calculation is just going to

be the, double or smaller the, the, to one sided

hypothesis value P-values. That's an easy procedure.

It's generally right.

so don't just report P-values right away, just give

confidence intervals, it's a little hard when the problem

is harder than one dimensional. but if it's a one dimensional problem,

then you have no excuse to give a, give a confidence interval not just a P-value.

Okay, some final thoughts about the P-value.

You know, one of the problems of P-value is they only consider significance,

unlike confidence interval which so, P-value, it's difficult

to distinguish practical significance from statistical significance.

It's by itself, it's just too, a little bit too crude of a summary of your data.

It's a tremendously useful quantity, but it needs to be used with care.

there's quite a bit of work on, on

the philosophy of whether whether p values measure evidence.

And,

and the argument against the P-value is that absolute measures of the rareness of

an event are not necessarily good measures of evidence for or against a hypothesis.

And that's the the intrinsic, philosophy bind that

the P-value is that, it's measure of evidence virtue

of being the measure of the rareness of the

hypo, of the null hypothesis in a certain sense.

And certainly

P-values can become somewhat abusively used and frequently misinterpreted.

And, and that's one of the main issues, is

that the actual interpretation of the P-value is hard.

It's the probability of obtaining test statistics as a more extreme in

favor of the alternative, where the

calculation is done under the null hypothesis.

That's the actual interpretation of a P-value.

and then people try to interpret

P-values in all sorts of different ways, because

that interpretation, is a little, sounds a little complicated.

But that is the actual interpretation.

So the P-value's a confusing quantity.

Sort of, get used to just regurgitating the whole definition correctly.

So that you don't take short cuts, and give incorrect definitions.

because, you know, let me also say, people love to complain about p values as well.

so, you, you, you should get your P-value interpretations, correct.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.