Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

From the course by Johns Hopkins University

Mathematical Biostatistics Boot Camp 2

41 ratings

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

From the lesson

Hypothesis Testing

In this module, you'll get an introduction to hypothesis testing, a core concept in statistics. We'll cover hypothesis testing for basic one and two group settings as well as power. After you've watched the videos and tried the homework, take a stab at the quiz.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

Okay, so now is a good time to talk about confidence interval.

Â confidence intervals and their connection to hypothesis testing since

Â we've talked about hypothesis testing and two sided hypothesis tests.

Â So now, consider testing H naught mu over, equal to

Â some value, versus H a mu different from some value.

Â Like we just discussed in the previous slide with respect to the

Â Respiratory Disturbance Index.

Â And then take the set of all possible

Â values for which you fail to reject H naught.

Â Now think about this set.

Â These are, these are in, in some level the values of mu.

Â that are supportable as null hypotheses. So they're,

Â they're reasonable values of mu. So it isn't that big of a stretch to,

Â to acknowledge, or to to guess that this will form a confidence interval for mu.

Â What's interesting is that it forms exactly a

Â 1 minus alpha percent confidence interval for, for mu.

Â So, if you have a 5% type one error rate for a

Â set of tests, then you have a 95% confidence interval, which is nice.

Â and then, the same works in reverse, which

Â is probably even the more useful direction for us.

Â If 95% intervals say contains mu not, then we fail to reject mu naught, right?

Â Which makes sense, right?

Â The, the value of mu naught was supported as

Â a potential value when we created the confidence interval.

Â So it would make sense that we'd elect to, to, to fail to reject H naught.

Â Then, then, to conclude that mu was different from mu naught.

Â And, in the next slide, we'll go through the argument.

Â Okay, let's just briefly go through this argument.

Â so consider that we do not reject H naught, for a two-sided test

Â of mu equal to mu naught, versus mu different from mu naught.

Â If our test statistic, absolute value x bar

Â minus mu naught divided by the standard error,

Â s over square root n.

Â If that's less than the t quantile, the valued it

Â at 1 minus and n minus 1 degrees of freedom.

Â And 1 minus alpha over two quantile.

Â So, you remember, when we reject, if it's bigger than that particular t quantile.

Â Okay, so we can just, the s over square root of n is positive

Â so we can just move it over to the right-hand side and we have a

Â [UNKNOWN]

Â to the inequality. And this inequality here absolute value of

Â the x bar minus mu naught less or equal to t, this t quantile times s over square

Â root of n. And that's exactly equivalent to the

Â statement below that the mu naught lines is in between x bar minus t times

Â the square root of standard error. And so and so that is exactly

Â the same as saying mu naught lies inside the confidence interval.

Â So this is equivalent to saying, if mu naught lies inside

Â the confidence interval, then we would have failed to reject H naught.

Â And then you can obviously reverse this argument to get the other direction.

Â So that proves the statements we made from the previous slide.

Â And it

Â shows that this, sort of inherent duality

Â between confidence intervals and two-sided hypothesis tests.

Â This has several uses.

Â First of all, it, it'll, it tells you if you create say, a 95%

Â confidence interval, it conveys a little bit

Â more information than the result of hypothesis test.

Â Because A, you can do the hypothesis test.

Â but B it also gives you a sense of what values of mu are sort of, well supported.

Â and this helps to combat things like

Â the difference between scientific significance and statistical significance.

Â Where statistical significance you know, if you, you know, if we have a giant,

Â giant sample size, and our x bar, is, from our respiratory disturbance index example.

Â If our x bar is 30.01,

Â well that isn't very different from 30,

Â and it may not be scientifically meaningful.

Â The confidence interval would both show us what values of

Â mu, what range of values of mu are, are estimated.

Â And that, that in fact, our interval's quite close

Â to 30, even if 30 isn't right in it.

Â plus we can actually mechanically perform the hypothesis test.

Â So that's why, I think, in general, people have a preference.

Â If, when you can, if you could report a confidence

Â interval rather than, simply the result of a hypothesis test.

Â Okay, let's introduce the concept of P-values.

Â So, when we had a sample size 100, and we were doing the z test.

Â we rejected the one sided hypothesis test when alpha was 0.05.

Â would we have rejected if alpha was 1%, or how bout 0.001 percent?

Â Okay, now of course at some point, we're going to get to the

Â point where the z quantile is larger than our observed test statistic.

Â And that will correspond to a specific alpha level.

Â And that value, the smallest alpha for

Â which you still reject a null hypothesis is

Â called the attained the, that's exactly called the attained significance level.

Â and this is equivalent, but philosophically I guess a

Â little bit different from an entity called the P-value.

Â The P-value, on the other hand, which, which again is the

Â same number but conceptually is a different thing in my opinion.

Â The P-value is the probability under the

Â null hypothesis of obtaining evidence, as or more

Â extreme than would be observed by chance alone.

Â where here chance is governed by the null distribution.

Â the null probability distribution.

Â So here's the less, if P-values were

Â invented by the the great statistician Fisher.

Â And here's the logic by, so the attained

Â significance level has a kind of a easy logic

Â there to it, right?

Â You know, it would just say, you know, why don't we

Â just report the smallest significance level for which you fail to reject.

Â Then if I give someone that number, then they'll know,

Â whatever their alpha level is, whether or not they reject.

Â If their alpha happens to be bigger than the smallest significant level.

Â Then, then they would reject if it's smaller than the

Â smallest significance level, then it, then they would fail to reject.

Â So, they obtained significant,

Â thinking about the P-values and the obtained significance level, at

Â that level it's merely just a convenient thing to report.

Â Because regardless of a person's alpha level, then they can compare

Â it to the P-value and tell you whether or not they reject.

Â The P-value, on the other hand, has an, has a more interesting, interpretation.

Â Because the idea is that it, it is at some

Â level, people, claim that it's, it's, it's a measure of evidence.

Â so, here, here's the logic. If the P-value is small, then

Â either, the null hypothesis is true and we've observed something

Â that's very unlikely given that the null hypothesis is true.

Â Or that the null hypothesis is false.

Â And that's why Fisher introduced the P-value.

Â The P-value he thought was a, the, was a convenient calibrated

Â entity, because it was a probability. That would tell you, sort of, in

Â a sense, whether or not getting a test statistic as or more

Â extreme than you observed, wa, was rare under the null hypothesis.

Â It is, it is, and if it was rare, then

Â that cast some doubt on the veracity of the null hypothesis.

Â And, you know, I think this use of the, the, the P-value as a, as a measure

Â of evidence is a little bit more controversial of an entity.

Â The P, attained significance level which is again

Â the same exact number, it's just a different interpretation.

Â The attained significance level is maybe a a less controversial entity.

Â It's merely just telling you it's it's a mere mathematical answer to the question.

Â What's the smallest alpha level for which I would have rejected the null hypothesis?

Â Okay, so, let's calculate our P value from

Â our from our example. lets do it for our T statistic.

Â so if we're thinking that the sample size is 16, our test statistic was 0.8.

Â What's the probability of getting a T statistic as or larger than 0.8?

Â Well, this is, pt, which stands

Â for T probability, 0.8, 15 degrees of freedom.

Â And this lower tail equals false just means that

Â it, we want above 0.8, not below 0.8.

Â So, this works out to be, 22%, of course, it's larger than, say, 5%, which we knew.

Â Because we fail to reject the null hypothesis, if the

Â P-value's larger than alpha, you're going to fail to

Â reject, if it's smaller than alpha, you will reject.

Â Okay, so the probability of seeing evidence as or more extreme

Â than actually obtained, that probability calculated

Â under the null hypothesis is 22%.

Â Okay, let's just show the calc, the, the, computing of the P-value, in this case.

Â So our test statistic, x bar minus 30 over

Â s over square root 16, worked out to be 0.8.

Â So the probability, let's just see, the probability of, of being 0.8

Â or larger from the T-distribution was 15 degrees of freedom works out

Â to be 22%. So here, let's draw a picture, so there's

Â our T-distribution. and then, so right

Â here is, where our test statistic is 0.8,

Â the probability of lying above it, from the

Â t-distribution is 22%. And so you can see this area right here.

Â obviously 0.8 then is below the fifth, the fifth

Â upper quantile, which would be up here somewhere.

Â and so we would know that we would reject, but we also know that because 22%

Â is larger than 5% it just given our P-value that we would fail to reject.

Â Okay, let's do some notes. So, by reporting a P-value,

Â the, the reader or whomever can form the hypothesis

Â test of whatever alpha level he or she chooses to.

Â that's because the P-value is mathematically

Â equivalent to the obtained significance level.

Â So, if the P-value is less than alpha, you reject the null hypothesis.

Â If the P level is bigger than alpha, you fail to reject.

Â so for two sided hypothesis test,

Â my recommended P-value calculation is just going to

Â be the, double or smaller the, the, to one sided

Â hypothesis value P-values. That's an easy procedure.

Â It's generally right.

Â so don't just report P-values right away, just give

Â confidence intervals, it's a little hard when the problem

Â is harder than one dimensional. but if it's a one dimensional problem,

Â then you have no excuse to give a, give a confidence interval not just a P-value.

Â Okay, some final thoughts about the P-value.

Â You know, one of the problems of P-value is they only consider significance,

Â unlike confidence interval which so, P-value, it's difficult

Â to distinguish practical significance from statistical significance.

Â It's by itself, it's just too, a little bit too crude of a summary of your data.

Â It's a tremendously useful quantity, but it needs to be used with care.

Â there's quite a bit of work on, on

Â the philosophy of whether whether p values measure evidence.

Â And,

Â and the argument against the P-value is that absolute measures of the rareness of

Â an event are not necessarily good measures of evidence for or against a hypothesis.

Â And that's the the intrinsic, philosophy bind that

Â the P-value is that, it's measure of evidence virtue

Â of being the measure of the rareness of the

Â hypo, of the null hypothesis in a certain sense.

Â And certainly

Â P-values can become somewhat abusively used and frequently misinterpreted.

Â And, and that's one of the main issues, is

Â that the actual interpretation of the P-value is hard.

Â It's the probability of obtaining test statistics as a more extreme in

Â favor of the alternative, where the

Â calculation is done under the null hypothesis.

Â That's the actual interpretation of a P-value.

Â and then people try to interpret

Â P-values in all sorts of different ways, because

Â that interpretation, is a little, sounds a little complicated.

Â But that is the actual interpretation.

Â So the P-value's a confusing quantity.

Â Sort of, get used to just regurgitating the whole definition correctly.

Â So that you don't take short cuts, and give incorrect definitions.

Â because, you know, let me also say, people love to complain about p values as well.

Â so, you, you, you should get your P-value interpretations, correct.

Â Coursera provides universal access to the worldâ€™s best education,
partnering with top universities and organizations to offer courses online.