Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

Loading...

From the course by Johns Hopkins University

Mathematical Biostatistics Boot Camp 2

34 ratings

Johns Hopkins University

34 ratings

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

From the lesson

Hypothesis Testing

In this module, you'll get an introduction to hypothesis testing, a core concept in statistics. We'll cover hypothesis testing for basic one and two group settings as well as power. After you've watched the videos and tried the homework, take a stab at the quiz.

- Brian Caffo, PhDProfessor, Biostatistics

Bloomberg School of Public Health

Okay, so, we went through some extra steps there. In general, we don't convert our constant that we're interested in, back to the original scale. Right, we just take the standardized mean, and compare it to standard, normal quantile. So in this example. Remember, 32 was our empirical mean. 30 is the mean under the null hypothesis. And the standard error of the mean in this case was 10 divided by square root 100. So the mean expressed in standard devi- standard error units from the hypothesized mean works out to be two, 32 minus 30 divided by the standard error, two. So [INAUDIBLE] our mean is two standard eeror units away from the hypothesized mean. Then we would compare that to the 95th percentile for the standard normal distribution. which is 1.645 so we would directly compare a 2 and 1.645. So that's a, that's just a quicker way to do it. So we can [UNKNOWN] these rules for a normal test for a single mean and do a couple of simple rules. So let's just call this the Z test, because we're talking about testing a single mean, we're either assuming that our data, we're willing to assume that our data is Gaussian, or that our data, or that the central limit theorem is a good enough approximation to apply.

Suppose we want to test the null hypothesis, that mu equals mu naught. H naught isn't, mu equals mu naught, versus one of the three alternatives. The, the, mu being less than mu naught, mu being greater than mu naught, and the middle one here, mu not equal to mu naught. And that one, we'll have to talk a little bit about. In all three cases, we're going to say, well, we're going to reject for H1 say for sample mean, is small enough. Right, and if it's enough below mu naught. We're going to reject in the case of H3, if our sample mean is enough above mu naught. And then in the second case, we're going to just reject if our sample mu is enough different from mu naught, either too large or too small. And again, just like before, the logical way to do this is to express the mean in standard error units, in standardized units. So, when we calculate our test statistic, it's x bar minus the hypothesized mean under the null value, mu naught. Divided by the standard error, s over square root n. that then is a z-score. It is a, it is a sample mean expressed in standard error units.

so if we get, for example three, we would know that the sample mean is three standard deviations above the hypothesized mean. Right? And so we, we observed the sample mean that is 3 standard errors above the hypothesized mean. Which is, which would be unlikely, right? So that maybe casts some doubt on the hypothesized mean. If we observe a sample mean that is say, four standard deviations below the hypothesized mean. Again, that would, that would be evidence in favor of H1. If it was four standard deviations above the hypothesized mean, that would be evidence in favor of H3. And so we can actually force, if we wanted alpha-level error rates, so when I prove this example, alpha was 5%, so alpha, again, remember, is the probability of a type 1 error falsely rejecting the null hypothesis when, in fact, it is true. Then, under H1, bar we would reject if our test statistic was less than negative Z1 minus alpha. So in this case if, you know alpha is 0.05, then we would look up that 95th percentile and then take the negative of it. [INAUDIBLE]. We could equivalently look up the fifth percentile. Because the fifth percentile is the negative of the 95 percentile. But I wanted just to say, use the same quantile every time. So, in that case, it would be, negative 1.645, okay? In h3, then, we're going to do exactly what we did in our example. We would compare it to the positive.

the, the positive upper quantile, Z1 minus alpha. So the 95th percentile of our alpha. If the error rate we want to control for is 5%. And then for the second case, H2. We will reject if our test statistic is either too large, bigger than Z1 minus alpha over 2 or too small, smaller than Z1 minus alpha over 2. And the reason we divide alpha by 2 is because remember we want a 5 per, well alpha percent chance of rejecting the null hypothesis falsely.

so if the null hypothesis is true, we want only a 5 per, alpha percent chance 100 times alpha percent chance of rejecting The null hypothesis. So the way that we're going to do this now is, we're going to say, okay we will divide that probability into half of it being accidentally rejected, too large, and the other half being accidentally rejected, too small. And that seems like a pretty sensible, sensible rule to do. So in the, the, the, the execution of that rule is just to say, well, take my test statistic, if it's negative I throw out the negative sign. If it's positive, I just leave it alone. And then I compare it to the upper quantile from the normal distribution. But instead of looking at the 5% error rate, I look at the 2.5% error rate. If I wanted 10% type 1 error rate then I would look at a 5% then 95th percentile and so on. Okay? So let me just describe briefly more of this two tale test idea. So just to let you know that you know, that alpha over two in this case you know if, if alpha is 0.05 then, you know 1 minus 0.05 divided by 2, that works out to be 0.975 in the 0.975th quantile of the normal distribution works out to be about 1.96. And, and what we're doing let's let me get some x and y values. And then plot the standard normal again, which you can see over there. And then, so what, what we have is we have a, a a test statistic that we calculate. And then we're going to reject if it's too large or too small. So, so take for example, What we're going to do is take 1.96, and that puts 2.5% in the upper tail, and then we're going to take negative 1.96, and that's puts 2.5% in the lower tail. Of course then that puts 95% in the middle. So we're going to reject if our test statistic is above positive 1.96, which has a 2.5% chance under the null hypothesis. Or, if our test statistic is below negative 1.96, which has a 2.5% chance under the null hypothesis. So the union of those two events yields a, 5% chance under the null hypothesis and rejecting whether something is bigger than positive 1.X is the same thing as rejecting if the absolute value is bigger than 1.96. So in the way that we're executing hypothesis testing, we've forced the type 1 error rate to be small, it's, it's usually 5% or so. So if we reject the null hypothesis. You know, either our model is wrong or several other things maybe could have gone wrong, or there's a low probability that we've made an error.

On the other hand, we have not fixed anything to do with the type two error rate, which is usually called beta. so therefore we sent, intend to say Fail to reject H naught, rather than accepting H naught. The kind of, you know, it's, it's, to give you kind of a classical example of this is imagine if you have a very small sample size, and you want to test some scientific hypothesis, and you fail to reject the null hypothesis Well you've controlled, there's a, the, you know, regardless of your sample size you've controlled the type one error rate so that there's only a 5% chance that you will have rejected the null incorrectly. But if the alternative is true, it''s possible that your small sample size [INAUDIBLE]. has, leads to variability in the mean, [INAUDIBLE], right? Because, remember, the standard error of the mean is sigma divided by square root n. So if n is small, our variability of the mean is going to be larger. So if your sample size is small, and you fail to reject h not. it's not fair to say that you should conclude H node if you only have say, a sample of size three then maybe you didn't have a good chance of rejecting the null hypothesis anyway because you didn't collect enough data to really evaluate, evaluate it. So at any rate, that's a bit of novenclature, so that's why we, there's a tendency in, in basically every statistics text book. That teaches hypothesis testing to say, failed or reject H naught, rather than accept H naught. If you want to say that you want to accept H naught, there's an implicit notion that the type two error rate is small, which you know, usually most in problems less is known about of a type two area. And we'll talk, this goes to the subject of things like power which we'll talk about later on in study design.

you know that one of the ways to try and combat this problem, issues like this, is prior to the, conducting the study, to design it in such a way That you would, will have a high probability of rejecting the null hypothesis if, in fact, the alternative is true. And one of the things you have under your control for doing that is to, to create, to have a large sample size. so that's one point. And, so any rate, the, so the tendency is to say fail to reject H naught rather than, than rejecting H naught. You know, I think a classic phrase to the people always say is absence of evidence is not evidence of absence, is another way to put that. so that, that's one point. The other point I'm making in this slide is that statistical significance is not the same thing as scientific significance from, from in, in, in the context of hypothesis testing. So the most common argument in favor of this point, is you know we have these, for most of our hypothesis test procedures we have that, you know sharply specified nulls. You know, H naught, H naught is that mu is exactly 5 or something like that, you know, depending on the problem. and so, it's possible if, let's say, you have an enormous sample size, to get a sample mean that's 5.01. With an incredibly small standard error, because you have an enormous sample size, and still reject the null hypothesis, even though 5.01 isn't in any scientific sense different from five. And so, that's the point that's often made, it's that just because you reject the null hypothesis in the terms of prac, in, in the terms of of executing statistical test that doesn't mean the difference that you've detected is in fact meaningful. now you know I've read an argument by, at one point by a person saying that in some instanes for example when you have randomized comparative trials that [UNKNOWN] hyphothesis are still meaningful and even small deviations are, are important but, you know these are kind of subtle issues. I think the main basic issue that, that sort of at least generally understood is that it's not always the case that statistical significance and scientific significance are the same thing. at least generally understood is that it's not always the case that, that statistical significance and scientific significance are the same thing. So I would like you to at least be aware of that and you can read more about it. But, and then before, in the previous slide we said, well, we'll reject if our test statistic is above this value In the case where we're testing H1, we'll reject if it's either it's, it's absolute value is above this particular value in the case of H2. And then we'll reject if it's below this value in the case of H3. That was our rules we came up with in the previous slide. So in H1, the, the upper normal quantile and above. That's called the rejection region and it, again it's H2 the, the, the, the upper quantile and above or the negative quantile and below. In other words the absolute value being above a large value is the the reject, is the, the rejection region in that case. And in the third case The normal [INAUDIBLE] down below is the rejection region. So just the collection of values of test statistics for which you reject the null hypothesis is called a rejection region. Just a bit of nomenclature there.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.