0:00
We're going to wrap up our discussion on working with one unknown
population proportion by talking about doing a hypothesis test for a proportion.
Let's go through the steps for doing a hypothesis test,
these are going to look very similar to what we've seen before.
First, we set our hypothesis.
In this case, our unknown population parameter is denoted at with p,
as opposed to mu for means, so our null hypothesis sets that p
equal to some null value, and our alternative hypothesis says that p
can be less than, greater than, or not equal to that null value.
Next, we calculate our point estimate.
In this case, that's the sample proportion, a p-hat.
Then we check our conditions.
The first condition is independence.
We want to make sure that the
sampled observations are independent of each other.
We, this could either be ensured
through random sampling or random assignment.
Depending on whether you are doing an observational study or an experiment.
And if you're sampling with a replacement, we want the
sample size to be less than 10% of the population.
In terms of sample size ad skew, we want to make sure we
have at least ten expected successes and ten expected failures in our sample.
Note that here I've used p,
instead of p hat, and that is because in a hypothesis
test, we have to assume that the null hypothesis is true.
If you think about the definition of a p value, it says, probability
of observed or mare, more extreme outcome, if the null hypothesis is true.
So, when going through the conditions, or any other portion of the
hypothesis test, we must assume that the null is true, and therefore,
wherever we see a p, we plug in whatever the null
value for that p is, that's set forth in the null hypothesis.
So, we could read this as not ten observed successes and ten observed failures,
but instead as ten expected successes and ten expected failures.
Next step is to draw the sampling distribution.
Remember, we always,
always, always want to draw our curve before we calculate our p
value and we want to shade where the p value belongs to.
Either is it in one tail and if so, is it the upper tail or
the lower tail or is it a two tail test and we calculate our test statistic.
The test statistic is always of the form
observed minus null divided by the standard error.
That's observed sample proportion p hat minus
the null value p that comes from the
null hypothesis divided by the standard error, and
we calculate that standard error as the square root of p times 1 minus p over n.
Note again that I've said p and not p-hat,
because we are again assuming that the null hypothesis
is true and therefore we are using what the
null hypothesis has set forth as our true population parameter.
We don't know if that's the case, but we must assume
that the null is true as we proceed through the hypothesis test.
Lastly, we make a decision and interpret it in context of the research question.
If the p value is less than our significance level, we reject the
null hypothesis and decide that the data
provide convincing evidence for the alternative hypothesis.
If, in fact the p value
is greater than our significance level, we fail to reject the null hypothesis
and conclude that the data do not
provide convincing evidence for the alternative hypothesis.
3:33
So, just to clarify this discussion about when do we use p, and when do we use
p-hat, the moral of the story is, we
use the sample proportion when there's nothing else known.
And we use the population proportion, or at least
the null hypothesized value of the
population proportion, when we're doing a hypothesis
test, which dictates that we must assume that the null hypothesis is true.
So, if I want to check the success-failure condition for our confidence
interval, I would use the
observed proportions, the observed sample proportions.
If, on the other hand, I'm checking the
success-failure condition for a hypothesis test, I use the
expected counts and plug in the p that comes from my null hypothesis.
4:47
poll found that 60% of 1,983
randomly sampled American adults believe in evolution.
Does this provide convincing evidence that
majority of Americans believe in evolution?
And when say majority, what we mean is more than 50%.
So if the question is, is the true proportion
of Americans who believe in evolution greater than 50%,
then our alternative hypothesis should state p is greater than 0.5.
And using this, we can easily figure out what the null hypothesis
can be, because we keep the same population perimeter and the same null
value, except we simply set it equal to that number as opposed
to giving a direction one way or another or saying not equal to.
Remember, the null hypothesis always has an equal sign in it,
versus the alternative could have a greater
than, less than or not equal to sign.
Depending on the research question that's being posed.
We are also given that sample proportion is 0.6.
So, in this sample, definitely more than 50%
of the respondents believe in evolution, but we're, what
we are checking to see is, is this
difference that we're observing between the sample proportion and
what we're hypothesizing statistically significant.
In other words, does this particular sample yield
convincing evidence of majority of Americans believing in evolution.
Another input that we're going to need is our sample size and that's 1,983.
Before we move on to actually doing
inference, remember, we must always check our conditions.
The first condition, as usual, is about independence.
1,983 is definitely less than 10% of all Americans and we have a random sample.
And therefore we can assume that whether one American
in the sample believes in evolution is independent of another.
The second condition is about the sample
size or the skew of the sampling distribution.
And remember, we check this
for proportions using the success-failure condition.
And because we're doing a hypothesis test and
because, within the hypothesis test we have to assume that the null is true, we
would use the p as set forth by the null hypothesis in checking this condition.
So, the total number of successes and failures that are expected in this sample
are going to be both 983 times 50% or 0.5.
Which gives us roughly 991.5, which is obviously greater
than ten.
We didn't calculate both of them separately.
Because we are multiplying by the same 0.5.
Either way, whether you're calculating expected successes or expected failures.
7:33
Then, since the success-failure condition is met.
We can assume a nearly normal sampling distribution for our sample proportion.
Now that we've checked and got out of the
way our conditions and given a set of hypotheses and
characteristics on the sample, we can finally calculate our p value.
Before we get there, we need a test statistic.
Before we get therei we need to draw the sampling distribution.
So first, let's try to write it out.
p hat is distributed nearly normally according to our conditions.
And according to this central limit theorem.
The center of that distribution should be at the true population parameter.
We don't know the true population parameter.
But, since we are doing a hypothesis test,
we are assuming the known hypothesis to be true.
Therefore, we can plug in the value of the
population parameter that we set forth in our hypothesis.
And assume that that is indeed the true
population parameter for the purpose of this hypothesis test.
The standard error of the distribution can be calculated as
the square root of 0.5 times 0.5 divided by our
sample size, which comes out to be roughly 0.0112.