0:00
Now that you're familiar with the T distribution,
in this video we're going to illustrate methods for
doing inference with distribution with real data from a public study.
The study that we're using is called playing a computer game
during lunch effects fullness, memory for lunch and later snack intake.
In this study, the researchers evaluated the relationship between being distracted
and recall of food consumed and snacking, with the idea that if you're distracted
while you're eating, you may not remember what you eat.
They also hypothesized that failure to recall food consumed might lead to
increased snacking later on.
The sample for this study consisted of 44 volunteer patients, half men, half women.
These 44 patients were randomized into two groups.
One group was asked to play solitaire on the computer while eating and
was asked to win as many games as possible, and
the other group was asked to eat lunch without any distractions, focusing on what
they're eating and thinking about the taste of the food and that they're eating.
Both groups were provided the same amount of lunch and then after lunch
while they were waiting around, they were offered biscuits to snack on.
1:14
The researchers measured how many biscuits does it subjects to consumed?
This summary statistics suggest that the destruct dating groups
snack more after lunch with an average of 52.1 grams of
biscuits compared to 27.1 grams for the other group.
We're also given the center deviations for both groups,
as well as the sample sizes which we know are 22.
The goal in this video is to estimate the average snacking level for
distracted eaters.
1:43
Estimating a population parameter entails a confident centerfold,
which is always of the form point estimate plus or minus a margin of error.
The margin of error is a critical value times the standard error, and since we're
doing inference on the mean, we're going to use the t star for our critical value.
This standard error of x bar is s over the square root of n, so
the only new item here is the critical t score.
To figure out this value, we need to determine the degrees of
freedom associated with the t distribution that we need to use for this data.
When working with data from only one sample,
an estimating a single mean, the degrees of freedom is n-1.
We lose one degree of freedom because we're estimating the standard error
of the sample mean using the sample standard deviation.
Putting all of this together, the confidence interval for
a single population mean can be estimated using x bar plus or
minus t star with n minus 1 degrees of freedom times s over the square root of n.
There are variety of ways of finding the critical t score.
The first is using the t table.
First calculate the degrees of freedom which in this case is 22- 1, 21.
Then locate that row on the t table and find the corresponding tail area for
the desired confidence level.
At this point it's a good idea to draw the normal curve and
mark the confidence level in the center of the curve.
If we have 95% in the center, then we have 5% left for the two tails.
We locate this value as the two tail area on the table and grab the critical
t score at the intersection of the row and the column we marked, which is 2.08.
Alternatively, we can find the critical t score using R.
Once again, let's draw the normal curve and
mark our confidence level in the center of the curve.
3:41
Use this value as the percentile we're interested in in the qt function,
along with the appropriate degrees of freedom and
we arrive roughly at the same critical t score, 2.08, off only by rounding.
Note that we always use a positive critical value.
Remember that the confidence level is always the middle symmetric
area in the center of the curve and once you mark that, you can
easily determine the tail areas and use that value to find your critical t score.
We finally have all of our building blocks and we can now construct the confidence
interval for the average snacking level of distracted eaters.
The formula is x bar plus or minus t star times s over square with even.
That is 52.1 plus or minus 2.08 times 45 over the square root of 22 which yields
a standard error of approximately 9.62 and a margin of error of approximately 20.
Resulting in a confidence interval, ranging from 32.1 to 72.1 grams.
Obviously you can punch all of this into our calculator and
get this result in one step, but it's always nice to take things step
by step the first time you're solving a particular type of problem.
5:04
Next, suppose the suggested serving of these biscuits is 30 grams.
Do these data provide convincing evidence that the amount of snacks consumed by
distracted eaters post lunch is different than the suggested serving size?
Once again, our givens include the sample mean, the standard deviation,
the sample size.
And let's also note the standard error we calculated earlier, 9.62.
The null hypothesis is that the population mean mu is equal to 30 grams and
the alternative is that mu is not equal to 30,
since we're interested in a difference from this value in either direction.
The test statistic the t score, can be calculated as the sample mean of
52.1 minus the null value of 30 divided by the standard error 9.62 which yields 2.3.
The degrees of freedom n- 1 is 21.
To find the p value we draw the normal curve and
mark our observed test statistic 2.3 as well as -2.3 since we have
a two sided alternative hypothesis and shade both tails.
The pt function in R gives a tail area under the distribution
6:16
for any specified cutoff value.
So pt of 2.3 with 21 degrees of freedom will give you a tally area of 0.984.
What we really need is the compliment of this value times 2 for
the two tails, which approximately comes out to be 0.0318.
We could also get it this using a t table.
Step one, determine the degrees of freedom, which we know by now is 21.
We focus on that row of the table then we locate the calculated t score 2.3 a and
the corresponding degrees are freedom row, a quick aside here.
We calculated a positive t score, but
not that there are no negative t scores on this table.
So even if the calculated t score was negative we could still use this table.
But what would we simply work with the absolute value of the calculated t score.
Next, we grab the one or
two tale p-value from top of the table depending on our alternative hypothesis.
In this case, we had a two sided alternative hypothesis so
our p-value is going to be somewhere between 0.02 or 0.05.
While this answer is less precise than the exact value R gives,
we still have sufficient information on the p-value
to compare it to the significance levels of the test and make a decision.
7:35
So to recap, we focus on one group from the study that we were introduced to.
To distract the leaders, we were provided some example statistics on this group.
We calculated a 95% confidence in a full ranging from 32.1 to 72.1 grams.
And then we did the hypothesis test where we compared how much these people ate to
the suggested serving size.
We found a p-value of approximately 3.18%,
which is less than the standard significance level of 5%.
So we rejected the null hypothesis, and
concluded that these data provide convincing evidence that distracted
eaters consume an amount different than the suggested serving size.
Since both the estimation and
the testing were done using the same underlying inferential framework and
the same distribution, the results should agree with each other.
The null hypothesis sets mu equal to 30, and we rejected this null hypothesis.
Similarly, the confidence interval does not contain the value of 30.
Therefore, these two methods agree.
8:39
One important task that we skipped over initially is checking the conditions.
We have a random assignment and
22 is less than 10% of all distracted eaters we can assume.
Therefore, we are going to assume that one distracted eater in the sample
is independent of another.
8:57
We're not given a visualization of the distribution of biscuit consumption to
check the sample size skew condition.
However given the sample statistics, we can kind of sketch it out.
The sample mean is 52 and
there's a natural boundary at 0 since one cannot eat less than 0 grams of biscuit.
So the 68, 95, 99.7 rule is just not going to apply here.
Because if we were go more than one standard deviation below the mean,
we're going to hit that natural boundary of 0 grams.
Therefore that data are likely right skipped.
The t distribution is pretty robustious units but ideally we would like to see
a visualization of this distribution and the size this sample size queue
distribution accordingly, especially given the low sample size.