0:00
Hello. Anderson Smith here. We are talking about ways that we can evaluate
a causal claim that there is a difference in
the independent variable that is causing a difference in the dependent variable.
And often we have to use statistics to do that and we use
inferential statistics to tell
us whether the manipulation of
the independent variable(s) significantly affect(s) the dependent variable(s).
So the statistical procedures that we can use is
if we get a difference whether that difference is a real difference or not.
So we have a difference between the effect of
one level of the independent variable and the other level of the independent variable.
So we want to know is there really a difference between those two means,
and is that difference significant?
That is the difference are not the same, they are unequal.
But if we see that difference it might be
that in fact the difference that we are observing is simply due to chance.
That is the two means are really the same
significantly not found a difference, they are the same.
And that is often called the null hypothesis.
So if we want to find out if it is significant or is due to chance.
We often use a standard of 5%,
that is if it is significant,
the probability of getting that difference is < 5%.
And if it is due to chance the probability of getting that difference is > 5%.
That is an arbitrary standard,
that it is a standard that we use.
So let's say we have a difference where the difference is > 5%.
That could mean there are no difference,
that is a null hypothesis. They are equal.
The difference that we are observing is not a significant difference,
that could mean that but it could also mean that there is a difference,
but in fact the means are not the
same but we can't find it because we don't have a powerful enough experiment.
This is why testing in null hypothesis is very
difficult because we have null hypothesis because there really is no difference.
Others are different so we just can't detect it because have a sloppy experiment or
we don't have enough power in a statistical procedure to really see the difference.
Power: Probability of detecting a difference really depends upon sort of
the expected relationship whether we expect to have
a large difference or a small difference and that means the sample size.
We expect to see a small difference.
We got to have a much larger sample to see the difference.
If we have a big difference then we can have a smaller sample to see the difference.
So the number of people that we test really determine the power that we have in
detecting the differences of getting that p < 5%.
So we have two kinds of error;
we have type 2 error,
which is the probability of accepting the null hypothesis,
that is there is no difference when the difference is really present.
And remember, the power that we expect to have is
usually the standard is just like the p < 5%.
The power needs to be about 0.80 or higher,
and again that is a statistical finding which tells us the sample size
and expected difference to get
what the power of the experiment is detecting the difference.
So to achieve a power of 80, for example,
if we expected the difference to be large then we need about 52 subjects.
If we expect it to be a sort of a medium difference,
we need 128 subjects and if we really think it is a small difference
but a meaningful one then we have to have 788 subjects.
When we look it up in tables from the textbook,
in fact these sort of numbers actually come from one of your readings.
It is just another way of saying the p that we get,
is it powerful enough to really detect the difference?
That is we are trying to do in making causal claims.
We want to say we believe that
this independent verbal manipulation is causing this difference of
the dependent variable and that result really depends upon the power of the test.
Let's talk about that significance test,
the test that tells us whether the p < 5% or > 5%.
Is it significant or is it due to chance?
Well, if we only have two means that we are comparing,
a very simple experiment just one variable and
two means we can use this statistic called the t-test which simply
tells us whether or not the difference between the two means based on
the variability found in the experiment are significantly different or not,
less than p < 0.5.
If we have more than two means so we are testing maybe
more than one variable or more than one mean then would use Analysis of Variance.
Again it is a test that we use when we have more than
just one comparison to tell us whether there
is a difference among any of the means in the experiment.
The t Statistic: When we have several comparisons of two means,
that is the independent sample t-test
and we are just comparing one mean to the other
and we have to also understand is it one-tail test or two-tail test?
We have two means.
Now, is it possible that this mean is higher than the other mean?
And that is the only direction that can occur.
Or is it possible that the test can be higher or be lower?
That is a two-tail test.
So we want to know whether it is significantly lower or significantly
higher in the same test that is two-tail.
If we know the difference has to be in one way,
that is a one-tail test.
So, what do we need to do to have this kind of test?
We need two sample means,
we need to have an estimates of variance or two standard deviations and we need to
have a sample size which is determined by how big do we expect the difference to be.
So a t-test is really this formula.
It's the difference between the two means divided by the variance which is the square of
the standard deviation divided by
the number of subjects we have in
the square root because it is variance standard deviation,
a squared, so we take a square root of that and that gives us the t which is the test.
And when you look that up in the statistics table and
tell us whether that t with that degrees of
freedom that in size really is powerful enough to give us a significant difference.
Let's use an example. This is an example again
from the texts how psychologists want to know if
calorie estimation for people that eat
junk food is different from people to eat non-junk food, healthy food.
Is the calorie estimation different?
Are we good at estimating the number of calories?
So here is the results from the junk food eaters, eight of them,
they guessed that the food that is in front of me in a picture is 180 calories,
220 calories, 150, 85, 200.
So different estimates, different guesstimates I should say
made by the people that eat junk food of
the number of calories in that food group that is shown in the picture.
The non-junk food eaters have this estimates,
we only have seven of those.
So within the t-tests,
we take the difference of
the two means and we divide it by the standard deviation squared,
divided by the n,
n of 8, n of 7.
Then we look up the t-test that result is 2.42,
then we can look that up in a table that you can
find on the internet a t-test calculator or you can find it in
any statistics textbook and you look
up t-score of 2.42 that is the t score of that comparison.
Degrees of freedom it is 1 minus the degrees of freedom so (8-1) + (7-1)=13.
And we know it is a two-tail test because we believe that
the junk food is a going to estimate more calories or estimate
less calories and we get a p=0.0306 and that is less than 0.05.
So our different between the two means is significant, there is a difference,
they are not equal and we can talk about that
there is a causal relationship between junk food eaters,
independent variable and non-junk eaters and
their estimation of calories of dependent variable.
If we have more than two means,
we have to use an Analysis of Variance which is a much more complicated statistic.
And in fact, even though it is used when we had more than two means it
really is based on lots of different designs.
What we are testing is that the null hypothesis where all the means are
equal and expectation is going to be differences in the means,
that all the means are not equal.
And that test unlike the t-test is called The F distribution so is an F-test.
So the statistic we are looking for is an F. Then we can look at F value given
the degrees of freedom up in a table or on the internet and come up with a p,
the F statistics that is significant at p < 0.05.
As I said there are different tests for different kinds of designs.
But basically what you are doing in
all Analysis of Variance designs is you are
getting a computation of the variance between groups,
the variability between the groups that you are studying and the variance within groups
that is within a single group and then you
comparing that as a ratio and that gives you the F statistic.
I am not going into details about the statistics,
this is not a statistics course but I want to point
out that they are statistics that tell us that there are differences,
influential statistics there are differences among the means.
And just like t,
F tables can be used to determine the p for that particular experiment.
As I mentioned, there are many different Analysis of Variance
used to repeated measures design,
used for factorial designs but they all have
that same common way of looking at significant differences.
Analysis of Variance will only tell us that the means are different.
It won't tell us what means are different,
what other means and so we have to use
something called multiple comparison tests or post-hoc tests,
which didn't tell us any individual mean Analysis of Variance is different from
any other individual mean and that is often what we
have to do when we have multiple variables for example.
So, statistics used to analyze the design.
The data analysis are used to come up with
whether or not the inferences that we are making
about relationships between variables are significant.
And then the interpretations and conclusions are based on this analysis,
A is better than B,
that means we have done a test to show that in fact
the probability of getting A better than B in our experiment is < 5%.
In the next fed back into the research literature which then allows us to
increase body of knowledge about the relationship that we are interested in research.