So the way that people do this in advance in sort of experimental design is

with power.

So basically, the power is the probability that

if there's a real effect in the data set then you'll be able to detect it.

So, it depends on a few different things, it depends on the sample size,

it depends on how different the means are between the two groups,

like we saw the red and the blue lines.

And it depends how variable they are, so

we saw that there was variation around the means in both the X and the Y data sets.

So this is actually code from the R statistical programming language.

You don't have to worry about the code in this lecture but

you can just see that for example, if we want to do a t-test,

comparing the two groups which is a certain kind of statistical test.

The probability that we'll detect an effect of size 5,

that's what we have delta there with a variability of 10, the standard deviat,

standard deviation of 10 in each group and 10 samples is 18%.

So it's not very likely that even if there's an effect we'll detect it but

what you can do is you could also go back and make the calculations,

say, as is customary, we want 80% power.

In other words, we want an 80% chance of detecting an effect if it's really there.

So for a effect size of 5 and a standard deviation of 10,

you could see that we could calc back out, how many samples that we need to collect?

Here, in this case by doing the calculation,

we see we need 64 samples from each groups

in order to have an 80% chance of detecting its particular effects on us.

But similarly, you can do that calculation by saying, how many do you need to have

for one group if you're only going to be doing, or for each group,

if you're only going to be doing a test in one direction or the other?

So suppose, I know that the effect size will always be expression levels will be

higher in the cancer samples than the control samples.

Then it's possible to actually create, less, less samples and still

get the same power because you actually have a little bit more information.

Later classes and statistical classes will talk more about power and

how you calculate it.

But the basic idea is to keep in mind that you, the power is actually a curve.

It's never just one number even though you might hear 80% thrown around quite a bit

when talking about power, the idea is that there is a curve.

So when there's no, in this plot,

I'm showing on the X axis, all the different potential sizes of an effect.

So it could be 0, that's the center of the plot or it could be very high or

very low and then on the Y axis is power for different sample sizes.

Black lines correspond to sample sizes of 5, blue line corresponds to sample sizes

of 10 and red lines correspond to sample size of 20.

So as you can see that,

as you move out from the center of the plot, the power goes up.

So, the bigger the effect,

the easier it is to detect, also as the sample size go up, goes up,

you see from the black, to the blue, to the red curve, you get more power as well.

So as you vary these different parameters, you get different power and so

a power calculation is a hypothetical calculation based on what you

think the effect size might be and what sample size you can get.

And so, it's important to pay attention before performing a study

as to the power that you might have so you don't run the study.

And end up at the end of the day without any potential difference

even when there might have been one there.