0:02

Probably one of the things you've definitely heard about

Â statistics is the idea of statistical significance or P-values.

Â So this lecture is going to tell you a little bit about

Â some of the high-level thinking about statistical significance.

Â So the basic idea here is we want to know if observed differences that we have in

Â the sample are replicable or more generally what people call real.

Â Now, real is a little bit of a fuzzy concept or what

Â does that mean necessarily? Is it totally clear?

Â But what it's supposed to imply is that there's a difference between

Â the two groups and it's usually in the mean value of the measurements that you're taking.

Â So here is an example. Here are three genes.

Â For each of the genes there is measurements from two groups.

Â There's the red group and the blue groups,

Â there's three dots from each group.

Â On the Y-axis is the log expression values.

Â So for each gene you see the plot of the six data points corresponding with their gene.

Â Gene 1, there's not much difference in the means,

Â and most of that is due to one little outlier that comes out of the blue group.

Â But that suggests that while the means might be different,

Â the variability is also high enough that it's hard to conclude any difference.

Â For Gene 2, you see what would appear to be a pretty clear difference.

Â The three red dots and the three blue dots are

Â tightly clustered and they clearly have different mean levels.

Â On the other hand, Gene 3 is

Â another example where it looks like there might be a difference.

Â So, for example, the red dot seemed to be a little bit higher than the blue dots.

Â They're are also very tightly clustered.

Â But they're not very far apart and

Â the variability isn't very much different than the difference in means.

Â So, how do we distinguish these cases?

Â How do we know when we've observed a difference that

Â appears to be large enough that we would call it a real difference?

Â The most common statistic that people use and the one you've

Â almost certainly heard about is called the t- statistic.

Â The t-statistic actually has

Â a general form that is also widely used in a number of other statistics.

Â So imagine you have measurements that you've taken,

Â and we've labeled them with Y,

Â and we've labeled them with X for the two different groups.

Â Then the t-statistic equals the average of

Â the Y values minus the average of the X values,

Â divided by a measure of variability.

Â So here we estimate how variable the Y values are with S squared of Y,

Â and how variable the X values are with S squared of X.

Â So these are estimates.

Â We can go into more detail about in a statistical class,

Â but for now you can just think of the denominator as scaling

Â the difference between Y and X by the units of variability.

Â So if they're very far apart in

Â variability units then we might believe that it's real and if not then maybe not.

Â So big t-statistics means that it's more likely that there's a difference we think,

Â and small t-statistics means it may be less likely that there's a difference.

Â So, how do we actually quantify what we

Â mean by how statistically significant a result is?

Â The most common approach and probably the most

Â widely used and known statistic ever created is the p-value.

Â So the idea here is suppose that we've calculated

Â a t-statistic for comparing the difference between two groups.

Â Suppose that statistic is equal to two.

Â Is that a big value or a little value?

Â Well, one way that we could figure that

Â out and a way that's commonly used is what's called a permutation test.

Â Basically, you take the group labels that you're using,

Â the values that you're calling X and

Â the values that you're calling Y, and you scramble them up.

Â So some on the X values get like the Y,

Â and some of the Y values get left in the X and you do

Â that randomly and you do it over and over again.

Â Each time you create a random labeling,

Â you recalculate the statistic.

Â What's going on here? We've broken the relationship between the label and the data,

Â because we've randomly scrambled them so we wouldn't expect there to be any association.

Â So then what we can do is we can make a histogram, like we have here,

Â of all the statistics that you get from these random scrambles.

Â You can see where the original statistic lands in that distribution.

Â To calculate a P-value,

Â you can basically sum up how many

Â times the scrambled values were larger than your observed value.

Â Usually, you do this in absolute value.

Â In other words, you don't care whether the statistic is

Â bigger or smaller in absolute value than the value that you got,

Â and so you calculate the average number of

Â times the scrambled statistics are bigger than the observed statistic.

Â This gives you a P-value.

Â The P-value is widely used to calculate statistical significance.

Â It's also widely both interpretive and

Â misinterpreted and it has some properties that are very useful.

Â In general, what you've probably heard is that p-values that are low,

Â so closer to zero are reported statistically significant.

Â The usual cut off is 0.05.

Â This P-value is on a higher or often considered to be less statistically significant.

Â It's important to know what a P-value is and what P-value isn't.

Â In fact, this is the best way to get

Â a statistician's blood pressure up is to misinterpret a p-value.

Â So the p-value is interpreted as the probability of observing

Â a statistic as extreme or more extreme than the one

Â you calculated in the real data if the null hypothesis is true.

Â It seems like a mouthful because it sort of is,

Â it's a bit of a hard concept to think about.

Â It's basically looking to see how many more times in the null data,

Â in the data where we scramble the labels is

Â the statistic big than the one we actually calculated.

Â A few things that p-value is not and will get you in trouble with statisticians

Â almost certainly is that this p-value is not the probability that the null is true.

Â In other words, the probability that there's no difference between the groups.

Â It's not the probability the alternative is true.

Â In other words, it's not the probability that there is

Â a difference and it's not a measure of statistical evidence.

Â If you're using any of these interpretations you're

Â potentially walking into a world of hurt.

Â So you should stick with the very standard,

Â although a little bit wary of what the definition of what a p-value means.

Â So a common mistake is to misinterpret this p-value.

Â So here's an example from the New York Times where they're actually

Â trying to describe what the p-value means and you see a 0.05 there.

Â 0.05 is the common cutoff.

Â If a p-value is less than 0.05, people often call it significant.

Â There is absolutely no reason why 0.05 is the cutoff,

Â other than one time a person asked one of the original developers and users of p-values,

Â "What would be a good cut off?"

Â He said, "I guess 0.05 might be all right." But that's now

Â propagated throughout the entire medical establishment as they define cutoff.

Â In general, what happens is people over interpret or misinterpret

Â p-values and that's what gets us into a lot

Â of trouble with issues with statistical significance,

Â and why you've heard things like maybe most published medical research is false.

Â