This course covers the design, acquisition, and analysis of Functional Magnetic Resonance Imaging (fMRI) data. A book related to the class can be found here: https://leanpub.com/principlesoffmri

Loading...

From the course by Johns Hopkins University

Principles of fMRI 1

329 ratings

This course covers the design, acquisition, and analysis of Functional Magnetic Resonance Imaging (fMRI) data. A book related to the class can be found here: https://leanpub.com/principlesoffmri

From the lesson

Week 4

The description goes here

- Martin Lindquist, PhD, MScProfessor, Biostatistics

Bloomberg School of Public Health | Johns Hopkins University - Tor WagerPhD

Department of Psychology and Neuroscience, The Institute of Cognitive Science | University of Colorado at Boulder

Hi, in this module we're going to be talking about the multiple comparison

Â problem in FMRI.

Â So, to recap what we talked about a few modules ago, when we want

Â to fit the GLM in order to localize areas that are active in response to a task,

Â we begin by constructing a model for each voxel of the brain.

Â And this is typically done in the massive univariate approach,

Â where every voxel has a separate model.

Â And we usually use a GLM type approach.

Â And here is just a kind of cartoon showing how we can create a Design Matrix for

Â two different conditions, A and B.

Â And then we put this into the GLM model as follows.

Â Now, once we do this and we estimate the parameters of this model,

Â we can perform a statistical test to determine whether or

Â not there's task related activation present in the voxel.

Â So typically we test some hypothesis, c transpose beta is equal to 0, so

Â this is some linear combination of the beta parameters.

Â So for example we might test condition a minus condition b is equal to 0.

Â And in that case we want to check this versus the alternative that they're not

Â equal to 0.

Â And so we do this at every voxel of the brain and then we can summarize

Â the results that say, the subsequent t-statistics that we obtain by performing

Â this hypothesis test in a statistical image such as the one shown here.

Â And so here each voxel now has a value

Â corresponding to the t-statistic of the statistical test at that voxel.

Â Now, the next stage is, that's a nice map and all,

Â but we want to sort of determine which voxels are active or not.

Â And so in that case, we need to find a way to threshold this t-map in order to find

Â significant voxels and get a statistical parametric map, such as the one seen here.

Â Here each significant voxel is color coded according to the size of its p-value.

Â So the question here is, how do we determine this threshold?

Â So, before we start talking about this and the multiple comparison problem that this

Â entails, let's go over some basic nomenclature for hypothesis testing.

Â So, the null hypothesis H nought is a statement of no effect.

Â So, there's typically, we want to test the hypothesis that beta 1- beta 2 = 0.

Â And then we try to see if we can reject this null hypothesis and

Â say that well, they're indeed different from each other.

Â The way we do this is through a test statistic T.

Â And so, the test statistic measures the compatibility between the null hypothesis

Â and the data.

Â So the way we see whether or not they're compatible or

Â not is we calculate something called the P-value.

Â And the P-value's the probability that the test statistic would take a value as or

Â more extreme than that actually observed if H nought is true.

Â So, mathematically, we can write this as the probability that T is bigger than

Â little T, the test statistic is bigger than little T given the null hypothesis.

Â So basically this distribution is showing us what are feasible

Â values that the test statistic can take, if the null hypothesis were indeed true?

Â And if the p-value is small,

Â that says that our test statistic is lying far out on the tails of plausible values.

Â So the smaller the p-value, the less likely that we believe that

Â it arose due to this, that the null hypothesis holds, and

Â in that case we might choose to reject the null hypothesis.

Â Typically, we decide a fixed threshold,

Â which is called the significance level, so we choose a threshold u of alpha

Â which controls the false positive rate at some level alpha.

Â So we basically want to find some threshold u of alpha such that

Â the probability that the test statistic lies above that value

Â is equal to some value alpha, where say 0.05 is often used.

Â So, we want to be able to, we want to control

Â that the probability of making a false positive rate at say 5% in that case.

Â So, whenever we're doing hypothesis testing,

Â we're ultimately making a binary decision.

Â Should we reject a null hypothesis, yes or no?

Â So, when we're making decisions like this,

Â there's two types of errors that we can make.

Â One is called a Type I error.

Â That happens if the null hypothesis is true, but we mistakenly reject it.

Â This is also a called a false positive.

Â So indeed, the null hypothesis is true, but

Â we decide that we should reject the null hypothesis.

Â And this we can control by the significance level alpha.

Â So if we want to guard against the false positives,

Â we can make the alpha level very, very small.

Â That means we need a lot of evidence to reject the null hypothesis.

Â The other thing is a Type II error, which is

Â assumed that now that that null hypothesis is false, but we fail to reject it.

Â This is a false negative.

Â So in this case, we really should be rejecting the null hypothesis but

Â we don't do that because we don't have enough evidence to do so.

Â In that case we get a false negative.

Â And what's most serious between a Type I and

Â Type II error will depend on the situation.

Â The probability that a hypothesis test correctly rejects a false null hypothesis,

Â this is a good thing, this is called the power of the test.

Â So we want a test that's very powerful because if the null hypothesis is false,

Â we want to be able to reject it.

Â So, these are sort of terms that are often used when talking about

Â hypothesis testing and whatnot.

Â So choosing an appropriate threshold is complicated in the situation

Â that we're in in FMRI by the fact that we're dealing with a family of tests.

Â So if more than one hypothesis test is performed at any given time

Â the risk of making at least one Type I error is going to be inflated.

Â It's going to be greater than the alpha level of a single test.

Â So for example, if we control the Type I error rate,

Â let's say 0.05, that's the rate for a single test.

Â But if we perform hundreds of tests,

Â there's a 5% likelihood of making a mistake on each of these tests, and

Â eventually we're going to wind up making a mistake.

Â So the more tests one performs,

Â the greater the likelihood of getting at least one false positive.

Â And so when we're actually performing, say, 100,000 tests, it's very likely that

Â we'll make false positives if we don't make control for this appropriately.

Â So again, which of these 100,000 voxels are significant in this statistical map?

Â Well again, now we've performed 100,000 different hypothesis tests.

Â And if we were to just assume that they were all independent and we

Â could control at the 0.05 level, then we'd actually get 5000 false positive voxels,

Â because once out of every 20 times we would make a mistake.

Â So in this case, we would have 5,000 false positive voxels, and so

Â this could be entire regions of the brain that are deemed active even though they

Â shouldn't have been, and this can be a very serious problem.

Â So choosing a threshold is ultimately a balance between sensitivity, which is

Â the true positive rate, and specificity which is the true negative rate.

Â So, again we looked at this little example in an earlier module, but

Â I think it's worth looking at again.

Â So, for example this statistical map, we could threshold at any given level here.

Â So, here I show five examples with threshold at 1, 2, 3, 4, and 5.

Â And so you see, if you choose a low threshold,

Â then you get a lot of active voxels.

Â So in this case, you're probably finding all the active voxels of the brain.

Â However, you're probably getting a lot of things that shouldn't have been active and

Â declaring them active, so that's no good.

Â On the other hand, if you choose a very stringent threshold, say 5,

Â in this case you're pretty sure that the regions that are active are truly active.

Â But you can't shake the feeling that you've missed a couple of activations.

Â So we have to find some middle ground and we want to do this in a principled way.

Â So how do we choose the threshold to determine which voxels are active and

Â not active in a principled way that we can sort of defend and believe in?

Â So that's what the next couple of modules are about.

Â And so there exists several different ways of quantifying the likelihood of obtaining

Â false positives.

Â One way is to control what's called family-wise error rate.

Â The family-wise error rate is the probability of making any false positives.

Â This provides a very strict control over multiple comparisons.

Â So, we want to guard against making any false positives at all.

Â A little bit more lenient approach which is becoming increasingly popular is what's

Â called the False Discovery Rate or the FDR.

Â And so, the False Discovery Rate controls the proportion of false positives among

Â all rejected tests.

Â And so in the coming modules we'll talk about the family-wise error rate and

Â the false discovery rate in turn.

Â So, that's the end of this module.

Â This was just a brief introduction to the problem at hand,

Â with multiple comparisons.

Â In the next couple of modules, we'll go into detail and talk about methods for

Â controlling the family-wise error rate and the false detection rate.

Â See you then, bye.

Â Coursera provides universal access to the worldâ€™s best education,
partnering with top universities and organizations to offer courses online.