0:01

In the last section we argued that a good basic coding

Â model for many neural systems is a combination of a linear filter,

Â or feature, that extracts some component from the stimulus, and a

Â nonlinear input-output function that maps the

Â filtered stimulus onto the firing rate.

Â Our goal in this section is to understand

Â how to find the components of such a model.

Â You'll be doing this for yourself in the homework.

Â We'll then go on to think about how to modify

Â this model to incorporate other important neuronal properties.

Â 0:31

Let's step back to the original problem, which is to build a model like this.

Â To build this general model our problem is dimensionality.

Â Let's caste our minds back to the case of the movie we showed the retina.

Â We can define a movie in terms of the intensity of

Â three colors in every pixel in, say, a one megapixel image.

Â And to capture any time dependence, we'll also need to keep

Â enough frames of the movie to go back for maybe a second.

Â So each example of a stimulus is given by 3,000 times

Â maybe 100 time points, or in the order of 300,000 values.

Â That's just one stimulus.

Â To sample the distribution of possible stimuli, when each is

Â specified by hundreds of thousands of values is just impossible.

Â It would be impossible to fill up that response distribution,

Â even if our stimulus was just 100 dimensions.

Â The amount of data needed is unmanageable.

Â So we need a strategy to find a way to pull out one or two or a

Â few meaningful components in that image, so that we

Â have any hope of even computing this response function.

Â So to proceed at all we need to find the feature that drives the neuron.

Â To do this, we'll sample the responses of this

Â system to many stimuli but not to build the complete

Â model, just enough so we can learn what it is that really drives the cell.

Â That will let us go from a model that depends on arbitrary characteristics

Â of the input to the one that depends only on the key characteristics.

Â 1:57

So, we're going to start with a very high dimensional description.

Â Let's say, a time bearing wave form or an image.

Â And pick out a small set of relevant dimensions, that's our goal.

Â So how do we how do we think

Â about our arbitrary stimulus as a high dimensional vector?

Â 2:15

So we start with our s(t).

Â What we're going to do is discretize it, so we take, time t1.

Â We take

Â the value of the stimulus at that time, we'll call it s1, time t2, and we, we take

Â the value of the stimulus at that time, and

Â we plot these two points in this 2-dimensional space.

Â 2:35

As we keep taking more and more time points, that gives us more

Â and more axes in this diagram in which we're now plotting that stimulus.

Â So this is s(t), plotted as

Â the components of its representation at these different time points.

Â 3:07

One common and useful method to use is Gaussian white noise.

Â Gaussian white noise is a randomly varying input, which is generated

Â by choosing a new Gaussian random number at each time step.

Â In practice, the time step sets a

Â cut-off on the highest frequency that's represented

Â in the signal.

Â White noise, therefore, contains a very broad spectrum of

Â frequencies, and in fact, depending on how the noise

Â is smoothed in practical applications, almost all frequencies that

Â are there are present in the signals with equal power.

Â Here's an example of a white noise input that's been smoothed a little bit.

Â You'll be using an example of white noise in your problem sets.

Â 3:47

Now each chunk of white noise, let's say a hundred

Â time units long, can be plotted in a hundred dimensional space.

Â The axes I've drawn here might describe the value

Â at time t1, the value at time t2, et cetera.

Â As we continue to stimulate with new examples of white

Â noise, the different examples are plotted in these different points.

Â And they start to fill up a distribution.

Â Remember, because each one of these examples

Â is chosen randomly.

Â 4:46

Now, a multidimensional distribution that's Gaussian in

Â all directions is called a multivariate Gaussian.

Â The beauty of such a distribution is that

Â it's Gaussian no matter how you look at it.

Â If we chose to look at the distribution of stimuli projected across

Â some other dimension that's not in our original time points, but maybe

Â some linear combination of them.

Â Let's take a new vector and now project our stimuli onto that new vector.

Â We would find that even along that

Â new vector, the distribution is again Gaussian.

Â 5:23

Now let's take a look at the stimuli that trigger spikes to happen.

Â Here's one and let's say there's a bunch more.

Â You'll notice that there's some structure in this group of points.

Â Ordinarily if I were really plotting an arbitrary

Â choice of three of the hundred possible dimensions.

Â I wouldn't be able to see this.

Â I need to search for the right way to rotate

Â this hundred dimensional cloud, so that I can see that structure.

Â One way to find a good coordinate axis is to take the average of these points.

Â 6:07

So let's imagine we now take this vector through the data.

Â And let's project all these spike-triggering points onto that vector.

Â They're all going to have projections that are large and

Â similar to one another, so this will be

Â the distribution of points projected onto the spike-triggered average.

Â 6:33

So while I wanted to give you a geometrical perspective

Â of what you're doing that might seem a little abstract.

Â Operationally it's quite straightforward and intuitive.

Â Let's say you gave this system a long random white noise stimulus like

Â this one, which is just a scaler quantity that's varying randomly in time.

Â And the system, this neuron spiked during this presentation several times.

Â Here's a spike, here's another spike, here's another spike.

Â 7:00

For every time there's a spike, we look back in time,

Â at the chunk of stimulus preceding that spike, and grab it.

Â Put it down in this list.

Â This will be one example of your spike-triggering stimulus set.

Â 7:24

So what you're doing

Â is approximating whatever is common to all of the stimuli that triggered a spike.

Â So if all goes well, you'll see that

Â this average is much less noisy than the examples.

Â And it's generally quite sensible looking.

Â So what this system apparently likes to see is an

Â input that generally ramps up a bit, and then goes down.

Â 7:52

Here's an example of the same procedure, but when the stimulus

Â is not just a scalar value, but more like an image.

Â Every column here, is an image, with pixels of different colors,

Â maybe one that's been unwrapped into a single vector of values.

Â The spike-triggered average, now average over these chunks of spatiotemporal

Â data, that precede every spike, now has both time,

Â a time dimension, and also a space dimension.

Â 8:25

Now let's go back to only deal with time.

Â So back in the time representation that we

Â introduced before, our spike trigger to averages some vector.

Â We'll take it to be a unit vector. Let's call it f.

Â This is the object of our desire, the single

Â feature that captures a relevent componant of the stimulus.

Â Now, recall the previous section of the lecture.

Â What do we do with this identified feature?

Â We used it as a linear filter.

Â Linear filtering we said is the same as convolution.

Â And it's also the same as projection.

Â Let's take some arbitrary stimulus s, remember we can

Â represent it as a point in this high dimensional space.

Â 9:02

And if we filter it by

Â this spike-triggered average, that's the same as

Â projecting it as a vector onto the

Â spiked-triggered average, which is also a vector.

Â So what does that mean?

Â We have this vector

Â of our stimulus s. To project it onto s, f, means that we

Â take its component that's aligned along the

Â direction of f. This is s.f.

Â 9:46

Okay.

Â Now we've seen that a good way to find a feature that drives the neural

Â system is to stimulate with white noise and

Â use reverse correlation to compute the spike-triggered average.

Â This is a good approximation to our feature, f1.

Â Now, how do we proceed to compute the input/output

Â function of the system with respect to this feature?

Â 10:07

Remember that we're trying to find the probability of

Â a spike, given the stimulus, but where the stimulus, here,

Â is now replaced only by the component of the

Â stimulus that's extracted by the linear filter that we've identified.

Â We can find this relationship from quantities we can measure in data

Â by rewriting it using an identity

Â about conditional distributions known as Bayes' rule.

Â We can rewrite this

Â probility spike, given s1, in terms of the probability of s1 given a spike.

Â Probability there's a spike divided by the probability of s1.

Â 10:41

Let's see what this means.

Â We have this now in terms of two distributions, here the prior, again, now

Â the prior only with respect to that

Â one variable that we've extracted from the stimulus.

Â And here what we called the spike conditional, conditional

Â 11:07

We run a long stimulus and collect a bunch of spikes.

Â We project the stimulus onto our feature, f1, extracting component s1.

Â Here's s1 and here are the spikes.

Â We use this long stimulus run to make a histogram of s1 here.

Â 11:47

Hopefully, that distribution is different from the prior.

Â We take their ratio, as we see here, and

Â scale it by the overall rate as probability of spike.

Â 12:09

Let's say that our neuron fires at random times, so when we build a histogram of our

Â stimulus, that is the prior distribution, and a histogram

Â of the special stimuli that trigger spikes, we'll find

Â that those stimuli are actually not so special since

Â the stimulus points associated with spike times suggest a

Â random sampling from the Gaussian prior, their distribution is

Â just the same as the distribution of the prior.

Â This could mean either that the stimulus

Â had nothing to do with the firing of the

Â neuron in the first place or else that we chose

Â the wrong component and we filtered out whatever it was

Â about the stimilus that this neuron is actually responding to.

Â 12:54

What we want to see is a nice

Â difference between the prior and the spike conditional distribution,

Â which is going to result in an

Â input/output curve that has structure that's interesting.

Â So here our input/output curve tells us that

Â the neuron, as we saw previously, tends to fire,

Â so it has, predicts a high firing rate, when

Â the projections onto our, our identified feature are large.

Â This is success.

Â 13:20

So now let's go back to the basic coding

Â model that we developed and think about what's missing here.

Â We managed to get our dimensionality all the way down to 1.

Â Was that necessarily a good idea? Let's relax that a bit.

Â And add back something potentially important.

Â The possibility of sensitivity to multiple features.

Â Now the need for this should be intuitive.

Â We base all our decisions on many input

Â features and here's one of the most important ones.

Â Unless you have a brain full of Pamela Anderson neurons, though

Â personally I think I only have a Pamela Anderson neuron, neuron.

Â Generally we choose a partner or a friend on the basis

Â of many characteristics, flexibility, generosity,

Â the ability to cook, political affinity.

Â There are also many characteristics that enter into the

Â description of a person that may not matter to you

Â at all for their suitability as a friend, maybe

Â their eye color or their height or their typing speed.

Â 14:24

To express this in terms of the models that we've

Â been looking at so far, what we mean is that

Â now we want to consider that there's not just one but

Â several filters, each selecting a different component of the input.

Â The non-linear response function now combines the responses

Â of those different components in maybe non-trivial ways.

Â Let's take a simple auditory example.

Â Let's imagine we have a core detecting neuron.

Â So f1, the first feature,

Â selects frequency one, f2 selects the second frequency.

Â But only when both frequency one and frequency two are present in

Â the input will we get a large firing rate, given this nonlinearity.

Â 15:12

Let's go back to our picture of the white noise

Â experiment to think how we could find these features in data.

Â So we saw that we could take the

Â average of the points and compute the spike-triggered average.

Â But we can extract more information from that cloud of points.

Â 15:27

One could also, for example, compute the next order moment, or its covariance.

Â To do this, we apply a method something like principal component analysis, or PCA.

Â 15:37

I realize that most of you probably aren't familiar with this technique, and we don't

Â really have time here to build up the tools that we need to derive it properly.

Â So I'll just describe a little bit about what it does.

Â Its job is to find low dimensional structure, in that cloud of points.

Â 15:52

So PCA is a general, famous, and kind of magical

Â tool for discovering low dimensional structure in high dimensional data.

Â Here's an illustration of what it gives you.

Â Lets say you have a cloud of data where each data point

Â has an XYZ co-ordinate and we plot it in this three dimensional space.

Â But

Â 16:13

in fact, unbeknownst to us, all the data actually lie on a two dimensional plane.

Â So if we run PCA on this data

Â we'll discover that there are two-so called principle

Â components and these comopnents correspond to an orthogonal

Â set of vectors that span that two-dimensional cloud.

Â 16:32

So this feat of discovery doesn't look super-impressive when

Â all we're doing is reducing three dimensions to two.

Â We could have just

Â rotated our axes around and noticed that.

Â But what if when we start, as when we do generally, we have hundreds

Â of dimensions and that we're hoping that

Â our data has some low dimensional structure?

Â We'll never find it by plotting one coordinate against another.

Â As those dimensions that are important are some,

Â un, unknown linear combination of the original coordinates.

Â Here we had x, y, and z and our plane

Â is defined by some linear combination of our original axes.

Â 17:05

Generally the dimensions that pick out the relevant structure in the data will be

Â some linear combination of our stimulus coordinates

Â in their original basis, perhaps time or space.

Â For those of you with some linear algebra, PCA gives us a new basis set in which

Â to represent our data; a basis set that

Â generally is a lot smaller than our original representation.

Â So we get a lot of compression. And also, it's a basis set that's well

Â matched to our particular data set, unlike a

Â standard basis set, like a Fourier basis, for example.

Â 17:47

And most faces can be pretty well reconstructed from a small set, maybe

Â seven or eight of principle components, computed from a big bunch of faces.

Â So these are called eigenfaces.

Â If we have a new face that we want to fit

Â with this faces, we can construct that as a linear combination

Â of, of Fred, George, Bob, and Bill.

Â 18:06

So if we can represent any new face in, in

Â terms of sums of these computed eigenfaces, instead of the

Â intensity values of each pixel in the image, we can

Â represent the face using seven or eight numbers instead of hundreds.

Â 18:22

Dimensionality reduction using PCA has a lot

Â of practical uses in neuroscience experiments too.

Â For example it can be used to sort out spike wave forms

Â that were recorded on the same electrode from two or more different neurons.

Â Let's say one neuron would give a spike that

Â has a nice clean signal that looks like this.

Â 18:54

PCA can pick out two components that capture

Â the largest amount of variance in the data.

Â Now you project each noisy data point, each

Â example of a recording onto these two components.

Â 19:08

Usually, this will keep the two components that

Â span the wave forms of the two neuron spikes.

Â All the components that get thrown away are just noise.

Â You can then plot all of the different data,

Â data points that were recorded, project it onto those two

Â features, so now you're seeing every data point projected

Â into the space defined by feature one and feature two.

Â And in this new two-dimensional coordinate

Â frame, the wave forms from the two cells are now clearly separable.

Â 19:40

So let's go back to white noise and neural coding.

Â Here's an example from neural coding where PCA was used to

Â find multiple features and where that turned out to be very helpful.

Â Here you're looking at a scatter plot of all

Â the stimuli that drove a retinal ganglion cell to fire.

Â Each stimulus, each blue dot was 100 time steps of a white noise flicker.

Â So just a scalar that varied in time.

Â But now we've reduced each

Â one of those stimuli to, to a point in two dimensions by projecting

Â it onto the two features that we found, feature one and feature two.

Â 20:14

For this particular retinal ganglion cell, the spike-triggered average

Â was close to zero and this picture shows you why.

Â When we look at the stimuli that trigger spikes it turns out to be two

Â group stimuli that drove the neuron and the

Â average of the entire set is here approximately.

Â It's near zero.

Â 21:15

So this neuron both likes it when the light goes

Â on, and it likes it when the light goes off.

Â If we average all of those stimuli together, we'd get nothing.

Â But if we use this technique, where we now could pull out two different

Â features, and plot our stimuli in that

Â two dimensional space, now that structure is revealed.

Â 21:34

It's important to realize that the two features, f1 and f2, that

Â we found here are not themselves, the on and the off feature,

Â but the analysis allowed us to find a

Â coordinate system in which we could see that structure.

Â 21:49

Okay.

Â I've been making a lot of use of your linear algebra neurons.

Â Let's give them a bit of downtime with

Â the relaxing view of a, of a little eigenpuppy.

Â Although we were not necessarily able to go

Â into the details, I hope you got the flavor

Â of the construction of these kinds of models, and

Â a sense for why multidimensional models can be useful.

Â There are a lot of good resources to learn more

Â about these techniques, and we will post them on the website.

Â