Welcome back In the previous lecture, we considered this dataset and we asked the

question, can a network of neurons learn to represent such data?

When we applied the co-variance rule as you'll recall, to this dataset, it ended

up finding a weight vector, that was aligned with the direction of maximum

variance of this dataset. But we noted at the time, that this was

not a very satisfying model of this data set, because the input data appear to be

consist of these two clusters of data points.

So lets ask the question, can neurons learn to represent such clusters?

Here is one way in which we can use neurons to represent clusters.

So lets use a feedforward network with two output neurons, neuron A and neuron

B. And lets use neuron A to represent

cluster A, and neuron B to represent cluster B.

So we can do that by making the weight vector WA, be the center of cluster A,

and the weight vector WB, be the center of cluster B.

And so now since this is the feedforward network, so here is the input component 1

and input component 2. So U1 and U2 together comprise the vector

for the input U. Here is the output of each of these two

neurons, so it's just the dot product between the weight vector and the input

vector. So the question that I would like to ask

you is, if I give you a particular input such as this one here, which neuron do

you think will fire the most? In other words, which neuron will have a

higher output firing rate? Is it neuron A or neuron B, for this

particular input? Notice that this particular input is

closer to neuron B. So the distance from here to the center

of the cluster, so that is the distance from this input to WB, seems to be

shorter than the distance from this input to the weight vector WA, which is the

center of cluster A. So, which neuron do you think will have a

higher activity? Neuron A or neuron B?

If you answered neuron B you would be correct.

The most active neuron in the network, is going to be the one whose weight vector

is closest to an input. So in this case, for this particular

input U, the closest weight vector is WB, and therefore, the most active neuron is

also going to be the neuron B. And we can show that by looking at the

Euclidean distance between the input and each of these two weight vectors, WA or

WB. And so the square of the Euclidean

distance turns out to be equal, if you simplify all these terms.

It turns out to be equal to be to the square of the length of the input vector,

plus the square of the length of the rate vector, minus 2 times the output activity

of the neuron. And so if we assume that the length of

the input vector has been normalized to have length 1, and similarly the length

of the weight vector has been normalized to also have, let's say a length of 1.

Then minimizing the squared distance between the input and the weight vector,

turns out to be the same as maximizing the activity of that particular neuron.

Now suppose I give you a new input, UT. And here's that new input.

How will you update the weights of the two neurons, given this new input UT?

Well lets think about that for a little bit.

So the first thing that we need to do, is perhaps figure out which cluster this new

input belongs to. Is it cluster A, or cluster B?

And we can do that, by looking at the distance between that new data point in

each of these centers of the clusters. So in this case, the distance between WA,

and this new data point as well as WB in that particular data point, and it

appears in this case that the cluster A is the one that this new input might

belong to. Because the distance from that input to

the center of the cluster, is the shortest.

And so now that we've figured out which cluster this new input might belongs to,

we can update the weight, which is now the center of the cluster, to now include

this new data point. So how would we do that?

One way of doing that is to set the weight vector to be the running average

of all the inputs, in that particular cluster.

So the running average of all the inputs in this cluster, including this new

input. So do you remember the equation for

computing the running average? Well here it is.

So we can derive the equation for the running average, by starting out with the

expression for the average, which all know, which is just the summation of the

data points in this cluster A, divided by the number of such points, which we're

calling T. And so if we express the sum now, as the

sum of all the data points except for the data point T, and then we simply it.

What we find is an equation that has both the weight vector before we got the new

data point, plus this additional term, which includes the new data point.

And so we can now write this weight update rule.

The delta W, which is the change in the weight vector W for the neuron A, is

equal to some epsilon times just the difference between the new input, and the

weight vector for that particular neuron. Now we can epsilon to be equal to 1 over

T, and that would make this equation compute the running average.

Or you can keep epsilon as some small positive value, and that allows the

method to adapt to new inputs for an indefinite period of time.

So the 1 over T would make epsilon go to 0 for very large T.

But if you keep epsilon to be some constant positive value, then that allows

the algorithm to remain adaptive to new inputs, for an indefinite period of time.