0:00

To extend the learning rule for a linear neuron to a learning rule we can use for

Â multilayer nets of nonlinear neurons, we need two steps.

Â First, we need to extend the learning rule to a single nonlinear neuron.

Â And we're going to use logistic neurons, although many other kinds of nonlinear

Â neurons could be used instead. We're now going to generalize the learning

Â rule for a linear neuron to a logistic neuron, which is a non linear neuron.

Â So, a logistic neuron, computes its logic, z, which is its total input, its, its bias

Â plus the sum over all its input lines of the value of, on an input line xi times

Â the weight on that line, wi. It then gives an output y that's a smooth

Â nonlinear function of that logit. As shown in the graph here, that function

Â is approximately zero when z is big and negative, approximately one when z is big

Â and positive, and in bet, in between, it changes smoothly and nonlinearly.

Â The fact that it changes continuously gives it nice derivatives, which make

Â learning easy. So to get the derivatives of a logistic

Â neuron with respect to the weight, which is what we need for learning, we first

Â need to compute the derivative of the logit itself, that is the total input with

Â respect to our weight, that's very simple. The logit is just a bias plus the sum of

Â all the input lines of the failure on the input lines times the weight.

Â So, when we differentiate with respect to wi, we just get xi.

Â So, the derivative of the logit with respect to wi is xi, and similarly, the

Â derivative of the logit with respect to xi is wi.

Â The derivative of the output with respect to the logic is also simple if you express

Â it terms of the output. So, the output is one / one + e^-z. And dy

Â by dz is just y into one - y. That's not obvious.

Â For those of you who like to see the math, I've put it on the next slide.

Â The math is tedious but perfectly straightforward so you can go through it

Â by yourself. Now, we've got the derivative, the output

Â with respect to the logic and the derivative, the logit with respect to the

Â weight, we can start to figure out the derivative, the output with respect to the

Â weight. We just use the chain rule again.

Â So, dy by dw is dz by dw times dy by dz. And dz by dw, as we just saw, is xi, dy by

Â dz is y into one minus y. And so, we now have the learning row for a

Â logistic neuron. We've got dy by dw, and all we need to do

Â is use the chain rule once more, and multiply it by de by dy.

Â And we get something that looks very like the delta rule.

Â So, the way the arrow changes is we change the weight, de by dwi, is just the sum of

Â all the row of training cases and of the value on input line xin times the

Â residual, the difference between the target and the output, on the actual

Â output of the neuron. But it's got this extra term in it, which

Â comes from the slope of the logistic function, which is yn into one - yn.

Â So, a slight modification of the delta rule gives us the gradiant decent learning

Â rule for training a logistic unit.

Â