0:08

From your previous exposure to calculus,

Â you probably learned how to compute derivatives quickly

Â without the definition by following a certain set of rules or laws.

Â We're going to cover those rules in this lesson.

Â Such a topic has the potential to be boring, but

Â pay attention because we're going to cover why the rules hold, and

Â in doing so, we'll get a better grasp of the notion of a derivative.

Â 0:39

Recall from our previous lesson that we had two

Â different ways of defining the derivative.

Â One is in terms of a rate of change, it's a limit as the input approaches a,

Â the change in the output over the change in the input.

Â 1:01

Our second definition, a little bit stronger definition,

Â is in terms of first order variation.

Â One changes the input to f by a small amount, h and

Â looks at how the change in the output depends upon h.

Â If you like, it's the coefficient of the first order term

Â in the Taylor Series of f at a.

Â 1:30

From these definitions flows the interpretation of the derivative

Â as the rate of change of the output with respect to change in the input.

Â And this is the interpretation that you will want to have memorized.

Â A few remarks are in order.

Â First of all, the derivative, as you know, most certainly depends

Â upon the input value a that you are examining.

Â 2:02

Secondly, a derivative concerns rates of change.

Â We're not actually measuring how much the output changes.

Â We are looking at the rate of change, as the change in

Â the input is getting closer and closer to zero.

Â Lastly, the derivative at a is telling you a rate of change.

Â It's telling you how fluctuations, or changes, h,

Â in the input are perhaps amplified,

Â if the derivative is positive and larger than 1.

Â Or damped, if the derivative is, say, less than 1.

Â Or reversed, in the case where the derivative is negative.

Â 3:27

The first rule is that of linearity.

Â This is consists of two parts.

Â First what you might call a summation rule, that the derivative of

Â the function u plus v is the derivative of u plus the derivative of v.

Â The second part of linearity says that if we multiply

Â u by a constant c and take its derivative,

Â what we get is that constant, c times the derivative of u.

Â 4:04

The second rule, the product rule,

Â states that the derivative of the product of two functions, u and v, is

Â u times the derivative of v plus v times the derivative of u.

Â 4:20

Lastly, and perhaps most importantly, the chain rule states that the derivative

Â of a composition of two functions is the product of the individual derivatives.

Â We'll have some more to say about that in a moment.

Â 4:38

For now, let's focus on the rules of linearity.

Â The summation rule can be visualized rather simply,

Â keeping track of how u changes and how v changes.

Â And keeping track of how u plus v changes is really not that difficult.

Â Likewise, when you multiply u by a constant,

Â what happens to its rate of change?

Â It is similarly scaled.

Â That picture is at least reasonable.

Â There's another way to think about these rules, as well.

Â This is a way that I often think about it.

Â 5:21

If you take the sum of the two functions and then differentiate,

Â it's the same as differentiating the pieces and then adding them together.

Â Or likewise, you can multiply by a constant and then differentiate,

Â or you can differentiate and then multiply by a constant,

Â whichever path you travel, you will get to the same place.

Â 5:48

These pictures illuminate but do not really justify the differentiation rules.

Â To do a better justification, let's use our definition of the derivative

Â as a first order of variation and employ the language big-O.

Â First of all, u + v evaluated at x + h means what?

Â Why don't we take u, evaluate it at x + h,

Â and add to it v evaluate it at x + h?

Â Now by our definition of the derivative of u,

Â we know that the term on the left is u plus the derivative

Â of u times h plus something in big-O of h squared.

Â Likewise with v.

Â 6:50

What are the first order terms, those that have an h?

Â Well, on the left, we have du/dx, and

Â we add to it the term from the right, dv/dx.

Â Everything else is something in big-O h squared

Â plus something in big O of h squared.

Â That is in big-O of h squared, naturally.

Â So we see that the first order coefficient

Â is the sum of the derivatives, as expected.

Â Likewise, when you multiply u by a constant, c,

Â and evaluate that at x + h, it is simply c times

Â u + du/dx times h + something big-O of h squared.

Â The zeroth order term is c times u.

Â The first order term is c times du/dx times h.

Â Everything else is a constant times big-O of h squared.

Â But remember in big-O, constants don't count,

Â and so we see that the first order term, the derivative is c times du/dx.

Â 8:07

Likewise for the product rule, it's not hard to visualize

Â the first order variation of u times v in terms of du and dv.

Â But to write this out using our language of big-O is very simple.

Â If we take (u x v) and evaluate it at (x + h),

Â that is u at (x + h)v at (x + h).

Â We can substitute in the expansions for u and

Â v and perform this multiplication.

Â If we multiply these terms as if they were polynomials, what do we get?

Â The zeroth order term is simply u times v.

Â 8:55

All of those terms that are of first order

Â in h have coefficients u times dv/dx + v times du/dx.

Â Everything else in that multiplication, as you can check,

Â is going to be in big-O of h squared.

Â So the derivative can be read off from the first order of the term, and

Â we get the product rule.

Â 9:52

Now, what do the rates of change,

Â what does the derivative look like, in this case?

Â Well, dv is telling you at what rate the output of v

Â changes with the respect to change in the input, du tells you

Â at what rate the out of u changes with respect to change in the input.

Â What is the derivative of the composition?

Â Well, it tells you.

Â When you change the input to v at a certain rate,

Â at what rate does the output, the final output of u composed with v, change?

Â And the answer, as one can intuit is that these rates of change multiply.

Â 10:44

The justification of the chain rule is going to follow the same ideas but

Â with a bit more manipulation involved.

Â If we consider u composed with v and

Â evaluate the input at x + h, then,

Â clearly, our first step should be to expand out v using

Â what we know about the derivative of v with respect to x.

Â Everything else in the variation has big-O of h squared.

Â 11:32

Now we're evaluating u, not at v, but at v plus some perturbation,

Â some variation term which we know has the structure

Â of dv/dx times h plus something in big-O of h squared.

Â Forget about that structure for the moment.

Â We're going to expand this out.

Â The zeroth order term is u(v).

Â Next comes the derivative of u with respect to v times this perturbation term

Â 12:11

plus something in big-O of that perturbation term squared.

Â And now, here comes the final steps.

Â If we look at that big-O on the right, and

Â view it as a function of h, then squaring everything in sight,

Â we see that everything has powers of h that are at least 2.

Â And so we can replace that by a big-O of h squared.

Â Now for the terms in the middle, when we distribute the multiplication

Â of du/dv, with all of the terms in the parentheses,

Â well the latter term, du/dv times something

Â in big-O of u squared yields again something in big-O of h squared.

Â And so combining those big-O's of h squared,

Â we are left with a first order term in h that

Â has coefficient du/dv x dv/dx.

Â And that is the proper form the chain rule.

Â 13:24

Now one has to be careful.

Â The differential notation is especially misleading at this point.

Â You must evaluate your derivatives at the correct inputs.

Â Let's look at a simple example.

Â Let's say that f(x) is the exponential function e to the x.

Â What is the derivative of f composed with itself?

Â Well, according to the chain rule,

Â it should be df/dx x df/dx.

Â Well, that's not exactly what it is,

Â because we have to be careful about the inputs.

Â If we were to say evaluate this derivative at x = 0,

Â then the second term must be evaluated at x = 0.

Â The first term in this product is evaluated

Â not at x = 0, but at x = e to the 0, or 1.

Â That would give us a value of e x 1, which is, of course, e.

Â Now this looks awfully confusing.

Â What's really going on?

Â Well, you know how to differentiate functions, and

Â you know how to use the chain rule.

Â Let's think about what f composed with f is.

Â It's really e to the e to the x.

Â And if I were to ask you how do you differentiate that, you would say well,

Â first, I differentiate the e to the x and evaluate that at e to the x.

Â That gives me e to the e to the x, but because it's the chain rule,

Â I have to multiply by the derivative of that exponent, in this case, e to the x.

Â Now if you evaluate both of these at x = 0,

Â you get the same computation as above with the final value of e.

Â Be careful about where you evaluate your derivatives.

Â 15:36

There are a number of other differentiation rules that you may have

Â seen in your previous exposure to calculus, the reciprocal rule,

Â the quotient rule, the inverse rule.

Â If you remember seeing these, then take a brief look.

Â If you've not seen them before,

Â then you might wanna work through the following examples,

Â such as showing that the derivative of secant is secant times tangent.

Â Or that the derivative of tangent is really secant squared.

Â Or that the derivative of log of x, that is the inverse

Â of the exponential function, is, in fact, 1/x.

Â I'll leave it to you to take a more careful look at some of these rules and

Â to practice in the homework sets.

Â 16:27

And so we see the basic rules for differentiation.

Â But wait, there's more.

Â If you look at the bonus material, you'll see how some of the ghosts

Â of these rules haunt other realms of mathematics.

Â In our next lesson, we'll cover one of the canonical applications of

Â derivatives to linearization.

Â