So we wrote our gradient descent for J(s) if only w was your parameter.

In logistic regression, your cost function is a function of both w and b.

So in that case, the inner loop of gradient descent, that is this thing here,

this thing you have to repeat becomes as follows.

You end up updating w as w minus the learning rate times

the derivative of J(w,b) respect to w.

And you update b as b minus the learning rate times

the derivative of the cost function in respect to b.

So these two equations at the bottom are the actual update you implement.

As an aside I just want to mention one notational convention in calculus that

is a bit confusing to some people.

I don't think it's super important that you understand calculus, but

in case you see this I want to make sure that you don't think too much of this.

Which is that in calculus, this term here,

we actually write as fallows, of that funny squiggle symbol.

So this symbol, this is actually just a lower case d

in a fancy font, in a stylized font for when you see this expression all this

means is this isn't [INAUDIBLE] J(w,b) or really the slope of the function

J(w,b), how much that function slopes in the w direction.

And the rule of the notation in calculus, which I think isn't totally logical,

but the rule in the notation for calculus, which I think just makes things much

more complicated than you need to be is that if J is a function of two or

more variables, then instead of using lowercase d you use this funny symbol.

This is called a partial derivative symbol.

But don't worry about this,

and if J is a function of only one variable, then you use lowercase d.

So the only difference between whether you use this funny

partial derivative symbol or lowercase d as we did on top,

is whether J is a function of two or more variables.

In which case, you use this symbol, the partial derivative symbol, or

if J is only a function of one variable then you use lower case d.

This is one of those funny rules of notation in calculus that

I think just make things more complicated than they need to be.

But if you see this partial derivative symbol all it means is you're measure

the slope of the function, with respect to one of the variables.

And similarly to adhere to the formerly correct mathematical

notation in calculus, because here J has two inputs not just one.

This thing at the bottom should be written with this partial derivative simple.

But it really means the same thing as, almost the same thing as lower case d.

Finally, when you implement this in code,

we're going to use the convention that this quantity, really the amount by which

you update w, will denote as the variable dw in your code.

And this quantity, right?

The amount by which you want to update b

will denote by the variable db in your code.

All right, so, that's how you can implement gradient descent.

Now if you haven't seen calculus for a few years, I know that that might seem like

a lot more derivatives in calculus than you might be comfortable with so far.

But if you're feeling that way, don't worry about it.

In the next video, we'll give you better intuition about derivatives.

And even without the deep mathematical understanding of calculus,

with just an intuitive understanding of calculus

you will be able to make neural networks work effectively.

So that, let's go onto the next video where we'll talk a little bit more about

derivatives.