0:00

We have talked about how vectorization lets you speed up your code significantly.

Â In this video, we'll talk about how you can vectorize

Â the implementation of logistic regression,

Â so they can process an entire training set,

Â that is implement a single elevation of grading descent with respect to

Â an entire training set without using even a single explicit for loop.

Â I'm super excited about this technique,

Â and when we talk about neural networks later without

Â using even a single explicit for loop.

Â Let's get started. Let's first examine the four propagation steps of logistic regression.

Â So, if you have M training examples,

Â then to make a prediction on the first example,

Â you need to compute that,

Â compute Z. I'm using this familiar formula,

Â then compute the activations,

Â you compute [inaudible] in the first example.

Â Then to make a prediction on the second training example,

Â you need to compute that.

Â Then, to make a prediction on the third example,

Â you need to compute that, and so on.

Â And you might need to do this M times,

Â if you have M training examples.

Â So, it turns out, that in order to carry out the four propagation step,

Â that is to compute these predictions on our M training examples,

Â there is a way to do so,

Â without needing an explicit for loop.

Â Let's see how you can do it.

Â First, remember that we defined a matrix capital X to be your training inputs,

Â stacked together in different columns like this.

Â So, this is a matrix,

Â that is a NX by M matrix.

Â So, I'm writing this as a Python draw pie shape,

Â this just means that X is a NX by M dimensional matrix.

Â Now, the first thing I want to do is show how you can compute Z1, Z2,

Â Z3 and so on,

Â all in one step,

Â in fact, with one line of code.

Â So, I'm going to construct a 1

Â by M matrix that's really a row vector while I'm going to compute Z1,

Â Z2, and so on,

Â down to ZM, all at the same time.

Â It turns out that this can be expressed as

Â W transpose to capital matrix X plus and then this vector B,

Â B and so on.

Â B, where this thing,

Â this B, B, B, B,

Â B thing is a 1xM vector or

Â 1xM matrix or that is as a M dimensional row vector.

Â So hopefully there you are with matrix multiplication.

Â You might see that W transpose X1,

Â X2 and so on to XM,

Â that W transpose can be a row vector.

Â So this W transpose will be a row vector like that.

Â And so this first term will evaluate to W transpose X1,

Â W transpose X2 and so on, dot, dot, dot,

Â W transpose XM, and then we add this second term B,

Â B, B, and so on,

Â you end up adding B to each element.

Â So you end up with another 1xM vector.

Â Well that's the first element,

Â that's the second element and so on,

Â and that's the nth element.

Â And if you refer to the definitions above,

Â this first element is exactly the definition of Z1.

Â The second element is exactly the definition of Z2 and so on.

Â So just as X was once obtained,

Â when you took your training examples and

Â stacked them next to each other, stacked them horizontally.

Â I'm going to define capital Z to be this where

Â you take the lowercase Z's and stack them horizontally.

Â So when you stack the lower case X's corresponding to a different training examples,

Â horizontally you get this variable capital X and

Â the same way when you take these lowercase Z variables,

Â and stack them horizontally,

Â you get this variable capital Z.

Â And it turns out, that in order to implement this,

Â the non-pie command is capital Z equals NP dot W dot T,

Â that's W transpose X and then plus B.

Â Now there is a subtlety in Python,

Â which is at here B is a real number or if you want to say you know 1x1 matrix,

Â is just a normal real number.

Â But, when you add this vector to this real number,

Â Python automatically takes this real number B and expands it out to this 1XM row vector.

Â So in case this operation seems a little bit mysterious,

Â this is called broadcasting in Python,

Â and you don't have to worry about it for now,

Â we'll talk about it some more in the next video.

Â But the takeaway is that with just one line of code, with this line of code,

Â you can calculate capital Z and capital Z is

Â going to be a 1XM matrix that contains all of the lower cases Z's.

Â Lowercase Z1 through lower case ZM.

Â So that was Z, how about these values A.

Â What we like to do next,

Â is find a way to compute A1,

Â A2 and so on to AM,

Â all at the same time,

Â and just as stacking lowercase X's resulted in

Â capital X and stacking horizontally lowercase Z's resulted in capital Z,

Â stacking lower case A,

Â is going to result in a new variable,

Â which we are going to define as capital A.

Â And in the program assignment,

Â you see how to implement a vector valued sigmoid function,

Â so that the sigmoid function,

Â inputs this capital Z as a variable and very efficiently outputs capital A.

Â So you see the details of that in the programming assignment.

Â So just to recap,

Â what we've seen on this slide is that instead of needing to loop over

Â M training examples to compute lowercase Z and lowercase A,

Â one of the time, you can implement this one line of code,

Â to compute all these Z's at the same time.

Â And then, this one line of code,

Â with appropriate implementation of

Â lowercase Sigma to compute all the lowercase A's all at the same time.

Â So this is how you implement

Â a vectorize implementation of

Â the four propagation for all M training examples at the same time.

Â So to summarize, you've just seen how you can use

Â vectorization to very efficiently compute all of the activations,

Â all the lowercase A's at the same time.

Â Next, it turns out, you can also use vectorization very

Â efficiently to compute the backward propagation,

Â to compute the gradients.

Â Let's see how you can do that, in the next video.

Â