0:30

And when you're implementing machine learning algorithms, if you're able to

Â take advantage of these linear algebra libraries or these numerical linear

Â algebra libraries, and mix of routine codes to them rather then sort of write

Â code yourself to do things that these libraries could be doing.

Â And if you do that, then often you get code that first is more efficient.

Â So, just run more quickly and take better advantage of any parallel hardware your

Â computer may have and so on. And second it also means that you end up

Â with less code that you need to write. So, the simpler implementation that is

Â therefore, maybe also more likely to be concrete.

Â And as a concrete example rather than writing code yourself to

Â multiply mactrices, you can let octave do it by typing A times B that will use a

Â very efficient routine to multiply the two matrices.

Â And there's a bunch of examples like these where, if you use appropriate

Â vectorize implementations, you get much simpler code and much efficient code.

Â Let's look at some examples. Here's our usual hypothesis for linear

Â regression. And if you want to compute h of x, notice

Â that there's a sum on the right. And so, one thing you could do is,

Â compute the sum from j=00 and j=n yourself.

Â Another way to think of this is to think of h of x as theta transpose x.

Â And what you can do is think of this as you're computing this inner product

Â between two vectors where theta is, you know, your vector, let's say, theta zero,

Â theta one, theta two. If you have two features, if n=2, and if

Â you think of x as this vector, x0, x1, x2.

Â And, these two views can give you two different implementations.

Â Here's what I mean. Here's an unvectorized implementation for

Â how to compute h of x and by unvectorized I mean without vectorization.

Â We might first initialize your prediction to, to be 0.0.

Â This is going to eventually be, our prediction's going to eventually be h of

Â x. And then, I'm going to have a for loop

Â for j=11 through n1. + 1, prediction gets incremented by theta

Â j * xj. So, it's kind of, this expression over

Â here. By the way, I should mention and these

Â vectors I wrote over here, I had these vectors being zero index.

Â So it has theta zero, theta one, theta two.

Â But because Matlab is one index, theta zero in Matlab, we would end up

Â representing as eheta one. And this second element ends up as theta

Â two, and this third element might end up as theta three just because of vectors in

Â Matlab are indexed starting from one even though, you know, I wrote theta and x

Â here starting indexing from zero, which is why here I have a for loop.

Â j goes from one through n1 + 1 rather than j goes through zero up to n,

Â right? The so, this is an unvectorized

Â implementation in that we have a for loop, that's, you know, summing up the n

Â elements of the sum. In contrast, here's how you would write a

Â vectorized implementation which is that, you would think of

Â x and theta as vectors. And you just set prediction equals theta

Â transfer times x, so just computing like so.

Â So you, instead of writing, you know, all these lines of code with a for loop, you

Â instead just have one line of code and what this, what this line of code on the

Â right will do is it will use octaves highly optimized numerical of linear

Â algebra routines to compute this inner product between the two vectors, theta

Â and x. And not only is the vector implementation

Â simpler, it will also run much more efficiently.

Â 4:15

So that was octave. But, the issue of vectorization applies to other

Â programming languages as well. Let's look at an example in C++.+.

Â Here's what an unvectorized implementation might look like.

Â We again initialize, you know, a prediction to 0.0, and then

Â we now have a for loop for j=00. up to n prediction plus or equals theta j

Â * xj where again, you have this explicit for loop that you write yourself.

Â In contrast, using a good numerical linear algebra library in C++,+.

Â you could use, write a function like, or rather in contrast, using a good

Â numerical linear algebra library in C++,+, you can instead, write code that

Â might look like this. So, depending on the details of your,

Â numerical linear algebra library, you might whether have an object, this is a

Â C+++ object which is vector theta and C+++ object which is a vector x.

Â And you just take theta.transposex. times x where this times becomes a C+++

Â to overload an operator so that can just multiply, these two vectors in C++.+.

Â And depending on, you know, the details of your numerical linear algebra library,

Â you might end up using a slightly different syntax.

Â But by relying on the library to do this in the product, you can get a much

Â simpler piece of code and a much more efficient one.

Â Let's now look at a more sophisticated example.

Â Just to remind you, here's our update rule for gradient descent for linear

Â regression. And so we update theta j using this rule for all values of j = 0,

Â 1, 2 and so on. And if we just write out these equations for theta zero, theta

Â one, theta two, assuming we have two features of n = 2, then these are the

Â updates we perform to theta zero, theta one, theta two, where you might remember

Â my saying in an earlier video, that these should be simultaneous updates.

Â So, let's see if we can come up with a vectorized implementation of this.

Â Here are my same three equations written in a slightly smaller font.

Â And you can imagine that one way to implement these three lines of code is to

Â have a for loop that says, you know, for j = 0, 1 through 2,

Â to update theta j, or something like that.

Â But instead, let's cover with a vectorized implementation and see if we

Â can have a simpler way to basically compress these three lines of code or a

Â for loop that. you know, effectively does these three

Â sets one set at a time. Let's do take these three sets and compress them into

Â one line of vectorized code. Here's the idea.

Â What I'm going to do is I'm going to think of theta as a vector,

Â and I'm going to update theta as theta. Mine is alpha times some other vector

Â delta. Where delta's going to be equal to one

Â over m, sum from i equals one through m and then this term over on the right,

Â okay? So, let me explain what's going on here.

Â Here, I'm going to treat theta as a vector,

Â so there's an n plus one dimensional vector.

Â And I'm saying that theta gets your updated as a vector or n plus one.

Â Alpha is a real number, and delta here as a vector. So, this subtraction operation,

Â that's a vector subtraction, okay? because alpha times delta is a vector,

Â and so I'm saying theta gets, you know, this vector alpha times delta subtracted

Â from it. So, what is the vector delta? Well, this vector delta looks like this.

Â And what it is meant to be is really meant to be, this thing over here.

Â Concretely, delta will be a n plus one dimensional vector, and the very first

Â element of the vector delta is going to be equal to that.

Â So if we have the delta, if we index it from zero, there's delta zero, delta one,

Â delta two, what I want is that delta zero is equal to, you know, this first box in

Â green up above. And, indeed you might be able to convince

Â yourself that delta zero is this one over m, sum of, you know, h of x xi minus yi

Â times xi, zero. So, lets just make sure that we're on the

Â same page about how delta really is computed.

Â Delta is one over m times the sum, over here, and, you know, what, what, what is

Â the sum? Well, this term over here,

Â 9:13

that's a real number. And the second term over here, xi,

Â this term over there is a vector, right? because xi, may be a vector that would be

Â say xi0, xi1, xi2, right?

Â And, what is the summation? Well, what the summation is saying is

Â that this term that is this term over here,

Â this is equal to h of x1 - y1 * x1 + h of x2 - y2 * x2 plus,

Â you know, and so on. Okay?

Â Because this is a summation over i, so as i ranges from i = 1 through m,

Â you get these different terms, and you're summing up these terms here.

Â And the meaning of each of these terms, you know,

Â this is a lot like if you remember, actually from the, from

Â the earlier quiz in this, right? You, you saw this equation.

Â we said that, in order to vectorize this code,

Â we would, instead, set u = 2v + 5w. So, we're saying that the vector u is

Â equal to two times the vector v plus five times the vector w.

Â So, this is an example of how to add different vectors.

Â And this summation is the same thing. This is saying that the summation over

Â here is just some row number, right? That's kind of like the number two, or

Â some other number times the vector x1. This is kind of like, you know,

Â two times v, and saying with some other number times x1.

Â And then, plus, you know, instead of five times w, we instead have

Â some other row number plus some other vector.

Â And then, you add on other vectors, you know,

Â plus dot, dot, dot, dot, dot plus the other vectors,

Â which is y overall. This thing over here, that whole

Â quantity, that delta is just some vector. And concretely, the three elements of

Â delta correspond if n = 2, the three elements of delta correspond exactly to

Â this thing, to the second thing, and this third thing.

Â Which is why when you update theta, according to theta minus alpha delta we

Â end up carrying exactly the same simultaneous updates as, as the update

Â rules that we have on top. So, I know that there was a lot that

Â happened on this slide, but again, feel free to pause the video and, and I'd

Â encourage you to, so the step through the difference, if, if you aren't sure what

Â just happened, I'd encourage you to step through the slide to make sure you

Â understand why is it that this update here, with this definition fo delta,

Â right? Why is it that that's equal to this

Â update on top. and if it's still not clear one insight

Â is that, you know, this thing over here? That's exactly the vector x.

Â And so, we're just taking, you know, all three of these computations and

Â compressing them into one step which is vector delta which is why we can come up

Â with a vectorized implementation of this of this step of linear regression this

Â way. So,

Â I hope this step makes sense and do, do look at the video and make sure, and see

Â if you can understand it. in case you don't understand quite the

Â equivalents of this math, if you implement this, this turns out to be the

Â right answer anyway. So even, even if you didn't quite

Â understand the equivalents if you just implement it this way, you, you, you'll

Â be able to get linear regression to work. But, if you're able to figure out why

Â these two steps are equivalent, then hopefully, that will give you a better

Â understanding of vectorization as well. And finally,

Â if you implementing linear regression using more than one or two features.

Â So, sometimes we use linear regression with tens or hundreds or thousands of

Â features. But if you use the vectorized

Â implementation of linear regression, your data will run much faster than if you had

Â say, your old for loop that was, you know, updating theta zero then theta one,

Â then theta two yourself. So, using a vectorizing implementation,

Â you should be able to get a much more efficient implementation of linear

Â regression. And when you vectorize later algorithms

Â that we'll see in this course is a good trick whether in octave or some other

Â language, like C++ or Java, for getting your code

Â to run more efficiently.

Â