0:00

In this video I like to tell you about the idea of Vectorization.

Â So, whether you using Octave or a similar language like MATLAB or

Â whether you're using Python [INAUDIBLE], R, Java, C++,

Â all of these languages have either built into them or have regularly and

Â easily accessible difference in numerical linear algebra libraries.

Â They're usually very well written, highly optimized,

Â often sort of developed by people that have PhDs in numerical computing or

Â they're really specialized in numerical computing.

Â And when you're implementing machine learning algorithms, if you're

Â able to take advantage of these linear algebra libraries or these numerical

Â linear algebra libraries, and make some routine calls to them rather than sort of

Â write code yourself to do things that these libraries could be doing.

Â If you do that, then often you get code that, first, is more efficient, so

Â you just run more quickly and

Â take better advantage of any parallel hardware your computer may have and so on.

Â And second, it also means that you end up with less code that you need to write,

Â so it's a simpler implementation that is therefore maybe also more likely to be

Â by free.

Â And as a concrete example, rather than writing code yourself to

Â multiply matrices, if you let Octave do it by typing a times b,

Â that would use a very efficient routine to multiply the two matrices.

Â And there's a bunch of examples like these, where if you use appropriate

Â vectorization implementations you get much simpler code and much more efficient code.

Â Let's look at some examples.

Â 1:33

Here's our usual hypothesis for linear regression, and

Â if you want to compute h(x), notice that there's a sum on the right.

Â And so one thing you could do is, compute the sum from j = 0 to j = n yourself.

Â Another way to think of this is to think of h(x) as theta transpose x,

Â and what you can do is, think of this as you are computing this inner product

Â between two vectors where theta is your vector, say, theta 0, theta 1, theta 2.

Â If you have two features, if n equals two, and

Â if you think x as this vector, x0, x1, x2, and

Â these two views can give you two different implementations.

Â Here's what I mean.

Â Here's an unvectorized implementation for how to compute and by unvectorize,

Â I mean without vectorization.

Â We might first initialize prediction just to be 0.0.

Â The prediction's going to eventually be h(x), and then I'm going to have a for

Â loop for j=1 through n+1, prediction gets incremented by theta(j) * x(j).

Â So it's kind of this expression over here.

Â By the way, I should mention,

Â in these vectors that I wrote over here, I had these vectors being 0 index.

Â So I had theta 0, theta 1, theta 2.

Â But because MATLAB is one index, theta 0 in that MATLAB, we would end up

Â representing as theta 1 and the second element ends up as theta 2 and

Â this third element may end up as theta 3, just because our vectors in

Â MATLAB are indexed starting from 1, even though I wrote theta and

Â x here, starting indexing from 0, which is why here I have a for loop.

Â j goes from 1 through n+1 rather than j goes through 0 up to n, right?

Â But so this is an unvectorized implementation in that we have for

Â loop that is summing up the n elements of the sum.

Â In contrast, here's how you would write a vectorized implementation,

Â which is that you would think of a x and theta as vectors.

Â You just said prediction = theta' * x.

Â You're just computing like so.

Â So instead of writing all these lines of code with a for loop,

Â you instead just have one line of code.

Â And what this line of code on the right will do is,

Â it will use Octaves highly optimized numerical linear algebra routines to

Â compute this inner product between the two vectors, theta and X, and not only is

Â the vectorized implementation simpler, it will also run much more efficiently.

Â 4:54

In contrast, using a good numerical linear algebra library in C++,

Â you can instead write code that might look like this.

Â So depending on the details of your numerical linear algebra library,

Â you might be able to have an object, this is a C++ object, which is vector theta,

Â and a C++ object which is vector x, and you just take theta.transpose * x,

Â where this times becomes a C++ sort of overload operator so

Â you can just multiply these two vectors in C++.

Â And depending on the details of your numerical linear algebra library,

Â you might end up using a slightly different syntax, but

Â by relying on the library to do this inner product,

Â you can get a much simpler piece of code and a much more efficient one.

Â 5:40

Let's now look at a more sophisticated example.

Â Just to remind you, here's our update rule for

Â a gradient descent of a linear regression.

Â And so we update theta j using this rule for all values of j = 0, 1, 2, and so on.

Â And if I just write out these equations for theta 0, theta 1,

Â theta 2, assuming we have two features, so n = 2.

Â Then these are the updates we perform for theta 0, theta 1, theta 2, where you might

Â remember my saying in an earlier video, that these should be simultaneous updates.

Â 6:20

Here are my same three equations written in a slightly smaller font, and

Â you can imagine that one way to implement these three lines of code is to have a for

Â loop that says for j = 0, 1 through 2 to update theta j, or something like that.

Â But instead, let's come up with a vectorized implementation and see if

Â we can have a simpler way to basically compress these three lines of code or

Â a for loop that effectively does these three steps one set at a time.

Â Let's see if we can take these three steps and

Â compress them into one line of vectorized code.

Â Here's the idea.

Â What I'm going to do is,

Â I'm going to think of theta as a vector,

Â and I'm gonna update theta as theta-

Â alpha times some other vector delta,

Â where delta's is going to be equal to 1 over m,

Â sum from i = 1 through m.

Â And then this term over on the right, okay?

Â So, let me explain what's going on here.

Â 7:31

Here, I'm going to treat theta as a vector, so

Â this is n plus one dimensional vector, and

Â I'm saying that theta gets here updated as that's a vector, Rn + 1.

Â Alpha is a real number, and delta, here is a vector.

Â So, this subtraction operation, that's a vector subtraction, okay?

Â Cuz alpha times delta is a vector, and so

Â I'm saying theta gets this vector, alpha times delta subtracted from it.

Â So, what is a vector delta?

Â Well this vector delta, looks like this, and

Â what it's meant to be is really meant to be this thing over here.

Â Concretely, delta will be a n plus one dimensional vector, and

Â the very first element of the vector delta is going to be equal to that.

Â So, if we have the delta, if we index it from 0,

Â if it's delta 0, delta 1, delta 2, what I want is

Â that delta 0 is equal to this first box in green up above.

Â And indeed, you might be able to convince yourself

Â that delta 0 is this 1 of the m sum of ho(x),

Â x(i) minus y(i) times x(i) 0.

Â So, let's just make sure we're on this same page

Â about how delta really is computed.

Â Delta is 1 over m times this sum over here, and what is this sum?

Â Well, this term over here, that's a real number,

Â and the second term over here, x i,

Â this term over there is a vector, right,

Â because x(i) may be a vector that would be,

Â say, x(i)0, x(i)1, x(i)2,

Â right, and what is the summation?

Â Well, what the summation is saying is that,

Â this term, that is this term over here,

Â this is equal to, (h of(x(1))-

Â y(1)) * x(1) + (h of(x(2))-

Â y(2) x x(2) +, and so on, okay?

Â Because this is summation of i, so as i ranges from i = 1 through m,

Â you get these different terms, and you're summing up these terms here.

Â And the meaning of these terms, this is a lot like if you remember actually

Â from the earlier quiz in this, right, you saw this equation.

Â We said that in order to vectorize this code we will instead said u = 2v + 5w.

Â So we're saying that the vector u is equal to two times the vector v

Â plus five times the vector w.

Â So this is an example of how to add different vectors and

Â this summation's the same thing.

Â This is saying that the summation over here is just some real number, right?

Â That's kinda like the number two or some other number times the vector, x1.

Â So it's kinda like 2v or say some other number times x1, and

Â then plus instead of 5w we instead have some other real number,

Â plus some other vector, and then you add on other vectors, plus dot,

Â dot, dot, plus the other vectors, which is why, over all,

Â this thing over here, that whole quantity, that delta is just some vector.

Â 11:47

So, I know that there was a lot that happened on this slide, but again,

Â feel free to pause the video and if you aren't sure

Â what just happened I'd encourage you to step through this slide to make

Â sure you understand why is it that this update here with this definition of delta,

Â right, why is it that that's equal to this update on top?

Â And if it's still not clear, one insight is that, this thing over here,

Â that's exactly the vector x, and so

Â we're just taking all three of these computations, and compressing them into

Â one step with this vector delta, which is why we can come up

Â with a vectorized implementation of this step of the new refresh in this way.

Â 12:37

So, I hope this step makes sense and do look at the video and

Â see if you can understand it.

Â In case you don't understand quite the equivalence of this map,

Â if you implement this, this turns out to be the right answer anyway.

Â So, even if you didn't quite understand equivalence,

Â if you just implement it this way, you'll be able to get linear regression to work.

Â But if you're able to figure out why these two steps are equivalent,

Â then hopefully that will give you a better understanding of vectorization as well.

Â And finally, if you are implementing linear regression using more than one or

Â two features, so sometimes we use linear regression with 10's or 100's or

Â 1,000's of features.

Â But if you use the vectorized implementation of linear regression,

Â you'll see that will run much faster than if you had, say, your old for

Â loop that was updating theta zero, then theta one, then theta two yourself.

Â So, using a vectorized implementation, you should be

Â able to get a much more efficient implementation of linear regression.

Â And when you vectorize later algorithms that we'll see in this class,

Â there's good trick, whether in Octave or some other language like C++, Java,

Â for getting your code to run more efficiently.

Â