0:00

In this video, we'll apply the divide and

Â conquer algorithm design paradigm to the problem of multiplying matrices.

Â This will culminate in the study of Strassen matrix multiplication algorithm.

Â And this is a super cool algorithm for two reasons.

Â First of all, Strassen's algorithm is completely non-trivial.

Â It's totally non-obvious, very clever,

Â not at all clear how Strassen ever came up with it.

Â The second cool feature is it's for such a fundamental problem.

Â So computers as long as they've been in use from the time they

Â were invented up til today a lot of their cycles is spent multiplying matrices.

Â It just comes up all the time in important applications.

Â So let me first just make sure we're all clear on what the problem is of

Â multiplying two matrices.

Â 0:48

The ideas we'll talk about are also relevant for

Â multiplying non square matrices but we're not going to discuss it in this video.

Â The entries in these matrices, you could think of it as whatever you want.

Â Maybe they're integers, maybe they're rationals.

Â Maybe they're from some finite field.

Â It depends on the application but

Â the point is they're entries that we can add and multiply.

Â 1:30

So if ij was this red square, this red entry over in the z matrix that

Â would be derived from the corresponding row of the x matrix and

Â the corresponding column of the y matrix.

Â And recall what I mean by dot product, that just means you take the products of

Â the individual components and then add up the results.

Â 1:51

So ultimately the zij entry boils down to a sum over n things

Â where each of the constituent products is just the xik entry,

Â the ik entry of the matrix x with a kj entry of the matrix y.

Â Where your k is ranging from 1 to n.

Â So that's how zij is defined for a given pair of indices i and j.

Â 2:15

One thing to note is that where we often use n to denote the input size,

Â here we're using n to denote the dimension of each of these matrices.

Â The input size is not n, the input size is quite a bit bigger than n.

Â Specifically, each of these are n x n matrices that contains n squared entries.

Â 2:32

So since presumably we have to read the input, which has size n squared, and

Â we have to produce the output, which also has size n squared.

Â The best we could really hope for a matrix multiplication algorithm would be

Â a running time of n squared, so the question is how close can we get to it.

Â 2:47

Before we talk about algorithms for matrix multiplication let me just make sure we're

Â all crystal clear on exactly what the problem is so let's just actually spell

Â out what would be the result of multiplying two different 2 x 2 matrices.

Â So we can parameterize two generic 2 x 2 matrices by just giving the first one

Â entries a, b, c, and d.

Â Or these four entries could all be anything.

Â And then, we're multiplying by a second 2 x 2 matrix.

Â Let's call its entries e, f, g, and h.

Â Now what's the result of multiplying these?

Â Where again, it's going to be a 2 x 2 matrix where each entry is just

Â the corresponding dot product of the relevant row of the first matrix and

Â column of the second matrix.

Â So to get the upper left entry we take the dot product of the upper row of the first

Â matrix and the first column of the left column of the second matrix so

Â that results in ae + bg.

Â To get the upper right entry we take the dot product of the top row of

Â the left matrix with the right column of the second matrix, so

Â that gives us af + bh and then filling in the other entries in the same way,

Â we get ce + dg and cf + dh.

Â Okay, so that's multiplying two matrices, and

Â we've already discussed the definition in general.

Â Now, suppose you had to write a program to actually compute the result of

Â multiplying two n x n matrices.

Â One natural way to do that would just be to return to the definition,

Â which defines each of the n squared entries in the z matrix as a suitable sum

Â of products of entries of the x and y matrices.

Â So in the next quiz I'd like you to figure out exactly what would be the running time

Â of that algorithm as a function of the matrix dimension n.

Â Where, as usual, we count the additional multiplication of two individual entries

Â as a constant time operation.

Â 4:47

So the correct response to this quiz is the third answer.

Â That the running time of the straightforward iterative algorithm

Â runs in cubic time relative to the matrix dimension n.

Â To see this just recall what the definition of the matrix

Â multiplication was.

Â The definition tells us that each entry zij of

Â the output matrix z is defined as the sum from k = 1 to n of xik times ykj.

Â That is the dot product of the ith row of the x matrix and

Â the jth column of the y matrix.

Â I'm certainly assuming that we have the matrices represented

Â in a way that we could access a given entry in a constant time.

Â And under that assumption remember each of these products only takes constant time.

Â And so then to compute zij we just have to add up these n products so

Â that's going to be theta of n time to compute a given zij and

Â then there's an n squared entries that we have to compute.

Â There's n choices for i, n choices for j.

Â So that gives us n squared times n or cubic running time overall.

Â For the natural algorithm, which is really just a triple for loop which computes

Â each entry of the output array separately using the dot product.

Â So the question as always for the keen algorithm designer is, can we do better.

Â Can we beat n cube time for multiplying two matrices.

Â 6:09

And we might be especially emboldened with the progress that we've already seen in

Â terms of multiplying two integers.

Â We apply the divide and

Â conquer algorithm to the problem of multiplying two integers.

Â And we had both a naive recursive algorithm and

Â a seemingly better algorithm due to Gauss, which made only three recursive calls.

Â Now we haven't yet analyzed the running time of that algorithm.

Â But as we'll see later, that does indeed beat

Â the quadratic running time of the grade school algorithm.

Â So it's very natural to ask, can we do exactly the same thing here?

Â There's the obvious n cubed algorithm which follows straight from

Â the definition, perhaps analogous to Gauss we could have some clever divide and

Â conquer method which beats cubic time.

Â So that's what we're going to explore next.

Â 6:50

Let's recall the divide and conquer paradigm, what does it mean to use it?

Â Well we first have to identify smaller subproblems, so

Â if we want to multiple two n x n matrices.

Â We have to identify multiplications of smaller matrices that we can solve

Â recursively.

Â Once we figured out how we want to divide the given problem into smaller ones.

Â Then the conquer step, we simply invoke our own algorithm recursively.

Â That's going to recursively multiply the smaller matrices together.

Â And then in general, we'll have to combine the results of the recursive cause to

Â get the solution for the original problem.

Â In our case, to get the product of the original two matrices

Â from the product of whatever submatrices we identify.

Â So how would we apply the divide and conquer paradigm to matrices?

Â So we're given two n x n matrices, and we have to somehow identify

Â smaller pairs of square matrices that we can multiply recursively.

Â 7:41

So the idea I think is fairly natural.

Â So we start with a big n by n matrix x, right, so there's n rows and n columns.

Â And we have to somehow divide it into smaller pieces.

Â Now the first thing you might think about is you put it into it's left half and

Â it's right half analogous to what we've been doing.

Â With arrays, but

Â then we're going to break X into two matrices which are no longer square,

Â which are n over 2 in one dimension, and have length n in the other dimension.

Â And we want to recursively call a subroutine that multiplies

Â square matrices.

Â So what seems like the clear thing to do, is to divide X into quadrants.

Â Okay, so we have four pieces of X, each is going to be n over 2 by n over 2

Â corresponding to the different quarters of this matrix.

Â So let's call these different quadrants or blocks in matrix terminology A, B, C,

Â and D.

Â All of these are n over 2 by n over 2 matrices.

Â As usual for simplicity, I'm assuming that n is even.

Â And as usual it doesn't really matter and we can do the same trick with Y.

Â 9:03

So what I mean by that,

Â is that the product of X and Y can be expressed in terms of its quadrants.

Â And each of its four quadrants, each of its four corners

Â can be written as a suitable arithmetic expression of the quadrants of X and Y.

Â So here's exactly what those formulas are.

Â They're exactly analogous to when we just multiplied a pair of 2 by 2 matrices.

Â 9:26

So I'm not going to formally prove this fact.

Â I'm sure many of you have seen it before or familiar with it.

Â And if you haven't, it's actually quite easy to prove.

Â It's not obvious since you can't see it off the top of your head necessarily.

Â But if you go back to the definition, it's quite easy to verify.

Â But indeed when you multiply X and Y,

Â you can express its quadrants into blocks in terms of the blocks of X and Y.

Â 9:46

So what we just did is completely analogous to when talking about

Â integer multiplication, and we wanted to multiply two integers, x and y, and

Â we broke them into pairs of n over 2 digits.

Â And then we just took the expansion, and we've observed how that expansion could be

Â written in terms of products of n over 2 digit numbers.

Â It's the same thing going on here, except with matrices.

Â So now we're in business as far as a recursive approach.

Â We want to multiply x and y.

Â They're n by n matrices.

Â We recognize, we going to express that product, x times y.

Â In terms of the products of n over 2 by n over 2 matrices.

Â Things were able to multiply recursively, plus additions.

Â And additions, it's clearly easy to multiply two different matrices with,

Â say, n squared entries each.

Â It's going to be linear in the number of entries.

Â So it's going to be n squared time to add two matrices that are n by n.

Â So this immediately leads us to our first recursive algorithm.

Â 10:43

And now our first recursive algorithm is simply to evaluate all of these

Â expressions in the obvious way.

Â So specifically in step one,

Â we recursively compute all of the necessary products.

Â 10:54

And observe that there are eight products that we have to compute.

Â Eight products with n over 2 by n over 2 matrices.

Â There are four entries in this expansion of x times y.

Â Each of the blocks is the sum of two products and

Â none of the products reoccurred.

Â They're all distinct.

Â So naively, if you want to evaluate this,

Â we have to do eight different products of n over 2 by n over 2 matrices.

Â 11:33

Now the question you should be asking is, is this a good algorithm?

Â Was this good for anything?

Â This recursive approach.

Â Splitting x and y into these blocks.

Â Expanding the product in terms of these blocks then recursively

Â computing each of the blocks.

Â And I want to say it's totally not obvious.

Â It is not clear what the running time of this recursive algorithm is.

Â I'm going to go ahead and give you a spoiler which is going to follow from

Â the master method that we'll talk about in the next lecture.

Â But it turns out,

Â with this kind of recursive algorithm where you do eight recursive calls.

Â Each on a problem with dimension half as much as what you started with, and

Â then do quadratic time outside, the running time is going to be cubic.

Â So exactly the same as with the straightforward iterative algorithm that

Â follows from the definition.

Â That was cubic, it turns out, and that was clearly cubic.

Â This one, although it's not obvious, is cubic as well.

Â So no better, no worse than the straightforward iterative algorithm.

Â 12:29

So in case you're feeling disappointed that we went through all this work in this

Â sort of seemingly clever divide and conquer approach for matrix multiplication

Â and came out at the end no better than the iterative algorithm.

Â Well, there's really no reason to despair.

Â because remember back in integer multiplication,

Â we had a straightforward recursive algorithm.

Â Where we have to do four recursive calls.

Â Products of n over 2 digit numbers.

Â But then we had Gauss' trick, which said if we only did more clever products and

Â more clever additions and subtractions,

Â then we can get away with only three recursive calls.

Â And we'll see later if that is indeed a significant savings in the time we did for

Â integer multiplication.

Â And we've done nothing analogously clever to Gauss' trick for

Â this matrix multiplication problem.

Â All we did is the naive expansion, in terms of submatrices, and

Â the naive evaluation of the resulting expressions.

Â So, the $64,000 question is then, can we do something clever,

Â to reduce the number of recursive calls, from 8 down to something lower?

Â And that is where Strassen's Algorithm comes in.

Â 13:27

So the high level Strassen's Algorithm has two steps,

Â just like the first recursive algorithm that we discussed.

Â It recursively computes some products of smaller matrices and

Â over to a roughly n over 2 by n over 2 matrices.

Â 13:54

In step two then is to actually produce the product of x and y.

Â Produce each of those four blocks that we saw.

Â With suitable additions and subtractions of these seven products.

Â And again, these are much less straightforward than in the first

Â recursive algorithm.

Â 14:12

And so, while the additions and subtractions involved will be a little bit

Â more numerous than they were in the naive recursive algorithm.

Â It's only going to change the work in that part of the algorithm by

Â a constant factor.

Â So we'll still spend only theta (n squared) work adding and

Â subtracting things, and

Â we get a huge win in decreasing the number of recursive calls from 8 to 7.

Â Now just in case you have the intuition that shaving off one of eight

Â recursive calls should only decrease the running time of an algorithm

Â by one-eighth by 12.5%.

Â In fact it has a tremendously more amplified effect.

Â Because we do one less recursive call over and over and

Â over again as we keep recursing in the algorithm.

Â So it makes a fundamental difference in the eventual running time of the algorithm

Â as we'll explore in detail in the next set of lectures

Â when we discuss the master method.

Â So again a bit of a spoiler alert.

Â What you're going to see in the next set of lectures that indeed

Â Strassen's Algorithm does beat cubic time.

Â It's better than n cubed time.

Â You'll have to watch the next set of lectures if you want to know just what

Â the running time is.

Â But I'm going to tell you now that savings of the eighth recursive call is what

Â changes the algorithm from cubic to subcubic.

Â 15:20

Now, 1969 was, obviously, quite a bit before my time.

Â But by all accounts, from people I've talked to who were around then and from

Â what the books say, Strassen's Algorithm totally blew people's minds at the time.

Â Everybody was just assuming that there's no way you could do better than

Â the iterative algorithm, the cubic algorithm.

Â It just seemed that matrix multiplication intuitively fundamentally required

Â all of the calculations that are spelled out in the definition.

Â So Strassen's Algorithm is an early glimpse of the magic and

Â of the power of clever algorithm design.

Â But if you really have a serious ingenuity, even for

Â super fundamental problems,

Â you can come up with fundamental savings over the more straightforward solution.

Â Solutions. So

Â 16:01

those are the main points I wanted to talk about Strassen's algorithm.

Â How you can beat cubic time by saving a recursive call with suitably chosen

Â clever products and clever additions and subtractions.

Â Maybe a few of you are wondering what are these cleverly chosen products,

Â can you really do this?

Â And I don't blame you.

Â There's no reason to believe me just because I sort of spelled out this high

Â level idea.

Â It's not obvious this should work.

Â You might want to actually see the products.

Â So for those of you like that, this last slide is for you.

Â 16:30

So here is Strassen's Algorithm in it's somewhat gory detail.

Â So let me tell you what the seven products are that we're going to form.

Â I'm going to label them P1 through P7 and they're

Â all going to be defined in terms of the blocks of the input matrices x and y.

Â So let me just remind you that we think of x in terms of its blocks A,

Â B, C, D and we think of y in terms its blocks E, F, G, H.

Â And remember a through h are all n over 2 by n over 2 sub-matrices.

Â So here are the seven products P1 though P7.

Â P1 = A(F-H),

Â 17:49

So I hope you'll agree that these are indeed only seven products.

Â And we could compute these with seven recursive calls.

Â So we've pre-processed with a little bit of additions and subtractions.

Â We have to compute F minus H, A plus B, C plus D, and so on.

Â We compute all these new matrices from the blocks.

Â And then we can recursively, with seven recursive calls.

Â Do the seven products that operate on n over 2 by n over 2 matrices.

Â Now the question is why is this useful,

Â why on earth would we want to know the values of these seven products.

Â So the amazing other part of the algorithm is that from just these seven

Â products we can using only addition and

Â subtraction recover all four of the blocks of x times y.

Â So x times y you recall we solved it in terms of blocks.

Â 18:36

So we previously computed this to be AE+BG in the upper left corner and

Â similarly expressions for the upper right, lower left or lower right blocks.

Â So this we already know.

Â So the content of the claim is that these four blocks also arise

Â from the seven products in the following way.

Â 18:57

So the claim here is that these two different expressions for

Â x times y are exactly the same, and they're the same block by block.

Â So in the other words, what the claim is that this crazy expression

Â P5+P4-P2+P6 where those are four of the products that

Â we have listed above, that is precisely AE+BG.

Â Similarly we're claiming the P1+P2 as exactly

Â AF+BH that's actually easy to see P3+P4 is CE+DG.

Â That's also easy to see.

Â And then the other complicated one is the P1+P5-P3-P7 is exactly

Â the same as CF+DH, so all four of those hold.

Â So, let me, just so you believe me because I don't know why you'd believe me unless

Â I showed you some this derivation.

Â 19:45

Let's just look at the proof of one of the cases of the upper left corner.

Â So that is let's just expand out this crazy expression, P5+P4-P2+P6.

Â What do we get?

Â Well, from P5 we get, AE+AH+DE+DH,

Â then we add P4, so that's going to give us,

Â +DG-DE, then we subtract P2,

Â so that gives us a -AH-BH, and then we add in P6, so

Â that gives us a BG+BH-DG-DH.

Â Okay so what happens next, well now we look for

Â cancellations, so we cancel the AH's we cancel the DE's,

Â we cancel the DH's, we cancel the DG's,

Â we cancel the BH's and holy cow, what do we get?

Â We get AE+BG, that is we get exactly

Â what we were supposed to in the upper left block of x times y.

Â So we just actually verified that this equation holds for the upper left block.

Â It's quite easy to see that it holds for the upper right and lower left blocks.

Â And then a comparable calculation verifies it for the lower right blocks of the two.

Â So summarizing, because this claim holds, because actually we can recover the four

Â blocks of x times y from these seven products.

Â Strassen's algorithm works in the following way.

Â You compute the seven products P1 through P7 using seven recursive calls.

Â Then you just compute the four blocks using some extra additions and

Â subtractions as shown in the claim.

Â So it's seven recursive calls on n over 2 by n over 2 matrices.

Â Plus n squared work due to the necessary additions and as you'll

Â see in the master method lecture that is actually sufficient for subcubic time.

Â 22:05

And indeed this sort of illustrates,

Â the difference between checking somebody's proof and coming up with a proof.

Â Given that I told you the magical seven products and

Â how you from them you can recover the four desired blocks of x times y.

Â It's really just mechanical to see that it works.

Â It's a totally different story of how do you come up with P1 through P7 in

Â the first place.

Â So how did Strassen come up with them?

Â Honestly, your guess is as good as mine.

Â