The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

Loading...

From the course by Stanford University

Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming

248 ratings

Stanford University

248 ratings

Course 3 of 4 in the Specialization Algorithms

The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

From the lesson

Week 4

Advanced dynamic programming: the knapsack problem, sequence alignment, and optimal binary search trees.

- Tim RoughgardenProfessor

Computer Science

So now that we have our magic formula, our recurrence, all that's left to do is to systematically solve the subproblems. As usual, it is crucial that we solve the subproblems in the right order, from smallest to largest. How should we measure the size of a subproblem in the optimal binary search tree problem? The natural way to do it is the number of items in the subproblem. So if you're starting at I and you're going till J, the number of items in that subproblem is j - i + 1 that, and that's going to be our measure of subproblem size. So let's bust out our [INAUDIBLE] array. The number of dimensions of this array is going to be two and that's because we have two different degrees of freedom for annexing subproblems, one for the start of the contiguous interval, one for the end.

So the outer for loop is going to control the subproblem size. It's going to ensure that we solve all smaller subproblems before proceeding to larger subproblems. Specifically, we'll be using an index s S and in the iteration of this outer for loop, whatever the current value of S is, we're only going to consider subproblems of size s + 1. So you should think of s as representing the difference between the larger index j and the earlier index i. The inner for loop controls the first item in the contiguous interval that we're looking at, so that's just i. And now, all we have to do is rewrite the reccurrence in terms of the array entries and with this change variable where S corresponds to j - i. That is for a given subproblem, starting with the item i and ending with the item i + s. We just by by brute force pick the best route, so the route here is going to be similar to i and i + s. Regardless of the choice of the route, we pick up the constant, the sum of the Pks where here, K, is ranging from the first item i to the last item i + s. And then we also look at the previously computed optimal solution values for the two relevant subproblems, one starting in i, ending in r - 1. The other starting in r + 1 and ending at i + s. So a couple of quick comments about the two array lookups on the right-hand side of this formula. so first of all, if you choose i to be, the root to be the first item i, then the first lookup doesn't make sense. If you choose it to be the last item, the second array lookup doesn't make sense. In that case, it's understood we're just going to interpret these lookups as zero. Of course, in an actual implementation, you'd have to include that code but, I'll let you take care of that on your own. So the second comment is just our usual sanity check and again, you should always do this when you write out a dynamic programming algorithm. When you write down your formula to populate the array entries, make sure that on the right-hand side, whenever you do an array lookup, that is indeed already computed and available for constant time lookup. So in this case, whatever our choice of the root is, the two relevant subproblems are going to involve strictly fewer items than what we started with. And therefore, the two subproblem lookups on the right-hand side will indeed have been computed in some previous iteration of the outer for loop. Remember, the outer for loop is ensuring we solve some problems from smallest number of items up to largest number of items. And of course, after the two for loops complete, what we really care about is the answer in A of one comma n, that is the optimal binary search tree value for all of the items that's the eventual output. Some students like to think about these double for loops pictorially. So let's imagine A, the 2-D array is laid out as a grid. So imagine the x-axis correspond to the index i, that is the first item in the set of items we're looking at and the y-axis corresponding to j, the last item in the current set. And let me single out the diagonal of this grid, so these are subproblems corresponding to i = j, that is subproblems with a single element. Now, we only ever solve problems where j is at least as large as i, so that means we're really only filling in the upper left or northwestern part of this table. So we never bother to fill in the southeastern, the bottom right part of this table. We just sort of think of it all as zero. Now, in the first outer iteration. So, when s0. = 0, that's when our dynamic programming algorithm solves, in turn, each of the n single item problems. So, in the first iteration of this double for loop, it's going to solve the subproblem A11. In the next iteration of the inner for loop, it's going to proceed the A22, then A33, and so on. In each of those, both of the array lookups are going to just correspond to zero and we're just going to fill in this diagonal with the base cases, where Aii is just the probability of item i. Then, as the dynamic programming algorithm proceeds, we're going to be filling in the upper left portion of this table diagonal by diagonal. Each time we increment s, the index in the outer for loop, we're going to march up to the next northwesternmost diagonal, and then as we step through the possible values of i, we're going to fill in that diagonal one at a time moving from southwest to northeast. When we're filling in the value of a subproblem on one of these diagonals, all we need to do is lookup the value for two subproblems on lower diagonals. Lower diagonals correspond to subproblems with strictly fewer items. So that's it. That's the dynamic programming algorithm that computes the value of an optimal binary search tree given a set of items with probabilities. I'm not going to say anything about correctness. It's the, the same story as we've seen in the past. All the heavylifting is improving the optimal substructure lemma, that gave us the correctness of our occurrence given that our magic formula is correct and we're just applying it systematically, correctness of the dynamic programming algorithm follows in a straightforward way, just by induction. Let me, however, make some comments about the running time. So, let's just follow the usual procedure. Let's just look at how many subproblems got to get solved and then how much work has to get done to solve each of those subproblems. So as far as the number of subproblems, it's all possible choices of i and j, where i is at most j, or in other words, it's essentially half of that n by n grid. So this is roughly n squared over two, let's just call it theta of n squared, so a quadratic number of subproblems. Now, for each of the subproblems, we have to evaluate this recurrence, we have to evaulate the formula, which conceptually is a breadth-first search through the number of candidates that we've identified. And a disctinction between this dynamic programming algorithm and all of the other ones we've seen recently, sequence allignment, knapsack, computing independent set of the line graphs, is that it's actually kind of a lot of options for what the optimal solution can be. That is, our breadth-first search, for the first time, is not over a merely constant number of possibilities. We have to try every possible route, each of the items in our given subproblem is a candidate route and we try them all. So, given a start item of i and an end item of j, there's j minus i plus one total items and we have to do constant work for each one of those choices. So there will be subproblems, some subproblems that we can evaluate quickly and only say constant time if i and j are very close to each other, but for a constant fraction of the subproblems we have to deal with, this is going to be linear time, theta of n time. So over all, that gives us a cubic running time, theta of n cubed. Alright, so I would say this running time is sort of okay, not great. So it is polynomial time, that's good. That's certainly way, way, way faster than enumerating all of the exponentially many possible binary search trees, so it blows away breath-first search. But it's not something I would call blazingly fast or for free primitive or anything like that. So you're going to be able be to solve problem sizes with n in the 100's, but probably not n in the 1000's. So that will cover some applications where you'd want to use this optimal binary search tree algorithm, but not all of them. So it's good for some things, but it's not a universal solution. On the other hand, here's a fun fact.

And the fun fact is you can actually speed up this dynamic programming algorithm significantly. You can keep with the exact same 2-D array with the exact same semantics. Again, each index is going to correspond to the subproblem with the optimal binary search tree between items i and j inclusive. But, you can actually fill up this entire table all n squared entries using only a total of n squared time. That is on average, constant work per subproblem. So this fun fact, it's very clever. It's really more intricate than what we discussed in this video here, but it, it's not impossible to read. So if you're interested, I encourage you to go back to the original papers or search the web for some other resources on this optimized speed up version of this dynamic programming algorithm. I mean, at a very high level, sort of from 30,000 feet, the goal is to avoid doing this breadth-first search over all possible routes in every single subproblem. And it turns out there's structure, nice structure in this optimal binary search tree problem that allows you to piggyback on the work done in smaller subproblems. So, in smaller subproblems, you already searched over a bunch candidate roots and it turns out using the results of those previous breadth-first search is, you can make inferences about which subset of the current set of roots might conceivably be the ones that determine the recurrence. And so, that lets you avoid searching over all of the possible candidates for the roots, instead focusing just on a very small set. In fact, the average, on average, constant number of possible roots over all of the subproblems. And needless to say, this speeding up of the running time from cubic to quadratic really significantly increases the problem sizes that you can now apply this algorithm to. So now, instead of being stuck in the hundreds, you'd certainly be able to solve problem sizes in the 1000's, possibly even in the 10,000's using this quadratic time algorithm. Very cool.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.