The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

Loading...

From the course by Stanford University

Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming

513 ratings

The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

From the lesson

Week 3

Huffman codes; introduction to dynamic programming.

- Tim RoughgardenProfessor

Computer Science

So I know you're probably catching your breath from that last computation.

So let's zoom out. Let's make sure we don't lose the forest

for the trees. And see that we're actually mostly there.

We're just one lemma away from finishing the Proof of Theorem.

The proof of correctness of Huffman's algorithm.

So what do we have going for us. Well, first of all we've got the

inductive hypothesis. Remember and any proof by induction you better rely very

heavily on the inductive hypothesis. What does it say?

It says when Huffman's algorithm recurses on the smaller subproblem with the

smaller alpha bit signal prime, it solves it optimally.

It returns a tree which minimizes the average encoding length with respects to

the alpha bit sigma prime in the corresponding frequencies.

So, we're going to call that tree, the output of the recursive call, T prime

hat. So let me amplify the power of this

inductive hypothesis by combining it with the computation we did in the last slot.

So what do we know so far? We know for this smallest subproblem

which frankly, we don't care about in its own right.

But, nevertheless, for this smaller sub problem, with alphabet sigma prime, the

recursive call solves it optimally. It returns to us, this tree T hat prime.

And among all trees with leaves inable according to sigma prime.

This tree is as good as it gets, it minimizes the average encoding length.

But what have we learned in the last slide?

We learned that there's a one to one correspondence between feasible

solutions, between trees for the smallest subproblem with the alphabet sigma prime,

and feasible solutions to the original problem, the one we care about, that have

a certain form in which, it just so happens that A and B are siblings, that

they share a common parent. Moreover, this correspondence preserves

the objective function value, the average encoding length or at least it preserves

it up to a constant which is good enough for our purposes.

So the upshot is, in minimizing average encoding length

over all feasible solutions for the smallest subproblem, a recursive call is

actually doing more for us. It's actually minimizing the average

encoding length for the original problem with the original alphabet sigma over a

subset of the feasible solutions, the feasible solutions in which A and B are

siblings. Now, does this do us any good?

Well, it depends, what was our original goal?

Our goal was to get the smallest average encoding if possible over all feasible

solutions in the world. And so what have we just figured out?

We figured out, well we're getting the best possible scenario amongst some

solutions. Those in which A and B happen to be

siblings. So, we're in trouble if there is no

optimal solution in which A and B are siblings.

Then it doesn't do us any good to optimize only over these crappy

solutions. On the other hand, if there is an optimal

solution lying in this set XAB in which A and B are siblings, then we're golden.

Then there's an optimal solution in this set, while recursive call is finding the

best solution in the sets. So, we're going to find an optimal

solution. So that's really the big question.

Can we guarantee that it was safe to merge A and B in the very first

iteration? Can we guarantee there is an optimal solution, a best possible tree

that also has that property? That also has A and B as siblings.

So the key level, which I approve in the next slide in which we'll complete the

proof of the correctness of the apartment spherum, is indeed there is always an

optimal solution in the set XABA an optimal solution on which A and B, the

two lowest frequency symbols are indeed siblings.

So I'll give you a full-blown proof of this Key Lemma in the next slide.

But let me first just orientate you and you know, just develop your intuition

about why you'd hope this Key Lemma would be true.

The intuition is really the same as that as when we developed the greedy algorithm

in the first place, namely when you which symbols do you want to incur the longest

encoding lengths? Well, you want them to be the lowest

frequency symbols, that you want to minimize the average.

So if you look at any old tree, there's going to be you know, some

bottommost layer. There's going to be some deepest possible

leaves. And, you really hope that A and B, the

lowest frequency leaves are going to be in that lowest level.

Now they might both be in the lowest level and not, and might not be siblings.

But, if you think about it, amongst symbols at the same level, they're all

suffering exactly the same encoding length anyways.

So you can permute them in any way you want and it's not going to affect your

average encoding length. So once you put A and B at the bottommost

level you may as well put them as siblings and it's not going to make any

difference. So, the final punchline being, you can

take any tree like say an optimal tree. And by swapping.

Changing the lowest frequency elements A and B to the bottom and to be common

siblings, you can't make the tree worse you can only make it better, and that's

going to show that there is an optimal tree with the design property where A and

B are in fact siblings. All right, so that's a reasonably

accurate intuition, but it's not a proof. Here is the proof,

so we're going to use an exchange argument.

We're going to take an arbitrary optimal tree a tree minimizing the average

encoding length. Let's call it T star,

there's may be many such trees, just pick any one, I don't care which.

And the plan is to show an even better tree or at least as good tree which

satisfies the properties that we want where A and B show up as siblings.

So lets figure out where we're going to swap the symbols A and B to.

So we've got our tree, T star. And let's look at the deepest level of T

star. And let's find our, you know, any pair of

siblings at this deepest level, call them X and Y.

So in this green tree on the right we have two choices of pairs of siblings at

the lowest level. It doesn't matter which one we pick.

So let's just say I pick the leftmost pair,

call them X and Y. So A and B have to be somewhere in this

tree. Let me just pick a couple of the leaves

to be examples for where A and B might actually be in T star.

And so now we're going to execute the exchange.

We're going to obtain a new tree T hat from T star, merely by swapping the

labels of the leaves X and A and similarly swapping the labels of the

leaves Y and B. So after this exchange we get a new valid

tree. So once again, you know, the leaves are

labeled with the various symbols of sigma.

And, moreover, by our choice of X and Y, we've now via this exchange forced the

tree to conform to the property. We've forced A and B to be siblings.

They take the original positions of X and Y which were chosen to be siblings.

So to finish the proof, we're going to show that the averaging coding length of

T hat can only be less than T star. Since T star was optimal, that means that

T hat has to be optimal as well, since it also satisfies the desired property that

A and B are siblings, that's going to complete the proof the, the Key Lemma and

therefore the proof of the correctness of Huffman's algorithm.

The reason intuitively is that when we do the swap, A and B inherit the previous

debts of X and Y and conversely X and Y inherit the old debts of A and B.

So A and B can only get worse and X and Y can only get better, and X and Y are the

ones with the higher frequencies. So the overall cost benefit analysis is a

net win and the tree only improves. But, to make that precise, let me just

show you the exact computation. So let's look at the difference between

the average encoding length given by the 3T star and that after the swap given by

the 3T hat. So analogously, to our computation on the

previous slide most of the stuff cancels because the trees T star and T hat

frankly just all that aren't all that different.

The only things that are different are the positions of the symbols A, B, X, and

Y. So we're going to have four positive

terms contributed by T star for these four symbols.

And we're going to have four negative terms contributed by the tree T hat again

for these exact same four symbols. So let me just show you,

what happens after the dust settles and after you take away all the cancellation?

You can write the eight terms that remain in a slightly sneaky but you know,

clarifying way as follows. So we're going to have one product

representing the four terms involving the symbols A and X, and then an analogous

term for the four terms involving B and Y.

So in the first term we look at the product of two differences.

On the one hand we look at the difference between the frequencies of X and A.

Remember those are the two things that got swapped with each other.

And then we also look at the difference between their depths in the original tree

T star. And then because B and Y got swapped we

have an entirely analogous term involving them.

The reason I re-wrote this difference in a slightly sneaky way, is it now becomes

completely obvious, what role the various hypotheses play.

Specifically, why is it important that A and B have the lowest possible

frequencies? We actually haven't used that yet in our

proof, but it's fundamental to our greedy algorithm.

Secondly, why did we choose X and Y to be in the deepest level of the original tree

T star. Well.

Let's start with the first fare products. The differences between, between

frequencies. So the frequency of X minus the frequency

of A, and similarly between Y and B. Well because A and B have the smallest

frequencies of them all. These differences have to be

non-negative. The frequencies of X and Y can only be

more. The second pair of differences also must

be non-negative and this is simply because we chose X and Y to be at the

deepest level. Their depths have to be at least as large

as that of A and B in the original 3T star.

Since all four differences are nonnegative, that means the sum of their

products is also nonnegative. That means the average encoding length of

T star is at least as big as T hat. So that's where T hat has to also be

optimal. It in addition satisfies the desired

property that A and B are siblings. And so that means our recursive call,

even though we've committed to merging A and B, we've committed to returning a

tree in which their siblings that was without loss, that was a safe merge.

That's why Huffman's Algorithm at the end of the day can return an optimal tree,

one that minimizes the average encoding length.

We're all possible binary prefix-free codes.

So let me just wrap up with a few notes about how to implement Huntsman's

algorithm and what kind of running time you should be able to achieve.

So if you just naively implement the pseudocode I showed you, you're going to

get a not particularly great quadratic time algorithm that moreover uses a lot

of recursion. So why is it going to be quadratic time?

Well, you have one recursive call each time.

And you always reduce the number of symbols by one.

So the total number of recursive calls is going to be linear in the number of

symbols in your alphabet in. And how much work did you do in each

recursive call, not counting the work done by later calls?

Well, you have to search to figure out the lowest frequency symbols A and B.

So that's basically some minimum computations.

So that's going to take linear time for each of the linear recursive calls.

Now the fact that we keep doing these repeated minimum computations each

recursive call, I hope triggers a light bulb, that maybe a data structure would

be helpful. Which data structure's raison d'etre is

to speed up repeated minimum computations?

Well as we've seen, for example, in Dijkstra's and Prim's, Prim's Algorithms,

that would be the heap. So of course, we need to find a heap, you

have to say what the keys are. But since we always want to know what the

symbols with the smallest frequencies are, the obvious thing to do is to use

frequencies as keys. So the only question then is what happens

when you do a merge? Well, the obvious thing is to just to add

the frequencies of those two symbols and reinsert into the heap the new meta

symbol with its key equal to what? The sum of the keys of the two elements

that you just extracted from the heap. So not only is that going to work great,

it's going to be N log N time, it's very easy to implement in iterative manner.

So that's starting to be a really blazing fast implementation of Huffman's

algorithm. In fact you can do even better.

You can boil down an implementation of Huffman's algorithm to a single

indication of a sorting sub-routine followed by a merely linear amount of

work. Now in the worst case you're still going

to be stuck asymptotically with an N log N of limitation because you can't sort

better than N log N if you know nothing about what you're sorting.

But in this implementation the constants are going to be even better, and as we

discussed in part one, there are sometimes special cases where you can

beat N log N for sorting. So for example if all, all of your data

is representable with a small number of bits, then you can do better than N log

N. So I'm going to leave this even faster

implemtation as a somewhat nontrivial exercise.

I'll give you a hint though, which is, if you do sorting as a pre-processing step

you actually don't even need to use a heap data structure for this

implementation. You can get away with an even more

primitive data structure of a queue. You might however, want to use two

queues. So, next time you find yourself in line

for coffee, that's a nice thing to think about.

How do you implement Huffman's Algorithm merely with one invocation to sorting, so

that the symbols are sorted by frequency, followed by linear work using two queues?

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.