So let me explain what I hope is clear and what is maybe unclear at this

juncture. I hope what's intuitively clear is that

the bottom of approaches, a systematic way to build trees that have a prescribed

set of leaves. So what do we want?

We want trees whose leaves are labeled with the symbols of the alphabet sigma.

So if we have an alphabet with n symbols, we're going to start with just the N

leaves. What does a merger do?

Well, there's two things. First of all, it introduces a new

internal node, an unlabelled node. And secondly, it takes two of our old

subtrees and fuses them into one, merges them into one.

We take two subtrees, we make one the left child of this new internal node, the

other, the right child of this new internal node.

So that drops the number of subtrees we're working with by one.

So if we start with the N leaves and we do N minus one successive mergers, what

happens? Well on the one hand, we introduce N

minus one new unlabeled internal nodes. And on the other hand, we construct a

single tree. And the leaves of this tree that we get

are in one to one correspondence with the alphabet letters as desired.

Now what I don't expect you to have an intuition for is what should we be

merging with what and why? Forget about, you know, how do we get an

optimal tree at the end of the day. I mean, even just if we wanted to design

a greedy algorithm. If we just wanted to make a myopic

decision, that looks good right now, how would we even do that?

What's our greedy criteria that's going to guide us to merge a particular pair of

trees together? So we can re-frame this quandary in the

same kind of question we asked for minimum cost spanning trees and really

more generally with greedy algorithms. When you're making irrevocable decisions

which strikes fear in your heart, is that this decision will come back and haunt

you later on. You'll only realize at the end of the

algorithm that you made some horrible mistake early on in the algorithm.

So just as for MST's, we ask, you know, when can we be sure that including an

edge irrevocably is not a mistake, it's safe in the tree that we're building?

Here we want to ask, you know, we have to do a merger, we want to do successive

mergers and how do we know that a merger is safe?

That it doesn't prevent us from eventually computing an optimum solution.

Well here's the way to look at things that will at least give us an intuitive

conjecture for this question. We'll save the proof for the next video.

So what are the ramifications when we merge two subtrees, each containing a

collection of symbols? Well, when we merge two subtrees, we

introduce a new internal node which unites these two subtrees under them, and

if you think about it, at the end of the day on the final tree, this is yet

another internal node that's going to be on the root to leaf path,

for all of the leaves in these two sub trees.

That is, if you're a symbol and you're watching your subtree get merged with

somebody else. You're bummed out, your like, man that's

another bit in my encoding. That's yet one more node I have to pass

through to get back to the root. I think this will become even more clear

if we look at an example. So naturally, we'll use our four symbol

alphabet A, B, C, D. And initially, before we've merged

anything, each of these is just its own leaf A, B.

C, D. So there's no internal nodes above 'em.

In the sense, everybody's encoding length at the beginning is zero bits.

So now, imagine we've merged C and D, we introduce a new internal node, which is

the common an-ancestor of these two leaves.

And as a result, C and D are bummed out. They said, well, there's one bit that

we're going to, to have to incur our encoding length, there's one new internal

node we're always going to have to pass through enroute back to the root of the

eventual tree. Now suppose next we merge B with the

subtree containing both C and D. Well everybody but A is bummed out about,

about this merger because we introduce another internal node, and it contributes

one bit to the encoding of each of B, C, and D.

It contributes an extra one to the encoding of C and D, and it contributes a

zero to the encoding of B. So all of their encodings in some sense

inc, get incremented, go up by one, compared to how things were before.

Now in the final iteration, we have to merge everything together.

We have no choice, there's only two sub-trees left.

So here, everybody's encoding length gets bumped up by one.

So, A finally picks up a bit at zero to encode it and B, C, and D each pick up an

additional bit of one, which is prepenned into their encodings thus far.

And, in general what you'll notice is that the final encoding length of a

symbol, is precisely the number of mergers that its subtree has to endure.

Every time your subtree gets merged with somebody else you pick up another bit in

your encoding, and that's because there's one more internal node that you're going

to have to traverse enroute to the root of the final tree.

In this example, the symbols C and D, well they got merged with somebody else

in each of the three iterations. So, they're the two symbols that wind up

with the encoding length of three. The symbol B, it didn't get merged with

anybody in the first iteration, only the second two.

That's why it has an encoding length of two.

So this is really helpful. So this, lets us relate, the operations

that our algorithm actually does, namely mergers back to the objective function

that we care about, namely the average encoding length.

Mergers increase the encoding lengths of the participating symbols by one.

So this allows us to segway into a design of a greedy heuristic for how to do

mergers. Let's just think about the very first

iteration. So we have our N original symbols, and we

have to pick two to merge. And remember the consequences of a merge

is going to be an increase in the encoding length by one bit, whichever two

symbols we're going to pick. Now we want to do is minimize the average

encoding length with respect to the frequencies that were given.

So which pair of symbols are we the least unhappy to suffer an increment to their

encoding length, was going to be the symbols that are the least frequent.

That's going to increase your averaging poding length by the least.

So that's the green merging hiuristic. Somethings gotta increase.

Pick the ones that are the least frequent to be the ones that get incremented.

So that seems like a really good idea of how to proceed in the first iteration.

The next question is, how are we going to recurse?

So, I'll let you think about that in the following quiz.