The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

Loading...

From the course by Stanford University

Greedy Algorithms, Minimum Spanning Trees, and Dynamic Programming

375 ratings

The primary topics in this part of the specialization are: greedy algorithms (scheduling, minimum spanning trees, clustering, Huffman codes) and dynamic programming (knapsack, sequence alignment, optimal search trees).

From the lesson

Week 2

Kruskal's MST algorithm and applications to clustering; advanced union-find (optional).

- Tim RoughgardenProfessor

Computer Science

I need to begin by setting up some notation and definitions.

Â The first one is that of rank blocks. So if we're given value of N, the number

Â of objects in the data structure we define the rank blocks as follows.

Â 0 gets its own block as does 1. The next block is 2.

Â 3. 4.

Â The next block starts at 5 obviously and the last entry is going to be 2 raised to

Â the fourth. So 2 raised to the final.

Â The rank of the previous block as known as 16.

Â The nest block starts at 17 and goes up to 2 raised to the last rank of previous

Â blocks. So 2 raised to the 16 also known as

Â 65536, and so until we have enough rank blocks that we capture.

Â N. Actually, if you think about it, because

Â ranks can only be log n, we only have to go up to log n, but that actually doesn't

Â affect anything by more than a constant. So let's just go ahead all the way up to

Â So, how many rank blocks are there? Well, focus just on the largest member of each

Â of the rank blocks. The largest member of a given rank block

Â is the largest member is 2 raised to the largest member of the previous rank

Â block. So, for the t-th rank block, roughly what

Â you have is this largest entry is 2 to the 2 to the 2.

Â Raised t times. How many times you need to do that before

Â you get in? By definition, you need log*(n).

Â So, that's how many rank block star, that's where log*(n) enters the analysis.

Â So, I realized this must be a very inscrutable definition, these weird rank

Â blocks. So, let me tell you how you should think

Â of about them. They're meant to encode an intuition that

Â we had on the previous slide where we said well, you know we should be happy if

Â the rank of an object's parent is a lot bigger than the rank of that object

Â itself. Why? Well, when we traverse that object's

Â parent pointer, we make a ton of progress through the rank space and you can only

Â make a ton of progress through the rank space a limited number of times.

Â So that means a small number of parent traversals, that means fast fines.

Â So these rank blocks are just a very clever way of defining sufficient

Â progress. When we're going to be happy with a gap

Â between the rank of an object and the rank of its parent.

Â Specifically we're happy whenever these 2 ranks lie in different.

Â Rank blocks. We're not happy if they lie on the same

Â rank block. So, for example, if you're at an object

Â and it has rank 8, and together with his parents, it has rank 18, then we're happy

Â because 18 is in the rank block after the one that contains 8.

Â On the other hand, if you go from rank 8 to rank 15, then we're not happy, okay?

Â We're going to call that not a big gap Between the rank of the object and the

Â object's parent. So let's build on that idea to make

Â another definition. So consider a given snapshot in time.

Â That is, we've done some sequence of finds, we've done some sequence of

Â unions. So, we're going to call some objects at

Â this point in time good and some of them bad.

Â Here are the criteria by which you are good.

Â So first of all, if you're the root of your tree, you're going to be good.

Â If you're a direct descendant of the root, that is if your parent is the root

Â of your tree, then you're also good, If not though,if you're deeper in the tree,

Â then you're going to be good only if your parents ranked Rank is in a strictly

Â larger ranked block, than your own rank. So, that is in the previous example.

Â If your rank is 8, and your parents is 18, than you're good.

Â If your rank is 8, and your parents is 15 and your parent's not a root, then you're

Â going to be bad. The role of this definition is to split

Â the work that we do into two parts. First of all, we do some work visiting

Â good nodes. Second of all, we do some work vising bad

Â nodes. The work we do visiting good nodes would

Â be very easy to bout, no problem. We'll have to do a separate global

Â analysis to bound the total work that we do visiting bad nodes.

Â This dichotomy is exactly the same as something we faced when analyzing

Â Kruskal's algorithm when we implemented it using the union 5 data structure with

Â eager unions. Here, you recall there were some parts of

Â the work in Kruskal's algorithm that were easy to balance, just iteration by

Â iteration. For example every cycle check cost only

Â at constant time but then there is this more complicated type of work, namely all

Â of leader pointer updates that we had that bound globally via separate

Â argument. The exact same thing is going to happen

Â here. Good nodes will be able to bound

Â operation by operation whereas the bad nodes will have a global analysis control

Â the total work done over all of the operations.

Â So more precisely, I've set up the definitions so that the work done

Â visiting good nodes is bounded by O(log* n) every single operation.

Â Indeed, how many good nodes could you possibly visit during a find operation?

Â Say, from some object x. So you start at x, and you traverse these

Â parent pointers going all the way up to the roots.

Â Well, there's the roots. There's the descendant of the roots, so

Â that's 2. So let's set those aside.

Â What about the other good nodes that you encounter on your path? Well, by

Â definition, when you visit a good node. The rank of its parent is on the bigger

Â rank block than the rank of that node itself.

Â That is, every time you traverse a parent pointer from a good node, you will

Â progress to a subsequent rank block. Well, assuming that log star n rank

Â blocks, so you can only progress through subsequent log star n times.

Â So, the total number of good nodes you're going to see is O(log*n).

Â So now, let's go ahead and express the total amount of the work that we do

Â overall to find a union operations, and use two quantities.

Â Work on the good nodes, which we now know as just log star in each operation.

Â Plus, the visits to the bad nodes, which at the moment, we have no idea how big it

Â is. So now, let's proceed to the global bound

Â on the total amount of work that we performed on bad nodes and this is really

Â the crux of the whole theorem. [SOUND] So, let's just review the

Â definitions real quick. What is that mean that you're a bad node?

Â So, first of all, you're not the root Second of all you're not a direct

Â descendant of the root. That is you have a grandparent.

Â And third, it is not the case that your parents' rank is in a later rank block.

Â Is in exactly the same rank block as your own rank.

Â That's what it means that your. So how much work do we spend on bad

Â nodes? Well let's analyze it one rank block at a time.

Â So, fix an arbitrary rank block, let's say for some integer K, it's smallest

Â rank is K + 1 and it's biggest rank is 2 to the K.

Â Now I told you what our two main building blocks were.

Â First 1 is the rank lemma, I am going to ask you to remember that in a second.

Â But first I want to put to use our other building block, which is that path

Â compression increases the rank gap between objects and their parents.

Â That's what we're going to use right now. Specifically, consider a fine operation

Â and a bad object x that it visits. By virtue of being bad x is not the root

Â and is not a direct descendant of the root.

Â So, root is a higher up ancestor than its parent.

Â Therefore, x's parent will be changed In the subsequent path compression.

Â It will be rewired to point directly to the root, a strict ancestor of its

Â previous parent. Therefore, the rank of its new parent

Â will be strictly bigger than the rank of its previous parent.

Â That's going to keep happening every time that x is visited, y looks bad.

Â It keeps getting new parents, and those new parents keep having ranks strictly

Â bigger than the previous one. Well, how many times can this happen

Â before the rank of x's parent has grown so big that it lies in a subsequent rank

Â block? The biggest value in x's rank block and remember x is a nonroot its

Â rank is frozen forever. So it's always stuck in this rank block.

Â Once its parents rank gets updated let's say at least 2^k times then the rank has

Â to be so big that it lies. In the next rank block.

Â At that point, x is no longer bad. Its parent pointer makes so much

Â progress, it goes to another rank block. Now, we're going to call it good.

Â And, of course, once x becomes good in this fashion, it would be good

Â forevermore. It is not a root, it will never be a root

Â again. Its rank is frozen forever and its

Â parent's rank can only go up. So, once you're good, once your parent's

Â rank is sufficiently large. It will be sufficiently large for the

Â rest of time. Alright, so we're almost there.

Â Let's just make sure we don't forget anything that we have done.

Â So, the total work we're bounding in this two ways.

Â So, first of all, we visit log star n good nodes for each operation.

Â So, overall m operations at O(m). log*n for the good nodes, plus there's a

Â number of visits over the m operations to the bad nodes.

Â We're going to bound that wrok globally but we're going to proceed rank block by

Â rank blocks. We're fixing one ranks k+1 up to 2^k.

Â What we shown on the last slide is that for every object x whose final frozen

Â rank lies somewhere in this rank block. The number of times it gets visited while

Â it's bad, the number of times it could be visited before it becomes good forever

Â more is bounded above by 2^k. So, if now you used one of our key

Â building blocks, that path compression increases the ranks of parents, now let's

Â use the other building block, the rank lemma.

Â The rank lemma, if you recall, says, that in any given moment in time and for any

Â possible rank r, the number of objects that currently have rank r cannot be any

Â bigger than n/2^r. So, it's used the rank lemma and apply it

Â to the upper bound, how many nodes Could possibly have their final frozen ranks

Â lengths somewhere in this rank block. They have their final frozen ranks

Â somewhere between k+1 and 2^5. So, let's just sum over all of the ranks

Â in the rank blocks, so it's starting at 2k+1 going up to 2^k.

Â By the rank lemma, for a given value of i, we noticed the most n/2^i objects that

Â eventually wind up with rank i and by the usual geometrical sum, this whole thing

Â can be upper bounded by n / 2^k. So, this is now just trying to look like

Â a magical coincidence. Of course, we made a number of

Â definitions in the analysis. specifically, we structured the rank

Â blocks so that this magic would happen. Specifically, the number of inhabitants

Â of a rank block, n/2^k, times the maximum number of visits to an inhabitant in the

Â rank block wile they're bad, times 2^k, that's actually independent of the rank

Â block. We multiply these two things together,

Â the number of visits per object to the k, the number of objects n/2^k and what do

Â we get? We get n. Now, this was only counting the visits to

Â bad objects in a given rank block but there aren't many rank block, remember,

Â there's only log*n of them. So, that means the total amount of work

Â spent visiting the bad nodes, some that over the rank blocks is O(m*log*n).

Â Combining the balance of the good nodes and the bad nodes, we get (m+n) log*n.

Â At the beginning, I mentioned that the interesting case is when m is big Omega

Â of n, in that case, this bound is just O(m*log*n).

Â Essentially, if you have a very sparse set of union operations, you can just

Â apply this analysis to each of the directed trees separately.

Â So, that's it, that's the full story of complete analysis of the brillaint

Â Hopcroft only analysis. Log star and on average operation time,

Â under calf compression. Brilliant as it is, you can do even

Â there. That's the subject of the next couple

Â videos.

Â Coursera provides universal access to the worldâ€™s best education,
partnering with top universities and organizations to offer courses online.