0:00

So, in this video, we're gonna take it at least partway to the next level for the

Â heap data structure. That is, we'll discuss some of the implementation

Â details. I.e., how would you code a, a heat data structure from scratch? So

Â remember what the point of a heap is. It's a container, and it contains objects, and

Â each of these objects, in addition to possibly lots of other data, should have

Â some kind of key. And we should be able to compare the keys of different objects; you

Â know, Social Security numbers for different employers, edge weights for

Â different edges in a network, or time stamps on different events, and so on. Now

Â remember, for any data structure the number one thing you should remember is

Â what are the operations that it supports, i.e., what is the data structure good for.

Â And also, what is the running time you can count on of those operations. So something

Â we promised in a previous video. [inaudible] Indicate the implementation of

Â the miss video is these two primary operations that a heap exports. First of

Â all, you can insert stuff into it, and it takes [inaudible] logarithmic in the

Â number of objects that the heap is storing. And second of all, you can

Â extract an object. That has the minimum key value, and again we're going to allow

Â duplicates in our, in our heaps, so if there's multiple objects, then I'll have a

Â minimum, a common minimum key value, the heap will return one of those. It's

Â unspecified, which one. As I mentioned earlier, you can also dress up heaps with

Â additional operations, like you can do batch inserts, and you can do linear time

Â rather than log in time. You can delete from the middle of the heap. I'm not gonna

Â discuss those in the video; I'm just going to focus on how you'd implement inserts

Â and extract [inaudible]. If you wanna know how heaps really work, it's important to

Â keep in mind simultaneously Two different views of a heap One, as a tree And one, as

Â a, array. So we're gonna start on this slide with the tree view. Conceptually,

Â this will be useful to explain how the heap ope rations are implemented. A

Â conceptual will think of a heap not just as any old tree, but as a tree that's

Â rooted. It'll be binary. Meaning that, each node will have zero, one or two

Â children nodes And third, it will be as complete as possible. So let me draw for

Â your amusement an as complete as possible binary tree that has nine nodes. So if the

Â tree had, had only seven nodes it would have been obvious what, is complete as

Â possible means. It would have meant we just would have had three completely

Â filled in levels. If it had, had fifteen nodes it would have had four completely

Â filled in levels. If you're in between, these two numbers that are powers of two

Â minus one, well we're going to call a complete to the tree as just in the bottom

Â level you fill in the leaves from left to right. So here the two extra leaves on the

Â fourth level are both, pushed as far to the left as possible. So, in our minds,

Â this is how we visualize heaps. Let me next define the heat property. This

Â imposes an ordering on how the different objects are arranged in this tree

Â structure. The heap property dictates that at every single node of this tree it

Â doesn't matter if it's the root if it's a leaf if it's an internal node whatever. At

Â every node X the key of the object stored in X should be no more than the keys of Xs

Â children. Now X may have zero children if it's a leaf it may have one child or it

Â may have two children whatever those cases zero one or two children all those

Â children keys should be at least that of key at X. For example, here is a heap with

Â seven nodes. Notice that I am allowing duplicates. There are three different

Â objects that have the key value four, in this heap. Another thing to realize is

Â that while the heap property imposes useful structure on how the objects can be

Â arranged it in no way uniquely pins down their structure. So this exact same set of

Â seven keys could be arranged differently and it would still be a heap. The

Â important thing is that, in any heap, the root has to have a minimum value key. Just

Â like in the se two organizations of these seven keys, the root is always a four, the

Â minimum key value. So that's a good sign, given that one of the main operations

Â we're supposed to. Quickly implement, is to extract the minimum value. So at least

Â we know where it's going to be, it' gonna be at the root of a heap. So while in or

Â minds we think of heaps as organized in a tree fashion, we don't literally implement

Â them as trees. So in something like search trees, you actually have pointers at each

Â node and you can traverse pointers to go from a [inaudible] to the children from

Â the children to the parents, yada, yada, yada. Turns out it's much more efficient

Â in a heap to just directly implement it as an array. Let me show you by example how a

Â tree like we had on the last slide maps naturally onto an array representation. So

Â let's look at a slightly bigger heap, one that has nine elements. So let me draw an

Â array with nine positions. Labeled one, two, three all the way up to nine And the

Â way we're going to map this tree which is in our mind to this array implementation

Â is really very natural. We're just going to group the nodes of this tree by their

Â level. So, the root is gonna be the only node at level zero. Then the children of

Â the roots are level one, their children constitute level two, and then we have a

Â partial level three, which is just these last two notes here. And now we just stick

Â these notes into the array, one level at a time. So the roots winds up in the

Â privileged first position, so that's going to be the, the first, the object which is

Â the first copy of the four. Then we put in the level one object, so that's the second

Â copy of the four and the eight, and then we put in level two Which has our third

Â four along with the two nines. And then we have the last two notes from level three

Â rounding out the penultimate and final position of the array. And you might be

Â wondering how it is we seem to be having our cake and eating it, too. On the one

Â hand we have this nice, logical tree structure. On the other hand we have this

Â array implementation and we're not wasting any space on the usual pointers you would

Â have in a tree to traverse between parents and children. So where's the free lunch

Â coming from? Well the reason is that because we're able to keep this binary

Â tree as balanced as possible, we don't actually need pointers to figure out who

Â is whose parent and who is whose child. We can just read that off directly From the

Â positions in the array. So, let me be a little bit more specific. If you have a

Â node in the fifth position, I'm assuming I here is not one, right? So the, the root

Â doesn't have any, does not have a parent But any other, any other objects in

Â position I does have a parent And what the position that is, depends on, in a simple

Â way on whether I is even or odd. So if I is even, then the parent is just

Â [inaudible] the position of I/2, and if I is odd, then it's going to be I/2. Okay,

Â that's a fraction. So we take the floor that is we round down to the nearest

Â integer. If I is odd, So for example, the objects in positions two and three have as

Â their parent the object in position one, and those in four and five have the one in

Â position two as his parent. Six and seven have as their parents the node in, the

Â object in position three and so on And of course we can invert this function we can

Â equally easily determine who the children are of a given node so if we have an

Â object in position I. Then you notice that the children of I are gonna be at the

Â position 2i and 2i+1. Of course those may be empty so if you have a leaf of course

Â that doesn't have any children and then maybe one node that has only one child.

Â But in the common case of an internal node it's gonna be two children that are gonna

Â be in positions 2i and 2i+1. So rather than traversing pointers it's very easy to

Â just go from a node to its parent to either one of its children just by doing

Â these appropriate trivial calculations with respects to its position. So this

Â slide illustrates, some of the. Lower level reasons that heaps are quite a

Â popular data struct ure So the first one, just in terms of storage. We don't need

Â any overhead at all. We are just. We have these objects; we're storing them directly

Â In an array, with no extra space. Second of all, not only do we not have to have a

Â space for pointers But we don't even have to do any traversing. All we do are these

Â really simple. Divide by two or multiply two operations And using bit shifting

Â tricks. Those can also be implemented extremely quickly. So, the next two slides

Â let me indicate, at a high level, how you would implement the two exported

Â operations, namely insertion and extract [inaudible] in time algorithmic in the

Â size of the heap And rather than give you any pseudo code, I'm just going to show

Â you how these work by example. I think it will be obvious how they extended the

Â general case. I think just based on this discussion, you'll be able to code up your

Â own versions, of insert and extract [inaudible], if you so desire. So let's

Â redraw the 9-node heap that we had on the previous slide and again, I'm gonna keep

Â drawing it as a tree and I'm gonna keep talking about it as a tree but always keep

Â in mind the way it's really implemented is in terms of these array and when I talk

Â about the parent of a node, again, what that means is you go to the appropriate

Â position given the position of the node from where you started. So let's suppose

Â we have an existing heap, like this blue heap here And we're called upon to insert

Â a new object. Let's say with a key value K. Now remember heaps are always suppose

Â to be perfectly balanced binary trees. So if we want to maintain the property that

Â this tree is perfectly balanced is pretty only one place we can try to put the new

Â key K and that's as the next leaf. That is it's going to be the new right most leaf

Â on the bottom level. Or in terms of the array implementation we just stick it in

Â the first non empty slot in the array And if we keep track of the array size we're

Â getting constant time of course know where to put the new key. Now whether or not we

Â can get away with this depends on what the actual key value K is But, you know, for

Â starters we can say, what if we insert a key that's seven? Well, then we get to

Â say, whew, we're done. So, the reason we're done is cuz we have not violated the

Â heap property. It is still the case that every node has key no bigger than that of

Â its children. In particular, this third copy of a four, it picked up a new child,

Â but its key seven was bigger than its key four. So, you can imagine that maybe we

Â get lucky with another insert. Maybe the next insertion is a ten And again, we put

Â that in the next available spot in the last level And that becomes the second

Â child of the third copy of the four And again we have no violation of the heap

Â property. No worries still got a heap And in these lucky events insertion is even

Â taking constant time. Really all we're doing is putting elements at the end of an

Â array and not doing any rearranging. So where it gets interesting is when we do an

Â insertion that violates the heap property. So let's supposed we do yet another

Â insertion, and the left child of this twelve, it becomes a five. Now we got a

Â problem. So now we have a as perfectly as possible balanced binary tree, but the key

Â property is not satisfied. In particular, it's violated at the node twelve. It has

Â one child. The key of that child is less than its own key. That's no good. So is

Â there some way we can restore the heap property? Well, a natural idea is just to

Â swap the positions of the five and the twelve, and that's something that of

Â course can be done in constant time, cuz again from a node, we can go to the parent

Â or the child in constant time just with a suitable trivial computation. So we say

Â okay, for starters we put the five at the end, but that's no good. So we're gonna

Â swap the five with the twelve And now we see we're not out of the woods. No longer

Â is there a heap violation at the node twelve. That's been fixed. We've made it a

Â leaf But we've Pushed up the heap violations. So now instead it's the eight

Â that has a problem. The eight used to h ave two children, with keys twelve and

Â nine that was fine. Now the eight has two children with keys five and nine. The five

Â is less than the eight, that's a violation of the heap property But again, that's the

Â only violation of the heap property. There's no other node you could have

Â screwed up, because eight was the only person whose children we messed around

Â with. Alright So now we just it again. Let's try [inaudible] again, locally fix

Â the heap violation by swapping the five with the 8s And now we see we've restored

Â order. The only place where there could possibly be a violation of the heap

Â property is at the root. The root, when we did this swap, the only person whose

Â children we really messed with was the root four, and fortunately its new child

Â has key five, which is bigger than it. One subtle point that you might be thinking

Â that in addition to screwing up at the root node, that messing around with his

Â children, maybe we could have screwed up the twill by messing around with its

Â parent. Alright, its parent used to be five, and now its parent is eight. So is

Â there some possibility that his parent would all of a sudden have a key bigger

Â than it But if you think about it, this eight and this twelve, they were a

Â parent-child relationship in the original heap Right? So back in the blue heap, the

Â twelve was under the 8s. Now the twelve is under the eight yet again. Since we have

Â the heap property for that pair before, we certainly have it now. So in general, as

Â you push up this five up the tree, there's only going to be one possible edge that

Â could be out of order And that's between where the five currently resides and

Â whatever its parent is. So when the 5's parent was twelve that was a violation

Â When 5's parent was eight that was a violation But now that we've pushed it up

Â two levels and 5's parents is four, that's not a violation because four is less than

Â five. So in general, step two of insertion is, you do this swap, which it's called a

Â lot of different things. I'm gonna call it bubble up because that's how I learned it

Â more years ago than I care to admit But also this is called, sometimes sift up,

Â happily up, and so on. So now told you to just how to implement insertions by

Â repeated bubbling ups in a heap data structure And this is really how it works,

Â there is nothing I haven't told you but you know, I'm not going to really fill in

Â all the details but I'll encourage you to do that on your own time, if it's

Â something that interests you And the two main things that you should check is first

Â of all, is bubbling up process is gonna stop and when it stops, it stops with the

Â heap property restored. The second thing needs to be checked in this, I think is

Â easier to see is that we do have the desire one time log [inaudible] make in

Â the number of elements in the heap. The key observations areas that because this

Â is a perfectly balanced binary tree. We know exactly how many levels there are. So

Â this is basically log based two event levels where n is the number of items in

Â the heap And what is the running time of this insertion procedure while you only do

Â a constant amount of work at each level, just doing the swap and comparison and

Â then in the worst case, you'll have to swap at every single level and there is a

Â lot [inaudible] number of levels. So that's insertion. Let's now talk about how

Â do we implement the extract [inaudible] operation and again I'm gonna do this by

Â example and it's gonna be by repeating the [inaudible] of a bubble [inaudible]

Â procedure. So the Extract main operation is responsible for removing from the heap

Â an object with minimum key value and handing it back to the client on a silver

Â platter. So it pretty much have to whip out the root. Remember the minimum is

Â guaranteed to be at the root. So that's how we have to begin to extract the

Â subroutine is just we pluck out the root and hand it back to the client. So this

Â removal of course leaves a gaping hole in our tree structure and that's no good. One

Â of the [inaudible] responsible for maintaining is that we always have an as

Â perfectly balanced as possib le binary tree And if you are missing a root you

Â certainly don't have an almost perfect. Binary tree So, what are we going to do

Â about it? How do we fill this hole? Well, there's pretty much only one node that

Â could fill this hole without causing other problems with the tree structure, and that

Â is the very last node. So the rightmost leaf at the bottom level one that simple

Â fix is to swap that up and have that take the place of the original root. So in this

Â case, the thirteen is going to get a massive promotion And get teleported all

Â the way to be the new root of this tree. So now we've resolved our structural

Â challenges. We now again have a, as perfectly balanced as possible, binary

Â tree But of course now we've totally screwed up the heap property, right. So

Â the heap property says that at every node, including the root, the key value at that

Â node has to be less than both of the children, and now it's all messed up.

Â Right, so at the root, the key value's actually bigger than both of the children.

Â And matters that are little bit more tricky than they were with insertion,

Â right, when we inserted at the bottom because every note has a unique parent. If

Â you wanna push a note upward in the tree, there's sort of only one place you can go,

Â right, all you can do is swap with your parent, unless you're going to try to do

Â something really crazy But if you want to do something local, pretty much you only

Â have a unique parent you got to swap with. Now when you're trying to push notes down

Â to the rightful position in the tree, there is two different swaps you could do,

Â one for the left child, one for the right child and the decision that we make

Â matters To see that concretely, let's think about this example. There's this

Â thirteen at the root, which is totally not where it should be, and there's the two

Â children. The four and the eight, and we could try swapping it with either one. So

Â suppose we swap it in a misguided way with the right child, with the eight. So now

Â the eight becomes the root, and the thirteen gets pushed dow n a level. So on

Â the one hand; we made some progress because now at least we don't have a

Â violation between the thirteen and the eight. On the other hand, we still have

Â violations involving the thirteen. The thirteen is still violated with respect to

Â the twelve and nine And moreover, we've created a new problem between the eight

Â and the four, right? So now that the eight is the root, that's still bigger than its

Â left child, this four. So it's not even clear we made any progress at all when we

Â swapped the thirteen with the eight, okay? So that was a bad idea And if you think

Â about it would made it a bad idea, the stupid thing was to swap it with the

Â larger child. That doesn't make any sense. We really want to swap it with the smaller

Â child. Remember, every node should have a key bigger than both of its children. So

Â if we're going to swap up either the four or the eight, one of those is going to

Â become the parent of the other. The parent is supposed to be smaller, so evidently we

Â should take the smaller of the two children and swap the thirteen with that.

Â So we should swap the thirteen with the figure. Not with E And now we observe a

Â phenomenon very much analogous to what we saw in insert. When we were bubbling up

Â during insertion, it wasn't necessarily that we fixed violations of the heat

Â property right away But we would fix one And then introduce another one that was

Â higher up in the tree And we had confidence that eventually we could just,

Â push this violation up to the root of the tree And squash it, just like we're trying

Â to win a game Of Whack a Mole. Here, it's the opposite. It's just in the opposite

Â direction. So we swap the thirteen with the four. It's true we've created one new

Â violation of the heap property. That's again involving the thirteen with its

Â children nine and four. But we haven't created any new ones. We've pushed the

Â heap violation further down the tree And hopefully again, like in Whack a Mole.

Â We'll squash it At the bottom. So after swapping the thirteen and the four, now we

Â just gotta do t he same thing. We say, okay, we're not done. We still don't have

Â a heap. This thirteen is bigger than both of its children But now, with our

Â accumulated wisdom, we know we should definitely swap the thirteen with the

Â four. We're not gonna try swapping with the nine, that's for sure. So we move the

Â four up here And we, the thirteen takes the 4's old place. Boom! Now we're done.

Â So now we have no violations remaining. The thirteen in its new position has no

Â children so there's no way it can have any violations, and the four because it was

Â the smaller child that's gonna be bigger than the 9's so we haven't introduced a

Â huge violation there, and again we have these consecutive 4's but we know that's

Â not gonna be a problem because those were consecutive 4's in the original heap as

Â well. So you won't be surprised to hear that this procedure by which you push

Â something down, by swapping it with its smaller children, is called bubble down,

Â and extract men is nothing more than taking more than, taking this last leaf,

Â promoting it to the top of the tree, and bubbling down until the heap violation has

Â been fixed. So again on a conceptual level that's all of the ingredients necessary

Â for a complete from scratch implementation of extracting the minimum from a heap and

Â as before, I'll leave it for you to check the details. So first of all you should

Â check that in fact this bubble down has to at some point halt And when it halts you

Â do have a bona fide heap. The heap property is definitely restored And second

Â of all the running time is, is logarithmic. Here the running time

Â analysis is exactly the same as before so we already have observed that the heights

Â of a heap because it's perfectly balanced is essentially the log base two of the

Â number of elements in the heap and in bubbling down all you do is a constant

Â amount of work per level. All you have to do is a couple comparisons and swap. So,

Â that's a peek at what's under the hood in the heap data structure. A little bit

Â about the guts of its elementation. So after having seen this, I hope you feel

Â like a little bit more hard-core of a programmer, a little bit more hard-core of

Â a computer scientist.

Â