0:00

In this sequence of videos, we'll discuss our last but not least data structure

Â namely the Balanced Binary Search Tree. Like our discussion of other data

Â structures we'll begin with the what. That is we'll take the client's perspective and

Â we'll ask what operations are supported by this data structure, what can you actually

Â use it for? Then we'll move on to the how and the why. We'll peer under the hood of

Â the data structure and look at how it's actually implemented and then

Â understanding the implementation to understand why the operations have the

Â running times that they do. So what is a Balanced Binary Search Tree good for?

Â Well, I recommend thinking about it as a dynamic version of a sorted array. That

Â is, if you have data store in a Balanced Binary Search Tree, you can do pretty much

Â anything on the data that you could if it was just the static sorted array. But in

Â addition, the data structure can accommodate insertions and deletions. You

Â can accommodate a dynamic set of data that you're storing overtime. So to motivate

Â the operations that a Balanced Binary Search Tree supports, let's just start

Â with the sorted array and look at some of the things you can easily do with data

Â that happens to be stored in such a way. So let's think about an array that has

Â numerical data although, generally as we've said, in data structures is usually

Â associated other data that's what you actually care about and the numbers are

Â just some unique identifier for each of the records. So these might be an employee

Â ID number, social security numbers, packet ID numbers and network contacts, etcetera.

Â So what are some things that are easy to do given that your data is stored as a

Â sorted array, most a bunch of things? First of all, you can search and recall

Â that searching in a sorted array is generally done using binary search so this

Â is how we used to look up phone numbers when we have physical phone books. You'd

Â start in the middle of the phone book, if the name you were looking for was less

Â than the midpoint, you recurse on the left hand side, otherwise you'd recurse

Â on the right hand side. As we discussed back in the Master Method Lectures long ago,

Â this is going to run in logarithmic time. Roughly speaking, every time you recurse,

Â you've thrown out half of the array so you're guaranteed to terminate within a

Â logarithmic number of iterations so binary search is logarithmic search time.

Â Something else we discussed in previous lectures is the selection problem. So

Â previously, we discussed this in much harder context of unsorted arrays.

Â Remember, the selection problem in addition to array you're given in order

Â statistic. So, if your order statistic that your target is seventeen, that means

Â you're looking for the seventeenth smallest number that's stored in the

Â array. So in previous lectures, we worked very hard to get a linear time algorithm

Â for this problem in unsorted arrays. Now, in a sorted array, you want to know the

Â seventeenth smallest element in the array. Pretty easy problem, just return whatever

Â element happens to be in the seventeenth position of the array since the array is sorted,

Â that's where it is so no problem. It's already sorted constant time, you can

Â solve the selection problem. Of course, two special cases of the selection problem

Â are finding the minimum element of the array. That's just if the order statistic

Â problem with i = 1and the maximum element, that's just i = n. So this just

Â corresponds to returning the element that's in the first position and the last

Â position of the array respectively. Well let's do some more brainstorming. What

Â other operations could we implement on a sorted array? Well here's a couple more.

Â So there are operations called the Predecessor and Successor operations. And

Â so the way these work is, you start with one element. So, say you start with a

Â pointer to the 23, and you want to know where in this array is the next smallest

Â element. That's the predecessor query and the successor operation returns the next

Â largest element in the array. So the predecessor of the 23 is the seventeen,

Â the successor of the 23 would be the 30. And again in a sorted array, these are

Â trivial, right? You just know that predecessors just one position back in the

Â array, the successor is one position forward. So given a pointer to the 23, you

Â can return to 17 or the 30 in constant time. What else? Well, how about

Â the rank operation? So we haven't discussed this operation in the past. So

Â what rank is, this has for how many key stored in the data structure are less than

Â or equal to a given key. So for example, the rank of 23 would be equal to 6.

Â Because 6 of the 8 elements in the array are less than or equal to 23. And if you think about it,

Â implementing the rank operation is really no harder than implementing search. All

Â you do is search for the given key and wherever it is search terminates in the

Â array. You just look at the position in the array and boom, that's the rank of

Â that element. So for example, if you do a binary search for 23 and then when you

Â terminates, you discover it is, they're in position number six then you know the rank

Â is six. If you do an unsuccessful search, say you search for 21, well then you get

Â stuck in between the 17 and the 23, and at that point you can conclude that

Â the rank of 21 in this array is five. Let me just wrap up the list with the final

Â operation which is trivial to implement in the sorted array. Namely, you can output

Â or print say the stored keys in sorted order let's say from smallest to largest.

Â And naturally, all you do here is a single scan from left to right through the array,

Â outputting whatever element you see next. The time required is constant per element

Â or linear overall. So that's a quite impressive list of supported operations.

Â Could you really be so greedy as to want still more from our data structure? Well

Â yeah, certainly. We definitely want more than just what we have on the slide. The

Â reason being, these are operations that operate on a static data set which is not

Â changing overtime. But the world in general is dynamic. For example, if you

Â are running a company and keeping track of the employees, sometimes you get new

Â employees, sometimes employees leave. That is one of the data structure that

Â not only supports these kinds of operations but also, insertions and

Â deletions. Now of course it's not that it's impossible to implement insert or

Â delete in a sorted array, it's just that they're going to run way too slow. In

Â general, you have to copy over a linear amount of stuff on an insertion or

Â deletion if you want to maintain the sorted array property. So this linear time

Â performance when insertion and deletion is unacceptable unless you barely ever do

Â those operations. So, the raison d'etre of the Balanced Binary Search Tree

Â is to implement this exact same set of operations just as rich as that's

Â supported by a sorted array but in addition, insertions and deletions. Now, a

Â few of these operations won't be quite as fast or we have to give up a little bit

Â instead of constant time, the one in logarithmic time and we still got

Â logarithmic time for all of these operations, linear time for outputting the

Â elements in sort of order plus, we'll be able to insert and delete in logarithmic

Â time so let me just spell that out in a little more detail. So, a Balanced Binary

Â Search Tree will act like a sorted array plus, it will have fast, meaning

Â logarithmic time inserts and deletes. So let's go ahead and spell out all of those

Â operations. So search is going to run in O(log n) time, just like before. Select runs

Â in constant time in a sorted array and here it's going to take logarithmic, so

Â we'll give up a little bit on the selection problem but we'll still be able

Â to do it quite quickly. Even on the special cases of finding the minimum or

Â finding the maximum in our, in our data structure, we're going to need logarithmic

Â time in general. Same thing for finding predecessors and successors they're not,

Â they're no longer constant time, they go with logarithmic. Rank took as logarithmic

Â time and the, even the sorted array version and that will remain logarithmic

Â here. As we'll see, we lose essentially nothing over the sorted array, if we want

Â to output the key values in sorted order say from smallest to largest. And

Â crucially, we have two more fast operations compared to the sorted array of

Â data structure. We can insert stuff so if you hire a new employee, you can insert

Â them into your data structure. If an employee decides to leave, you can remove

Â them from the data structure. You do not have to spend linear time like you did for

Â sort of array, you only have to spend the logarithmic time whereas always n is the

Â number of keys being stored in the data structure. So the key takeaway here is

Â that, if you have data and it has keys which come from a totally ordered set

Â like, say numeric keys, then a Balanced Binary Search Tree supports a very rich

Â collection of operations. So if you anticipate doing a lot of different

Â processing using the ordering information of all of these keys, then you really

Â might want to consider a Balanced Binary Search Tree to maintain them. Well then,

Â keep in mind though is that we have seen a couple of other data structures which

Â don't do quite as much as balanced binary search trees but what they do, they do

Â better. We already, we just discussed in the last slide of the sorted array. So, if

Â you have a static data set, you don't need inserts and deletes. Well then by all

Â means, don't bother with Balanced Binary Search Tree that use a sorted array

Â because it will do everything super fast. But, we also sought through dynamic data

Â structures which don't do as much but do it, but what they do, they do very well.

Â So, we saw a heap, so what the heap is good for is it's just as dynamic as a

Â search tree. It allows insertions and deletions both in logarithmic time. And in

Â addition, it keeps track of the minimum element or the maximum element. Remember

Â in a heap, you can choose whether you want to keep track of the minimum or keep track

Â of the maximum but unlike in a search tree, a heap does not simultaneously keep

Â track of the minimum and the maximum. So if you just need those three operations,

Â insertions, deletions and remembering the smallest, and this would be the case for

Â example in a priority queue or scheduling application as discussed in the heap

Â videos. Then, a Binary Search Tree is over kill. You might want to consider a heap

Â instead. In fact, the benefits of a heap don't show up in the big O notation here

Â both have logarithmic operation time but the constant factors both in space and

Â time are going to be faster with a heap then with a Balanced Binary Search Tree.

Â The other dynamic data structure that we discussed is a hash table. And what hash

Â tables are really, really good at is handling insertions and searches, that is

Â look ups. Some, sometimes, depending on the implementation also handle deletions

Â really well also. So, if you don't actually need to remember things like

Â minima, maxima or remember ordering information on the keys, you just have to

Â remember what's there and what's not. Then the data structure of choice is definitely

Â the hash table, not the balance binary search tree. Again, the Balance Binary Search

Â Tree would be fine and we'd give you logarithmic look up time but it's kind of

Â over kill for the problem. All you need is fast look ups. A hash table recall will

Â give you constant time look ups. So that will be a noticeable win over the Balanced

Â Binary Search Tree. But if you want a very rich set of operations for processing your

Â data. Then, the Balanced Binary Search Tree could be the optimal data structure

Â for your needs.

Â