0:00

So, in this video, we're going to start reasoning about the performance of hash

Â tables. In particular, we'll make precise this idea that properly implemented they

Â guarantee constant time lookup. So, let me just briefly remind you of the road map

Â that we're in the middle of. So, we observed that every fixed hash function is

Â subject to a pathological data set and so exploring the solution of making a real

Â time decision of what hash function to use. And we've already gone over this

Â really quite interesting definition of universal hash functions and that's the

Â proposed definition of a good random hash function. More over, in the previous video

Â I showed you that there are simple such families of hash functions. They don't

Â take much space to store, they don't take much time to evaluate. And the plan for

Â this video is to prove formally, that if you draw a hash function uniformly at

Â random from a universal family of hash functions, like the one we saw in the

Â previous video, then you're guaranteed expected constant time for all of the

Â supported operations. So, here's the definition once more of a universal family

Â of hash functions. We'll be using this definition, of course, when we prove that

Â these hash functions give good performance. So, remember, we're talking

Â now about a set of hash functions. These are all of the conceivable real time

Â decisions you might make about which hash function to use. So, the universe is

Â fixed, this is something like IP addresses, the number of buckets is fixed.

Â You know that's going to be something like 10,000, say, and what it means for a

Â family to be universal is that the probability that any given pair of items

Â collides is as good, as small as with the gold standard of completely perfect

Â uniform random hashing. That is for each pair xy of distinct elements of the

Â universe, so for example, for each distinct pair of IP addresses, the

Â probability over the choice of the random hash function little h from the family

Â script h is no more than one over n, where n is the number of buckets. So, if you

Â have 10,000 buckets, the probability that any given pair of IP addresses winds up

Â getting mapped to the same bucket is almost one in 10,000. Let me now spell out

Â the precise guarantee we're going to prove if you use a randomly chosen hash function

Â from a universal family. So, for this video, we're going to only talk about hash

Â tables implemented with chaining, with one length list per bucket. We'll be able to

Â give a completely precise mathematical analysis with this collision resolution

Â scheme. We're going to analyze the performance of this hash table in

Â expectation over the choice of a hash function little h drawn uniformly from a

Â universal family script h. So, for example, for the family that we

Â constructed in the previous video, this just amounts to choosing each of the four

Â coefficients uniformly at random. That's how you select a universal, that's how you

Â select a hash function uniformly at random. So, this theorem and also the

Â definition of universal hash functions dates back to a 1979 research paper by

Â Carter and Wegman. The idea of hashing dates back quite a bit before that,

Â certainly to the 50s. So, this just kind of shows us sometimes ideas have to be

Â percolating for awhile before you find the right way to explain what's going on. So,

Â Carter and Wegman provided this very clean and modular way of thinking about

Â performance of hashing through this universal hashing definition. And the

Â guarantee is exactly the one that I promised way back when we talked about

Â what operations are supported by hash tables and what kind of performance should

Â you expect, you should expect constant time performance. As always, with hashing,

Â there is some fine print so let me just be precise about what the caveats of this

Â guarantee are. So, first of all, necessarily this guarantee is an

Â expectation. So, it's on average over the choice of the hash function, little h. But

Â I want to reiterate that this guarantee does hold for an arbitrary data set. So,

Â this guarantee is quite reminiscent of the one we had for the rand omized quick sort

Â algorithm. In Quicksort, we made no assumptions about the data. It was a

Â completely arbitrary input array and the guarantee said, on average over the real

Â time randomized decisions of the algorithm, no matter what the input is,

Â the expected running time was in log in. Here we're saying again, no assumptions

Â about the data. It doesn't matter what you're storing in the hash table and

Â expectation over the real time random decision of what hash function you use,

Â you should expect constant time performance, no matter what that data set

Â is. So, the second caveat is something we've talked about before. Remember, the

Â key to having good hash table performance, not only do you need a good hash function

Â which is what this universality key is providing us but you also need to control

Â the load of the hash table. So, of course, to get constant time performance, as we've

Â discussed, a necessary condition is that you have enough buckets to hold more or

Â less the stuff that you're storing. That is the load, alpha, the number of objects

Â in the table divided by the number of buckets should be constant for this

Â theorem to hold. And finally, whenever you do a hash table operation, you have to in

Â particular invoke the hash function on whatever key you're given. So, in order to

Â have constant time performance, it better be the case that it only takes constant

Â time to evaluate your hash function and that's, of course, something we also

Â discussed in the previous video when we emphasized the importance of having simple

Â universal hash functions like those random linear combinations we discussed in the

Â previous video. In general, the mathematical analysis of hash table

Â performance is a quite deep field, and there is some, quite mathematically

Â interesting results that are well outside the scope of this course. But what's cool,

Â in this theorem I will be able to provide you a full and rigorous proof. So, for

Â hash tables with chaining and using randomly chosen universal hash functions,

Â I'm going to now prove that you do get cons tant time performance. Right, so hash

Â tables support various operations, Insert, Delete and Lookup. But really if we can

Â just bound a running time of an unsuccessful lookup, that's going to be

Â enough to bound the running time of all of these operations. So, remember in hash

Â table with chaining, you first hash the appropriate bucket and then you do the

Â appropriate Insert, Delete or Lookup in the link list in that bucket. So, the

Â worst case as far as traversing though this length list is if you're looking for

Â something but it's not there cuz you have to look at every single element in the

Â list and followup into the list before you can conclude that the lookup has failed.

Â Of course, insertion, as we discussed, is always constant time, deletion and

Â successful searches, well, you might get lucky, and stop early before you hit the

Â end of the list. So, all we're going to do is bother to analyze unsuccessful lookups

Â that will carry over to all of the other operations. So, a little more precisely,

Â let's let s be the data set. This is the objects that we are storing in our hash

Â table. And remember that to get constant time lookup, it really needs to be the

Â case that the load is constant. So, we are assuming that the size of s is bigger over

Â the number of buckets n. And let's suppose we are searching for some other object

Â which is not an s, call it x. And again, I want to emphasize we are making no

Â assumptions about what this data set s is other than that the size is comparable to

Â the number of buckets. So, conceptually doing a lookup in a hash table with

Â chaining is a very simple thing. You just hash to the appropriate bucket and then

Â you scan through the length list in that bucket. So, conceptually, it's very easy

Â to write down the what the running time of this unsuccessful lookup is. It's got two

Â parts. So, the first thing you do is you evaluate the hash function to figure out

Â the right bucket. And again, remember we're assuming that we have a simple of a

Â hash function and it takes constant time. Now, of course, the magic of hash

Â functions is that given that this hash value, we can zip right to where the

Â lenght list is to search for x using random access into our array of buckets.

Â So, we go straight to the appropriate place in our array of buckets and we just

Â search through the list ultimately failing to find what we're looking for s.

Â Traversing a link list, as you all know, it takes time proportional to the length

Â of the list. So, we find something that we discussed informally in the past which is

Â that's the running time of hash table operations implemented with chaining is

Â governed by the list lengths. So, that's really the key quantity we have to

Â understand. So, lets call this, lets give this a name, Capital L, it's important

Â enough to give a name. So, what I want you to appreciate at this point is that this

Â key quantity, Capital L, the list of the length in x's bucket is a random variable.

Â It is a function of which hash function little h, we wind up selecting in a real

Â time. So, for some choices of our hash function, little h, this list length might

Â be as small as zero but for other choices of this hash function h, this list, list

Â length could be bigger. So, this is exactly analogous too in Quicksort where

Â depending on the real time decision of which random pivot element you use, your

Â going to get a different number of comparisons, a different running time. So,

Â the list length and hence the time for the unsuccessful storage, depends on the hash

Â function little h, which we're choosing at random. So, let's recall what it is we're

Â trying to prove. We're trying to prove an upper bound on the running time of an

Â unsuccessful lookup on average, where the on average is over the choice of the hash

Â function little h. We've expressed that this lookup time in terms of the average

Â list length in x's bucket where again the average is over the random choice of h.

Â Summarizing, we've reduced what we care about, expected time for lookup to

Â understanding the expected value of this random variable Capital L, the average

Â list length in x's bucket. So, that's what we've got to do, we've got to compute the

Â expected value of this random variable, Capital L. Now to do that, I want to jog

Â your memory of a general technique for analyzing expectations which you haven't

Â seen in awhile. The last time we saw it, I believe, was when we were doing the

Â analysis of randomized Quicksort and counting its comparisons. So, here's a

Â general decomposition principle which we're now going to use in exactly the same

Â way as we did in Quicksort here to analyze the performance of hashing with chaining.

Â So, this is where you want to understand the expect, expectation of random variable

Â which is complicated but what you can express as the sum of much simpler random

Â variables. Ideally, 0,1 or indicator random variables. So, the first step is to

Â figure out the random variable, Capital Y, it's what I'm calling it here that you

Â really care about. Now, we finished the last slide, completing step one. What we

Â really care about is Capital L, the list length in x's bucket. So, that governs the

Â running time a bit unsuccessful Look up, clearly that's what we really care about.

Â Step two of the decomposition principle is well, you know, the random variable you

Â care about is complicated, hard to analyze directly but decompose it as a sum of 0,1

Â indicator random variable. So, that's what we're going to do in the beginning of the

Â next slide. Why is it useful to decompose a complicated random variable into the sum

Â of 0,1random variables? Well, then you're in the wheelhouse of linear of

Â expectations. You get that the expected value of the random variable that you care

Â about is just the sum of the expected values of the different indicator random

Â variables and those expectations are generally much easier to understand. And

Â that will again be the case here in this theorem about the performance of hash

Â tables with chaning. So, let's apply this three-step-decomposition principle to

Â complete the proof of the Carter-Wegman theorem. So, for the record, let me just

Â remind you about the random variable that we actually really care about, that's

Â Capital L. The reason that's a random variable is that because it depends on the

Â choice of the hash function, little h. This number could vary between zero and

Â something much, much bigger than zero, depending on which each you choose. So,

Â this is complicated, hard to analyze directly, so let's try to express it as

Â the sum of 0,1 random variables. So, here are the0,1 random variables that are going

Â to be the constituents of Capital L. We're going to have one such variable for each

Â object y in the data set. Now, remember this is an unsuccessful search, x is not

Â in the data set Capital S. So, if y is in the data set, x and y are necessarily

Â different. And we will define each random variable z sub y, as follows. We'll define

Â it as one if y collides with x under h and zero otherwise. So, for a given zy, we

Â have fixed objects x and y So, x is some IP address, say, y is some distinct IP

Â address, x is not in our hash table, y is in our hash table. And now, depending on

Â which hash function we wind up using, these two distinct IP addresses may or may

Â not get mapped to the same bucket of our hash table. So, this indicates a random

Â variable just indicating whether or not they collide, whether or not we unluckily

Â choose a hash function little h that sends these distinct IP addresses x and y to

Â exactly the same bucket. Okay, so, that's zy, clearly by definition, it's a 0,1

Â random variable. Now, here's what's cool about these random variables is that

Â Capital L, the list length that we care about decomposes precisely into the sum of

Â the zy's. That is, we can write capital L as being equal to the sum over the objects

Â in the hash table of zy. So, if you think about it, this equation is always true, no

Â matter what the hash function h is. That is if we choose some hash functions that

Â maps these IP address x to, say, bucket number seventeen, Capital L is just

Â counting how many other objects in the hash table wind up getting mapped to

Â bucket number seventeen. So, maybe ten different ob jects got mapped to bucket

Â number seventeen. Those are exactly the ten different values of y that will have

Â their zy equal to1, right? So, so l is just counting the number of objects in the

Â data set s that's collide with x. A given zy is just counting whether or not a given

Â object y in hash table is colliding with x. So, summing over all the possible

Â things that could collide with x, summing over the zy's, we of course get the total

Â number of things that collide with x which is exactly equal to the number, the

Â population of x's bucket in the hash table. Alright, so we've got all of our

Â ducks lined up in a row. Now, if we just remember all of the things we have going

Â for us, we can just follow our nose and nail the proof of this theorem. So, what

Â is it that we have going for us? Well, in addition to this decomposition of the list

Â length in to indicate random variables, we've got linear expectation going for us,

Â we've got the fact that our hash function is drawn from a universal family going for

Â us. And we've got the fact that we've chosen the number of buckets and to be

Â comparable to the data set size. So, we want to use all of those assumptions and

Â finish the proof that the expected performance is constant. So, we're going

Â to have a few inequalities and we're going to begin with the thing that we really

Â care about. We care about the average list length in x's bucket. Remember, we saw in

Â the previous slide, this is what governs the expected performance of the lookup. If

Â we can prove that the expected value of capital L is constant, we're done, we've

Â finished the theorem. So, the whole point of this decomposition principle is to

Â apply linear of expectation which says that the expected value of a sum of random

Â variables equals the sum of the expected values. So, because L can be expressed as

Â the sum of these zy's, we can reverse the summation and the expectation and we can

Â first sum over the y's, over the objects in the hash table and then take the

Â expected value of zy. Now, something which came up in our Quicksort an alysis but

Â which you might have forgotten is that 0,1 random variables have particularly simple

Â expectations. So, the next quiz is just going to jog your memory about why 0,1

Â random variables are so nice in this context. Okay, so the answer to this quiz

Â is the third one, the expected value of zy is simply the probability that x and y

Â collide, that just follows from the definition of the random variable zy and

Â the definition of expectation, namely recall how do we define zy. This is just

Â this one, if this object y in the hash table happens to collide with the object x

Â that we are looking for under the hash function x and zero otherwise, again, this

Â will be, this will be one for some hash functions and zero for other hash

Â functions. And then we just have to compute expectations. So, one way to

Â compute the expected value of a 0,1 random variable is, you just say, well, you know,

Â there are the cases where the random variable evaluates to zero and then

Â there's the cases where the random variable evaluates to one, and of course

Â we can cancel the zero. So, this just equals the probability that zy = one. And

Â since zy being one is exactly the same thing as x and y colliding, that's what

Â gives us the answer. Okay, so it's the probability that x collides with y. So,

Â plenty of that into our derivation. Now, we again have the sum of all the objects y

Â in our hash table and the set of the expected value of zy what's right that in

Â the more interpretable form, the probability that this particular object in

Â the hash table y collides with the thing we are looking for x. Now, we know

Â something pretty cool about the probability that a given pair of distinct

Â elements like x and y collide with each other. What is it that we know? Okay, so I

Â hope you answered the second response to this quiz. This is really in some sense

Â the key point of the analysis. This is the role, that being a universal family of

Â hash functions plays in this performance guarantee. What does it mean to be

Â universal? It means for any pair of objects distinct like x and y in that

Â proof, if you make a random choice of a hash function, the probability of

Â collision is as good as with perfectly random hashing, hashing. Namely at most,

Â 1/ n where n is the number of buckets. So, now I can return to the derivation. What

Â that quiz reminds you is that the definition of a universal family of hash

Â functions guarantees that this probability for each y is at most 1/n, where n is the

Â number of buckets in the hash table. So, let's just rewrite that. So, we've upper

Â bounded the expected list length by a sum over the objects in the data set of 1/n.

Â And this, of course, is just equal to the number of objects in the data set, the

Â [inaudible] of s divided by n. And what is this? This is simply the load, this is the

Â definition of the load alpha which we are assuming is constant. Remember, that was

Â the third caveat in the theorem. So, that's why as long as you have a hash

Â function which you can compute quickly in constant time. And as long as you keep the

Â load under control so the number of buckets is commensurate with the size of

Â the data set that you're storing. That's why, universal hash functions in a hash

Â table with chaining guarantee expected constant time performance.

Â