0:02

Next we'll look at separate chaining, a collision red solution strategy that makes

Â use of elementary link list. So, what are we supposed to do when two different keys

Â hash to the same index? The birthday problem tells us that we're going to have

Â collisions. You need a quadratic amount of memory to avoid collisions. In the lower

Â balancing, a coupon collector analysis tell us that the collisions are going to

Â be evenly distribute, distributed among the table, around the table. So, what we

Â want to do is have an easy way to deal with collisions. And so the first way

Â we'll look at is called Separate Chaining and it's a very diagonal idea back1953,

Â and the idea is just build a link list for each of the table positions. So, we'll

Â have a table that's smaller than the number of keys that we have, the hash

Â function will map each key to some integer. So in this case, we have a table

Â of size five. So the hash function will map any key. In this case, we use single

Â letter keys. It'll map any key to an integer between zero and four and then to

Â [cough] do an insertion we'll just keep a link list at the table position

Â corresponding to the hash value. So S hash is to position two, it'll be on the link

Â list that is first link is at position two. And E goes to zero, and A goes to

Â zero. And for search, we're going to have to go to, if we're going to look at is C

Â in this table say, we're going to find the hash value for C and we'll look down the

Â list to see if we can find C. So we have to look through the whole list for search

Â but you only have to look through one list out of all the lists. Essentially if you

Â have M entries in the hash table and M keys the link of list you're going to look

Â at is about N over M cuz they're evenly distributed. So that's a straightforward

Â and simple scheme for implementing symbol tables with hashing. Now, we could use a

Â interval bag or some data structure like that and hide the link list structure

Â underneath and that's a perfectly fine way to proceed in modern programming. This

Â implementation directly implements the link list. Now, for a practical situation

Â we picked some kind of, some value of M. You could make it so that the hash table

Â itself grows once it gets really huge and such hybrid methods are easy to implement.

Â So we won't talk to much about that. We need to just in terms of implementation

Â details, our keys and values have to be objects. Because we can't have an array of

Â generics. So, since we're making array of nodes, a node would have generics if we

Â use to key in value. So we have to make them objects then when we get things off,

Â we're going to have cast. So [cough] this is the get procedure. So to look for a key

Â in the hash table we compute the hash value. In our hash function is pull out

Â the system hash code, make it positive by ending off the sign bit and then mark with

Â M to get a number of, zero and -one. So we pick that number, I and then we just go to

Â that list and this is the standard code for diversing a link list start at the

Â first node as long as it is not null go x = x dot x. And if you find a key that's

Â equal to the key you're looking for, return the value and we have to cast it to

Â value because of the generic recreation problem in Java, otherwise return null. So

Â that's not much code, and it's trivial code at that for doing an efficient symbol

Â table search using hashing. And insertion is not much more difficult if you do the

Â same thing and if you find a node where key equal to key on the link list, reset

Â the value and return. Otherwise, you make a new node and put it at the beginning of

Â the link list with the standard code. Now, replace STFI with a new node that links to

Â the old STFI. So again, very little code to implement search and insert using

Â hashin g and that's why it's so popular. And what about the analysis? Well, again

Â this the [cough] standard probabilistic analysis of the balls and bins problem

Â tells us a lot of information of what goes on. And again, if the uniform hashing

Â assumption holds the probability that the number of keys within a list is within a

Â constant factor of N over M is extremely close to one. So, it means that we've

Â divided the search cost which would be N if we have a sequential search by a factor

Â of M. And, and in many applications even setting M = 100 or 1,000 is going to be

Â very effective. And that's why so many system programs refuse that. So, number of

Â pros for search and insert's proportional to N over M. Now typically, what we'd what

Â a programmer would do is try to figure on making M about equal to the number of keys

Â divided by five say. So, you can't make M too large, you have too much space and

Â you'll have empty chains or short chains. And if you make M too small then they're

Â too long, you have to search through them all. So let's say, N over five and then

Â you get constant time searches and not much extra space. You have extra space for

Â the links to implement the link lists but the rest of the table is not much extra

Â space. And those are typical parameters. If you want a full service symbol table

Â which is going to, going to grow from small to huge and then back down to small

Â again then you'd want to use array re-sizing to make sure that M is always

Â within a constant factor of N but we will leave that detail out for now. So that

Â brings us to this summary where red-black trees, we were happy with a log based two

Â of N for search and insert with separate chaining, you can really get it down to a

Â constant number of operations for search and insert. So hashing's going to be

Â preferred for short keys where the hash function's easy to compute. And where we

Â don't need ordered iteration or any of the ordered symbol table operations because it

Â has really fast access to the symbol table. That's our first collision

Â resolution method, hashing with separate chaining.

Â