Now the question is, how to choose the size of your hash table?
Of course, it control the amount of memory used with m which is your
chronology of the hash functions and which is equal to the size of the hash table.
But you also control the speed of the operations.
So ideally, in practice, you want your load factor alpha to be between 0.5 and 1.
You want it to be below 1 because otherwise you store too much keys in
the same hash table and then everything could becomes slow.
But also you don't want alpha to be too small because that way you will
waste a lot of memory.
If alpha is at least one-half, then you basically use linear memory
to store your n keys and your memory overhead is small.
And operations still run in time, O(1 + alpha) which is a constant time,
on average if alpha is between 0.5 and 1.
The question is what to do if you don't know in advance how
many keys you want to store in your hash table.
Of course, there is a solution to start with a very big hash table, so
that definitely all the keys will fit.
But this way you will waste a lot of memory.
So, what we can do is copy the idea you learned in the lesson about
dynamic arrays.
You start with a small hash table and
then you grow it organically as you put in more and more keys.
Basically, you resize the hash table and
make it twice bigger as soon as alpha becomes too large.
And then, you need to do what is called a rehash.
You need to copy all the keys from the current hash table to the new bigger hash
table.
And of course,
you will need a new hash function with twice the chronology to do that.
So here is the code which tries to keep loadfFactor below 0.9.
And 0.9 is just a number I selected, you could put 1 here or
0.8, that doesn't really matter.
So first we compute the current loadFactor, which is the ratio
of the number of keys stored in the table to the size of the hash table.
And if that loadFactor just became bigger than 0.9,
we create a new hash table of twice the size of our current hash table.
We also choose a new random hash function from the universal family
with twice the cardinality coresponding to the new hash table size.
And then we take each object from our current hash table, and
we insert it in the new hash table using the new hash function.
So we basically copy all the keys to the new hash table.
And then we substitute our current hash table with the bigger one and the current
hash function with the hash function corresponding to the new hash table.
That way, the loadFactor decreases roughly twice.
Because we added, probably just added one new element,
the loadFactor became just a little more than 0.9.
And then we increase the size of the hash table twice while the number of keys
stayed the same, so the loadFactor became roughly 0.45,
which is below 0.9, which is what we wanted.
So to achieve that, you need to call this procedure rehash after
each operation which inserts something in your hash table.
And it could work slowly when this happens because the rehash
procedure needs to copy all the keys from your current hash
table to the new big hash table, and that works in linear time.
But similarly to dynamic arrays, the amortized running time will still be
constant on average because their hash will happen only rarely.
So you reach a certain level of load factor and
you increase the size of our table twice.
And then it will take twice longer to again reach too high value of load factor.
And then you'll again increase your hash table twice.
So the more keys you put in, the longer it takes until the next rehash.
So their hashes will be really rare, and
that's why it won't influence your running time with operations, significantly.