And the natural greedy algorithm that is optimal, is called the furthest in the
future algorithm. So what is the furthest in the future
algorithm? Well, it's exactly what you think it
would be. It's basically what seems like a good
idea at the moment you have to perform a eviction from the cache.
Basically you want to put off judgment day.
You want to put off the regret of evicting this particular piece of data as
long as possible. When are you going to regret evicting a
piece of data? Well, it's when it gets requested next.
So if we have four things in the cache, you know, one is one is requested next,
one is requested in seven time steps and one is requested in you know, 70 time
steps, that's the one you want to evict now because it will take the longest
until you actually regret that eviction. So for example, in the example on the
previous slide, you can check that the furthest in the future algorithm would in
fact evict the ones you want to evict, a and b,
not the ones that we evicted in the example, c and d.
Now at this point, many of you are probably justifiably scratching your
heads. You're wondering, you know, why is this
useful. It doesn't seem like this is what we
wanted. The objection, to this result being that the furthest in the future
algorithm is clairvoyant. Its very definition assumes that you know the
future, it assumes that at the moment that you have to make an eviction you're
aware of when each of the pieces data in the cache will be requested next.
But if you think for a minute about the motivating applications for sudding the
ultimate caching problem, this assumption simply doesn't hold, you simply do not
know the future, you simply do not know when each of the pieces of data in your
cache will be requested next. So this algorithm is not defined, it is
unimplementable. Despite that, this is still an extremely
useful result to know. Why.
Well, two reasons. First of all, this unimplementable
algorithm can never the less, serve as a guide line for practical.
Implementable algorithms. For example it naturally motivates the
LRU or least recently used caching algorithm.
So what you do in the LRU algorithm is that instead of looking forward in the
future, which you can't do generally, you look in the past.
And you say, well, let me guess that whatever's been requested recently, will
be requested again soon. Whatever hasn't been request been
requested for a long time, will continue to not be requested for a long time.
So, that sets as a proxy for the piece of data that's going to be referenced the
furthest down the future, you look for the one that was most recently referenced
the furthest back in the past. So that's the LRU algorithm.
And as long as data exhibits what's called locality of reference,
meaning whatever's being requested a lot in the recent past is also going to be
what's requested in the near future. Then LRU is going to approximate furthest
in the future. And indeed, LRU is in many applications,
the gold standard amongst practical implementable caching algorithms.
The second reason this theorem is useful in practice is because it served as an
idealized benchmark. A hypothetical perfect scenario against
which you can compare your latest and greatest cashing hereistic.
So for example, maybe you have a caching application and you start by implementing
the LRU least recently used caching algorithm and then as a sanity check you
probably want to go back later once you have hindsight you look at the last few
days of traces of logs of page requests and you say, how well did we do.
Let's look at how well our caching algorithm LRU did.
And let's look at how well we would have done had we known the future.
And hopefully, you're just a few percent away.
And then you can conclude that, yes indeed, the data seem to have locality
reference. Yes indeed, LRU is doing almost as well
as if we know the future, and we can proceed.
On the other hand, if you go back through the last few days of logs, and you find
that your caching algorithm is doing much worse than furthest in the future, then
it's back to the drawing board with respect to your caching algorithm.
You should work harder, understand the data better, and come up with a smarter
heuristic. So for almost all of the greedy
algorithms that we cover in this course, I'm going to explain to you why they are
correct. I'm going to prove it rigorously.
this algorithm is an exception. I'm actually not going to prove this
theorem for you. the way it works is by what's called an
exchange argument. So again, you may not have seen any
examples, but you will soon. but the exchange argument to prove the
latter result, as far as I know it's pretty tricky, believe it or not.
Even though the algorithm is natural, you might think this result feels a little
self-evident. Try to prove it rigorously.
Not easy. Not easy at all.
Indeed if you look at a textbook and say operating systems or a field like that,
generally you'll see a description of this algorithm.
You'll see the claim that it's optimal but you won't find the proof.
some algorithms textbooks, for example Algorithm Design by Kleinberg and Tardos,
do include the proof of this theorem. Those of you that are interested, I
challenge you to prove it yourself without looking it up on the web or in a
textbook. I think if you try to prove it you'll
appreciate the subtleties that come up in understanding whether greedy algorithms
are correct or not. In the best case scenario, I would love.
Love to see, a simple proof of this theorem, something that I could explain
in a class like this, in say five minutes or less,
that would be amazing.