So, that's what we're going to show. To get started let's fix an arbitrary cut
A, B. Using the assumption the input graph is
connected, it certainly contains at least one edge that crosses this cut.
The question is whether or not T star contains a crossing edge but certainly
the original graph G contains a crossing edge.
So, here's the key point of the argument. Kruskal's algorithm, by definition, it
makes a single scan through all of the edges.
That is, it considers every edge of the original input graph exactly once.
So, what I want you to do is, I want you to think about this cut A, B which has at
least one edge of G crossing. And I want you to fast forward Kruskals
algorithm until the moment that it first considers an edge crossing this cut AB.
The key claim is that this first edge seen by Kruskal's algorithm, is
definitely be, going to be included in the final output, T star.
So why is this true? Well, let me remind you of the lonely cut
corollary. The lonely cut corollary says that if an
edge is the sole edge crossing a cut, if it's lonely in a cut, then it cannot
participate in any cycle, right?
If it was in a cycle, that cycle would loop back around and cross the cut a
second time. Now, what does that have to do with the
picture here? Well, if this is the first edge that
Kruskal sees crossing this cut, then certainly the tree so far T star contains
nothing crossing this cut, right? It hasn't even seen anything crossing
this cut yet. So at the moment it encounters the first
edge, there's no way including that first edge will create a cycle because that
first edge is going to be lonely in this cut AB.
So again summarising, the first edge crossing a cut is guaranteed to be chosen
by a cross because the algorithmic cannot create a cycle.
That's why there's at least one edge of Kruskal's output crossing this particular
cut AB. Since AB was arbitrary, all edges,
all cuts have some edge of T star crossing them, that's why T star is
connected. So this completes the part of the proof
where we argue that Kruskal's algorithm outputs a spanning tree.
Now we have to move on and argue that it's actually a minimum cost spanning
tree. We're going to argue this in the same way
as we did for Prim's algorithm. We're going to argue that every edge ever
selected by Kruskal's algorithm is justified by the cut property.
Satisfies the hypotheses of the cut property.
Recall from our correctness proof for Prim's Algorithm,
this is enough to argue correctness. That the output is a minimum cost
spanning tree, right?
An edge justified by the cut properties, not a mistake.
It's got to belong to the minimum cost spanning tree.
If you have an algorithm that outputs a spanning tree, and it never made a
mistake, that's got to be the minimum cost spanning tree.
And that's going to be the case for Kruskal.
Now this statement was easy to prove for Prim's algorithm and that's because the
definition of Prim's algorithm, it selects edges based on being the cheapest
in some cut. So it's tailor made for the application
of the cut property. Not so for Kruskal's algorithm.
If you look at the pseudocode, nowhere does the pseudocode discuss taking cheap
edges across cuts. So we have to show that Kruskal's
algorithm in effect is inadvertently at every edge picking the cheapest edge
crossing some cut. And we actually have to exhibit what that
cut is in our proof of correctness. So, that's what we have to do here.
So to argue that, let's just sort of freeze Kruskal's algorithm at some
arbitrary iteration, where it adds a new edge.
We need to show that this edge is justified by the cut property.
So, maybe this edge has end points U and V, and let's let capital T denote the
current value of the edges selected by the algorithm so far.
So let's think about what this state of the world looks like, at a, sort of,
intermediate iteration of Kruskal's algorithm.
So Kruskal's algorithm maintains the invariant there's no cycles but remember
it doesn't maintain any invariant of the current edges forming a connected set so
in general in an intermediate iteration of Kruskal's algorithm, you've got a
bunch of pieces, a bunch of little mini trees floating around the graph.
Connect the components if you like, with respect to the current edge shed, capital
T. And there can be some, you know, isolated
vertices floating around as well. What more can we say?
Well, if the current edge has endpoints U and V and Kruskal is deciding to add this
edge UV to the current set capital T, we certainly know that capital T has no path
between U and V, right?
If it did have a path, then this new edge would create a cycle, then Kruskal would
skip the edge. Since it isn't skipping the edge, U and V
are currently in different pieces. They're in different components with
respect to the current edge set. Now remember, ultimately, if we're going
to apply the cut property, we have to somehow exhibit some cut from somewhere,
justifying this particular edge. So here's where we get the cut from, and
it's going to be a very similar argument to when we proved the empty cut limit
characterizing disconnectedness in terms of empty cuts.
We're going to say look, with respect to the tree edges we have so far, with
respect to the green edges, there's no path from U to V.
That means we can find a cut such that U is on one side, V is on the other side,
and there's no edges crossing the cut. So, here's the cool part of the proof.
And this is also the part where we actually use the fact that Kruskal was a
greedy algorithm. That it considers the edges from cheapest
to most expensive. Notice that we haven't actually used that
fact up to this point and we better use that fact.
So, the claim is that not only is this edge we're adding UV, not only does it
cross this cut AB, but actually it's the cheapest edge crossing the cut.
Nothing from the original input graph G that crosses it can be cheaper.
So why is that true? Well, let's remember how we wrapped up
the previous slide when we were arguing that the output of Kruskal's algorithm is
connected. What did we argue?
We said, well, fix any cut you like. Any cut of the graph.
And freeze Kruskal's algorithm the very first time it considers some edge
crossing that cut. We noticed that Kruskal's algorithm is
guaranteed to include that first edge crossing the cut in its final solution.
There's no way that, that first edge considered can create a cycle with
respect to the edges already chosen. So, it's not going to trigger the
exclusion criterion in Kruskal's algorithm.
And that edge is going to get picked. So what's the significance about being
the first edge crossing a cut well because of the gradient criterian because
Kruskal considers edges from cheapest and most expensive.
The first edge it encounters crossing a cut is also necessarily the cheapest edge
that's crossing the cut. So here's how we weave these different
strands together to complete the correctness proof.
Alright, so let's remember where we are. We're focusing on a single iteration of
Kruskal's algorithm. It's about to add this edge U, V to the
edges, capital T, that it's chosen in the past.
We've exhibited a cut. A, B with a property that no previously
chosen edges, no edges of capital T cross this cut, and UV is going to be the first
chosen edge crossing this cut. We just argued, that Kruskal's algorithm
is guaranteed to pick the first edge crossing a cut.
So by virtue of their not yet being any chosen edges, crossing the cut AB, it
must be the case that this current edge UV is the first one that's ever been seen
by Kruskal's algorithm that crosses this cut AB.
Now, in being the first edge it's ever seen crossing this cut. It must also be
the cheapest edge of the input graph that crosses this cut.
That is the hypothesis of the cut property.
That is why this edge UV and this current arbitrary iteration is justified by the
cut property.