So, that's what we're going to show. To get started let's fix an arbitrary cut

A, B. Using the assumption the input graph is

connected, it certainly contains at least one edge that crosses this cut.

The question is whether or not T star contains a crossing edge but certainly

the original graph G contains a crossing edge.

So, here's the key point of the argument. Kruskal's algorithm, by definition, it

makes a single scan through all of the edges.

That is, it considers every edge of the original input graph exactly once.

So, what I want you to do is, I want you to think about this cut A, B which has at

least one edge of G crossing. And I want you to fast forward Kruskals

algorithm until the moment that it first considers an edge crossing this cut AB.

The key claim is that this first edge seen by Kruskal's algorithm, is

definitely be, going to be included in the final output, T star.

So why is this true? Well, let me remind you of the lonely cut

corollary. The lonely cut corollary says that if an

edge is the sole edge crossing a cut, if it's lonely in a cut, then it cannot

participate in any cycle, right?

If it was in a cycle, that cycle would loop back around and cross the cut a

second time. Now, what does that have to do with the

picture here? Well, if this is the first edge that

Kruskal sees crossing this cut, then certainly the tree so far T star contains

nothing crossing this cut, right? It hasn't even seen anything crossing

this cut yet. So at the moment it encounters the first

edge, there's no way including that first edge will create a cycle because that

first edge is going to be lonely in this cut AB.

So again summarising, the first edge crossing a cut is guaranteed to be chosen

by a cross because the algorithmic cannot create a cycle.

That's why there's at least one edge of Kruskal's output crossing this particular

cut AB. Since AB was arbitrary, all edges,

all cuts have some edge of T star crossing them, that's why T star is

connected. So this completes the part of the proof

where we argue that Kruskal's algorithm outputs a spanning tree.

Now we have to move on and argue that it's actually a minimum cost spanning

tree. We're going to argue this in the same way

as we did for Prim's algorithm. We're going to argue that every edge ever

selected by Kruskal's algorithm is justified by the cut property.

Satisfies the hypotheses of the cut property.

Recall from our correctness proof for Prim's Algorithm,

this is enough to argue correctness. That the output is a minimum cost

spanning tree, right?

An edge justified by the cut properties, not a mistake.

It's got to belong to the minimum cost spanning tree.

If you have an algorithm that outputs a spanning tree, and it never made a

mistake, that's got to be the minimum cost spanning tree.

And that's going to be the case for Kruskal.

Now this statement was easy to prove for Prim's algorithm and that's because the

definition of Prim's algorithm, it selects edges based on being the cheapest

in some cut. So it's tailor made for the application

of the cut property. Not so for Kruskal's algorithm.

If you look at the pseudocode, nowhere does the pseudocode discuss taking cheap

edges across cuts. So we have to show that Kruskal's

algorithm in effect is inadvertently at every edge picking the cheapest edge

crossing some cut. And we actually have to exhibit what that

cut is in our proof of correctness. So, that's what we have to do here.

So to argue that, let's just sort of freeze Kruskal's algorithm at some

arbitrary iteration, where it adds a new edge.

We need to show that this edge is justified by the cut property.

So, maybe this edge has end points U and V, and let's let capital T denote the

current value of the edges selected by the algorithm so far.

So let's think about what this state of the world looks like, at a, sort of,

intermediate iteration of Kruskal's algorithm.

So Kruskal's algorithm maintains the invariant there's no cycles but remember

it doesn't maintain any invariant of the current edges forming a connected set so

in general in an intermediate iteration of Kruskal's algorithm, you've got a

bunch of pieces, a bunch of little mini trees floating around the graph.

Connect the components if you like, with respect to the current edge shed, capital

T. And there can be some, you know, isolated

vertices floating around as well. What more can we say?

Well, if the current edge has endpoints U and V and Kruskal is deciding to add this

edge UV to the current set capital T, we certainly know that capital T has no path

between U and V, right?

If it did have a path, then this new edge would create a cycle, then Kruskal would

skip the edge. Since it isn't skipping the edge, U and V

are currently in different pieces. They're in different components with

respect to the current edge set. Now remember, ultimately, if we're going

to apply the cut property, we have to somehow exhibit some cut from somewhere,

justifying this particular edge. So here's where we get the cut from, and

it's going to be a very similar argument to when we proved the empty cut limit

characterizing disconnectedness in terms of empty cuts.

We're going to say look, with respect to the tree edges we have so far, with

respect to the green edges, there's no path from U to V.

That means we can find a cut such that U is on one side, V is on the other side,

and there's no edges crossing the cut. So, here's the cool part of the proof.

And this is also the part where we actually use the fact that Kruskal was a

greedy algorithm. That it considers the edges from cheapest

to most expensive. Notice that we haven't actually used that

fact up to this point and we better use that fact.

So, the claim is that not only is this edge we're adding UV, not only does it

cross this cut AB, but actually it's the cheapest edge crossing the cut.

Nothing from the original input graph G that crosses it can be cheaper.

So why is that true? Well, let's remember how we wrapped up

the previous slide when we were arguing that the output of Kruskal's algorithm is

connected. What did we argue?

We said, well, fix any cut you like. Any cut of the graph.

And freeze Kruskal's algorithm the very first time it considers some edge

crossing that cut. We noticed that Kruskal's algorithm is

guaranteed to include that first edge crossing the cut in its final solution.

There's no way that, that first edge considered can create a cycle with

respect to the edges already chosen. So, it's not going to trigger the

exclusion criterion in Kruskal's algorithm.

And that edge is going to get picked. So what's the significance about being

the first edge crossing a cut well because of the gradient criterian because

Kruskal considers edges from cheapest and most expensive.

The first edge it encounters crossing a cut is also necessarily the cheapest edge

that's crossing the cut. So here's how we weave these different

strands together to complete the correctness proof.

Alright, so let's remember where we are. We're focusing on a single iteration of

Kruskal's algorithm. It's about to add this edge U, V to the

edges, capital T, that it's chosen in the past.

We've exhibited a cut. A, B with a property that no previously

chosen edges, no edges of capital T cross this cut, and UV is going to be the first

chosen edge crossing this cut. We just argued, that Kruskal's algorithm

is guaranteed to pick the first edge crossing a cut.

So by virtue of their not yet being any chosen edges, crossing the cut AB, it

must be the case that this current edge UV is the first one that's ever been seen

by Kruskal's algorithm that crosses this cut AB.

Now, in being the first edge it's ever seen crossing this cut. It must also be

the cheapest edge of the input graph that crosses this cut.

That is the hypothesis of the cut property.

That is why this edge UV and this current arbitrary iteration is justified by the

cut property.