0:00

We're now ready to present all the details of the Kruskal Algorithm.

Â Namely, we will prove that the Kruskal strategy is optimal,

Â it produces a spanning tree of minimum total weight,

Â and we will also present implementation details of this algorithm.

Â Recall that the idea of this algorithm is the following.

Â We start with an empty set X, and we repeatedly add to this set

Â the next lightest edge that doesn't produce a cycle.

Â So it is not difficult to see that

Â at any point of time the set of edges X forms a forest.

Â That is a collection of trees.

Â Let me illustrate this.

Â Assume that we have some set of vertices and initially, the set X is empty,

Â which means that each of our vertices forms a tree of size 1,

Â namely a tree that contains 1 vertex and no edges.

Â Initially each vertex is isolated in our set X.

Â Now, we start adding edges.

Â Probably, this is the first one, then we add this edge, then this edge,

Â then this edge, then this edge, for example.

Â At this point of time, our set X consists of three trees.

Â So this is the first tree, T1.

Â This is the second tree, T2.

Â And this is the third tree, T3.

Â In particular, the tree T3 contains just one vertex.

Â It is an isolated vertex, okay?

Â 2:38

As one part of this cut namely this is going to be the set S so

Â this is one part of our partition and

Â all other vertices is the other part of this partition.

Â In this case, we see that e is the lightest edge

Â that joins two vertices from different parts.

Â Which means in turn that cut property justifies that adding

Â e in this case is indeed a safe move.

Â In other words, if our carbon set tax is a part

Â of some optimal spanning tree, then x with

Â e added Is also a part of some minimum spanning tree.

Â 3:25

Once again, initially, the set X in the Kruskal algorithm is empty,

Â which means that each vertex of our initial graph

Â forms a separate connected component.

Â So this is how initially the set x looks like.

Â So each vertex lies in a separate connective component.

Â Then we start adding edges to x.

Â This creates [NOISE] a forest.

Â In this forest currently, we have three trees.

Â This is the first tree.

Â This is the second one.

Â And this is the third one.

Â Assume now that the next lightest edge that Kruskal's Algorithm

Â considers is the following one.

Â 4:17

First of all, we need to be able to check whether it joins two vertices that lie

Â in the same tree or in other words, that lie in the same connected component.

Â In this case, they lie in the same connected component, so

Â Kruskal's Algorithm will not edit through the set x,

Â because otherwise, it would produce a cycle in our set x.

Â Now, assume that next set that Kruskal's Algorithm tries is the following.

Â Again, we need to check whether the corresponding two end points lie in

Â the same connected component.

Â In this case, it is not a case.

Â They lie in different connected component.

Â So we add this edge and to this point, we should update the data structures that

Â we use to indicate that now we actually merge trees T1 and T2.

Â So what we need to check in our data structure is whether two given vertices

Â lie in the same set or in the same connected component, and also if they lie

Â in different connected components, we need to merge the corresponding two trees.

Â So the perfect choice for data structure for this algorithm is, of course,

Â the disjoint sets data structure.

Â Once again, to check whether two given vertices lie

Â in different connected components, we just check whether find of

Â one endpoint is equal to find of the other end point of this edge.

Â If they are different then they lie in different connected component.

Â And when adding an edge to the set X, we need to merge the corresponding two tree

Â and this is done by calling the method union of the corresponding two end points.

Â We will now illustrate this on a toy example.

Â This is a toy example where we have six vertices.

Â Let's first call them A, B, C, D, E, F, and

Â let's assume that we have a data structure disjoint set and

Â let me show the contents of this disjoint sets of this data structure.

Â So initially, each vertex lies in a separate set.

Â No we start processing edges in order of non-decreasing weight.

Â So the first lightest edge is AE.

Â We check whether A and E, at this point,

Â lie a different connected components.

Â For this way, we call find of A and find of E.

Â This gives us different IDs because they stay in different sets.

Â So we add this edge to our current solution and

Â we also notify our data structure that now A and

Â E actually lie in the same connected component.

Â So now it looks like this.

Â The next lightest edge if the edge CF.

Â Again we ask our data structure whether C and F belong to the same set and

Â each replies that they do not belong to the same set because find of C is

Â not equal to find of F, so it is safe to add this edge to our solution.

Â We also notify our data structures and C and

Â F now lie in the same set by calling union of C and F.

Â So now C and F lie in the same set.

Â 7:56

The next edge is A, E, D and we see that A and

Â D lie in different connected components so we just add this etch to a solution and

Â also notify our data structures that we need to merge sets

Â that contain the vertex A and the vertex D.

Â So now, we have three different disjoint sets in our data structure,

Â which actually corresponds to vertices of our three trees.

Â So the first tree contains vertices AED,

Â the second one contains the vertex B, and the last one contains vertices C and F.

Â Now, the next lightest edge is DE, it has weight 3.

Â However, we see that D and E belong to the same connected component.

Â This, in turn, means that if we added the edge DE to our current solutions,

Â this would produce a cycle.

Â So we just keep the edge DE, and we continue to the next lightest edge.

Â The next one is AB, and we see that A and B lie in different connected components,

Â so it is safe to add the edge AB to the current solution.

Â We also need to merge the corresponding two sets.

Â 9:44

It is of weight 8 and, at this point, we also nudge two sets.

Â And now, all our vertices lie in the same connected component,

Â which means that we constructed an optimal spanning tree,

Â that is a spanning tree of minimum total weight.

Â The pseudocode of the Kruskal algorithm looks as follows.

Â First, for each vertex in our graph, we create a separate disjoint set.

Â We do this by calling MakeSet method of disjoint sets data structure.

Â Then we initialize the set of edges X by empty set.

Â The next step is that we sort the edges, all the edges of our graph, by weight.

Â Then we process all the edges in order of non-decreasing weight.

Â This is done in this is fold.

Â Then for reach such edge, we need to check whether adding in the x safe or not.

Â Namely, whether it produces a cycle or not.

Â To do this, we just check whether u and v belong to different connector components.

Â For this, we need to check where to find a few equal to find a v or not.

Â If they're different, then they lie in different connected components.

Â In this case, it is safe to add the edge u,

Â v to the set X and produces in this line and

Â also in this case we need to notify our data structure that all

Â the vertices that before that lied in connected component with u and three,

Â now lie in the same connected components, because we just joined these two trees,

Â and this is done by calling union of of u and tree.

Â Finally, in the end, we just return the resulting set X.

Â It remains to estimate the running time of the Kruskal's algorithm.

Â We start by sorting the edges, so this requires big O(E log E) time, right?

Â This in turn can be rewritten as big O(E log of V squared),

Â just because a simple graph has at most V squared edges.

Â This, in turn, can be rewritten as just E times 2 log V.

Â 12:42

Then we also make at most V minus one calls to the union procedure.

Â Why V minus one?

Â Well, just because initially we have n connected components.

Â Namely, when the set x is empty,

Â each vertex of our graph forms a separate connected components.

Â Then each time when we call union, we actually merge two connected components.

Â And in the end of the run of our algorithm,

Â we have just one connected component.

Â So all the vertices lie in the same tree.

Â So initially, we have n connected components and then we have 1 and

Â each time we call union, we reduce the number of connected components by 1 which

Â means that we make exactly V minus 1 calls to union procedure.

Â Okay, so we have roughly E calls to find and roughly V calls to union procedure.

Â Now recall that if we implement the disjoint set data structure as a forest or

Â as a collection of disjoint trees and we use union by rent.

Â Heuristic than the running time that abound on the running time of each

Â iteration is just log v, which gives us that amount E plus V times log v.

Â 13:57

Recall also that in our case,

Â the graph is connected, which mean that e is at least v minus 1,

Â which in turn means that E plus V, is at most 2E.

Â Which allows us to rewrite it as follows.

Â So the upper bound on the running time of the second

Â step is actually the same as for the first step.

Â It is O of E log V, which gives us that the upper bound

Â on the running time of the whole algorithm is big O(E log V).

Â Now recall that we actually know that if, for our implementation

Â of disjoint sets data structure, we use both union by run heuristic and

Â past compression heuristic then we can state a strong upper bound.

Â That is, instead of using log v here,

Â we actually have log star of v, the iterated log, right?

Â This gives us a stronger upper bound for the second step.

Â Unfortunately, this doesn't improve the total

Â running time because still the upper bound for

Â sorting all the edges delineates the upper bound for the second step.

Â However, for some applications, the edges are given to us already in sorted order,

Â which means that, in this case,

Â we do not need to spend E log V time for sorting the edges.

Â So that the total running time in this case becomes equal to E times log* of V.

Â Which makes the Kruskal algorithm in this case even more efficient.

Â So in this case, the running time is upper bounded

Â by E times log star of V, which is nearly linear.

Â