Kruskal's algorithm for Minimum Spanning Tree[ Edit ]

Given an undirected weighted graph, a minimum spanning tree (MST) is a subset of the edges of the graph which form a tree and have the minimum total edge weight. For a MST to exist, the graph must be connected (that is, every pair of nodes must be reachable from each other).

Example of Minimum Spanning Tree. Total edge weight = 5 + 8 + 8 + 4 + 11 + 6 = 42

Kruskal's algorithm build's the minimum spanning tree one edge at a time. To see what that means, watch the following video.

Video with step-by-step example

Algorithm steps

Start with an empty tree.

Iterate over the edges sorted by weight.

If adding edge (u, v) to the current tree does not create a cycle, then add it, else discard it.

Done!

How to check for a cycle?

Given a graph with n nodes and m edges, if we use an adjacency list representation for the current tree, and BFS / DFS to see if adding edge (u, v) forms a cycle, then the algorithm complexity is O(mn).

However, checking if adding (u, v) forms a cycle can be done in O(\log n) using a disjoint-set data structure, which gives a run-time complexity of O(m \log n).

Bonus: Disjoint-set data-structures

The following is an efficient implementation of disjoint-set data-structures in Python along-with a detailed explanation in the comments.

## implementation of disjoint-set data structure (very efficient!)

# each "set" is a tree, and the "set representative" is the tree root

# hence, two nodes are in the same set if root(u) == root(v)

# initially, everything is in its own set. hence parent(node) = node

parent = range(nn)

size = [1]*nn

# to find the root, start from the node and keep going to parent[node]. stop when parent[node] = node.

# in addition, we set "parent[node] = root(node)" so that next time we look for root(node), we'll get there in 1 step!

def root(node):

if not parent[node] == node:

parent[node] = root(parent[node])

return parent[node]

# to merge the sets, we can simply do parent[root(u)] = root(v)

# to ensure that tree heights are O(log n), we make the root of the smaller tree a child of the root of the larger tree

# (since a node's root can't change > log n times)

def merge(uu, vv):

root_uu, root_vv = root(uu), root(vv)

assert root_uu != root_vv, (root_uu, root_vv)

if size[root_uu] < size[root_vv]:

parent[root_uu] = root_vv

size[root_uu] += size[root_vv]

else:

parent[root_vv] = root_uu

size[root_vv] += size[root_uu]

Correctness of Kruskal's algorithm

The cut property

The main idea behind both Prim's algorithm and Kruskal's algorithm is the cut property.

Consider any subset S of all vertices V of the graph. Consider all edges going from a vertex in S to a vertex outside S. The cut property states that, if e is the edge with minimum weight among all these edges, then e must be part of the minimum spanning tree.

Proof of cut property: The proof is quite simple. Assume the contrary, that is, the MST of the graph does not include e. Since the minimum spanning tree spans all vertices, it has at-least one edge e' going from S to outside S. If this edge is replaced by e, we get a spanning tree with lesser weight. This is a contradiction, since the original tree was a minimum spanning tree.

Correctness of Kruskal's algorithm

In Kruskal's algorithm, in each step, we consider edges sorted by weight. We include an edge only if connecting it's end-points don't form a cycle with already chosen edges.

Let the current edge be (u, v). Consider S = set of all edges u is connected to via edges already chosen in the MST. The current edge (u, v) is the minimum-weight edge from set S to all vertices outside S. Hence, (u, v) must be part of the MST.

Read more…(472 words)

Mark as completed

Previous

Prim's algorithm for Minimum Spanning Tree

Next

Hands-on: Implementing Minimum Spanning Tree algorithms (Prim's / Kruskal's)