CommonLounge Archive

Kruskal's algorithm for Minimum Spanning Tree

July 04, 2018

Given an undirected weighted graph, a minimum spanning tree (MST) is a subset of the edges of the graph which form a tree and have the minimum total edge weight. For a MST to exist, the graph must be connected (that is, every pair of nodes must be reachable from each other).

Example of Minimum Spanning Tree. Total edge weight = 5 + 8 + 8 + 4 + 11 + 6 = 42

Kruskal’s algorithm build’s the minimum spanning tree one edge at a time. To see what that means, watch the following video.

Video with step-by-step example

Algorithm steps

  1. Start with an empty tree.
  2. Iterate over the edges sorted by weight.
  3. If adding edge (u, v) to the current tree does not create a cycle, then add it, else discard it.
  4. Done!

How to check for a cycle?

Given a graph with $n$ nodes and $m$ edges, if we use an adjacency list representation for the current tree, and BFS / DFS to see if adding edge (u, v) forms a cycle, then the algorithm complexity is O($mn$).

However, checking if adding (u, v) forms a cycle can be done in O($\log n$) using a disjoint-set data structure, which gives a run-time complexity of O($m \log n$).

Bonus: Disjoint-set data-structures

The following is an efficient implementation of disjoint-set data-structures in Python along-with a detailed explanation in the comments.

from collections import defaultdict
import heapq
# each "set" is a tree, and the "set representative" is the tree root
# hence, two nodes are in the same set if root(u) == root(v)
# initially, everything is in its own set. hence parent(node) = node
n = 5
parent = list(range(n))
size = [1]*n
# to find the root, start from the node and keep going to parent[node]. stop when parent[node] = node.
# in addition, we set "parent[node] = root(node)" so that next time we look for root(node), we'll get there in 1 step!
def root(node):
    if not parent[node] == node:
        parent[node] = root(parent[node])
    return parent[node]
# to merge the sets, we can simply do parent[root(u)] = root(v)
# to ensure that tree heights are O(log n), we make the root of the smaller tree a child of the root of the larger tree
# (since a node's root can't change > log n times)
def merge(uu, vv):
    root_uu, root_vv = root(uu), root(vv)
    assert root_uu != root_vv, (root_uu, root_vv)
    if size[root_uu] < size[root_vv]:
        parent[root_uu] = root_vv
        size[root_uu] += size[root_vv]
    else:
        parent[root_vv] = root_uu
        size[root_vv] += size[root_uu]
# initialisation
graph = defaultdict(list)
dist = [float('inf')]*n
cost = 0
mst = []
# adding the following edges
edges = [(0, 1), (1, 2), (2, 3), (3, 4), (4, 0), (1, 4), (1, 3)]
weight = [1, 4, 2, 6, 7, 5, 3];
sortedlist = []
for i in range (0, len(weight)):
    sortedlist.append((weight[i], edges[i]))
# sorts in ascending order of weights of edges
sortedlist.sort()
for i in range(0, len(sortedlist)):
    # If the two nodes are in disjoint sets then the
    # roots would be different, so we merge them
    if(root(sortedlist[i][1][0]) != root(sortedlist[i][1][1])):
        merge(sortedlist[i][1][0], sortedlist[i][1][1])
        cost += sortedlist[i][0]
        mst.append((sortedlist[i][1][0], sortedlist[i][1][1]))
print("Minimum cost of spanning tree is: " + str(cost))
print("Following are the edges in MST: ")
for i in range(0,len(mst)):
    print(str(mst[i][0]) + " " + str(mst[i][1])) 

The following is the cpp code for Kruskal’s implementation to find minimum spanning tree:

#include <bits/stdc++.h>
using namespace std;
int n,m;
vector<int>rt, sz;
vector<pair<int, int> >mst;
void init()
{
  /*
    Initialises the root of every
    node as itself and size 
    disjoint set as 1
  */
  for (int i = 0; i < n; i++)
  {
    rt[i] = i;
    sz[i] = 1;
  }
}
int root(int i)
{
  while(i != rt[i])
  {
    rt[i] = rt[rt[i]];
    i = rt[i];
  }
  return i;
}
void join(int a, int b)
{
  int rta = root(a);
  int rtb = root(b);
  int sza = sz[rta];
  int szb = sz[rtb];
  /*
    To merge the sets, we can simply do parent[root(u)] = root(v).
    To ensure that tree heights are O(log n), we make the root of 
    the smaller tree a child of the root of the larger tree.
    (since a node's root can't change more than log n times)
   */
  if(sza >= szb)
  {
    rt[rtb] = rt[rta];
    sz[rta] += sz[rtb];
    sz[rtb] = 0;
  }
  else
  {
    rt[rta] = rt[rtb];
    sz[rtb] += sz[rta];
    sz[rta] = 0;
  }
}
int main() 
{
  n = 5;
  pair<int, int>edges[] = { {0, 1}, {1, 2}, {2, 3}, {3, 4}, {4, 0}, {1, 4}, {1, 3} };
  int weight[] = {1, 4, 2, 6, 7, 5, 3};
  m = sizeof(weight) / sizeof(weight[0]);
  rt.resize(n);
  sz.resize(n);
  init();
  vector<pair<int, int> >v;
  for (int i = 0; i < m; i++)
  {
    v.push_back({weight[i], i});
  }
  // sorts in ascending order of weights of edges
  sort(v.begin(), v.end());
  int cost = 0;
  for (int i = 0; i < m; i++)
  {
    int idx = v[i].second;
    /* If the two nodes are in disjoint sets
       then the roots would be different, in which
       case we join the two nodes
    */
    if (root(edges[idx].first) != root(edges[idx].second))
    {
      join(edges[idx].first, edges[idx].second);
      cost += v[i].first;
      mst.push_back({edges[idx].first, edges[idx].second});
    }
  }
  cout << "Minimum cost of spanning tree is: " << cost << endl;
  cout << "Following are the edges in MST: " << endl;
  for(auto i: mst)
    cout << i.first << " " << i.second << endl;
  return 0;
}

Correctness of Kruskal’s algorithm

The cut property

The main idea behind both Prim’s algorithm and Kruskal’s algorithm is the cut property.

Consider any subset S of all vertices V of the graph. Consider all edges going from a vertex in S to a vertex outside S. The cut property states that, if $e$ is the edge with minimum weight among all these edges, then $e$ must be part of the minimum spanning tree.

Proof of cut property: The proof is quite simple. Assume the contrary, that is, the MST of the graph does not include $e$. Since the minimum spanning tree spans all vertices, it has at-least one edge $e'$ going from S to outside S. If this edge is replaced by $e$, we get a spanning tree with lesser weight. This is a contradiction, since the original tree was a minimum spanning tree.

Correctness of Kruskal’s algorithm

In Kruskal’s algorithm, in each step, we consider edges sorted by weight. We include an edge only if connecting it’s end-points don’t form a cycle with already chosen edges.

Let the current edge be (u, v). Consider S = set of all edges u is connected to via edges already chosen in the MST. The current edge (u, v) is the minimum-weight edge from set S to all vertices outside S. Hence, (u, v) must be part of the MST.


© 2016-2022. All rights reserved.