Merge-sort: Detailed tutorial with Python & C++ implementation

September 15, 2016

Merge sort is an efficient and recursive algorithm for sorting an array. To sort an array recursively, merge-sort splits the array into two and sorts each half. Then, it uses the merge operation to combine the two sorted halfs into a single sorted array.

First, we will see how the merge operation works. Then, we will see how the merge-sort algorithm works as whole.

Merge operation

Suppose that we already have two sorted arrays, say A and B. Then the merge operation merges them into a single sorted array C.

For example, let’s say we have

A = [22, 43, 59]
B = [17, 31, 64]

We want to create a single sorted array C, so:

C = [17, 22, 31, 44, 59, 64]

The merge operation works as follows.

We initialize two pointers, i and j, which point to the first element of the array A and array B respectively. C starts as an empty array.
If A[i] is less than (or equal to) B[j], we add element A[i] to array C and increment i so that it points to the next element in array A. Otherwise, we add element B[j] to array C and increment j so that it points to the next element in array B.
Repeat step 2 while we have not exhausted all elements of arrays A and B. At some point, one of i or j will reach the end of the array. When that happens, we add the remaining elements of the other array to C.

For our example above, we start with:

A = [22, 43, 59] and B = [17, 31, 64]
C = []
i = 0 (points to 22) and j = 0 (points to 17)

Step 2:

We compare A[0] and B[0]. B[0] = 17 is smaller. So we add 17 to C and update j.

C = [17]
i = 0 (points to 22) and j = 1 (points to 31)

Step 3 = keep repeating above:

We compare A[0] and B[1]. A[0] = 22 is smaller. So we add 22 to C and update i.

C = [17, 22]
i = 1 (points to 43) and j = 1 (points to 31)

We compare A[1] and B[1]. B[1] = 31 is smaller. So we add 31 to C and update j.

C = [17, 22, 31]
i = 1 (points to 43) and j = 2 (points to 64)

We compare A[1] and B[2]. A[1] = 43 is smaller. So we add 43 to C and update i.

C = [17, 22, 31, 43]
i = 2 (points to 59) and j = 2 (points to 64)

We compare A[2] and B[2]. A[2] = 59 is smaller. So we add 59 to C and update i.

C = [17, 22, 31, 43, 59]
i = 3 (reached end) and j = 2 (points to 64)

Now that we have reached the end of A, we can just add the remaining elements of B to C. Since j = 2, there’s only one remaining element in B: 64.

C = [17, 22, 31, 43, 59, 64]

After going through all values in A, we append the remaining elements of array B without any further comparisons.

Done! We have built the sorted array C.

The two arrays to be merged need not be of the same sizes. The merge operation can work on arrays of different sizes.

Run-time: Each element is processed and put into array C only once. Hence, the time complexity of merge operation is O(n), where n is the total number of elements in A and B combined.

Python implementation of merge operation

def merge(left, right):
    left_length = len(left)
    right_length = len(right)
    array = []
    i = 0
    j = 0
    while i < left_length and j < right_length:
        if left[i] < right[j]:
            array.append(left[i])
            i += 1
        else:
            array.append(right[j])
            j += 1
    while i < left_length:
        array.append(left[i])
        i += 1
    while j < right_length:
        array.append(right[j])
        j += 1
    return array
a = [22, 43, 59]
b = [17, 31, 64]
print("Initial arrays:", a, b)
print("Merged array:", merge(a, b))

C++ implementation of merge operation

void merge(int a[], int b[], int a_size, int b_size)
{
    /*
        Size of the new array c 
        will be the sum of a_size and b_size
    */
    int c_size = a_size + b_size;
    int c[c_size];
    // Keep adding the smaller element to C
    int i = 0, j = 0, k = 0;
    while (i < a_size && j < b_size)    // loop_0
    {
        if (a[i] <= b[j])
            c[k++] = a[i++];
        else 
            c[k++] = b[j++];
    }
    // Add remaining elements from A or B to C. Only *one* of the below loops execute.
    while (i < a_size)                 // loop_1
        c[k++] = a[i++];
    while (j < b_size)                 // loop_2
        c[k++] = b[j++];
    cout << "Sorted array" << endl;
    for (int i = 0; i < c_size; i++ )
        cout << c[i] << " ";
}
int main() 
{
    // Initialising the arrays a and b
    int size_a = 3, size_b = 3; 
    int a[] = {22, 43, 59};    
    int b[] = {17, 31, 64};
    cout << "Initial arrays" << endl;
    for (int i = 0; i < 3; i++) cout << a[i] << " ";
    cout << endl;
    for (int i = 0; i < 3; i++) cout << b[i] << " ";
    cout << endl;
    merge(a, b, size_a, size_b);
    return 0;
}

In the above code, exactly one of loop_1 and loop_2 will run depending on which of the initial arrays has leftover elements.

Merge Sort Algorithm

Now that we know how to merge two sorted arrays into a single sorted array, let’s see how merge-sort works.

We want to sort an array with N elements. For example, let’s say we have an array of $8$ elements. If we were given two already sorted arrays of size $4$ each, then using the merge operation we can obtain a single sorted array of length $8$.

The question that arises here is how do we get the two sorted arrays. This is achieved by recursively calling the function mergeSort() on array’s first half and second half. Recall that recursion is a process in which a function calls itself — we learnt about it in this CommonLounge tutorial on Recursion.

Here’s the pseudo-code for the above:

function mergeSort(array A)
    if length(A) <= 1: don't do anything (base case)
    recursively sort first-half of A
    recursively sort second-half of A
    merge sorted halves into single sorted array

The mergeSort() function calls itself on the first-half and second-half of the array A. The base case for recursion is when we call mergeSort() for an array of size 1, since a single element array is always sorted.

From two arrays of size $1$ to get a sorted array of size $2$. From two sorted arrays of size 2, we get a sorted array of size 4. This process repeats itself and we obtain a sorted array of any given length.

Animation for merge-sort

Python 3 implementation

def merge(left, right):
    left_length = len(left)
    right_length = len(right)
    array = []
    i = 0
    j = 0
    while i < left_length and j < right_length:
        if left[i] < right[j]:
            array.append(left[i])
            i += 1
        else:
            array.append(right[j])
            j += 1
    while i < left_length:
        array.append(left[i])
        i += 1
    while j < right_length:
        array.append(right[j])
        j += 1
    return array
def merge_sort(array):
    length = len(array)
    # base case
    if length <= 1: return array
    # recursion
    midpoint = length // 2
    left_half = merge_sort(array[:midpoint])
    right_half = merge_sort(array[midpoint:])
    return merge(left_half, right_half)
array = [10, 20, 5, 8, 7, 2, 15, 12]
print("Initial array:", array)
print("Sorted array:", merge_sort(array))

C++ Implementation

Code for merge-sort in C++:

#include <bits/stdc++.h>
using namespace std;
void merge(int arr[], int left, int right)
{
    int mid = (left + right) / 2;
    int lsize = mid - left + 1, rsize = right - mid;
    int pos = left;
    int leftarr[lsize];
    int rightarr[rsize];
    // Create a copy of left and right sorted arrays.
    for (int i = 0; i < lsize; i++)
        leftarr[i] = arr[left+i];
    for (int i = 0; i < rsize; i++)
        rightarr[i] = arr[mid+1+i];
    // Keep adding the smaller element to C
    int i = 0, j = 0; 
    while (i < lsize && j < rsize)
    {
        if (leftarr[i] <= rightarr[j]) 
            arr[pos++] = leftarr[i++];      
        else 
            arr[pos++] = rightarr[j++];      
    }
    // Add the remaining elements in either of the two arrays.
    while (i < lsize)
        arr[pos++] = leftarr[i++];
    while (j < rsize)
        arr[pos++] = rightarr[j++];
}
/*
    Function mergeSort recursively call itself on the left half 
    and right half and then calls merge function to merge the two
    sorted halves
*/
void mergeSort(int arr[], int left, int right)
{
    // The following if condition is the base case. 
    if (left == right)    
        return;
    int mid = (left + right) / 2;
    mergeSort(arr, left, mid);
    mergeSort(arr, mid+1, right);
    merge(arr, left, right);
}
int main()
{
    int a[] = {10, 20, 5, 8, 7, 2, 15, 12};
    int n = 8;
    // Display the array 
    cout << "Initial array: "; 
    for (int i = 0; i < n; i++)
        cout << a[i] << " ";
    cout << endl; 
    /*
        Call function mergeSort on input array
        mergeSort(array, start, end)
    */
    mergeSort(a, 0, n-1);
    // Display the array 
    cout << "Sorted array: "; 
    for (int i = 0; i < n; i++)
        cout << a[i] << " ";
    cout << endl; 
    return 0;
}

Execution flow

Let us understand the execution flow of the merge-sort implementation given above by the following example.

Let the given array be a = [8, 6, 4, 2].

mergeSort(a, 0, 3) is called from the main function.
mergeSort(a, 0, 1) and mergeSort(a, 2, 3) are called.
From mergeSort(a, 0, 1), the calls to mergeSort(a, 0, 0) and mergeSort(a, 1, 1) will return as the condition left == right holds true. It then merges them to get [6, 8].
Similarly, the call the second-half, mergeSort(a, 2, 3), would return [2, 4].
Code execution returns to mergeSort(a, 0, 3). The two halves are merged and we get the following array: a = [2, 4, 6, 8].

Run-time Analysis

The easiest way to understand the run-time complexity of merge-sort is to look at the recursion graph.

Since we keep on dividing $n$ by $2$ in each subsequent layer till we reach a single element array, there are $O(\log n)$ layers in the tree.
Within each layer, each element appears exactly once. Hence, each layer executes in $O(n)$ operations.

Therefore, the algorithm complexity is $O(n\times \log n)$.

Recursion graph of merge-sort

For example, above you can see the diagram of the recursion graph of merge-sort on an array of size $7$. The graph has roughly $2\times n\log_2 n$ layers.

Each of the layers in the lower half of the graph executes in $O(n)$ operations as the merge operation has linear complexity.

Combining [27, 38] with [3, 43] takes 4 operations, and combining [9, 82] and [10] takes 3 operations. For that entire layer, the number of operations is roughly 7 operations which is basically $O(n)$. Similarly, we can observe for all layers where the merge operation executes.

Hence, the complexity of merge sort is $O(n\log n)$.

For n = 100,000, merge-sort is about 5,000 times faster than an O(n²) sorting algorithm!

Video Tutorial / Review

The first 2.5 minutes of the video above explain the merge-sort algorithm. The next 3 minutes (optional) compare merge-sort and quick-sort in one case where quicksort’s pivot choices get lucky, and another where quicksort gets unlucky.

Summary

In this tutorial you learned about

merge sort — a recursive algorithm for sorting
merge operation — is the core of the merge-sort algorithm. it merges two sorted arrays into a single sorted array
References is an efficient algorithm which executes in $O(n\log n)$