Hands-on: Implementing Merge-sort

June 29, 2018

Now that we have learnt about merge-sort, it’s time to implement it. You can work in any programming language of your choice - Python, C++, Java, or anything else you want. Let’s get started!

Implementing Merge-sort

Python

Below, we provide a short Python code template for you to implement merge-sort.

# merge_sort.py 
def merge_sort(array):
    # TODO: base case 
    # TODO: split in two and recurse 
    return merge(left, right)
def merge(left, right):
    merged = list()
    ii, jj = 0, 0 # pointers for left and right 
    # TODO: create merged lists 
    # check if pointers have reached the end of the list 
    assert ii == len(left), (ii, len(left))
    assert jj == len(right), (jj, len(right))
    return merged
array = [5, 10, 8, 7, 3, 7, 6, 12, 2, 7]
result = merge_sort(array)
assert result == [2, 3, 5, 6, 7, 7, 7, 8, 10, 12], result

C++, Java, etc

If you’re not using Python, have your implementation accept test cases from standard input, and print results to standard output.

Input description: The first line contains the number of test cases T. This is followed by T lines. Each line starts with a number N, the size of the array. On the same line, the N numbers of the array are listed one after the other.

Sample input:

2
3 227 198 19
10 458 687 945 378 479 421 33 295 922 840

Output description: Output should contain T lines. Each line should just contain the numbers in sorted order, separated by spaces.

Sample output:

19 198 227
33 295 378 421 458 479 687 840 922 945

Notes

The trickiest part about implementing merge-sort is the merge function. Specifically, be careful about handling the pointers when one of the segments has been scanned completely, but the other segment’s pointer still has some elements to cover.

Grading your solution

Python

You can use the following file to grade your solution.

# grader.py
from __future__ import print_function
import datetime
import random
from merge_sort import merge_sort
npassed, nfailed = 0, 0
for itest in range(1, 13):
    print('Test case:', itest)
    nelements = random.randint(int((10**(itest/2.))/2), int(10**(itest/2.)))
    print('Array size:', nelements)
    array = [random.randint(1, 100*nelements) for _ in range(nelements)]
    tic = datetime.datetime.now()
    submission = merge_sort(array)
    toc = datetime.datetime.now()
    answer = sorted(array)
    correct = len(submission) == len(answer) and (submission == answer)
    if correct:
        print('PASSED (Runtime = %.2f seconds)' % (toc-tic).total_seconds())
        npassed += 1
    else:
        print('FAILED')
        nfailed += 1
        print(array)
        print(submission)
        print(answer)
    print('='*100)
print('TOTAL PASSED: %d. TOTAL FAILED: %d.' % (npassed, nfailed))

Save the file above, and then run the following command:

python grader.py

and you should see the results of running the grader.

C++, Java, etc

You’ll find all the files required to check your solution here: Algorithms - Sorting. In particular, the folder includes 3 files - input, answer and grader.py.

Do the following to check your solution.

{COMMAND TO RUN C++/JAVA CODE} < input > output 
python grader.py output answer

Solution

The solution for this assignment can be found here: merge_sort.py. The solution includes two implementations of the merge() function. The first one is easier to read and understand. The second one, merge_concise(), is more concise. Either way, the code takes about 5 seconds to run for input sizes around 700,000. Note that if you implemented the solution in C++ / Java, your code should be executing about > 5x faster.