CommonLounge Archive

Hands-on: K-sort (with progressive 'hint-by-hint' solution)

July 20, 2018

We’ve just implemented the heap data structure and the heap-sort algorithm. Now, we’ll solve the K-sort problem using heaps. You can work in any programming language of your choice - Python, C++, Java, or anything else you want. Let’s get started!

Problem Statement

We’re given $k$ sorted arrays. We need to combine the sorted arrays into a single sorted array in O($n \log k$) time, where $n$ is the total number of elements in all the arrays combined.

Note: The desired complexity is O($n \log k$), not O($n \log n$).

Constraints: $n \leq 5000000$ and $k \leq 50$.

IO description

Input description: The first line contains the number of test cases T. Each test case begins with a line with a single number $k$ (number of sorted arrays). Each of the next $k$ lines describes a sorted array. The first number in the line is $n_i$, the number of elements in array $i$. Which is then followed by $n_i$ numbers which make the sorted array.

Sample Input:

2
2
3 16 30 126
2 21 113
4
2 95 95
3 16 30 126
2 21 113
1 42

Output description: Output should contain T lines. Each line should contain the answer for the respective test case - the combined sorted array.

Sample output:

16 21 30 113 126
16 21 30 42 95 95 113 126

Explanation:

The first test case consists of 2-sorted arrays that need to be combined. The second test case consists of 4-sorted arrays that need to be combined.

Grading your solution

You can find all the required files to grade your solution here.

First run your solution with the provided input

{COMMAND TO RUN Python/C++/Java CODE} < input > output

Then, grade your output using the grader and answer provided.

python grader.py output answer

Progressive ‘hint-by-hint’ solution

Only look at the hints if you can’t solve the problem for at-least 20-minutes. Read only one hint that you could not figure out on your own. Then spend at-least 20-minutes trying to solve the problem again.

Hint 1:

We are given k sorted arrays. Somehow, we need to use the sorted-ness of the given arrays to do sorting in O(n log k) instead of O(n log n).

Hint 2:

When we were sorting a general array of size n, we had to add all the elements to a heap, and hence the heap operations took log n time each. Now, it’s possible for us to ensure the total size of the heap is at-most k, so each operation takes time log k.

Hint 3:

We essentially need to perform the merge step from merge-sort, but with k arrays instead of 2. We can use a heap to do this efficiently.

Full solution

Here is the official solution for this assignment. The code executes in ~10 seconds. If you implemented your solution in C++ or Java, you should expect your solution to run at-least 5x faster.

A note on the run-time

The above heap-based solution executes in ~10 seconds.

If we simply combine all the elements into a single array and sort the array using quick-sort, then we get the following run-times.

  1. Our quick-sort implementation: 47 seconds.
  2. Our in-place quick-sort implementation: 30 seconds.
  3. Python’s optimized quick-sort implementation: 6 seconds.

As we can see, Pythons optimized implementation for quick-sort is faster than the optimization we achieved because of O($\log k$) vs O($\log n$), however we achieve a significant speed-up when we use unoptimized quick-sort implementation, even if we use in-place quick-sort.


© 2016-2022. All rights reserved.