Understanding Mergesort

Mergesort is a sorting algorithm that is interesting to compare with quicksort.

Mergesort and Quicksort are both recursive Divide & Conquer algorithms
Quicksort is randomized, meaning that the number of recursive calls required cannot be determined ahead
Mergesort has a completely deterministic runtime - but this is typically slower than Quicksort. On the other hand, Quicksort can have a much worse runtime, if the input data is really bad (for example, quicksort doesn't perform too well on already sorted input).
Mergesort is like a good student at sorting: reliable; whereas Quicksort is more like the mad genius that can be brilliant and completely disappoint at other times.

The basic structure of quicksort was this:

partition the input data
recursively sort the lower half
recursively sort the upper half

The basic structure of mergesort is this:

recursively sort the lower half
recursively sort the upper half
merge the two sorted halfs: because both halfs are already sorted, this is much more simple algorithm than the partition function.

From this description you can see already what quicksort does different: by first looking at the data and using partial sorting information gained from looking at the input data it can perform better - in some cases - than mergesort can.

Mergesort has one additional drawback: it needs additional memory to store the data. Let's look at two slightly different implementations of Mergesort in Python, and then check the performance compared with Quicksort.

The Unsurprising Implementation

The first implementation works "without surprises", meaning it doesn't use python-specific list manipulation functions. The merge process looks difficult, but it really isn't:

create a temporary array of the same size as the input array (or rather: input range)
Since both halfs of the input are already sorted, we need to look only at a single element in both lists in order to fill the temporary array
There may be traililng elements in both halfs that need to be added to the end of the temp array
Finally, write back the temp array to the original array.

This is the code:

def merge(A, start, mid, stop):

    # create temporary array
    width = (stop+1-start)
    temp = [None] * width

    index_in_lower_half = start
    index_in_upper_half = mid+1
    write_index = 0

    # insert the contents of the already sorted array halfs into the temporary array
    while (index_in_lower_half <= mid) and (index_in_upper_half <= stop):

        a = A[index_in_lower_half]
        b = A[index_in_upper_half]

        if a < b:
            temp[write_index] = a
            index_in_lower_half += 1
        else:
            temp[write_index] = b
            index_in_upper_half += 1

        write_index += 1

    # insert any trailing elements from the lower halg
    if index_in_lower_half <= mid:

        while index_in_lower_half <= mid:
            temp[write_index] = A[index_in_lower_half]
            index_in_lower_half += 1
            write_index += 1

    # insert any trailing elements from the upper half
    if index_in_upper_half <= stop:

        while index_in_upper_half <= stop:
            temp[write_index] = A[index_in_upper_half]
            index_in_upper_half += 1
            write_index += 1

    # write back temp array to the original input array
    write_index -= 1
    while write_index >= 0:
        A[start+write_index] = temp[write_index]
        write_index -= 1

def mergesort_range1(A, start, stop):
    if start < stop:
        mid = start + (stop-start)/2
        mergesort_range1(A, start, mid)
        mergesort_range1(A, mid+1, stop)
        merge1(A, start, mid, stop)

def mergesort1(A):
    if not A:
        return

    mergesort_range1(A, 0, len(A)-1)
    return A

The surprising implementation

Python has lots of builtin list manipulation functions. In Python, allocating temporary lists from builtin functions can be faster than fine-tuned code that doesn't allocate additional memory! The following example is a more pythonesque way of doing mergesort, where the merge-part creates a list dynamically, on the fly.

def mergesort2b(array):
    size = len(array)
    if size < 2:
        return array
    else:
        middle = size / 2
        left = mergesort2(array[:middle])
        right = mergesort2(array[middle:])

        result = []
        i, j = 0, 0
        while i < len(left) and j < len(right):
            if left[i] <= right[j]:
                result.append(left[i])
                i += 1
            else:
                result.append(right[j])
                j += 1
        result += left[i:]
        result += right[j:]
        return result

This code has the same basic structure, but it doesn't pre-allocate an array of n items, instead it appends the results as it goes along, thereby skipping the need for a write index. And, surprisingly, it turns out to be faster than the code above. There is a further trick that can be applied: list.pop() is typically the fastest way to get an element from the end of a list, and reduce the list size. The only problem is: we're taking items from the front of the two halfs. Well, we can change that: by first reversing the two lists to be merged, we can then compare and pop items from the end of those two lists, and finally reverse them back for the rest. This is how the code looks like, and it was the fastest one I could come up with in Python:

def mergesort2(array):
    size = len(array)
    if size < 2:
        return array

    middle = size / 2
    left = mergesort2(array[:middle])
    right = mergesort2(array[middle:])

    # reverse both lists, so that we can take items from the back of the list
    left.reverse()
    right.reverse()

    result = []

    while left and right:

        # note: no read indices, and we're popping items from the back of the list
        if left[-1] <= right[-1]:
            result.append(left.pop())
        else:
            result.append(right.pop())

    # reverse lists back, so that they can be added to the result list
    left.reverse()
    right.reverse()
    if left:
        result.extend(left)
    if right:
        result.extend(right)
    return result

But how does it perform compared with Quicksort? Well, actually disappointingly so. For the standard test of 1 million floats, these were the results:

5.83 5.69 5.69 5.68 5.71

Method	Run 1	Run 2	Run 3	Run 4	Run 5
Quicksort Hetland
Quicksort Pyalda	6.38	6.31	6.30	6.37	6.33
Mergesort	8.10	8.08	8.08	8.09	8.01

Looking at this, you may wonder why bother with Mergesort at all? Well, do a simple example: try sorting an already sorted list of 1 million floats. You will find that while mergesort runs as expected (actually, a bit faster), both Quicksort implementations will break and throw a RuntimeError: maximum recursion depth exceeded in cmp, because their recursion depth will exceed the maximum. You cannot sort a million floats in (our recursive implementation of) quicksort, because it'll essentially require a million-deep stack, and unless you are really rich your machine won't support that.
This is what was mentioned at the top of this page: mergesort is boring, but reliable. quicksort is a genius, but sometimes it'll disappoint you badly.

And now, for a real surprise, turn to Heapsort next...