Historical Note

This page was migrated from the original p-nand-q.com site which was last updated in 2015. The content has been preserved exactly as it was, with only formatting updated for modern browsers. Over the coming days and weeks, the content will be reviewed and may be updated for accuracy and relevance. If you find any issues, please contact me.

Heaps

A heap is a common data structure that can be used to maintain a priority queue, implement a sorting mechanism, and generally keep track of extremal (minimal or maximal) values in an efficient manner. Note that the Python Standard Library contains a heap implementation in heapq, but we're going to write our own first and then take a peak inside the reference implementation.

The Wikipedia article on Heaps contains pointers to a couple of specialized heaps with additional features, but we're going to start off easy with a very simple one: a min heap.

A simple Min Heap implementation

We're going to start with a min heap, and then check how we can generalize it to a MaxHeap. We need a capacity for the heap, and since each level is going to have 2**n items maximum, it makes sense to start off with a capacity that is a power of 2.

class MinHeap(object):
    def __init__(self, capacity = 2 ** 20):
        # the capacity should be a power of two - since all levels in a heap
        # are densely populated, and why would you want to waste space?
        # the default is 1MB of arbitrary data
        self.data = [None] * capacity
        self.used = 0

Inserting an item is a two-step process:

class MinHeap(object):
    ...
    def Insert(self, value):
        assert self.used < len(self.data)

        self.data[self.used] = value
        self.used += 1
        self.PushUp(self.used-1)

That was the easy part. Now for the fun part: the item is at the end of the heap. we need to push it up to a position, where the heap property is satisfied.

class MinHeap(object):
    ...
    def PushUp(self, item_position):
        assert item_position >= 0
        assert item_position < self.used

        # the value to be pushed up the tree
        item_value = self.data[item_position]

        # when does the loop end? when the item has reached 
        # the top position - there is no place higher than the top!
        while item_position > 0:

            # get index and value of parent
            parent_position = GetIndexOfParent(item_position)
            parent_value = self.data[parent_position]

            # if the parent is less than this value, we need to push ourselves up
            if parent_value > item_value:

                self.data[parent_position] = item_value
                self.data[item_position] = parent_value
                item_position = parent_position

            else:
                # the parent is larger, so we are in a good position
                break

        return item_position

This code is missing an important function: GetIndexOfParent. Now, in the literature (e.g. Cormen et al) you'll find the following definitions:

This won't work with Python, because Python arrays are zero-based. So i is really (i+1), and from the result we need to subtract 1 to get to the actual position. Sounds complicated? Let's try to insert it:

Therefor we have:

def GetIndexOfParent(n):
    return (n-1)/2

def GetIndexOfLeftChild(n):
    return (2*n)+1

def GetIndexOfRightChild(n):
    return (2*n)+2

Getting the minimum is trivial:

class MinHeap(object):
    ...
    def GetMin(self):
        assert self.used > 0
        return self.data[0]

Extracting the minimum is another important operation: we special case for a heap that has only one element (so there is no need to invoke Heapify()).

class MinHeap(object):
    ...
    def ExtractMin(self):
        assert self.used > 0
        result = self.data[0]

        if self.used > 1:
        
            # move item from the end of the heap to the top, and heapify it (move it down
            # until the heap property is satisified again)
            self.data[0] = self.data[self.used-1]
            self.used -= 1
            self.Heapify()

        else:
            self.used = 0

        return result

    def Heapify(self):

        # start at the top
        index_of_this_node = 0
        value_of_this_node = self.data[index_of_this_node]

        # loop end criteria: we've reached the last most item in the heap
        while index_of_this_node < self.used:

            # get left child (or None)
            index_of_left_node = GetIndexOfLeftChild(index_of_this_node)
            value_of_left_node = value_of_this_node
            if index_of_left_node < self.used:
                value_of_left_node = self.data[index_of_left_node]

            # get right child (or None)
            index_of_right_node = GetIndexOfRightChild(index_of_this_node)
            value_of_right_node = value_of_this_node
            if index_of_right_node < self.used:
                value_of_right_node = self.data[index_of_right_node]

            # find index of the node that is bigger than ours
            index_of_max_node = index_of_left_node
            value_of_max_node = value_of_left_node
            if value_of_right_node < value_of_max_node:
                value_of_max_node = value_of_right_node
                index_of_max_node = index_of_right_node

            # if there is a match, swap the nodes
            if value_of_max_node < value_of_this_node:
                self.data[index_of_this_node] = value_of_max_node
                self.data[index_of_max_node] = value_of_this_node
                index_of_this_node = index_of_max_node

            else:
                break;

Let's add some more helper functions:

class MinHeap(object):
    ...
    def IsEmpty(self):
        return self.used == 0

    def GetCount(self):
        return self.used
        
    def GetDepth(self):
        if self.used < 2:
            return self.used

        level = 1
        pos = self.used -1

        while pos > 0:
            pos = GetIndexOfParent(pos)
            level += 1

        return level

Pretty-Printing a Heap

One of the more interesting functions to write for a heap class is a repr() method to print the heap as a string. We're going to go for a simple plaintext function. For example, a heap with 15 items could look like this:

3            
     15          9      
  17    15    12    34  
44 22 40 32 46 23 47

The code for this has several steps:

class MinHeap(object):
    ...
    def __repr__(self):
        # special case: empty heap
        if self.used == 0:
            return "<MinHeap: None>"

        max_cell_width = 1

        # We expect 2i items on level i
        expected_size_of_this_level = 1
        levels = []
        reprs_on_this_level = None

        # We enumerate all items
        for i in range(self.used):
            if reprs_on_this_level is None:
                reprs_on_this_level = []
                levels.append(reprs_on_this_level)

            # Convert the items on this level to strings and store them in a list
            k = repr(self.data[i])
            reprs_on_this_level.append(k)

            # As we go along, keep track of the maximum string size needed to represent a single element.
            if len(k) > max_cell_width:
                max_cell_width = len(k)

            if len(reprs_on_this_level) == expected_size_of_this_level:

                # We expect 2**i items on level i
                expected_size_of_this_level *= 2
                reprs_on_this_level = None

        max_cell_width += 1

        # Center a single word around the expected cell size for this level, adding spaces if necessary
        def center(word, max_cell_width):
            result = [' '] * max_cell_width

            word = list(word)
            insert_pos = (max_cell_width - len(word))/2
            for c in word:
                result[insert_pos] = c
                insert_pos += 1
            return "".join(result)

        # We enumerate the list of string lists in reverse order
        for i in range(len(levels)-1,-1,-1):

            level = levels[i]
            for k in range(len(level)):
                level[k] = center(level[k], max_cell_width)

            # Convert the list of cells to a single string
            levels[i] = "".join(levels[i])

            # Adjust the expected cell size: cell size grows by 2**n
            max_cell_width *= 2

        # Finally, enumerate all strings again from the top down: adding newlines to combine all the lines
        return "\n".join(levels)

Comparing our code with the reference implementation

As mentioned in the beginning, the python standard library contains a reference implementation of the heap data structure. You can read the source on the web at http://svn.python.org/projects/python/trunk/Lib/heapq.py. A couple of comments on that code: