# Heaps

A heap is a common data structure that can be used to maintain a priority queue, implement a sorting mechanism, and generally keep track of extremal (minimal or maximal) values in an efficient manner. Note that the Python Standard Library contains a heap implementation in heapq, but we're going to write our own first and then take a peak inside the reference implementation.

The Wikipedia article on Heaps contains pointers to a couple of specialized heaps with additional features, but we're going to start off easy with a very simple one: a min heap.

## A simple Min Heap implementation

We're going to start with a min heap, and then check how we can generalize it to a MaxHeap. We need a capacity for the heap, and since each level is going to have 2**n items maximum, it makes sense to start off with a capacity that is a power of 2.

```class MinHeap(object):
def __init__(self, capacity = 2 ** 20):
# the capacity should be a power of two - since all levels in a heap
# are densely populated, and why would you want to waste space?
# the default is 1MB of arbitrary data
self.data = [None] * capacity
self.used = 0
```

Inserting an item is a two-step process:

• Insert the item at the end of the heap
• Push the item up until we've reached a proper insert position
```class MinHeap(object):
...
def Insert(self, value):
assert self.used < len(self.data)

self.data[self.used] = value
self.used += 1
self.PushUp(self.used-1)
```

That was the easy part. Now for the fun part: the item is at the end of the heap. we need to push it up to a position, where the heap property is satisfied.

```class MinHeap(object):
...
def PushUp(self, item_position):
assert item_position >= 0
assert item_position < self.used

# the value to be pushed up the tree
item_value = self.data[item_position]

# when does the loop end? when the item has reached
# the top position - there is no place higher than the top!
while item_position > 0:

# get index and value of parent
parent_position = GetIndexOfParent(item_position)
parent_value = self.data[parent_position]

# if the parent is less than this value, we need to push ourselves up
if parent_value > item_value:

self.data[parent_position] = item_value
self.data[item_position] = parent_value
item_position = parent_position

else:
# the parent is larger, so we are in a good position
break

return item_position
```

This code is missing an important function: `GetIndexOfParent`. Now, in the literature (e.g. Cormen et al) you'll find the following definitions:

• `PARENT(i) = i/2`
• ` LEFT(i) = 2i`
• ` RIGHT(i) = 2i+1`

This won't work with Python, because Python arrays are zero-based. So `i` is really `(i+1)`, and from the result we need to subtract 1 to get to the actual position. Sounds complicated? Let's try to insert it:

• `PARENT(i) = PARENT(n+1) - 1 = (n+1)/2 - 1 = (n+1)/2 - (2/2) = (n+1-2)/2 = (n-1)/2`
• ` LEFT(i) = LEFT(n+1) - 1 = 2*(n+1) - 1 = 2*n + 2- 1 = 2*n + 1`
• ` RIGHT(i) = RIGHT(n+1) - 1 = 2*(n+1) - 1 + 1 = 2 * n + 2`

Therefor we have:

```def GetIndexOfParent(n):
return (n-1)/2

def GetIndexOfLeftChild(n):
return (2*n)+1

def GetIndexOfRightChild(n):
return (2*n)+2
```

Getting the minimum is trivial:

```class MinHeap(object):
...
def GetMin(self):
assert self.used > 0
return self.data
```

Extracting the minimum is another important operation: we special case for a heap that has only one element (so there is no need to invoke Heapify()).

```class MinHeap(object):
...
def ExtractMin(self):
assert self.used > 0
result = self.data

if self.used > 1:

# move item from the end of the heap to the top, and heapify it (move it down
# until the heap property is satisified again)
self.data = self.data[self.used-1]
self.used -= 1
self.Heapify()

else:
self.used = 0

return result

def Heapify(self):

# start at the top
index_of_this_node = 0
value_of_this_node = self.data[index_of_this_node]

# loop end criteria: we've reached the last most item in the heap
while index_of_this_node < self.used:

# get left child (or None)
index_of_left_node = GetIndexOfLeftChild(index_of_this_node)
value_of_left_node = value_of_this_node
if index_of_left_node < self.used:
value_of_left_node = self.data[index_of_left_node]

# get right child (or None)
index_of_right_node = GetIndexOfRightChild(index_of_this_node)
value_of_right_node = value_of_this_node
if index_of_right_node < self.used:
value_of_right_node = self.data[index_of_right_node]

# find index of the node that is bigger than ours
index_of_max_node = index_of_left_node
value_of_max_node = value_of_left_node
if value_of_right_node < value_of_max_node:
value_of_max_node = value_of_right_node
index_of_max_node = index_of_right_node

# if there is a match, swap the nodes
if value_of_max_node < value_of_this_node:
self.data[index_of_this_node] = value_of_max_node
self.data[index_of_max_node] = value_of_this_node
index_of_this_node = index_of_max_node

else:
break;
```

Let's add some more helper functions:

```class MinHeap(object):
...
def IsEmpty(self):
return self.used == 0

def GetCount(self):
return self.used

def GetDepth(self):
if self.used < 2:
return self.used

level = 1
pos = self.used -1

while pos > 0:
pos = GetIndexOfParent(pos)
level += 1

return level
```

## Pretty-Printing a Heap

One of the more interesting functions to write for a heap class is a `repr()` method to print the heap as a string. We're going to go for a simple plaintext function. For example, a heap with 15 items could look like this:

```           3
15          9
17    15    12    34
44 22 40 32 46 23 47
```

The code for this has several steps:

• We enumerate all items
• Convert the items on this level to strings and store them in a list
• As we go along, keep track of the maximum string size needed to represent a single element.
• We expect 2i items on level i
• A new level starts when the items on the current level have reached the expected size.
• We enumerate the list of string lists in reverse order
• For each item in the current level:
• Center it around the expected cell size for this level, adding spaces if necessary
• Convert the list of cells to a single string
• Adjust the expected cell size: cell size grows by 2n
• Finally, enumerate all strings again from the top down: adding newlines to combine all the lines
```class MinHeap(object):
...
def __repr__(self):
# special case: empty heap
if self.used == 0:
return "<MinHeap: None>"

max_cell_width = 1

# We expect 2<sup>i</sup> items on level i
expected_size_of_this_level = 1
levels = []
reprs_on_this_level = None

# We enumerate all items
for i in range(self.used):
if reprs_on_this_level is None:
reprs_on_this_level = []
levels.append(reprs_on_this_level)

# Convert the items on this level to strings and store them in a list
k = repr(self.data[i])
reprs_on_this_level.append(k)

# As we go along, keep track of the maximum string size needed to represent a single element.
if len(k) > max_cell_width:
max_cell_width = len(k)

if len(reprs_on_this_level) == expected_size_of_this_level:

# We expect 2**i items on level i
expected_size_of_this_level *= 2
reprs_on_this_level = None

max_cell_width += 1

# Center a single word around the expected cell size for this level, adding spaces if necessary
def center(word, max_cell_width):
result = [' '] * max_cell_width

word = list(word)
insert_pos = (max_cell_width - len(word))/2
for c in word:
result[insert_pos] = c
insert_pos += 1
return "".join(result)

# We enumerate the list of string lists in reverse order
for i in range(len(levels)-1,-1,-1):

level = levels[i]
for k in range(len(level)):
level[k] = center(level[k], max_cell_width)

# Convert the list of cells to a single string
levels[i] = "".join(levels[i])

# Adjust the expected cell size: cell size grows by 2**n
max_cell_width *= 2

# Finally, enumerate all strings again from the top down: adding newlines to combine all the lines
return "\n".join(levels)
```

## Comparing our code with the reference implementation

As mentioned in the beginning, the python standard library contains a reference implementation of the heap data structure. You can read the source on the web at http://svn.python.org/projects/python/trunk/Lib/heapq.py. A couple of comments on that code:

• The code starts out with a very interesting and thorough discussion of heaps - much better than I can explain it ;)
• The code doesn't define a class - it imposes the heap structure on items in an existing list. It is vaguely interesting but ultimately pointless to discuss which approach is better for reuse of code...
• `heappush` is essentially the same as the `Insert()` method above: insert at the end, then move the item to the correct insert position
• `heappop` also has the same essential structure, but
• `_siftup` (the equivalent of `Heapify()` above) is much more dense (and probably considerably faster). But note the extensive comment describing the operation