Understanding how to create a heap in Python

this is a slightly modified version of the code found here : http://code.activestate.com/recipes/577086-heap-sort/

def HeapSort(A,T):
    def heapify(A):
        start = (len(A) - 2) / 2
        while start >= 0:
            siftDown(A, start, len(A) - 1)
            start -= 1

    def siftDown(A, start, end):
        root = start
        while root * 2 + 1 <= end:
            child = root * 2 + 1
            if child + 1 <= end and T.count(A[child]) < T.count(A[child + 1]):
                child += 1
            if child <= end and T.count(A[root]) < T.count(A[child]):
                A[root], A[child] = A[child], A[root]
                root = child
            else:
                return

    heapify(A)
    end = len(A) - 1
    while end > 0:
        A[end], A[0] = A[0], A[end]
        siftDown(A, 0, end - 1)
        end -= 1


if __name__ == '__main__':
    text = "the quick brown fox jumped over the the quick brown quick log log"
    heap = list(set(text.split()))
    print heap

    HeapSort(heap,text)
    print heap

Output

['brown', 'log', 'jumped', 'over', 'fox', 'quick', 'the']
['jumped', 'fox', 'over', 'brown', 'log', 'the', 'quick']

you can visualize the program here http://goo.gl/2a9Bh


In Python 2.X and 3.x, heaps are supported through an importable library, heapq. It supplies numerous functions to work with the heap data structure modelled in a Python list. Example:

>>> from heapq import heappush, heappop
>>> heap = []
>>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
>>> for item in data:
        heappush(heap, item)

>>> ordered = []
>>> while heap:
        ordered.append(heappop(heap))

>>> ordered
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> data.sort()
>>> data == ordered
True

You can find out more about Heap functions: heappush, heappop, heappushpop, heapify, heapreplace in heap python docs.


Here's another variation based on Sedgewick

The heap is represented internally in an array where if a node is at k, it's children are at 2*k and 2*k + 1. The first element of the array is not used, to make the math more convenient.

To add a new element to the heap, you append it to the end of the array and then call swim repeatedly until the new element finds its place in the heap.

To delete the root, you swap it with the last element in the array, delete it and then call sink until the swapped element finds its place.

swim(k):
  while k > 1 and less(k/2, k):
    exch(k, k/2)
    k = k/2

sink(k):
  while 2*k <= N:
    j = 2*k
    if j < N and less(j, j+1):
      j++
    if not less(k, j):
      break
    exch(k, j)
    k = j

Here's a visualization of heap insert, inserting the first 15 letters of the alphabet: [a-o]

heap insert visualization


Your confusion may come from the fact that the Python module heapq does not define a heap as a data type (a class) with its own methods (e.g. as in a deque or a list). It instead provides functions that you can run on a Python list.

It's best to think of heapq as a module providing a set of algorithms (methods) to interpret lists as heaps and manipulate them accordingly. Note that it's common to represent heaps internally as arrays (as an abstract data structure), and Python already has lists serving that purpose, so it makes sense for heapq to just provide methods to manipulate lists as heaps.

Let's see this with an example. Starting with a simple Python list:

>>> my_list = [2, -1, 4, 10, 0, -20]

To create a heap with heapq from my_list we just need to call heapify which simply re-arranges the elements of the list to form a min-heap:

>>> import heapq
>>> # NOTE: This returns NoneType:
>>> heapq.heapify(my_list)

Note that you can still access the list underlying the heap, since all heapify has done is change the value referenced by my_list:

>>> my_list
[-20, -1, 2, 10, 0, 4]

Popping elements from the heap held by my_list:

>>> [heapq.heappop(my_list) for x in range(len(my_list))]
[-20, -1, 0, 2, 4, 10]

Tags:

Python

Heap