Binomial Heaps

For simplicity, we'll continue to work with heaps of Ints. Our implementation (BinomialHeaps.elm) exports the same type signatures as the previous implementations of min-heaps.

Binomial Trees: Representation and Invariants

type alias Rank = Int
type Tree = Node Rank Int (List Tree)

rank (Node r _ _) = r
root (Node _ x _) = x

Binomial trees of rank r are defined inductively as follows.

Node 0 n [] is a binomial tree of rank 0.
Node r n ts is a binomial tree of rank r > 0 if ts is a list of r binomial trees with rank r-1 through 0, respectively.

A binomial tree of rank r has 2^r nodes.

Linking

A binomial tree of rank r + 1 is formed by linking together two trees of rank r, making one the leftmost child of the other.

The link function below links two binomial trees, choosing to keep the smaller of the two elements at the root. Therefore, if t1 and t2 are both heap-ordered, then so is the result.

link : Tree -> Tree -> Tree
link t1 t2 =
  let (Node r x1 c1) = t1
      (Node _ x2 c2) = t2
  in
    if x1 <= x2
      then Node (1+r) x1 (t2::c1)
      else Node (1+r) x2 (t1::c2)

Binomial Heaps: Representation and Invariants

type alias InternalHeap = List Tree
type Heap = WrapHeap InternalHeap

A binomial heap is list of heap-ordered binomial trees, kept in strictly-increasing order of rank. A binomial heap containing n elements is represented using at most O(log n) binomial trees, analogous to the binary representation of n.

Heap Interface

empty : Heap
empty = WrapHeap []

isEmpty : Heap -> Bool
isEmpty h = h == empty

The findMin function searches for the smallest root among all of the binomial trees, taking O(log n) time.

findMin (WrapHeap ts) =
  case List.map root ts of
    []    -> Nothing
    n::ns -> Just (List.foldl min n ns)

See Homework 3 for a way to implement findMin so that it runs in O(1) time, as for other heap implementations.

Insertion

Inserting into a binomial heap requires pairwise linking of Trees with equal rank. Think "sum" and "carry" bits as in arithmetic addition. This is analogous to arithmetic addition.

insert : Int -> Heap -> Heap
insert x (WrapHeap ts) = WrapHeap (insertTree (Node 0 x []) ts)

insertTree : Tree -> InternalHeap -> InternalHeap
insertTree t ts = case ts of
  []      -> [t]
  t'::ts' ->
    if rank t == rank t' then insertTree (link t t') ts'
    else if rank t < rank t' then t :: ts
    else Debug.crash "insertTree: impossible"

There are O(m) recursive calls to insertTree (where m is the length of ts), each of which performs O(1) work (to link two trees). Thus, insert runs in O(m) = O(log n) time.

Merging

merge : Heap -> Heap -> Heap
merge (WrapHeap ts1) (WrapHeap ts2) = WrapHeap (merge_ ts1 ts2)

merge_ : InternalHeap -> InternalHeap -> InternalHeap
merge_ ts1 ts2 = case (ts1, ts2) of
  ([], _) -> ts2
  (_, []) -> ts1
  (t1::ts1', t2::ts2') ->
    if rank t1 < rank t2 then t1 :: merge_ ts1' ts2
    else if rank t2 < rank t1 then t2 :: merge_ ts2' ts1
    else insertTree (link t1 t2) (merge_ ts1' ts2')

To analyze the running time of merge_, let m be the total number of trees in both ts1 and ts2. The first two cases run in O(1) time. Each recursive call to merge_ decreases m by one (in the third and fourth cases) or two (in the fifth case). The cons operations in the third and fourth cases require O(1) work. In the fifth case, link requires O(1) time, insertTree requires O(m) time, and the recursive call to merge_ requires T(m-2) time. There are O(m) recursive calls, each of which requires at most O(m) time. The result is a O(m²) running time, which is O(log²(n)) where n is the total number of elements described in the two heaps being merged.

A more subtle analysis, however, can be used to argue that the implementation of merge_ runs in O(log n) time. The argument requires a more careful accounting of how many times link is called (which is the crux of both insertTree and merge_) based on the analogy between merging lists and adding two numbers in binary representation.

For our purposes, we will consider an alternative, one-pass definition (also drawn from this post) that is slightly easier to analyze. (UPDATE 2/9: New r == r1 == r2 case for merge_wc.)

merge' : InternalHeap -> InternalHeap -> InternalHeap
merge' ts1 ts2 = case (ts1, ts2) of
  ([], _) -> ts2
  (_, []) -> ts1
  (t1::ts1', t2::ts2') ->
    if rank t1 < rank t2 then t1 :: merge' ts1' ts2
    else if rank t2 < rank t1 then t2 :: merge' ts1 ts2'
    else merge_wc (link t1 t2) ts1' ts2'

merge_wc : Tree -> InternalHeap -> InternalHeap -> InternalHeap
merge_wc t ts1 ts2 = case (ts1, ts2) of
  ([], _) -> insertTree t ts2
  (_, []) -> insertTree t ts1
  (t1::ts1', t2::ts2') ->
    let (r,r1,r2) = (rank t, rank t1, rank t2) in
    if      r <  r1 && r <  r2 then t :: merge' ts1 ts2
    else if r <  r1 && r == r2 then merge_wc (link t t2) ts1 ts2'
    else if r == r1 && r <  r2 then merge_wc (link t t1) ts1' ts2
    else if r == r1 && r == r2 then t :: merge_wc (link t1 t2) ts1' ts2'
    else Debug.crash "merge_wc: impossible"

Let T(m) and S(m) be the running times of merge' and merge_wc, respectively, where m is an upper bound on the number of trees in both input lists combined. Consider each of the five cases of merge':

Cases 1 and 2: T(m) = O(1)
Cases 3 and 4: T(m) = O(1) + T(m-1)
Case 5: T(m) = O(1) + S(m-2)

Consider each of the six cases of merge_wc:

Cases 1 and 2: S(m) = O(m)
Case 3: S(m) = O(1) + T(m)
Cases 4, 5, and 6: S(m) = O(1) + S(m-1)

There are at most O(m) mutually recursive calls between the two functions. The last call to merge_wc may take O(m) time, but all other calls take O(1) time. Thus, the worst-case running time for each of these two functions is O(m)+O(m) = O(m) time. That is, O(log n) time.

Deletion

removeMinTree : InternalHeap -> (Tree, InternalHeap)
removeMinTree ts = case ts of
  [t]    -> (t, [])
  t::ts' ->
    let (t',ts'') = removeMinTree ts' in
    if root t < root t'
      then (t, ts')
      else (t', t::ts'')

deleteMin : Heap -> Maybe Heap
deleteMin (WrapHeap ts) = case ts of
  [] -> Nothing
  _  -> let (Node _ x ts1, ts2) = removeMinTree ts in
        Just (WrapHeap (merge_ (List.reverse ts1) ts2))

Finding (Revisited)

We can reuse the removeMinTree helper function to reimplement findMin.

findMin (WrapHeap ts) =
  case ts of
    [] -> Nothing
    _  -> Just (root (fst (removeMinTree ts)))

Pre- and Post-Conditions

Let's revisit the merge function above and think a bit more carefully about why the output heap satisfies the intended invariants.

First, assume we have implemented a couple of predicates that identify valid binomial trees and binomial heaps, respectively:

binTree : Tree -> Bool
binHeap : Internal Heap -> Bool

Next, we can define the following function to compute the rank of the first binomial tree in a binomial heap, if any:

rankFst : InternalHeap -> Int
rankFst ts =
  case ts of
    []   -> Random.maxInt   -- pretend this is "infinity"
    t::_ -> rank t

Now let's reason about the functions on which merge depends, namely, link and insertTree.

In comments, we'll write some pre-conditions on arguments that we expect all callers to satisfy and some post-conditions on the return value that our implementation intends to satisfy.

   link :     Tree  ->     Tree  ->      Tree
-- link : (t1:Tree) -> (t2:Tree) -> (ret:Tree)
--
--   @pre: binTree t1 && binTree t2 && rank t1 == rank t2
--  @post: binTree ret && rank ret == 1 + rank t1

A first attempt for insertTree:

   insertTree :    Tree ->      InternalHeap ->       InternalHeap
-- insertTree : (t:Tree) -> (ts:InternalHeap) -> (ret:InternalHeap)
--
--   @pre: binTree t && binHeap ts && rank t <= rankFst ts
--  @post: binHeap ret

And a first attempt for merge_:

   merge_ : InternalHeap -> InternalHeap -> InternalHeap
-- merge_ : (ts1:InternalHeap) -> (ts2:InternalHeap) -> (ret:InternalHeap)
--
--   @pre: binHeap ts1 && binHeap ts2
--  @post: binHeap ret

In the case where t1 is cons-ed onto the recursive call to merge_, the binHeap ret predicate follows from (i) the post-condition established by the recursive call, (ii) the conditional check, and (iii) from the relationship between t1 and ts1' established by the assumption binHeap ts1. The case where t2 is cons-ed onto the recursive call is similar.

Now consider the case when insertTree is called with the return value from the recursive call to merge_. The post-condition of merge_ above is not sufficient to satisfy the pre-conditions required by insertTree. So, we need to establish a stronger post-condition:

   merge_ : InternalHeap -> InternalHeap -> InternalHeap
-- merge_ : (ts1:InternalHeap) -> (ts2:InternalHeap) -> (ret:InternalHeap)
--
--   @pre: binHeap ts1 && binHeap ts2
--  @post: binHeap ret &&
--         rankFst ret >= min (rankFst ts1) (rankFst ts2)    (added)

This will satisfy the pre-conditions on insertTree, but now the post-condition of insertTree is not sufficient to satisfy the new, stronger post-condition of merge_. So, we need to strengthen the post-condition of insertTree:

   insertTree :    Tree ->      InternalHeap ->       InternalHeap
-- insertTree : (t:Tree) -> (ts:InternalHeap) -> (ret:InternalHeap)
--
--   @pre: binTree t && binHeap ts && rank t <= rankFst ts
--  @post: binHeap ret &&
--         rankFst ret >= rank t                (added)

This is the kind of informal reasoning we need to do on paper to argue why merge does indeed produce a valid binomial heap.

To gain more confidence, we could add defensive, dynamic checks at run-time to make sure that we haven't missed something. Or we could use powerful theorem provers to make certain that we've written the code correctly. Both of these are topics for another day.

Reading

Required

Okasaki, Chapter 3.2

cs223