Red-Black Tree Delete

Deleting an element from a red-black tree is considerably harder than inserting one. Matt Might presents a deletion algorithm that extends the temporary-invariant-violation plus bubble-and-rotate approach for insertion. Today, we'll reformulate his approach using slightly different terminology and drawing conventions to match the ones we used last time. The code below is found in RedBlackDelete.elm.

There will be new kinds of temporary invariant violations, tracked using three new kinds of nodes:

  • the black empty tree BE which counts as one black node,
  • the double black node T BB left x right which counts as two black nodes, and
  • the double red node T RR left x right which counts as negative one black node.

We extend our Tree type as follows (differences in terminology and weights compared to the original are noted in parentheses):

type Tree  = E    --  0  "empty"        ("black leaf"        +1)
           | BE   -- +1  "black empty"  ("double black leaf" +2)

           | T Color Tree Int Tree

type Color = BB   -- +2  "double black" ("double black")
           | B    -- +1
           | R    --  0
           | RR   -- -1  "double red"   ("negative black")

We extend our convention of drawing square boxes for Black nodes and round circles for Red nodes to the new kinds of Trees:

Balance: Redux

Before we start, let's split the original balance function into pieces, which will set us up for some changes later. The Balance alias is shorthand for the type of balance as before, and the new BalanceMaybe alias describes balancing functions that may or may not perform a transformation:

type alias Balance = Color -> Tree -> Int -> Tree -> Tree
type alias BalanceMaybe = Color -> Tree -> Int -> Tree -> Maybe Tree

The four rotation cases of balance can be factored into a one function ...

balance_B_R_R : BalanceMaybe
balance_B_R_R color l val r =
  case (color, l, val, r) of
    (B, T R (T R a x b) y c, z, d) -> Just <| T R (T B a x b) y (T B c z d)
    (B, T R a x (T R b y c), z, d) -> Just <| T R (T B a x b) y (T B c z d)
    (B, a, x, T R (T R b y c) z d) -> Just <| T R (T B a x b) y (T B c z d)
    (B, a, x, T R b y (T R c z d)) -> Just <| T R (T B a x b) y (T B c z d)
    _                              -> Nothing

... and the default case can be handled as follows:

balance : Balance
balance c l val r =
              (balance_B_R_R c l val r)
  `maybeElse` (T c l val r)

We use the helper function

maybeElse : Maybe a -> a -> a
maybeElse mx x =
  case mx of
    Just x  -> x
    Nothing -> x

to provide a "default" value in case the balance_B_R_R function does not perform a rotation. (The maybeElse function is the same as flip Maybe.withDefault.)

We will define additional re-balancing functions later. To stitch them together easily (that is, to try one after another until one succeeds), we define another helper function for "adding" Maybe values:

maybePlus : Maybe a -> Maybe a -> Maybe a
maybePlus mx my =
  case mx of
    Just x  -> mx
    Nothing -> my

Deletion Algorithm: Overview

There are three phases in the algorithm.

1. Removal

To remove an element y, we first need to traverse the Tree, making left or right turns as dictated by the search order property until we find y or reach an Empty.

Empty

If we reach Empty, there is nothing to remove.

Node with 0 Children

If we find y at a node with zero children, there are two cases.

  • This is the case where a black empty is created, to track the removal of a black node.

Node with 1 Children

Now, let's say we find y at a node with one child. The following six cases are impossible, the first two because the original tree cannot have red-red violations and the last four because the original tree would not have satisfied the black height property.

There are only two cases that can appear. In both cases, we can locally re-establish the red-black tree invariants:

Node with 2 Children

Lastly, let's say we find y at a node with two children. We can reduce this case to removing y from a node with zero or one child. See the article for an example (look for the diagram with blue and green nodes).

2. Bubbling

If the removal phase converts a black node to a black empty, the black empty must be bubbled upwards and dealt with. There are six configurations in which a black node could have been transformed to BE.

1a
  • The black empty becomes empty.
  • This may introduce a R/R violation, which can be handled with the original balance_B_R_R function.
2a
  • The black node becomes double black.
  • This may introduce a R/R violation below a BB node, which will be handled by a new balance_BB_R_R function.
3a
  • The black node becomes double black.
  • The red node becomes double red.
  • A new balance_BB_RR function will eliminate RR nodes, which always appear below BB ones.

There are three cases analogous to the above:

1b
2b
3b

Some transformations above introduce double black nodes that, like black empties, must be bubbled upwards. So, there are six more cases analogous to the ones above.

In each case above, the transformation re-colored the parent with one more unit of black and re-colored each child with one less. The same strategy will apply to the new cases.

1a' 1b'
2a' 2b'
3a' 3b'

3. Rebalancing

The last aspect of the algorithm is to implement the two new re-balancing functions to clean up the effects of bubbling.

Recall the original balance_B_R_R (below, left). The four cases for balance_BB_R_R (below, right) are similar.

Because only red nodes turn into double red ones, there are only two cases that balance_BB_RR must handle.

In both cases, the original balance_B_R_R must be called (at most once) to handle a new R/R violation that may be introduced.

Deletion Algorithm: Code

We will now implement the three steps.

3. Rebalancing

The balance_BB_R_R function is quite similar to balance_B_R_R:

balance_BB_R_R : BalanceMaybe
balance_BB_R_R color l val r =
  case (color, l, val, r) of
    (BB, T R (T R a x b) y c, z, d) -> Just <| T B (T B a x b) y (T B c z d)
    (BB, T R a x (T R b y c), z, d) -> Just <| T B (T B a x b) y (T B c z d)
    (BB, a, x, T R (T R b y c) z d) -> Just <| T B (T B a x b) y (T B c z d)
    (BB, a, x, T R b y (T R c z d)) -> Just <| T B (T B a x b) y (T B c z d)
    _                               -> Nothing

One more new balancing function:

balance_BB_RR : BalanceMaybe
balance_BB_RR color l val r =
  case (color, l, val, r) of
    (BB, T RR (T B a w b) x (T B c y d), z, e) -> Just <| T B (balance B (T R a w b) x c) y (T B d z e)
    (BB, a, w, T RR (T B b x c) y (T B d z e)) -> Just <| T B (T B a w b) x (balance B c y (T R d z e))
    _                                          -> Nothing

Finally, we stitch them all together:

balance : Balance
balance c l val r =
              (balance_B_R_R c l val r)
  `maybePlus` (balance_BB_R_R c l val r)
  `maybePlus` (balance_BB_RR c l val r)
  `maybeElse` (T c l val r)

2. Bubbling

We define two helper functions for incrementing and decrementing a node's color:

incr c =
  case c of
    BB -> Debug.crash "incr BB"
    B  -> BB
    R  -> B
    RR -> R

decr c =
  case c of
    BB -> B
    B  -> R
    R  -> RR
    RR -> Debug.crash "decr RR"

The 12 cases for our bubbling function are organized according to the three tables above. If we wanted to, we could also factor this function into smaller pieces like we did for balance.

bubble_BB t =
  case t of

    -- cases 1a, 2a, 3a
    T c1 (T c2 a x b) y BE ->
      case (c1, c2) of
        (R, B) -> balance (incr c1) (T (decr c2) a x b) y E
        (B, B) -> balance (incr c1) (T (decr c2) a x b) y E
        (B, R) -> balance (incr c1) (T (decr c2) a x b) y E
        _      -> t

    -- cases 1b, 2b, 3b
    T c1 BE y (T c3 c z d) ->
      case (c1, c3) of
        (R, B) -> balance (incr c1) E y (T (decr c3) c z d)
        (B, B) -> balance (incr c1) E y (T (decr c3) c z d)
        (B, R) -> balance (incr c1) E y (T (decr c3) c z d)
        _      -> t

    -- cases 1a', 1b', 2a', 2b', 3a', 3b'
    T c1 (T c2 a x b) y (T c3 c z d) ->
      case (c1, c2, c3) of
        (R, B, BB) -> balance (incr c1) (T (decr c2) a x b) y (T (decr c3) c z d)
        (R, BB, B) -> balance (incr c1) (T (decr c2) a x b) y (T (decr c3) c z d)
        (B, B, BB) -> balance (incr c1) (T (decr c2) a x b) y (T (decr c3) c z d)
        (B, BB, B) -> balance (incr c1) (T (decr c2) a x b) y (T (decr c3) c z d)
        (B, R, BB) -> balance (incr c1) (T (decr c2) a x b) y (T (decr c3) c z d)
        (B, BB, R) -> balance (incr c1) (T (decr c2) a x b) y (T (decr c3) c z d)
        _          -> t

    _ -> t

1. Removal

Lastly, the top-level removal procedure. As with insert, a helper function rem does the traversal, and then the last step is to color the resulting root node black.

remove : Int -> Tree -> Tree
remove x t =
  case rem x t of
    T _ l y r -> T B l y r
    _         -> Debug.crash "remove"

rem n t =
  case t of

    BE -> Debug.crash "rem"
    E -> E

    -- 0 children
    T R E x E -> if n == x then BE else t
    T B E x E -> if n == x then T BB E x E else t

    -- 1 child
    T B (T R E x E) y E -> if n == y then T B E x E else t
    T B E y (T R E z E) -> if n == y then T B E z E else t
    T _ E _ _           -> Debug.crash "rem"
    T _ _ _  E          -> Debug.crash "rem"

    -- 2 children
    T c l y r ->
      if n < y then balance c (bubble_BB (rem n l)) y r
      else if n > y then balance c l y (bubble_BB (rem n r))
      else {- n == y -} Debug.crash "TODO"

Calls to bubble_BB may return trees with red roots (via balance_B_R_R via balance), so we propagate these red-red violations upwards and call balance to fix downward violations. We leave the case for removing a node with two children as an exercise. We use Debug.crash for the cases that should never appear.


Reading

Optional