Testing and Beyond

To wrap up, we will talk about a few mechanisms for testing programs, which is, of course, a crucial part of software development in the large. We will use our RedBlackTree.elm module from before (the original version with membership and insertion but not deletion) as a case study.

Random Testing

elm-test is a library for writing tests that is equipped with support for fuzzing, namely, the use of randomness to help generate large numbers of relevant test cases. (Note: We will be using version 3.1.0 of elm-test rather than 4.0.0.). The Test module provides an API for defining tests, which comprise Expectations (defined in Expect) about the results of computations.

We will use the html-test-runner package by Richard Feldman for running tests as a browser application; there are also ways to run tests as a standalone command-line application. Our test runner is defined in TestRedBlackTree.elm. To try it out:

% git clone https://github.com/rtfeldman/html-test-runner.git
% cd html-test-runnner
% git checkout aabeebe4ff266c94700e475151a1e14c6609578f
% mv /PATH/TO/TestRedBlackTree.elm example/
% cd example
% elm-reactor

External Invariants for Red-Black Tree Insertion

Recall that we defined Boolean predicates to identify valid red-black trees:

rb t = bso t && noRedRed t && okBlackHeight t

Our approach for testing our red-black tree implementation — insert in particular — will be to insert a bunch of elements, perform an in-order traversal of the resulting tree, and compare the resulting list of elements to the sorted list that we expect; this will check the binary search order property. We will also test that the noRedRed and okBlackHeight properties hold.

We start by defining functions to convert between a list of elements and a tree:

toList : Tree a -> List a
toList t = case t of
  E                -> []
  T _ left x right -> toList left ++ [x] ++ toList right

fromList : List comparable -> Tree comparable
fromList xs = List.foldl insert empty xs

Then we define a testing function:

testSort : List comparable -> Test
testSort xs =
  test ("sort: " ++ toString xs) <| \() ->
    let tree = fromList xs in
    let result = toList tree in
    let expected = List.sort xs in
    let errorMessage = 
      String.join "\n\n" <|
        [ "toList <| " ++ toString tree
        , " == " ++ toString result
        , " /= " ++ toString expected
        ]
    in
    result
      |> Expect.equal expected
      |> Expect.onFail errorMessage

We define a single test case in main:

main : Test.Runner.Html.TestProgram
main =
  [ testSort (List.range 1 100) ]
  |> concat
  |> Test.Runner.Html.run

We can open the resulting application in elm-reactor to see whether the test case passes or fails.

Now let's extend the function to also test the noRedRed and okBlackHeight properties:

testAllProperties : List comparable -> Test
testAllProperties xs =
  let tree = fromList xs in
  let testSortedElements =
    test "bso" <| \() ->
      let result = toList tree in
      let expected = List.sort xs in
      let errorMessage = 
        String.join "\n\n" <|
          [ "toList <| " ++ toString tree
          , " == " ++ toString result
          , " /= " ++ toString expected
          ]
      in
      result
        |> Expect.equal expected
        |> Expect.onFail errorMessage
  in
  let testHeight =
    test "h <= 2*bh + 1" <| \() ->
      case blackHeight tree of
        Nothing -> Expect.pass
        Just bh ->
          let h = height tree in
          let errorMessage = 
            String.join "\n\n" <|
              [ toString tree, " h = " ++ toString h, " bh = " ++ toString bh ]
          in
          h |> Expect.atMost (2 * bh + 1)
            |> Expect.onFail errorMessage
  in
  let testNoRedRed =
    let errorMessage = toString tree in
    test "noRedRed" <| \() ->
      noRedRed tree
        |> Expect.equal True
        |> Expect.onFail errorMessage
  in
  describe ("test all properties of tree with: " ++ toString xs) <|
    [ testSortedElements
    , testHeight
    , testNoRedRed
    ]

Notice how

describe : String -> List Test -> Test

turns a list of Tests into a single Test.

main : Test.Runner.Html.TestProgram
main =
  [ testSort (List.range 1 100)
  , testAllProperties (List.range 1 100)
  ]
  |> concat
  |> Test.Runner.Html.run

To automatically generate many more test cases, we use

fuzzWith
    :  FuzzOptions
    -> Fuzzer a
    -> String
    -> (a -> Expectation)
    -> Test

type alias FuzzOptions = { runs : Int }

The Fuzz library provides many useful primitives for generating values of different types, such as:

string : Fuzzer String
int : Fuzzer Int
intRange : Int -> Int -> Fuzzer Int
list : Fuzzer a -> Fuzzer (List a)
maybe : Fuzzer a -> Fuzzer (Maybe a)

We will use Fuzz.list Fuzz.int and Fuzz.list (Fuzz.intRange -10000 10000) to generate random lists of integers. Then we need a (List Int -> Expectation) function. Our testAllProperties function, however, returned a Test. We can rework that function to return the underlying Expectations instead (and also use our new sortAndRemoveDupes):

insertThenCheckTree xs =
  let tree = fromList xs in
  let checkSortedElements =
    let result = toList tree in
    let expected = List.sort xs in
    let errorMessage = 
      String.join "\n\n" <|
        [ "Checking that all elements are found and in sorted order"
        , "toList <| " ++ toString tree
        , " == " ++ toString result
        , " /= " ++ toString expected
        ]
    in
    result
      |> Expect.equal expected
      |> Expect.onFail errorMessage
  in
  let checkHeight =
    case blackHeight tree of
      Nothing -> Expect.pass
      Just bh ->
        let h = height tree in
        let errorMessage = 
          String.join "\n\n" <|
            [ "Checking that h <= 2*bh + 1"
            , toString tree
            , " h = " ++ toString h
            , " bh = " ++ toString bh
            ]
        in
        h |> Expect.atMost (2 * bh + 1)
          |> Expect.onFail errorMessage
  in
  let checkNoRedRed =
    let errorMessage =
      String.join "\n\n" <|
        [ "Checking no red-red violations", toString tree ]
    in
    noRedRed tree
      |> Expect.equal True
      |> Expect.onFail errorMessage
  in
  Expect.all
    (List.map always [ checkSortedElements
                     , checkHeight
                     , checkNoRedRed
                     ]) ()

Now:

main : Test.Runner.Html.TestProgram
main =
  [ fuzzWith { runs = 1000 } (list int)
      "1,000 randomly generated trees"
      insertThenCheckTree
  , fuzzWith { runs = 1000 } (list (intRange -10000 10000))
      "1,000 more randomly generated trees"
      insertThenCheckTree
  ]
  |> concat
  |> Test.Runner.Html.run

If we run this, we see that we get a counterexample. Ah, our implementation disallowed duplicate elements. So, rather than simply using List.sort, we need to also remove duplicates.

sortAndRemoveDupes : List comparable -> List comparable
sortAndRemoveDupes =
  let removeDupes xs =
    case xs of
      []         -> []
      [x]        -> [x]
      x::y::rest -> if x == y
                    then removeDupes (y::rest)
                    else x :: removeDupes (y::rest)
  in
  List.sort >> removeDupes

insertThenCheckTree xs =
  ...
    let expected = sortAndRemoveDupes xs in
  ...

Great, we can now quickly verify that, for a large number of examples, our resulting red-black trees are well-formed.

Exercise: Write some tests for other data structures we have implemented.

So far, we have tested the external interface for insert on a set of examples. There are two remaining questions we might ask.

First: What about additional internal invariants for the insert, ins, and balance functions? In particular, there are temporary, single red-red violations that get threaded through, and resolved by, these functions. To test these, we could write a simple Fuzzer Tree and then filter the ones that don't satisfy the pre-conditions for a function. But this is unlikely to be very effective, because most randomly generated Trees will not satisfy the invariants. Instead, we could write a smarter Fuzzer Tree that actually uses the internal invariants to generate Trees. You may wish to try this out as an exercise.

Second: What about inputs that are not exercised by the test cases? We could keep testing (forever) or augment our methodology with one of the following two approaches.

Contracts (Run-Time Testing)

We can turn every run-time observation (i.e. function call) into a test. This will impose significant performance overhead, but it may be worth it during developing and debugging, and perhaps even in a deployment setting.

The idea is to augment input types with pre-conditions that the function assumes of its arguments (it is the responsibility of callers to ensure them) and augment output types with post-conditions that the function guarantees will hold (callers of the function get to assume them).

We can systematically decorate all function types with pre- and post-conditions.

foo
  : S  -- pre-condition:  foo_pre x == True,  where foo_pre : S -> Bool
 -> T  -- post-condition: foo_post e == True, where foo_post : T -> Bool
foo x =
  e

Better yet, post-conditions can depend on input values:

foo
  : S  -- pre-condition:  foo_pre x == True,    where foo_pre : S -> Bool
 -> T  -- post-condition: foo_post x e == True, where foo_post : S -> T -> Bool
foo x =
  e

Pre- and post-conditions are then checked upon every function call:

foo x =
  if foo_pre x == False
  then violation "foo" "argument" x
  else
    let ret = e in
    if foo_post x ret == False
    then violation "foo" "return value" (x, ret)
    else ret

violation func name values =
  Debug.crash <|
    String.join "\n"
      [ ""
      , func
      , "assertion violation for: " ++ name
      , toString values
      ]

This approach to run-time checking of invariants is often referred to as contract checking.

Writing all of our functions in this style is tedious. We can define a helper function to help:

monitor : String -> (a -> Bool) -> (a -> b -> Bool) -> (a -> b) -> (a -> b)
monitor f pre post foo =
  \x ->
    if pre x == False then violation f "1st arg" x
    else
      let ret = foo x in
      if post x ret == False
      then violation f "return value" (x, ret)
      else ret

And then:

foo = monitor "foo" foo_pre foo_post <| \x ->
  e

We can also define another helper function...

maybeMonitor flag f pre post foo =
  if flag then monitor f pre post foo else foo

... to make it quick to toggle contract checking on and off:

foo = maybeMonitor True "foo" foo_pre foo_post <| \x ->
  e

What about "multiple argument" (i.e. curried) functions? Contracts.elm defines a few helpers to avoid writing deeply nested calls to the general monitor function. Another option, if applicable, is to write functions in an uncurried style so that "all" arguments are packed into a single tuple.

One thing to note about our definitions of contracts is that they don't play nicely with elm-test because they Debug.crash (exceptions should be handled in a newer version of elm-test). But at least we can check the console to see the first test that crashed, if any.

Internal Invariants for Red-Black Tree Insertion

RedBlackTreeContracts.elm

Analogs to rb and noRedRed to describe temporary red-red violations:

rbExceptMaybeRoot t = bso t && maybeOneRedRed t && okBlackHeight t

maybeOneRedRed t = oneRedRed t || noRedRed t

oneRedRed t = case t of
  E                             -> False
  T R (T R _ _ _) _ (T R _ _ _) -> False
  T R (T R l1 _ r1) _ r         -> noRedRed l1 && noRedRed r1 && noRedRed r
  T R l _ (T R l2 _ r2)         -> noRedRed l && noRedRed l2 && noRedRed r2
  T _ l _ r                     -> False

The input and output trees for insert should be valid red-black trees:

insert : comparable -> Tree comparable -> Tree comparable
insert =
  maybeMonitor.two "insert"
      (\x -> True)
      (\_ t -> rb t)
      (\_ _ ret -> rb ret) <|
  \x t ->
    ...

Although not necessary for specifying and checking well-formedness of red-black trees, we can, furthermore, state an invariant relating the black heights of the trees:

      ...
      (\_ t ret -> rb ret && (bh ret == bh t || bh ret == 1 + bh t)) <|
      ...

In contrast to insert, the output of ins is almost a red-black tree; there may be a single red-red violation at the root. We can also specify an invariant about its black height.

ins : comparable -> Tree comparable -> Tree comparable
ins =
  maybeMonitor.two "ins"
      (\x -> True)
      (\_ t -> rb t)
      (\_ t ret -> rbExceptMaybeRoot ret && bh t == bh ret) <|
  \x t ->
    ...

One possible contract for the balance function says that the left and right subtrees of the input tree and the entire output tree may have a red-red violation at their roots:

balance : Color -> Tree comparable -> comparable -> Tree comparable -> Tree comparable
balance =
  maybeMonitor.four "balance"
      (\c -> True)
      (\c l -> rbExceptMaybeRoot l)
      (\c l val -> True)
      (\c l val r -> rbExceptMaybeRoot r)
      (\c l val r ret -> rbExceptMaybeRoot ret) <|
  \c l val r ->
    ...

We can state stronger invariants: that the function is called with at most one red-red violation among the left and right subtrees, and that the function produces either an rb or rbExceptMaybeRoot depending on the arguments:

balance =
  maybeMonitor.four "balance"
      (\c -> True)
      (\c l -> rbExceptMaybeRoot l)
      (\c l val -> True)
      (\c l val r -> if oneRedRed l then rb r else rbExceptMaybeRoot r)
      (\c l val r ret -> if oneRedRed (T c l val r)
                         then rbExceptMaybeRoot ret
                         else rb ret) <|
  \c l val r ->
    ...

We can run our random test cases (much more slowly now) and verify that all of these invariants are satisfied by the observed test cases.

Programming with contracts in this style is tedious, however, and the error messages are really primitive when an assertion is violated. Many languages — Racket, C#, Eiffel, and others — provide automated support for contracts.

The syntax varies, but the following is one reasonable option, where the base types of the language are refined with Boolean-valued predicates (written in the same programming language):

foo : x:{ S | foo_pre x == True } -> ret:{ T | foo_post ret == True }
foo x =
  e

Using this syntax that streamlines base types and fine-grained predicates, we might rewrite our assertions as:

insert
    : x: comparable
   -> t: { Tree comparable | rb t }
   -> ret: { Tree comparable | rb ret && (bh ret == bh t || bh ret == 1 + bh t) }
insert x t =
  ...

ins : x: comparable
   -> t: { Tree comparable | rb t }
   -> ret: { Tree comparable | rbExceptMaybeRoot ret && bh t == bh ret }
ins x t =
  ...

balance
    : c: Color
   -> l: { Tree comparable | rbExceptMaybeRoot l }
   -> val: comparable
   -> r: { Tree comparable | if oneRedRed l then rb r else rbExceptMaybeRoot r }
   -> ret: { Tree comparable | if oneRedRed (T c l val r)
                               then rbExceptMaybeRoot ret
                               else rb ret }
balance c l val r ret =
  ...

In addition to systematically translating functions and function calls like we did manually, and tracking better stack traces for good error messages, contract systems must perform more sophisticated program transformations when dealing with additional language features — such as mutable references, objects, and exceptions — and also to reduce the performance overhead imposed by pervasive dynamic checking.

Exercise: Define contract predicates for the different external and internal functions of other data structures we have implemented. Then create and run tests using contracts to develop confidence that the internal invariants that we intend are indeed correct.

Static Verification

Tests are nice because they build confidence about correctness before deployment without slowing down our programs. Contracts are nice because they are integrated within the programs themselves, so that the testing is always on, but they impose significant performance penalties. When possible, it is desirable to statically verify that the pre- and post-conditions are true for all inputs.

A variety of advanced type systems permit static verification of the kinds of data structure properties we have considered, well beyond just the relatively simple type and data structure properties tracked by a system like Elm, or OCaml, or Haskell. (These languages have incredibly useful type systems, but they are "simple" relative to the kinds of precise predicates we want to describe.) No matter how fancy the type or verification system, however, there will always be properties that are beyond the reach of static reasoning. So, we always need to test, but the question is where to draw the boundary.

There is a long history of progress on static software verification; recent efforts that bring these techniques to bear in the context of functional programming languages include: