To wrap up, we will talk about a few mechanisms for testing programs, which is, of course, a crucial part of software development in the large. We will use our RedBlackTree.elm
module from before (the original version with membership and insertion but not deletion) as a case study.
elm-test
is a library for writing tests that is equipped with support for fuzzing, namely, the use of randomness to help generate large numbers of relevant test cases. (Note: We will be using version 3.1.0 of elm-test
rather than 4.0.0.). The Test
module provides an API for defining tests, which comprise Expectation
s (defined in Expect
) about the results of computations.
We will use the html-test-runner
package by Richard Feldman for running tests as a browser application; there are also ways to run tests as a standalone command-line application. Our test runner is defined in TestRedBlackTree.elm
. To try it out:
% git clone https://github.com/rtfeldman/html-test-runner.git
% cd html-test-runnner
% git checkout aabeebe4ff266c94700e475151a1e14c6609578f
% mv /PATH/TO/TestRedBlackTree.elm example/
% cd example
% elm-reactor
Recall that we defined Boolean predicates to identify valid red-black trees:
rb t = bso t && noRedRed t && okBlackHeight t
Our approach for testing our red-black tree implementation — insert
in particular — will be to insert a bunch of elements, perform an in-order traversal of the resulting tree, and compare the resulting list of elements to the sorted list that we expect; this will check the binary search order property. We will also test that the noRedRed
and okBlackHeight
properties hold.
We start by defining functions to convert between a list of elements and a tree:
toList : Tree a -> List a
toList t = case t of
E -> []
T _ left x right -> toList left ++ [x] ++ toList right
fromList : List comparable -> Tree comparable
fromList xs = List.foldl insert empty xs
Then we define a testing function:
testSort : List comparable -> Test
testSort xs =
test ("sort: " ++ toString xs) <| \() ->
let tree = fromList xs in
let result = toList tree in
let expected = List.sort xs in
let errorMessage =
String.join "\n\n" <|
[ "toList <| " ++ toString tree
, " == " ++ toString result
, " /= " ++ toString expected
]
in
result
|> Expect.equal expected
|> Expect.onFail errorMessage
We define a single test case in main
:
main : Test.Runner.Html.TestProgram
main =
[ testSort (List.range 1 100) ]
|> concat
|> Test.Runner.Html.run
We can open the resulting application in elm-reactor
to see whether the test case passes or fails.
Now let's extend the function to also test the noRedRed
and okBlackHeight
properties:
testAllProperties : List comparable -> Test
testAllProperties xs =
let tree = fromList xs in
let testSortedElements =
test "bso" <| \() ->
let result = toList tree in
let expected = List.sort xs in
let errorMessage =
String.join "\n\n" <|
[ "toList <| " ++ toString tree
, " == " ++ toString result
, " /= " ++ toString expected
]
in
result
|> Expect.equal expected
|> Expect.onFail errorMessage
in
let testHeight =
test "h <= 2*bh + 1" <| \() ->
case blackHeight tree of
Nothing -> Expect.pass
Just bh ->
let h = height tree in
let errorMessage =
String.join "\n\n" <|
[ toString tree, " h = " ++ toString h, " bh = " ++ toString bh ]
in
h |> Expect.atMost (2 * bh + 1)
|> Expect.onFail errorMessage
in
let testNoRedRed =
let errorMessage = toString tree in
test "noRedRed" <| \() ->
noRedRed tree
|> Expect.equal True
|> Expect.onFail errorMessage
in
describe ("test all properties of tree with: " ++ toString xs) <|
[ testSortedElements
, testHeight
, testNoRedRed
]
Notice how
describe : String -> List Test -> Test
turns a list of Test
s into a single Test
.
main : Test.Runner.Html.TestProgram
main =
[ testSort (List.range 1 100)
, testAllProperties (List.range 1 100)
]
|> concat
|> Test.Runner.Html.run
To automatically generate many more test cases, we use
fuzzWith
: FuzzOptions
-> Fuzzer a
-> String
-> (a -> Expectation)
-> Test
type alias FuzzOptions = { runs : Int }
The Fuzz
library provides many useful primitives for generating values of different types, such as:
string : Fuzzer String
int : Fuzzer Int
intRange : Int -> Int -> Fuzzer Int
list : Fuzzer a -> Fuzzer (List a)
maybe : Fuzzer a -> Fuzzer (Maybe a)
We will use Fuzz.list Fuzz.int
and Fuzz.list (Fuzz.intRange -10000 10000)
to generate random lists of integers. Then we need a (List Int -> Expectation)
function. Our testAllProperties
function, however, returned a Test
. We can rework that function to return the underlying Expectation
s instead (and also use our new sortAndRemoveDupes
):
insertThenCheckTree xs =
let tree = fromList xs in
let checkSortedElements =
let result = toList tree in
let expected = List.sort xs in
let errorMessage =
String.join "\n\n" <|
[ "Checking that all elements are found and in sorted order"
, "toList <| " ++ toString tree
, " == " ++ toString result
, " /= " ++ toString expected
]
in
result
|> Expect.equal expected
|> Expect.onFail errorMessage
in
let checkHeight =
case blackHeight tree of
Nothing -> Expect.pass
Just bh ->
let h = height tree in
let errorMessage =
String.join "\n\n" <|
[ "Checking that h <= 2*bh + 1"
, toString tree
, " h = " ++ toString h
, " bh = " ++ toString bh
]
in
h |> Expect.atMost (2 * bh + 1)
|> Expect.onFail errorMessage
in
let checkNoRedRed =
let errorMessage =
String.join "\n\n" <|
[ "Checking no red-red violations", toString tree ]
in
noRedRed tree
|> Expect.equal True
|> Expect.onFail errorMessage
in
Expect.all
(List.map always [ checkSortedElements
, checkHeight
, checkNoRedRed
]) ()
Now:
main : Test.Runner.Html.TestProgram
main =
[ fuzzWith { runs = 1000 } (list int)
"1,000 randomly generated trees"
insertThenCheckTree
, fuzzWith { runs = 1000 } (list (intRange -10000 10000))
"1,000 more randomly generated trees"
insertThenCheckTree
]
|> concat
|> Test.Runner.Html.run
If we run this, we see that we get a counterexample. Ah, our implementation disallowed duplicate elements. So, rather than simply using List.sort
, we need to also remove duplicates.
sortAndRemoveDupes : List comparable -> List comparable
sortAndRemoveDupes =
let removeDupes xs =
case xs of
[] -> []
[x] -> [x]
x::y::rest -> if x == y
then removeDupes (y::rest)
else x :: removeDupes (y::rest)
in
List.sort >> removeDupes
insertThenCheckTree xs =
...
let expected = sortAndRemoveDupes xs in
...
Great, we can now quickly verify that, for a large number of examples, our resulting red-black trees are well-formed.
Exercise: Write some tests for other data structures we have implemented.
So far, we have tested the external interface for insert
on a set of examples. There are two remaining questions we might ask.
First: What about additional internal invariants for the insert
, ins
, and balance
functions? In particular, there are temporary, single red-red violations that get threaded through, and resolved by, these functions. To test these, we could write a simple Fuzzer Tree
and then filter the ones that don't satisfy the pre-conditions for a function. But this is unlikely to be very effective, because most randomly generated Tree
s will not satisfy the invariants. Instead, we could write a smarter Fuzzer Tree
that actually uses the internal invariants to generate Tree
s. You may wish to try this out as an exercise.
Second: What about inputs that are not exercised by the test cases? We could keep testing (forever) or augment our methodology with one of the following two approaches.
We can turn every run-time observation (i.e. function call) into a test. This will impose significant performance overhead, but it may be worth it during developing and debugging, and perhaps even in a deployment setting.
The idea is to augment input types with pre-conditions that the function assumes of its arguments (it is the responsibility of callers to ensure them) and augment output types with post-conditions that the function guarantees will hold (callers of the function get to assume them).
We can systematically decorate all function types with pre- and post-conditions.
foo
: S -- pre-condition: foo_pre x == True, where foo_pre : S -> Bool
-> T -- post-condition: foo_post e == True, where foo_post : T -> Bool
foo x =
e
Better yet, post-conditions can depend on input values:
foo
: S -- pre-condition: foo_pre x == True, where foo_pre : S -> Bool
-> T -- post-condition: foo_post x e == True, where foo_post : S -> T -> Bool
foo x =
e
Pre- and post-conditions are then checked upon every function call:
foo x =
if foo_pre x == False
then violation "foo" "argument" x
else
let ret = e in
if foo_post x ret == False
then violation "foo" "return value" (x, ret)
else ret
violation func name values =
Debug.crash <|
String.join "\n"
[ ""
, func
, "assertion violation for: " ++ name
, toString values
]
This approach to run-time checking of invariants is often referred to as contract checking.
Writing all of our functions in this style is tedious. We can define a helper function to help:
monitor : String -> (a -> Bool) -> (a -> b -> Bool) -> (a -> b) -> (a -> b)
monitor f pre post foo =
\x ->
if pre x == False then violation f "1st arg" x
else
let ret = foo x in
if post x ret == False
then violation f "return value" (x, ret)
else ret
And then:
foo = monitor "foo" foo_pre foo_post <| \x ->
e
We can also define another helper function...
maybeMonitor flag f pre post foo =
if flag then monitor f pre post foo else foo
... to make it quick to toggle contract checking on and off:
foo = maybeMonitor True "foo" foo_pre foo_post <| \x ->
e
What about "multiple argument" (i.e. curried) functions? Contracts.elm
defines a few helpers to avoid writing deeply nested calls to the general monitor
function. Another option, if applicable, is to write functions in an uncurried style so that "all" arguments are packed into a single tuple.
One thing to note about our definitions of contracts is that they don't play nicely with elm-test
because they Debug.crash
(exceptions should be handled in a newer version of elm-test
). But at least we can check the console to see the first test that crashed, if any.
Analogs to rb
and noRedRed
to describe temporary red-red violations:
rbExceptMaybeRoot t = bso t && maybeOneRedRed t && okBlackHeight t
maybeOneRedRed t = oneRedRed t || noRedRed t
oneRedRed t = case t of
E -> False
T R (T R _ _ _) _ (T R _ _ _) -> False
T R (T R l1 _ r1) _ r -> noRedRed l1 && noRedRed r1 && noRedRed r
T R l _ (T R l2 _ r2) -> noRedRed l && noRedRed l2 && noRedRed r2
T _ l _ r -> False
The input and output trees for insert
should be valid red-black trees:
insert : comparable -> Tree comparable -> Tree comparable
insert =
maybeMonitor.two "insert"
(\x -> True)
(\_ t -> rb t)
(\_ _ ret -> rb ret) <|
\x t ->
...
Although not necessary for specifying and checking well-formedness of red-black trees, we can, furthermore, state an invariant relating the black heights of the trees:
...
(\_ t ret -> rb ret && (bh ret == bh t || bh ret == 1 + bh t)) <|
...
In contrast to insert
, the output of ins
is almost a red-black tree; there may be a single red-red violation at the root. We can also specify an invariant about its black height.
ins : comparable -> Tree comparable -> Tree comparable
ins =
maybeMonitor.two "ins"
(\x -> True)
(\_ t -> rb t)
(\_ t ret -> rbExceptMaybeRoot ret && bh t == bh ret) <|
\x t ->
...
One possible contract for the balance
function says that the left and right subtrees of the input tree and the entire output tree may have a red-red violation at their roots:
balance : Color -> Tree comparable -> comparable -> Tree comparable -> Tree comparable
balance =
maybeMonitor.four "balance"
(\c -> True)
(\c l -> rbExceptMaybeRoot l)
(\c l val -> True)
(\c l val r -> rbExceptMaybeRoot r)
(\c l val r ret -> rbExceptMaybeRoot ret) <|
\c l val r ->
...
We can state stronger invariants: that the function is called with at most one red-red violation among the left and right subtrees, and that the function produces either an rb
or rbExceptMaybeRoot
depending on the arguments:
balance =
maybeMonitor.four "balance"
(\c -> True)
(\c l -> rbExceptMaybeRoot l)
(\c l val -> True)
(\c l val r -> if oneRedRed l then rb r else rbExceptMaybeRoot r)
(\c l val r ret -> if oneRedRed (T c l val r)
then rbExceptMaybeRoot ret
else rb ret) <|
\c l val r ->
...
We can run our random test cases (much more slowly now) and verify that all of these invariants are satisfied by the observed test cases.
Programming with contracts in this style is tedious, however, and the error messages are really primitive when an assertion is violated. Many languages — Racket, C#, Eiffel, and others — provide automated support for contracts.
The syntax varies, but the following is one reasonable option, where the base types of the language are refined with Bool
ean-valued predicates (written in the same programming language):
foo : x:{ S | foo_pre x == True } -> ret:{ T | foo_post ret == True }
foo x =
e
Using this syntax that streamlines base types and fine-grained predicates, we might rewrite our assertions as:
insert
: x: comparable
-> t: { Tree comparable | rb t }
-> ret: { Tree comparable | rb ret && (bh ret == bh t || bh ret == 1 + bh t) }
insert x t =
...
ins : x: comparable
-> t: { Tree comparable | rb t }
-> ret: { Tree comparable | rbExceptMaybeRoot ret && bh t == bh ret }
ins x t =
...
balance
: c: Color
-> l: { Tree comparable | rbExceptMaybeRoot l }
-> val: comparable
-> r: { Tree comparable | if oneRedRed l then rb r else rbExceptMaybeRoot r }
-> ret: { Tree comparable | if oneRedRed (T c l val r)
then rbExceptMaybeRoot ret
else rb ret }
balance c l val r ret =
...
In addition to systematically translating functions and function calls like we did manually, and tracking better stack traces for good error messages, contract systems must perform more sophisticated program transformations when dealing with additional language features — such as mutable references, objects, and exceptions — and also to reduce the performance overhead imposed by pervasive dynamic checking.
Exercise: Define contract predicates for the different external and internal functions of other data structures we have implemented. Then create and run tests using contracts to develop confidence that the internal invariants that we intend are indeed correct.
Tests are nice because they build confidence about correctness before deployment without slowing down our programs. Contracts are nice because they are integrated within the programs themselves, so that the testing is always on, but they impose significant performance penalties. When possible, it is desirable to statically verify that the pre- and post-conditions are true for all inputs.
A variety of advanced type systems permit static verification of the kinds of data structure properties we have considered, well beyond just the relatively simple type and data structure properties tracked by a system like Elm, or OCaml, or Haskell. (These languages have incredibly useful type systems, but they are "simple" relative to the kinds of precise predicates we want to describe.) No matter how fancy the type or verification system, however, there will always be properties that are beyond the reach of static reasoning. So, we always need to test, but the question is where to draw the boundary.
There is a long history of progress on static software verification; recent efforts that bring these techniques to bear in the context of functional programming languages include: