Patterns and pattern matching

Pattern matching is an elegant, useful feature of modern programming language design. Pattern matching is part of common idioms in functional programming languages including Standard ML, Haskell and their derivatives. Pattern matching is also part of the design of Typed Racket.

Programs that make use of pattern matching are cleaner, easier to read and more pleasant to work with than their pattern-free counterparts. Pattern matching is not addressed in How to Design Programs, but since it offers so many advantages, and no disadvantages, and is so well-suited to the style of programming taught in this course, we include it in our curriculum.

Typed Racket’s pattern matching offers a rich collection of syntactic forms, only some of which are included in this document. Note that you will find many other pattern matching forms in the language documentation, but, for the purposes of this course, you only need a few of the available forms.

Pattern matching basics

As conditional expressions are written with cond, pattern matching expressions are written with match. A match expression has the following general form:

(match exp_0
  [pat_1 exp_1]
  [pat_2 exp_2]
  ...
  [pat_k exp_k])

Match evaluation is similar to cond evaluation. First the expression exp_0 is evaluated to a value we’ll call v. Then v is compared against each pattern in turn, in the order given: pat_1, then pat_2, and so on, up to pat_k. If v matches pat_1, then the value of the whole expression is exp_1, and evaluation within the match goes no further. If pat_1 doesn’t match, then v is checked against pat_2, and the value is exp_2. Evaluation continues from pattern to pattern until a match is found (or none is).

Consider this match expression:

(match (+ 1 1)
  [2 "A"]
  [3 "B"])

In this case, (+ 1 1) is evaluated to 2. This is compared first against the first pattern in order, which is a constant pattern 2. This is a match, so the value of the whole expression is the string "A". With the following alteration, the value of the expression is "B":

(match (+ 1 2)
  [2 "A"]
  [3 "B"])

There are many useful kinds of patterns besides constant patterns: one such kind is variable patterns. A variable pattern consists of a variable name, and a) always succeeds and b) gives a name to the value matched. Consider this example:

(match (* 10 3)
  [n (+ n 1)])

The value of this expression is 31. Why? The expression immediately following match evaluates to 30. Then 30 is compared against the variable pattern n, which succeeds in matching (as variable patterns always do) and furthermore binds the name n to the expression’s value. When the expression (+ n 1) is evaluated, n is bound to 30, and we finally arrive at 31.

Because variable patterns always match, they can be used as catch-all or default patterns in a pattern match, playing a role not entirely unlike what else does in the last branch of a cond. For example:

(match (+ 8 9)
  [0 "zero"]
  [n "not zero"])

In this expression, the match of evaluated 17 against the constant pattern 0 fails. The match against n necessarily succeeds, and the value of the expression altogether is the string "not zero".

The variable pattern n in the last example is a suitable candidate for a wildcard pattern, written with the underscore character _. Recall that a variable pattern both matches a value and binds a name to it, but in this case (and many cases in practice) the name is not needed, since n is not subsequently used. In such cases as this, a wildcard pattern, informally known as a "don’t care" pattern, is a good choice. A wildcard pattern always succeeds in matching, like a variable pattern does, but doesn’t bind; that is, it associates no name with the matched value.

(match (+ 8 9)
  [0 "zero"]
  [_ "not zero"])

The underscore serves not only to catch any expression that falls past the constant pattern 0, but it also expresses clearly that the particular value of the matching expression is of no interest, only the fact that it did not match any previous pattern.

Matching structures and nested patterns

Matching against structures is a convenient way to, first, identify the particular struct under consideration, and also to name the individual components in the structure and thereby eliminate calls to selectors.

Define a point in 2-space as follows:

(define-struct Point
  ([x : Real]
   [y : Real]))

Now assume there exists an expression of type Point bound to a variable p, and consider the following match:

(match p
  [(Point x y) (+ x y)])

This expression performs the (admittedly contrived) operation of adding together the x and y components of the point p. The pattern (Point x y) serves several purposes: Point ensures that p is actually a Point struct, and furthermore the names x and y are bound to its two components. Syntactically, this is a real bargain, although it will take practice and experience to appreciate why. Without pattern matching, a similar expression looks like this:

(+ (Point-x p) (Point-y p))

On the whole, the latter expression is no longer (in number of characters) than the first; however, in the part of the expression where the action is — namely, where x and y are added together — the former expression is a clean (+ x y), while the latter is cluttered up with selectors (Point-x and Point-y). Pattern matching often has this clarifying effect by separating the deconstruction of a compound value from computing with its elements.

Using nested patterns with nested structures pushes the advantages of structure matching still further. Consider a line segment structure containing two points as follows:

(define-struct Seg
  ([a : Point]
   [b : Point]))

Now we consider the following (contrived) computation: to subtract the product of the segments y components from the product of its x components. We can write the expression with nested patterns like this:

(match s
  [(Seg (Point ax ay) (Point bx by)) (- (* ax bx) (* ay by))])

Compare that expression to this one that uses selectors:

(- (* (Point-x (Seg-a s)) (Point-x (Seg-b s)))
   (* (Point-y (Seg-a s)) (Point-y (Seg-b s))))

In the second expression, the selectors overwhelm the syntax and make reading and understanding the expression (comparatively) difficult; the matched alternative above (- (* ax bx) (* ay by)) is crystal clear by comparison.

Note also that this second example of pattern matching illustrates that the variable names to which values from within a structure are bound do not need to have the same names as the fields. While in the first example, we found it convenient to use x as the name for the contents of the x field of the structure, in the second example, we chose the names ax and bx.

Pattern matching in function definitions

Pattern matching is especially useful in the definition of functions. Here is a function definition using only selectors:

(: q1? : Point -> Boolean)
;; test whether the point is in the first quadrant
(define (q1? p)
  (and (positive? (Point-x p))
       (positive? (Point-y p))))

When we rewrite the function using a match, the code is not shorter, but its main expression is free of selectors and hence clearer:

(: pat-q1? : Point -> Boolean)
;; reimplementation of q1? using pattern matching
(define (pat-q1? p)
  (match p
    [(Point x y)
     (and (positive? x)
          (positive? y))]))

With nested structures, nested pattern matching cleans up and clarifies the computation to an even greater extent. The following two functions compute the same property of a line segment; compare the syntax of one to the other.

(: p1 : Seg -> Boolean)
(define (p1 s)
  (< (* (Point-x (Seg-a s)) (Point-y (Seg-b s)))
     (* (Point-x (Seg-b s)) (Point-y (Seg-a s)))))

(: p2 : Seg -> Boolean)
(define (p2 s)
  (match s
    [(Seg (Point ax ay) (Point bx by))
     (< (* ax by) (* bx ay))]))

Matches and disjoint unions

Recall the definition of the Shape type from Chapter 5.

(define-type Shape (U Circle Rect))

We can use pattern matching to write functions that work over union types like Shape. For example, the shape-color function can be written using pattern matching as

(: shape-color : Shape -> Color)
;; Get the color of a shape.
;;
(define (shape-color shape)
  (match shape
    [(Circle _ _ col) col]
    [(Rect _ _ _ col) col]))

We use wild cards for the parts of the shapes that we are not interested in.

Matching multiple expressions with `match*`

Typed Racket also provides a mechanism for matching against multiple values with multiple patterns. For example, say we wanted a function to test if two shapes are the same shape and color. We could write that as

(: same-shape-and-color? : Shape Shape -> Boolean)
;; test if two shapes are the same kind and have the same color.
;;
(define (same-shape-and-color? shape1 shape2)
  (match* (shape1 shape2)
    [((Circle _ _ col1) (Circle _ _ col2)) (symbol=? col1 col2)]
    [((Rect _ _ _ col1) (Rect _ _ _ col2)) (symbol=? col1 col2)]
    [(_ _) #f]))

Pattern-matching gotchas

You have to be careful to avoid using the identifiers true, false, and empty in patterns. These identifiers are variables (not literals), which means that they will match any value. For example, the following function will always return false.

(: flip : Boolean -> Boolean)
(define (flip b)
  (match b ([true false] [false true])))

What happens here is that the value of b is assigned to a new variable named true, rather than being matched to the value true. The expression then results in false, the expression part of the first match.

The correct way to write it is to use the #f and #t literals in the patterns:

(: flip : Boolean -> Boolean)
(define (flip b)
  (match b [#t false] [#f true]))

Another common mistake with pattern matching is assuming that a successful match induces type information the way that type filters for type tests do. For example, the following function will result in a type error

(: shape-color : Shape -> Color)
;; Get the color of a shape.
;;
(define (shape-color shape)
  (match shape
    [(Circle _ _ _) (Circle-color shape)]
    [_ (Rect-color shape)]))

Even though matching shape against the Circle pattern is only possible if shape has the Circle type, the flow-sensitivity of Typed Racket’s type system does not extend to pattern matching. Likewise, even though failing to match the Circle pattern is only possible if shape has type Rect, the Typed Racket type checker is not clever enough to realize that fact.

Last updated 2019-12-29 10:59:14 -0600

Table of Contents | Back to CS151 Home