We’ve seen how lists are built with [] and
(:), and deconstructing with patterns using []
and (:). Now we’ll see a general way for users to define
new types that have one or more kinds of values.
As an example, currency converter. For simplicity, only two kinds of currency.
data Currency
= USD Float -- US Dollars
| JPY Float -- Japanese Yen
deriving (Eq, Show)
The data keyword defines a new algebraic data type
(ADT), or simply datatype, in this case called
Currency. Values of type Currency are either
the data constructor called USD and a single
Float or the data constructor JPY and a single
Float. The smallest denomination of US currency, one cent,
is one-hundredth of a dollar; the smallest denomination of Japanese
currency is one yen. We choose to track JPY with
Floats to provide finer-grained precision; these will be
rounded to Integer values later.
The deriving (Eq, Show) clause instructs Haskell to
automatically implement a function called
show :: Currency -> String which is described by the
Show type class, and an equality function
(==) :: Currency -> Currency -> Bool which described
by the Eq type class. We’ll talk more about these later.
Try the interactions below with and without the deriving
clause.
> USD 11.99
> JPY 1353
> USD 11.99 == USD 11.99
> USD 11.99 == JPY 1353
To extract data wrapped by data constructors, we use patterns involving the data constructors and variables. For example, the following function allows adding two values in either currency.
yenPerDollar = 112.86 -- as of 9/27/17
dollarsPerYen = 1 / yenPerDollar
add :: Currency -> Currency -> Currency
add (USD d1) (USD d2) = USD (d1 + d2)
add (JPY y1) (JPY y2) = JPY (y1 + y2)
add (USD d) (JPY y) = USD (d + y * dollarsPerYen)
add (JPY y) (USD d) = JPY (y + d * yenPerDollar)
Note that any amount has two valid representations; the last two equations choose to retain the base currency of the first argument.
In general, a datatype definition contains one or more data
constructors C, each beginning with
a capital letter (e.g. A, B, C,
D below). The name of the new type T (e.g. T below) must also
start with a capital letter.
data T
= A
| B Int
| C Bool String
| D (Char, String)
deriving (Eq, Show)
Each data constructor wraps zero or more values:
A does not carry any additional data,B carries a single Int,C carries one Bool and one
String, andD carries one pair containing a Char and a
String.Data constructors are functions that, when applied to arguments of
the specified types, produce values of the datatype (in this case,
T).
> :t A
> :t B
> :t C
> :t D
Three kinds of patterns p can be used to match and destruct values of a datatype:
_;
andFor example:
tToInt :: T -> Int
tToInt A = 0
tToInt (B i) = 1
tToInt (C b s) = 2
tToInt (D (c, s)) = 3
At run-time, a variable pattern x (of some type S) successfully matches any value v (of type S); v is then bound to x in the right-hand side of the equation.
The wildcard pattern _, like a variable, matches any value but does not introduce a name for it.
A data constructor pattern C p_1 …
p_n (of some type S) matches only those values v (of
type S) that are constructed with C and whose n values match the
patterns p_1 through p_n, respectively. If C carries no data, then there are no
subsequent patterns (i.e. n= 0).
Even though a datatype constructor may take multiple arguments
(e.g. C in our example) and, thus, may be partially applied
like other functions, all data constructors must be “fully applied” to
enough patterns according to its definition.
The definition of tToInt above defines multiple
equations — one for each data constructor — to handle all possible
argument values of type T. Another way to destruct values
of type T is to use a case
expression.
tToInt :: T -> Int
tToInt t =
case t of
A -> 0
B i -> 1
C b s -> 2
D (c, s) -> 3
Each branch of case expression must return the same type
of value (in this case, Int). The equations in the original
version of tToInt above are syntactic sugar for this
version using case.
A data definition can refer to refer to one or more
(lowercase) type variables (a, b, etc.
below).
data T a b c ... = ...
The type variables can be referred to in the types of the data constructors.
As an example to motivate polymorphic ADTs, recall our
head and tail functions from before.
head :: [a] -> a
head (x:xs) = x
tail :: [a] -> [a]
tail (x:xs) = xs
By omitting equations for the empty list, these functions implicitly fail: they crash with inexhaustive pattern match errors at run-time.
Versions with more explicit failure:
head :: [a] -> a
head [] = undefined
head (x:xs) = x
tail :: [a] -> [a]
tail [] = undefined
tail (x:xs) = xs
> :t undefined
Whoa! undefined has every type? This is only
okay because undefined throws an exception at run-time,
halting program execution at that point.
This isn’t much better than before. Using undefined is,
however, a cool trick in the type system that allows exceptions to be
raised anywhere in the code, while still accurately typechecking other
expressions. Using undefined is often helpful during
program development, allowing you to put “placeholders” for expressions
that you haven’t finished yet while still allowing the rest of the
program to type check and run.
Now, a better way than crashing to represent failure.
data Maybe a
= Nothing
| Just a
deriving (Eq, Show)
Notice that Maybe is polymorphic; it takes one type
variable a. A value of type Maybe a is either
Nothing with no additional data, or Just with
a single value of type a.
> :t Just True
> :t Just 1
> :t Just (+)
> :t Nothing
> :t Nothing :: Maybe Bool
> :t Nothing :: Maybe (Maybe (Maybe Bool))
We can use Maybe values to deal with “errors” much more
effectively.
maybeHead :: [a] -> Maybe a
maybeHead [] = Nothing
maybeHead (x:xs) = Just x
maybeTail :: [a] -> Maybe [a]
maybeTail [] = Nothing
maybeTail (x:xs) = Just xs
This allows (requires) callers of these functions to handle each of the two cases explicitly.
The Maybe type is extremely useful and is defined in the
standard Prelude. Its cousin Either is also
quite useful.
Datatypes can also be recursive.
data List a
= Nil
| Cons a (List a)
deriving (Eq, Show)
What does this type remind you of?
The built-in lists are just like the List type above,
but with special syntax:
[] and (:), rather than names starting
with capital latters, are data constructors,[1,2,3,4] is syntactic sugar for nested
calls to (:),[], rather than a name starting with
a capital letter, and[a] is syntactic sugar for
[] a.For example:
> :t []
> :t [] :: [a]
> :t [] :: [] a
> :t "abc"
> :t "abc" :: [Char]
> :t "abc" :: [] Char
> let (x:xs) = [1, 2, 3]
> let ((:) x xs) = [1, 2, 3]
data Pair a b = Pair a b deriving (Eq, Show)
data Triple a b c = Triple a b c deriving (Eq, Show)
data Tuple4 a b c d = Tuple4 a b c d deriving (Eq, Show)
Built-in pairs are just like these datatypes, again with special syntactic support.
> :t ('a', 2 :: Int)
> :t (,) 'a' 2 :: (,) Char Int
> :t (,)
> :t (,,)
> :t (,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,)
> let (a, b, c) = (1, 2, 3)
> let ((,,) a b c) = (1, 2, 3)
Sometimes it is useful to have a dummy value.
data Unit = Unit deriving (Eq, Show)
The built-in unit value and type are just like this datatype, again with special syntactic support.
> :t ()
> :t () :: ()
Even booleans can be (and are) defined as a datatype.
data Bool
= False
| True
deriving (Eq, Show)
If-expressions, multi-way if-expressions, and guards are syntactic
sugar for pattern matching on Bools.
absoluteValue n
| n >= 0 = n
| otherwise = -n
-- source file {-# LANGUAGE MultiWayIf #-}
-- ghc command-line option -XMultiWayIf
-- ghci :set -XMultiWayIf
absoluteValue n =
if | n >= 0 -> n
| otherwise -> -n
absoluteValue n =
if n >= 0 then n
else if otherwise then -n
else undefined
absoluteValue n =
case n >= 0 of
True -> n
False ->
case otherwise of
True -> -n
False -> undefined
Recall the currency example above. We were careful to use
ds for dollar Floats and ys for
yen Floats, but it’s easy to accidentally, for example, add
these two kinds of Floats, leading to a non-sensical, and
probably hard-to-detect-and-debug, result.
One approach might be to define type aliases.
type Dollars = Float
type Yen = Float
But remember, type aliases do not create new types, just synonyms.
So, Dollars and Yen, which are synonymous with
Float and with each other, can still be misused.
Another option is to define wrapper (or “box”) types with single data
constructors, for the purpose of forcing the programmer to construct and
destruct the different Float types, with the usual support
from the typechecker to prevent errors.
data Currency
= USD Dollars -- US Dollars
| JPY Yen -- Japanese Yen
deriving (Eq, Show)
data Dollars = DollarsFloat Float deriving (Eq, Show)
data Yen = YenFloat Float deriving (Eq, Show)
Note, it is common to see definitions like the following:
data Dollars = Dollars Float deriving (Eq, Show)
data Yen = Yen Float deriving (Eq, Show)
Notice how the type and data constructor names are the same; this is not a problem, because types and expressions “live” in different places in the language syntax, so there’s never any ambiguity.
Now, we redefine add as follows, in terms of several
helper functions that significantly reduce the chances that we mistreat
the actual Float values.
add :: Currency -> Currency -> Currency
add (USD d1) (USD d2) = USD (addDollars d1 d2)
add (JPY y1) (JPY y2) = JPY (addYen y1 y2)
add (USD d) (JPY y) = USD (addDollars d (convertYenToDollars y))
add (JPY y) (USD d) = JPY (addYen y (convertDollarsToYen d))
addDollars :: Dollars -> Dollars -> Dollars
addDollars (Dollars d1) (Dollars d2) = Dollars (d1 + d2)
addYen :: Yen -> Yen -> Yen
addYen (Yen y1) (Yen y2) = Yen (y1 + y2)
convertYenToDollars :: Yen -> Dollars
convertYenToDollars (Yen y) = Dollars (y * dollarsPerYen)
convertDollarsToYen :: Dollars -> Yen
convertDollarsToYen (Dollars d) = Yen (d * yenPerDollar)
When data constructors carry multiple values, it can be hard to remember which components are meant to represent what (especially when multiple components have the same types). Furthermore, it can be tedious to write accessor functions that retrieve particular components carried by a data constructor.
Haskell offers an embellishment of data constructors that addresses
these two issues. Consider the following definition where the data
constructors A and B use record
syntax to give names to its data values.
data T'
= A'
| B' Int
| C' { foo :: Bool, baz :: String }
| D' { bar :: Char, baz :: String }
deriving (Show)
All data constructors can be used to construct values as before.
> :t A'
> :t B'
> :t C'
> :t D'
But now, record construction can also be used to create
A and B values.
> C' { foo = True, baz = "hello" }
> D' { bar = 'a', baz = "hello" }
> D' { baz = "hello", bar = 'a' }
In addition, accessor functions have been automatically generated.
> :t foo
> :t bar
> :t baz
Furthermore, there is a way to create a new record by copying all of its fields except for some.
> let c' = C' { foo = True, baz = "hello" }
> c' { baz = "goodbye" }
When multiple data constructors, update-copy expressions may crash:
> c' { bar = 'b' }
Should the following be an acceptable data
definition?
data Data = One { data :: Int } | Two { data :: Bool }
NamedFieldPuns
and RecordWildCards
can be handy when programming with record syntax. And punny code can
spark joy.
We will start seeing the keyword newtype popping up in
code examples later on. This is a detail we don’t need to worry about
for now.
Data constructors tag, or label, the values they carry in order to distinguish them from values created with different data constructors for the same type. As we have seen, it is sometimes useful to define a new datatype even with only one data constructor.
In such cases, tagging and untagging (or constructing and destructing, or boxing and unboxing) values is useful for enforcing invariants while programming, but these operations add unnecessary run-time overhead: there is only one kind of value, so they ought to be free of labels at run-time.
Haskell allows datatypes with exactly one constructor (such as
Dollars and Yen), and which carries exactly
one value, to be defined with the keyword newtype in place
of data, such as
newtype Box a = Box a
For the purposes of programming, newtype is almost
exactly the same as data. But it tells the compiler to
optimize the generated code by not including explicit Box
labels at run-time. We will get into the habit of using
newtype whenever we define a datatype with one constructor,
without delving into the subtle differences between
using newtype and data.
Furthermore, we will often write expressions of the form
Box . doSomething . unbox
to unwrap (unbox), transform (doSomething),
and rewrap (Box) values. So, we will use records so that
unwrapping functions are generated automatically.
newtype Box a = Box { unbox :: a }