Most mainstream functional languages employ an eager (a.k.a. strict) evaluation strategy, where an expression is evaluated entirely even if its resulting value is not needed or only parts of it are needed. As we will see, there are sometimes advantages in lazily evaluating certain expressions. There are two important aspects of lazy evaluation:
suspending (a.k.a delaying) a computation until its result is actually needed (a.k.a demanded or forced); and
memoizing (a.k.a caching) the result of a suspended computation in case its value is demanded again.
As an example to motivate and illustrate lazy evaluation, we will work through an encoding of natural numbers and some simple operations on them.
type Nat
fromInt : Int -> Nat
toInt : Nat -> Int
toString : Nat -> String
plus : Nat -> Nat -> Nat
equals : Nat -> Nat -> Bool
Nat
.elm
We start by inductively defining Nat
ural numbers to be either Z
ero or the S
uccessor of some other Nat
ural number.
type Nat = Z | S Nat
Next, let's define functions toInt
and fromInt
to convert Nat
ural numbers to and from Int
egers.
fromInt : Int -> Nat
fromInt i =
if i <= 0 then
Z
else
S (fromInt (i-1))
If we take fromInt
for a spin, we see that it busts the stack rather quickly.
> fromInt 0
Z : Nat
> fromInt 1
S Z : Nat
> fromInt 10
S (S (S (S (S (S (S (S (S (S Z))))))))) : Nat
> fromInt 10000
RangeError: Maximum call stack size exceeded
Ah, right, we should define it tail-recursively so that it runs in constant stack space.
fromInt =
let
foo acc i =
if i <= 0 then
acc
else
foo (S acc) (i-1)
in
foo Z
That should do the trick...
> fromInt 10000
RangeError: Maximum call stack size exceeded
Or not. Okay, it's time for a little more investigative journalism. We could fire up Elm Reactor to start debugging. Or we can be lazy (pun intended) and continue to poke around at the REPL.
> let _ = fromInt 10000 in ()
() : ()
That's interesting. The call to fromInt
was not the problem. So the act of printing the resulting Nat
causes the stack overflow? Let's write our own tail-recursive printing function to test this hypothesis.
toString : Nat -> String
toString n =
let
foo acc n =
case n of
Z -> acc
S n_ -> foo ("S" ++ acc) n_
in
foo "Z" n
Sure enough, that does the trick...
> fromInt 10000 |> toString
" ... SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSZ" : String
> fromInt 10000000 |> toString
" ... SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSZ" : String
... until we run out of heap space, that is.
> fromInt 100000000 |> toString
FATAL ERROR: JS Allocation failed - process out of memory
Okay, now that we have sorted out our call stack and heap space concerns, let's return to the task of programming with Nat
s. First, a bit of refactoring:
foldl : (b -> b) -> b -> Nat -> b
foldl f acc n =
case n of
Z -> acc
S n_ -> foldl f (f acc) n_
toString =
foldl ((++) "S") "Z"
And a function to convert Nat
s to Int
s:
toInt : Nat -> Int
toInt =
foldl ((+) 1) 0
We can add two Nat
s together by peeling S
uccessor labels off of y
one at a time and wrapping them around x
.
plus : Nat -> Nat -> Nat
plus x y =
let foo acc n =
case n of
Z -> acc
S n_ -> foo (S acc) n_
in
foo x y
Or:
plus x y = foldl S y x
Or:
plus x y = foldl S x y
Or:
plus = foldl S
This plus
function encodes the usual notion of addition for our Nat
type.
> plus (fromInt 0) (fromInt 0) |> toString
"Z" : String
> plus (fromInt 0) (fromInt 2) |> toString
"SSZ" : String
> plus (fromInt 10) (fromInt 2) |> toString
"SSSSSSSSSSSSZ" : String
> plus (fromInt 10) (fromInt 2) |> toInt
12 : Int
(Read up to at least this point before watching video part one and part two )
We can define eq
uality for Nat
s by peeling off one data constructor at a time.
equals : Nat -> Nat -> Bool
equals x y =
case (x, y) of
(Z, Z) -> True
(S x_, S y_) -> equals x_ y_
_ -> False
This seems to work just fine...
> equals (fromInt 0) (fromInt 0)
True : Bool
> equals (fromInt 0) (fromInt 2)
False : Bool
> equals (fromInt 10) (fromInt 2)
False : Bool
... but it is really slow for some comparisons that seem like they should be easy to decide quickly.
> equals (fromInt 10000) (fromInt 10000000)
False : Bool
> equals (fromInt 0) (fromInt 10000000)
False : Bool
The problem is that, under eager evaluation, both Nat
ural numbers are evaluated completely before calling equals
, which then very quickly decides the last two disequalities.
ThunkNat
.elm
A common approach to delaying the evaluation of an expression e
of type a
in an eager language is to define a function \() -> e
, called a thunk, that waits for a dummy argument before evaluating the expression.
We will port the implementations above in order to delay computing the representations of natural numbers. In our new representation of Nat
s, a S
uccessor value stores the delayed computation of the Nat
that it succeeds. The force
function is used to evaluate a suspended computation.
type Nat = Z | S (Thunk Nat)
type alias Thunk a = () -> a
force : Thunk a -> a
force thunk =
thunk ()
Note that implementing a function like the following is not a good idea, because a call to delay
will force its argument to be evaluated!
delay : a -> Thunk a
delay e =
\() -> e
To implement fromInt
, we no longer need to implement a (tail-recursive) helper function because there are no direct recursive calls; instead, the latter case immediately returns a S
uccessor value (which may some time later lead to a call to fromInt
).
fromInt i =
if n <= 0 then
Z
else
S (\_ -> fromInt (i-1))
Notice that our new representation of non-Z
ero numbers is quite different from before:
> import Nat as N
> import ThunkNat exposing (..)
> N.fromInt 10
S (S (S (S (S (S (S (S (S (S Z))))))))) : N.Nat
> fromInt 10
S <function> : Nat
Unlike fromInt
, toInt
does need to make recursive calls immediately, because the resulting type (Int
) does not have the notion of delayed computation built in to its representation. Therefore, we will want to employ the tail-recursive helper strategy.
toInt =
let foo acc n =
case n of
Z -> acc
S n_ -> foo (1 + acc) (force n_)
in
foo 0
As before, fromInt
and toInt
are inverses:
> fromInt 100000000 |> toInt
100000000 : Int
Notice how toInt
uses force
to evaluate all of the nested suspensions that are stored within a Nat
. A function like this is called monolithic, whereas a function like fromInt
is called incremental because it does not trigger the evaluation of all nested suspensions.
Another example of a monolithic function is toString
.
toString =
let foo acc n =
case n of
Z -> acc
S n_ -> foo ("S" ++ acc) (force n_)
in
foo "Z"
However, this function is no longer needed for its original purpose above, because printing the representation of a S
uccessor value is now very quick.
> fromInt 10000
S <function> : Nat
> fromInt 10000 |> toString
" ... SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSZ" : String
We can now return to our motivation for delaying the evaluation of Nat
s.
equals x y =
case (x, y) of
(Z, Z) -> True
(S x_, S y_) -> equals (force x_) (force y_)
(_, _) -> False
When x
and y
represent the same number, equals
evaluates all of the nested suspensions in both x
and y
. Otherwise, it evaluates only enough of their delayed representations in order to demonstrate a difference in corresponding data constructors. As a result, all of the following comparisons run quickly, unlike with our original Nat.elm
implementation.
> equals (fromInt 0) (fromInt 0)
True : Bool
> equals (fromInt 0) (fromInt 2)
False : Bool
> equals (fromInt 10) (fromInt 2)
False : Bool
> equals (fromInt 10000) (fromInt 10000000)
False : Bool
> equals (fromInt 0) (fromInt 10000000)
False : Bool
We finish porting the original module with the following incremental implementation of plus
.
plus x y =
case y of
Z -> x
S y_ -> S (\_ -> plus x (force y_))
As a result, the following comparison evaluates quickly.
> equals (fromInt 0) (plus (fromInt 10000000) (fromInt 10000000))
False : Bool
What happens if we ask Elm to compare Nat
s using built-in equality?
> fromInt 0 == fromInt 0
-- PARSE ERROR ------------------------------------------------------------- elm
...
> (fromInt 0 == fromInt 0)
True : Bool
> (fromInt 0 == fromInt 1)
False : Bool
> (fromInt 1 == fromInt 2)
Error: Trying to use `(==)` on functions. There is no way to know if
functions are "the same" in the Elm sense. ...
Elm throws a run-time error when trying to compare two different function values. Fair enough. Notice that "physical equality" between function values is supported, however.
> fromInt 1 == fromInt 1
Error: Trying to use `(==)` on functions. There is no way to know if
functions are "the same" in the Elm sense.
> let foo = fromInt 1 in foo == foo
True : Bool
Defining suspensions and incremental functions can be really valuable techniques, but there's no free lunch. The representation of a thunk is a closure, which is a function to evaluate along with bindings for all free variables referred to by the function. Delaying computations willy nilly, then, can lead to a huge number of these closures building up. So one should restrict the use of thunks to situations where the benefits of being able to define incremental functions outweighs the overheads associated with delayed computations.
Another concern is that the same delayed computation may be demanded more than once. If the computation takes significant resources to evaluate, then redoing the work every time is undesirable.
As a result, many eager languages — such as OCaml, Standard ML, and Racket — offer special-purpose constructs for manipulating delayed computations with the guarantee that the result of forcing a delayed computation is cached in case it is forced again in the future. The term lazy evaluation is often used to describe support for delayed computations with the guarantee of evaluating any such computation at most once.
Prior versions of Elm included library support for lazy evaluation, with the following interface:
lazy : (() -> a) -> Lazy a
force : Lazy a -> a
The lazy
function turned a Thunk a
into a Lazy a
value — the thunk was evaluated the first time it was force
d; if and when force
d again, the previous result was retrieved. Under the hood, this library was implemented using imperative features in JavaScript: each thunk had a corresponding mutable variable (called isForced
) to track whether the particular thunk had been evaluated and a mutable variable (called value
) to store its result.
Elm v0.19 does not provide language or library support for memoizing delayed computations. We can accomplish a bit of the same by explicitly maintaining caches for delayed computations; take a look at the optional reading below if you're curious. But in class, we'll keep things simple and just pretend that the system performs memoization.