CMSC 22610
Implementation of Computer Languages - I
Winter 2009



Project Overview

1  Introduction

The project for the course is to implement a small functional programming language, called LangF. (Students who have taken CMSC 22100 should recognize it as an enrichment of System F, the polymorphic λ-calculus.) The project will be divided into four parts: the scanner, which coverts a stream of input characters into (a stream of) tokens; the parser, which analyzes the syntactic structure of the stream of tokens and produces a parse tree; the type checker, which checks the parse tree for type correctness and produces a typed abstract syntax tree; and a simple code generator that takes the abstract syntax tree and produces an object file for interpretation by a virtual machine. Each part of the project builds upon the previous parts (but reference solutions for previous parts will be available). The implementation will be undertaken using the Standard ML programming language and submission of the project milestones will be managed using the course GForge server. Programming projects will be individual efforts (no group submissions).

1.1  Project Schedule

The tentative schedule for the project assignments is as follows:

Out date:    Project description:    Due date: 
January 8 Scanner January 23
January 22 Parser February 6
February 12 Type checker February 27
February 26 Code generator March 20

2  LangF

LangF is a strongly-typed, call-by-value, higher-order, polymorphic, functional programming language. The syntax and semantics of LangF are similar to other functional programming languages (e.g., Standard ML, Haskell), but with many simplifications and a more explicit type system. LangF does not have type inference, exceptions, references, or a module system. LangF does have first-class functions, datatypes, and explicit and first-class polymorphism.

2.1  Types and Values

LangF supports two primitive types of values: integers and strings. In addition, LangF has datatype-constructed values (including predefined datatypes for booleans and units), function values, and polymorphic-function values.

2.2  Programs

A LangF program is a sequence of declarations, which are either type, datatype, value, or function definitions, followed by an expression. Executing a LangF program means evaluating each of the declarations (making their definitions available to the subsequent declarations and expression) and then evaluating the expression. Here is a very simple LangF program that computes 5! by defining the fact function followed by the expression fact 5:

(* program declarations *) fun fact (n: Integer) : Integer = if n == 0 then 1 else n * fact (n - 1) ; (* program expression *) fact 5

Note that a ; is used to separate the declarations from the expression (but multiple declarations are not separated by any delimiter) and a (* is used to start a comment and *) is used to terminate a comment; comments may be nested.

2.3  Simple Expressions

LangF is an expression language, which means that all computation is done by expressions (there are no statements). Furthermore, LangF is a call-by-value language, which means that (almost) all sub-expressions are evaluated to values before the expression itself is evaluated. The body of the fact function in the example program above has introduced many of the simple LangF expressions:

Thus, many of the simple LangF expressions deal with values of the primitive integer and string types and the built-in boolean datatype. For reference, the following is the full collection of integer-centric expressions:

expression formmeaning
integerinteger constant
Exp == Expinteger equality
Exp <> Expinteger inequality
Exp < Expinteger less-than
Exp <= Expinteger less-than-or-equal
Exp > Expinteger greater-than
Exp >= Expinteger greater-than-or-equal
Exp + Expinteger addition
Exp - Expinteger subtraction
Exp * Expinteger multiplication
Exp / Expinteger division
Exp % Expinteger modulus
~ Expunary integer negation

the following is the full collection of string-centric expressions:

expression formmeaning
stringstring constant
Exp ^ Expstring concatenation

and the following is the full collection of boolean-centric expressions:

expression formmeaning
if Exp then Exp else Expconditional
Exp orelse Expshort-circuiting disjunction
Exp andalso Expshort-circuiting conjunction

2.4  Value Declarations

A LangF value declaration simply binds a name to an expression; the name may be used in subsequent declarations and expressions. When a value declaration is evaluated, the expression is evaluated to a value and the value is bound to the name (making the value available to subsequent declarations and expressions through the name). The following simple value declaration binds the name two to the integer expression 1 + 1:

val two = 1 + 1

Note that a value declaration is introduced with the val keyword and that (value variable) names are written with a leading lower-case letter. A LangF value declaration may optionally assert the type of the bound expression:

val four : Integer = two + two

Such type assertions can be useful to understand why a program has (or does not have) type errors.

2.5  Function Declarations and Anonymous Function Expressions

A LangF function declaration defines a function and introduces a name for the function; the function name may be used in subsequent declarations and expressions. For example, we might wish to define a function to add two integers:

fun add (a: Integer) (b: Integer) : Integer = a + b

Note that a function declaration is introduced with the fun keyword, followed by the function name, one or more function parameters (each a variable with its type), the function result type, and an expression for the function body. Functions in LangF are first-class: they may be nested, taken as arguments, and returned as results. A function with more than one parameter is a curried function; thus the following program evaluates to 4:

fun add (a: Integer) (b: Integer) : Integer = a + b val _ : Integer -> Integer -> Integer = add val inc : Integer -> Integer = add 1 ; inc (add 1 2)

Note that the function type is introduced with the syntax Type -> Type; the add function has the type Integer -> Integer -> Integer; applying it to the argument 1 returns a function having the type Integer -> Integer (which is bound to the name inc).

A LangF function declaration may be recursive and a collection of function declarations may be mutually recursive. Thus, the function body may use the function names introduced by the function declaration(s). For example, in the factorial function, the function name fact is used in the function body:

fun fact (n: Integer) : Integer = if n == 0 then 1 else n * fact (n - 1)

As an example of mutually recursive functions, consider the isEven and isOdd functions:

fun isEven (n: Integer) : Bool = if n == 0 then True else isOdd (n - 1) and isOdd (n: Integer) : Bool = if n == 0 then False else isEven (n - 1)

Note that functions declarations in a collection of mutually recursive function declarations are separated with the and keyword. Also note that within a collection of mutually recursive function declarations, all of the introduced function names must be distinct.

In addition to function declarations, LangF has anonymous function expressions. For example, we might wish to return a function to multiply two integers (without introducing a name for the function):

fn (x: Integer) (y: Integer) => x * y

Note that an anonymous function expression is introduced with the fn keyword, followed by one or more function parameters (each a variable with its type) and an expression for the function body. A LangF anonymous function expression cannot be recursive (as there is no name for the function).

2.6  Polymorphic Functions

A defining characteristic of LangF (taken from System F, the polymorphic λ-calculus) is polymorphism or type abstraction. The prototypical example of a polymorphic function is the identity function. The identity function simply returns its argument (without performing any computation on it). Thus, the behavior of the function is the same for all possible types of its argument (and result). The function declaration for the identity function introduces one function parameter (a type variable) to be used as the type of the second function parameter and the result type:

fun id ['a] (x: 'a) : 'a = x

Note that type variables are written with a leading ' and lower-case letter.

Like (ordinary) functions, polymorphic functions in LangF are first-class: they may be nested, taken as arguments, and returned as results. To use a polymorphic function, it must be applied to a type, rather than to an expression. The result of applying a polymorphic function to a type is a value having the type produced by instantiating the type variable with the applied type. For example, the result of applying the identity function to the integer type is a function having the type Integer -> Integer:

fun id ['a] (x: 'a) : 'a = x val _ : ['a] -> 'a -> 'a = id val _ : Integer -> Integer = id [Integer] val zero : Integer = id [Integer] 0

Note that the polymorphic function type is introduced with the syntax [ tyvarid ] -> Type. Also note that the type variable in a function parameter and in a polymorphic function type is a binding occurrence of the type variable; two polymorphic function types are equal if each of the bound type variables in one can be renamed to match the bound type variables in the other:

fun id ['a] (x: 'a) : 'a = x val _ : ['b] -> 'b -> 'b = id val _ : ['c] -> 'c -> 'c = id

Project 3 will discuss this aspect in more detail.

Anonymous function expressions may also be used to return polymorphic functions:

val twice : ['a] -> ('a -> 'a) -> 'a -> 'a = fn ['a] (f: 'a -> 'a) (x: 'a) => f (f x) val two : Integer = twice [Integer] (fn (x: Integer) => x + 1) 0 val zzzz : String = twice [String] (fn (s: String) => s ^ s) "z"

In function declarations and anonymous function expressions, type variable and (value) variable parameters may be mixed; however, a type variable parameter must occur before any use of the type variable in the types of (value) variable parameters:

val revApp = fn ['a] (x: 'a) ['b] (f: 'a -> 'b) => f x val _ : ['b] -> (Integer -> 'b) -> 'b = revApp [Integer] 1 val two = revApp [Integer] 1 [Integer] (fn (x: Integer) => x * 2)

The previous examples have also demonstrated that a function with more than one parameter (either type variable parameters or (value) variable parameters) is a curried function and can be partially applied to types or expression arguments.

2.7  Type Declarations

A LangF type declaration introduces another name for a type; the new type name may be used in subsequent declarations and expressions. For example, we might wish to abbreviate the type of an integer comparison function (a function from two integers to a boolean):

type IntCmp = Integer -> Integer -> Bool val intEq : IntCmp = fn (x: Integer) (y: Integer) = x == y

Note that a type declaration is introduced with the type keyword and that type names are written with a leading upper-case letter.

A LangF type declaration may also include type parameters, which must be instantiated at each use of the new type name. For example, we might wish to abbreviate the type of a general comparison function (a function from two values of the same (but any) type to a boolean) and then define the type of an integer comparison function in terms of the general comparison function:

type Cmp ['a] = 'a -> 'a -> Bool type IntCmp = Cmp [Integer]

Note that type parameters and type arguments are written in [] and, as in function parameters, that type variables are written with a leading ' and lower-case letter. Multiple type parameters and type arguments are separated by ,s and [ ] can be used to be explicit about the absence of type parameters and type arguments:

type BinOp ['a, 'b] = 'a -> 'a -> 'b type Cmp ['a] = BinOp ['a, Bool] type IntCmp [] = Cmp [Integer]

Unlike polymorphic functions, a type name cannot be partially applied; at every use of the type name, all type parameters must be instantiated.

2.8  Datatype Declarations and Constructor and Case Expressions

A LangF datatype declaration introduces a new type along with constructors; the constructors provide the means to create values of the new type and to take apart values of the new type. Each constructor is declared with the types of its argument(s). A very simple datatype declaration is the one for booleans that introduces two constructors (True and False), each with no arguments:

datatype Bool = True | False

Note that a datatype declaration is introduced with the datatype keyword and that constructor names are written with a leading upper-case letter. A slightly more complicated datatype declaration is one that represents publications, which can be either a book (with an author and a title) or an article (with an author, a title, and a journal name):

datatype Publication = Book {String, String} | Article {String, String, String}

Note that the types of a constructor's argument(s) are written in {} separated by ,s and { } can be used to be explicit about the absence of arguments.

To create a value having a datatype type, a constructor name is applied to argument(s):

val pub1 : Publication = Book {"Lawrence Paulson", "ML for the Working Programmer"} val pub2 : Publication = Article {"Andrew Appel", "A Critique of Standard ML", "Journal of Functional Programming"}

As in constructor declarations, the arguments of a constructor are written in {} separated by ,s and { } can be used to be explicit about the absence of arguments. Unlike functions, a constructor name cannot be partially applied; at every use of the constructor name, all arguments must be given.

To take apart a value having a datatype type, a collection of match rules (each with a pattern involving the constructor names and an expression) is checked successively against an argument; when a pattern matches the argument, its expression is returned as the result:

fun not (b: Bool) : Bool = case b of True => False | False => True end fun author (p: Publication) : String = case p of Book {author, title} => author | Article {author, title, journal} => author end val lawrence : String = author pub1 val andrew : String = author pub2

Note that a case expression is introduced with the case keyword, followed by the case-expression argument, the of keyword, one or more match rules, and terminated by the end keyword. Each match rule is introduced with the syntax Pat => Exp and separated by |s. In a match rule, the pattern is either a constructor name applied to simple pattern(s) or a simple pattern; a simple pattern is either a variable name or an _. When a pattern matches a value, pattern variable names are bound to the corresponding value components and may be used in the match rule expression. Thus, in the evaluation of author pub1 in the example above, the pattern Book {authortitle} matches the value Book {"Lawrence Paulson""ML for the Working Programmer"}, making author bound to "Lawrence Paulson". A subtle point is that a case expression must have exactly one a match rule for every possible argument; Project 3 will discuss this aspect in more detail.

The following example uses _ patterns:

fun isBook (p: Publication) : Bool = case p of Book {_, _} => True | _ => False end fun isJournal (p: Publication) : Bool = case p of Journal {_, _, _} => True | _ => False end

A LangF datatype declaration may also include type parameters (yielding a polymorphic datatype), which must be instantiated at each use of the new type name. The types of a constructor's arguments(s) may use the type parameters. For example, the pair datatype takes two type parameters and introduces a constructor with two arguments of the types of the parameters:

datatype Pair ['a, 'b] = Pair {'a, 'b}

To create a value having a polymorphic-datatype type, a constructor name is applied to type argument(s) and argument(s):

val one_hello = Pair [Integer, String] {1, "hello"}

The type arguments of a constructor are written in [] separated by ,s and [ ] can be used to be explicit about the absence of type arguments. Unlike polymorphic functions, a polymorphic-constructor name cannot be partially applied; at every use of the polymorphic-constructor name, all type arguments and arguments must be given.

Similarly, to match a value having a polymorphic-datatype type, a constructor name is applied to type argument(s) and simple pattern(s) in a match rule:

fun fst ['a] ['b] (p: Pair ['a, 'b]) : 'a = case p of Pair ['a, 'b] {x, y} => x end fun snd ['a] ['b] (p: Pair ['a, 'b]) : 'b = case p of Pair ['a, 'b] {x, y} => y end val one : Integer = fst [Integer] [String] one_hello val hello : String = snd [Integer] [String] one_hello

A LangF datatype declaration may be recursive and a collection of datatype declarations may be mutually recursive. Thus, the types of a constructor's argument(s) may use the type names introduced by the datatype declaration(s). For example, in the list datatype, the type of the second argument of the Cons constructor is itself the list datatype:

datatype List ['a] = Nil | Cons {'a, List ['a]}

As an example of mutually recursive datatypes, consider variadic trees that are either empty or a node with an element and a forest and forests that are either empty or a node with a tree and a forest:

datatype Tree ['a] = EmptyT | Forest {'a, Forest ['a]} and Forest ['a] = EmptyF | Tree {Tree ['a], Forest ['a]}

Note that datatype declarations in a collection of mutually recursive datatype declarations are separated with the and keyword. Also note that within a collection of mutually recursive datatype declarations, all of the introduced type names must be distinct and all of the introduced constructor names must be distinct; however, type names and constructor names need not be distinct (as in the pair datatype declaration and in the tree/forest datatype declarations).

The following mutually recursive functions compute the height of variadic trees and forests:

fun max (x: Integer) (y: Integer) : Integer = if x > y then x else y fun heightTree ['a] (t: Tree 'a) : Integer = case t of EmptyT ['a] => 0 | Forest ['a] {_, f} => 1 + (heightForest ['a] f) end and heightForest ['a] (f: Forest 'a) : Integer = case f of EmtpyF ['a] => 0 | Tree ['a] {t, f} => max (heightTree ['a] t) (heightForest ['a] f)

2.9  Miscellaneous Expressions

LangF include a few more expressions that have not been introduced in the previous examples.

A let expression introduces declarations with limited scope; that is, the names (type names, constructor names, function names) introduced by the declarations can only be used in the body of the let expression:

val sixteen = let fun square (n: Integer): Integer = n * n in square (square 2) end (* cannot use 'square' in later declarations or expressions. *)

Note that a let expression is introduced with the let keyword, followed by one or more declarations, the in keyword, followed by the let-expression body, and terminated by the end keyword.

A type-constraint expression asserts the type of an expression:

val three = ((id [Integer] : Integer -> Integer) (3 : Integer) : Integer)

Note that a type-constraint expression is introduced with the syntax Exp : Type. Such type assertions can be useful to understand why a program has (or does not have) type errors.

Finally, a sequence expression evaluates multiple expressions, but only returns the result of the final expression:

val zero = (print "Hello " ; print "world!\n" ; 0) val one = inc zero

Note that a sequence expression is written in () separated by ;. Evaluating these declarations first prints the string "Hello ", then prints string "world!\n", then binds zero to the value 0, and finally binds one to the value 1. As a convenience, a sequence expression appearing in the body of a let expression can be written without the delimiting parentheses:

val nine = let fun square (n: Integer) : Integer = n * n in print "Hello " ; print "world!\n" ; square 3 end

Evaluating this declaration first prints the string "Hello ", then prints string "world!\n", then binds nine to the value 9.

2.10  Predefined Types, Constructors, and Functions

A LangF program may use a number of predefined types, constructors, and functions. Such predefined types, constructors, and functions are provided both for convenience and for behaviors that cannot otherwise be expressed in a LangF program. We will expand the collection of predefined types, constructors, and functions over the course of the project; for now, we assume that the following are provided:

type Integer = ... type String = ... datatype Unit = Unit datatype Bool = True | False fun print (s: String) : Unit = ... fun fail ['a] (s: String) : 'a = ...

Note that the predefined types, constructors, and functions are bound to type names, constructor names, and function names; in particular, they are not keywords.

The Integer and String types correspond to the primitive types of integers and strings; we have seen many examples that use these types.

The Bool datatype corresponds to the type returned by the integer-comparison expression forms and the type expected by the boolean-centric expression forms. The Unit datatype corresponds to a type with exactly one constructor Unit; it is useful as the return type of functions that should be evaluated for a side-effect, without returning a value.

The print function prints its string argument. The fail polymorphic function also prints its string argument, but then immediately terminates the LangF program. It is useful for aborting the program under unrecoverable error conditions or unexpected arguments. For example, the fact function defined above will go into an infinite loop when called with a negative argument. The follwing safeFact function aborts the program (with a somewhat helpful error message) when called with a negative argument, rather than going into a (less helpful) infinite loop:

fun safeFact (n: Integer) : Integer = if n < 0 then fail [Integer] "safeFact: n < 0" else fact n

3  Example Programs

The following example programs demonstrate many of the features of the LangF programming language.

(* fib.lgf *) fun intToString (i:Integer) : String = if i < 0 then "~" ^ (intToString (~i)) else if i == 0 then "0" else if i == 1 then "1" else if i == 2 then "2" else if i == 3 then "3" else if i == 4 then "4" else if i == 5 then "5" else if i == 6 then "6" else if i == 7 then "7" else if i == 8 then "8" else if i == 9 then "9" else (intToString (i / 10)) ^ (intToString (i % 10)) fun fib (n:Integer) : Integer = if n <= 1 then 1 else fib (n - 1) + fib (n - 2) val n = 20 val fib_n = fib n ; (print "fib "; print (intToString n); print " = "; print (intToString fib_n); print "\n")
(* int-to-string.lgf *) fun intToString (i:Integer) : String = if i < 0 then "~" ^ (intToString (~i)) else if i == 0 then "0" else if i == 1 then "1" else if i == 2 then "2" else if i == 3 then "3" else if i == 4 then "4" else if i == 5 then "5" else if i == 6 then "6" else if i == 7 then "7" else if i == 8 then "8" else if i == 9 then "9" else (intToString (i / 10)) ^ (intToString (i % 10)) ; print (intToString 42 ^ "\n")
(* fact-y.lgf *) fun intToString (i:Integer) : String = if i < 0 then "~" ^ (intToString (~i)) else if i == 0 then "0" else if i == 1 then "1" else if i == 2 then "2" else if i == 3 then "3" else if i == 4 then "4" else if i == 5 then "5" else if i == 6 then "6" else if i == 7 then "7" else if i == 8 then "8" else if i == 9 then "9" else (intToString (i / 10)) ^ (intToString (i % 10)) (* the Y combinator *) fun y ['a] (f: ('a -> 'a) -> ('a -> 'a)) (x: 'a) : 'a = f (fn (x : 'a) => y ['a] f x) x val factY = fn (fact: Integer -> Integer) (n: Integer) => if n == 0 then 1 else n * fact (n - 1) val fact : Integer -> Integer = y [Integer] factY val n = 5 val fact_n = fact n ; (print "fact "; print (intToString n); print " = "; print (intToString fact_n); print "\n")
(* list.lgf *) fun intToString (i:Integer) : String = if i < 0 then "~" ^ (intToString (~i)) else if i == 0 then "0" else if i == 1 then "1" else if i == 2 then "2" else if i == 3 then "3" else if i == 4 then "4" else if i == 5 then "5" else if i == 6 then "6" else if i == 7 then "7" else if i == 8 then "8" else if i == 9 then "9" else (intToString (i / 10)) ^ (intToString (i % 10)) datatype List ['a] = Nil | Cons {'a, List ['a]} fun foldl ['a] ['b] (f: 'a -> 'b -> 'b) (b: 'b) (l: List ['a]) : 'b = case l of Nil ['a] => b | Cons ['a] {hd, tl} => foldl ['a] ['b] f (f hd b) tl end val rev = fn ['a] => foldl ['a] [List ['a]] (fn (hd: 'a) (tl: List ['a]) => Cons ['a] {hd, tl}) (Nil ['a]) fun tabulate ['a] (n: Integer) (f: Integer -> 'a) : List ['a] = let fun loop (i: Integer) (acc: List ['a]) : List ['a] = if i <= n then loop (i + 1) (Cons ['a] {f i, acc}) else rev ['a] acc in if n < 0 then fail [List ['a]] "tabulate: n < 0" else loop 0 (Nil ['a]) end val n = 4999 val sum_n = foldl [Integer] [Integer] (fn (x: Integer) (y: Integer) => x + y) 0 (tabulate [Integer] n (fn (i: Integer) => i)) ; (print "foldl [Integer] [Integer]\n"; print " (fn (x: Integer) (y: Integer) => x + y)\n"; print " 0 (tabulate [Integer] "; print (intToString n); print " (fn (i: Integer) => i)) = "; print (intToString sum_n); print "\n")

4  LangF syntax

For reference, the following is the collected syntax of LangF. We assume the following kinds of terminal symbols: type variable identifiers (tyvarid, with a leading ' and lower-case letter), type constructor identifiers (tyconid, with a leading upper-case letter) type; data constructor identifiers (daconid, with a leading upper-case letter); variable identifiers (varid, with a leading lower-case letter); integer literals (integer); and string literals (string). As written the grammar is ambiguous; Project 2 will discuss how to resolve the ambiguity.

Prog
    ::= Exp
    | (Decl)* ; Exp
Decl
    ::= type tyconid TypeParams = Type
    | datatype DataDecl (and DataDecl)*
    | val SimplePat (: Type)? = Exp
    | fun FunDecl (and FunDecl)*
TypeParams
    ::=
    | [ ]
    | [ tyvarid (, tyvarid)* ]
Type
    ::= Type -> Type
    | [ tyvarid ] -> Type
    | tyconid TypeArgs
    | tyvarid
    | ( Type )
TypeArgs
    ::=
    | [ ]
    | [ Type (, Type)* ]
DataDecl
    ::= tyconid TypeParams = DaConDecl (| DaConDecl)*
DaConDecl
    ::= daconid DaConArgTys
DaConArgTys
    ::=
    | { }
    | { Type (, Type)* }
FunDecl
    ::= varid Param+ : Type = Exp
Param
    ::= ( varid : Type )
    | [ tyvarid ]
Exp
    ::= let Decl+ in Exp (; Exp)* end
    | fn Param+ => Exp
    | if Exp then Exp else Exp
    | case Exp of MatchRule (| MatchRule)* end
    | Exp orelse Exp
    | Exp andalso Exp
    | Exp : Type
    | Exp op Exp op ∈ {==,<>,<,<=,>,>=,+,-,*,/,%,^}
    | ~ Exp
    | daconid TypeArgs DaConArgs
    | Exp ApplyArg
    | varid
    | ( Exp (; Exp)* )
    | integer
    | string
MatchRule
    ::= Pat => Exp
Pat
    ::= daconid TypeArgs DaConPats
    | SimplePat
DaConPats
    ::=
    | { }
    | { SimplePat (, SimplePat)* }
SimplePat
    ::= varid
    | _
DaConArgs
    ::=
    | { }
    | { Exp (, Exp)* }
ApplyArg
    ::= Exp
    | [ Type ]

Document History

January 16, 2009
 
Section 4: let Decl+ in Exp (; Exp)+let Decl+ in Exp (; Exp)*
January 8, 2009
 
Section 2.8: val one_hello : Pair [IntegerString] {1, "hello"}val one_hello = Pair [IntegerString] {1, "hello"}

Section 2.8: and Forest ['a] = EmptyF | Tree of {Tree ['a], Forest ['a]}and Forest ['a] = EmptyF | Tree {Tree ['a], Forest ['a]}

January 6, 2009
Original version

Last modified: Sun Mar 15 19:19:10 CDT 2009