LangF Type Checker
Due: February 27, 2009
1 Introduction
The third project is to implement a type checker for LangF, which
checks whether or not a parse tree is statically correct and
produces a typed abstract syntax tree (AST). The abstract
syntax tree includes information about the binding sites of
identifiers and about the types of variables and expressions. The
project seed code will provide hand-written and ML-ULex based scanners
(but you may also use your hand-written scanner from Projects 1 and
2), ML-Antlr and ML-Yacc based parsers (but you may also use your
parser specification from Project 2), and modules for implementing the
abstract-syntax-tree representation. The bulk of this document is a
formal specification of the typing rules for LangF.1 The type system for LangF is essentially an
enrichment of the System F type system.
2 LangF Syntactic Restrictions
There are a number of syntactic restrictions that should be enforced
by the type checker. Although these restrictions could be specified
as part of the typing rules below, it is easier to specify them
separately.
-
The type variable identifiers in the type parameters of a
type declaration must be distinct.
- The type variable identifiers in the type parameters of each
DataDecl of a datatype declaration must be distinct.
- The type constructor identifiers in a datatype declaration
must be distinct.
- The data constructor identifiers in a datatype declaration
must be distinct. (On the other hand, a datatype declaration
may introduce the same constructor name for both a type constructor
and a data constructor. For example,
datatype
Unit
=
Unit
.)
- The variable identifiers denoting function variable identifiers
in a fun declaration must be distinct.
- The variable identifiers in a pattern must be distinct.
- Integer literals must be in the range −230…230−1
(i.e., representable as a 31-bit 2's-complement integer).
- Extra credit: The patterns in a case expression
must be irredundant and exhaustive. Consider the
match rules
Pat1 => Exp1 | … | Pat1 => Expn.
For the patterns to be irredundant, each Patj must match
some value (of the right type) that is not matched by Pati
for any i < j. For the patterns to be exhaustive, every value (of
the right type) must be matched by some Pati.
3 LangF Types
In the LangF typing rules, we distinguish between syntactic
types as they appear in the program text (or parse-tree
representation) and semantic types that are inferred for
various syntactic forms. To understand why we make this distinction,
consider the following LangF program:
datatype T = A {Integer} | B
val x : T = A {1}
datatype T = C {Integer} | D
val y : T = B
; 0
There is a type error at line 4 in the declaration
val
y
:
T
=
B
, because the type of the data constructor
expression B
is the type constructor corresponding to the
datatype declaration at line 1, but the type constraint
T
is the type constructor corresponding to the
datatype declaration at line 3. The datatype declaration at
line 3 shadows the datatype declaration at line 1.
However, in the parse-tree representation, all instances of
T
correspond to the same type constructor name (that is,
as values of the ParseTree
.
TyVarName
.
t
type, they all
carry the same Atom
.
atom
in the node
field).
τ ∈ Type | ::= | ∀ α → τr | type function, α ∈ TyVar |
| | | τa → τr | function |
| | | θ(k)(τ1, …, τk) | type constructor, θ ∈ TyCon |
| | | α | type variable |
|
Figure 1: LangF semantic types |
The abstract syntax of LangF semantic types is given in
Figure 1 (and represented by the
AbsSynTree
.
Type
.
t
datatype in the project seed code). A
semantic type is formed from type variables, type constructors
(including nullary and abstract type constructors like
bool(0) and integer(0)), function types,
and type-function types. (Note that a k-arity type constructor
records its arity as a superscript.)
Although this syntax mirrors that of the syntactic types, there are
some crucial differences. First, there will be distinct type
variables (α) and type constructors (θ) for each binding
occurence of a type variable name and type constructor name in the
parse-tree representation. Hence, type checking the datatype
declaration at line 1 will introduce one type constructor, say
t1(0), and type checking the datatype declaration
at line 3 will introduce a different type constructor, say
t2(0), which are not equal. Second, we will consider
semantic types equal upto renaming of type-function type variables.
That is, we will consider the semantic types ∀ α
→ α → α → bool(0)
and ∀ β → β → β →
bool(0) to be equal, whereas the parse trees
corresponding to ['
a
] -> '
a
-> '
a
->
bool
and
['
b
] -> '
b
-> '
b
->
bool
are not equal, because they use
different type variable names.
4 Identifiers and Environments
The typing rules for LangF use a number of different environments to
track binding information. There is a separate environment for each
kind of identifier in the parse-tree repesentation:
TVE | ∈ | TyVarEnv = TyVarId ⇀ TyVar | type-variable environment |
TCE | ∈ | TyConEnv = TyConId ⇀ TyCon ∪ (TyVar* × Type) | type-constructor environment |
DCE | ∈ | DaConEnv = DaConId ⇀ TyVar* × Type* × TyCon | data-constructor environment |
VE | ∈ | VarEnv = VarId ⇀ Type | variable environment |
|
where TyVarId is the set of syntactic type-variable identifiers
('
a
in the LangF concrete syntax, tyvarid in the
LangF grammar, or ParseTree
.
TyVarName
.
t
in the project
seed code), TyConId is the set of syntactic type-constructor
identifiers (T
, tyconid, or
ParseTree
.
TyConName
.
t
), DaConId is the set of
syntactic data-constructor identifiers (A
, daconid,
or ParseTree
.
DaConName
.
t
), VarId is the set of
syntactic variable identifiers (x
, varid, or
ParseTree
.
VarName
.
t
), Type is the set of
semantic types (τ in Figure 1 or
AbsSynTree
.
Type
.
t
in the project seed code),
TyVar is the set of semantic type variables
(α or AbsSynTree
.
TyVar
.
t
), and
TyCon is the set of semantic type constructors
(θ or AbsSynTree
.
TyCon
.
t
).
We define the extension of an environment E by another
environment E′ as follows:
(E ⊕ E′)(x) | = | ⎧
⎨
⎩ | E′(x) | if x ∈ dom(E′) |
E(x) | if x ∉ dom(E′) |
|
|
|
Since each of the environments has a different domain, it is
convenient to consider a combined environment:
E | ∈ | Env = TyVarEnv × TyConEnv × DaConEnv × VarEnv | combined environment |
|
For combined environments E =
〈TVE,TCE,DCE,VE〉 and
E′ =
〈TVE′,TCE′,DCE′,VE′〉,
we define lookup by lookup in the environment appropriate to the
identifier:
E(x) | = | | ⎧
⎪
⎨
⎪
⎩ | TVE(x) | if x ∈ TyVarId |
TCE(x) | if x ∈ TyConId |
DCE(x) | if x ∈ DaConId |
VE(x) | if x ∈ VarId |
|
|
|
and define extension by extension of each environment:
E ⊕ E′ | = | 〈
TVE ⊕ TVE′,
TCE ⊕ TCE′,
DCE ⊕ DCE′,
VE ⊕ VE′
〉 |
|
5 LangF Typing Rules
The typing rules for LangF provide a specification for the static
correctness of LangF programs. The general form of a judgement, as
used in the LangF typing rules, is
which can be read as “in Context, Term has
Descr.” The context is usually an environment, but may
include other information, while the description is usually a semantic
type and/or an (extended) environment. The judgement forms used in
the typing rules for LangF are given in Figure 2.
The typing rules for LangF are syntax directed, which means
that there is a typing rule for each (major) syntactic form in the
parse-tree representation of LangF programs. In the following, a
typing rule is labelled by the SML data constructor in the
ParseTree
module that corresponds to the syntactic form
handled by the typing rule.
E ⊢ Type ⇒ τ | type checking a type |
E ⊢ Param ⇒ E′; T | type checking a parameter |
E; τ ⊢ SimplePat ⇒ E′ | type checking a simple pattern |
E; τ ⊢ Pat ⇒ E′ | type checking a pattern |
E ⊢ Exp ⇒ τ | type checking an expression |
E; τ ⊢ MatchRule ⇒ τ′ | type checking a match rule |
E ⊢ Decl ⇒ E′ | type checking a declaration |
E;〈α1,…,αn〉;θ(n) ⊢ DaConDecl ⇒ E′ | type checking a data constructor declaration |
⊢ Prog ⇒ ✓ | type checking a program |
|
Figure 2: LangF judgement forms |
The typing rules for types check for well-formedness and translate the
syntactic types to semantic types. The typing rules for types use a
judgement of the form
which can be read as “in the environment E, the syntactic
type Type is well-formed and translates to the semantic type
τ.”
Type checking a type-function type requires introducing a new semantic
type variable (α) for the syntactic type variable identifier
(tyvarid) when checking the result type.
E ⊕ { tyvarid ↦ α | α fresh } ⊢ Typer ⇒ τr
| T_TyFn |
| |
E ⊢ [ tyvarid ] -> Typer ⇒ ∀ α → τr
| |
|
|
Type checking a function type requires checking the argument type and
the result type.
E ⊢ Typea ⇒ τa
E ⊢ Typer ⇒ τr
| T_Fn |
| |
E ⊢ Typea -> Typer ⇒ τa → τr
| |
|
|
There are two rules for type checking a type-constructor application,
depending on whether the type constructor identifier corresponds to a
type definition or a datatype definition (or an abstract
type). For type definitions, we check the actual (syntactic)
type arguments and then substitute the actual (semantic) type
arguments for the formal type parameters to produce a new (semantic)
type.
tyconid ∈ dom(E)
E(tyconid) = (〈 α1,…,αn 〉, τ)
E ⊢ Type1 ⇒ τ1
…
E ⊢ Typen ⇒ τn
| T_TyCon(type) |
| |
E ⊢ tyconid [ Type1 , … , Typen ] ⇒ τ[τ1/α1,…,τn/αn]
| |
|
|
For datatype definitions (or abstract types), we check the type
arguments and then construct a new (semantic) type.
tyconid ∈ dom(E)
E(tyconid) = θ(n)
E ⊢ Type1 ⇒ τ1
…
E ⊢ Typen ⇒ τn
| T_TyCon(datatype) |
| |
E ⊢ tyconid [ Type1 , … , Typen ] ⇒ θ(n)(τ1,…,τn)
| |
|
|
Type checking a type variable identifier returns its semantic type
variable, as recorded in the environment.
tyvarid ∈ dom(E)
E(tyvarid) = α
| T_TyVar |
| |
E ⊢ tyvarid ⇒ α
| |
|
|
5.2 Parameters
Type checking a parameter returns a new environment, which includes a
binding for the type variable identifier or variable identifier in the
parameter, and a meta-function that constructs a (semantic) type when
given the (semantic) type of the expression over which the parameter
abstracts. Note that the output environment only includes bindings
for the identifiers appearing in the parameter; the output environment
is not an extension of the input environment.
Type checking a variable parameter requires type checking the declared
type and returns an environment with a single binding and a
meta-function that creates a (semantic) function type.
E ⊢ Type ⇒ τ
E′ = { varid ↦ τ }
T′(x) = τ → x
| P_VarName |
| |
E ⊢ ( varid : Type ) ⇒ E′; T′
| |
|
|
Type checking a type variable parameter requires introducing a new
semantic type variable (α) and returns an environment with a
single binding and a meta-function that creates a (semantic)
type-function type.
E′ = { tyvarid ↦ α | α fresh }
T′(x) = ∀ α → x
| P_TyVarName |
| |
E ⊢ [ tyvarid ] ⇒ E′; T′
| |
|
|
5.2.1 Multiple Parameters
Type checking a list of parameters requires type checking each
paramter in turn. Since type variable identifiers in earlier
parameters are bound in later parameters, each parameter is type
checked in an environment extended with output environments of each
earlier parameter. The final output environment is the extension of
all the individual output environments, while the final meta-function
is the composition of all the individual meta-functions.
n ≥ 0
E ⊢ Param1 ⇒ E1; T1
…
E ⊕ E1 ⊕ … ⊕ En−1 ⊢ Paramn ⇒ En; Tn
E′ = E1 ⊕ … ⊕ En−1 ⊕ En
T′ = T1 ∘ … ∘ Tn
| Params |
| |
E ⊢ Param1 … Paramn ⇒ E′; T′
| |
|
|
5.3 Simple Patterns and Patterns
Type checking a pattern is similar to type checking a parameter, in
that it returns a new environment, which includes a binding for any
variable identifier in the pattern. Again, the output environment
only includes bindings for the identifiers appearing in the pattern;
the output environment is not an extension of the input environment.
The typing rules for patterns use judgements of the form:
E; τ ⊢ SimplePat ⇒ E′
E; τ ⊢ Pat ⇒ E′
|
where τ is the (semantic) type of values matched by the pattern.
5.3.1 Simple Patterns
Type checking a variable identifier simple pattern returns an
environment with a single binding (of the variable to the input
(semantic) type).
E′ = { varid ↦ τ }
| P_VarName |
| |
E; τ ⊢ varid ⇒ E′
| |
|
|
Type checking a wildcard simple pattern returns an empty environment.
E′ = { }
| P_Wild |
| |
E; τ ⊢ _ ⇒ E′
| |
|
|
Type checking a data constructor pattern requires checking a number of
things. First, the (semantic) type of values being matched by the
pattern must be a type-constructor type
(θ(n)(τ1′,…,τn′)). Second, the data
constructor identifier is looked up in the environment, returning its
semantic type constructor (θ(n), which must match the type
of values being matched by the pattern), the formal types of its
arguments (〈τ1,…,τm〉), and the formal type
variables (〈α1,…,αn〉) over which the
type constructor and the formal types of its arguments is abstracted.
Third, each of the actual type arguments is type checked (E
⊢ Typei ⇒ τi′). Fourth, each of the
constituent simple patterns is type checked with an input type equal
to the substitution of the actual type arguments for the formal type
variables in the formal type of the corresponding data constructor
argument (E;
τi[α1/τ1′,…,αn/τn′] ⊢
SimplePati ⇒ Ei). The final output
environment is the extension of all the individual output environments
of the constituent simple patterns.
τ = θ(n)(τ1′,…,τn′)
daconid ∈ dom(E)
E(daconid) = (〈α1,…,αn〉,〈τ1,…,τm〉,θ(n))
E ⊢ Type1 ⇒ τ1′
…
E ⊢ Typen ⇒ τn′
E; τ1[α1/τ1′,…,αn/τn′] ⊢ SimplePat1 ⇒ E1
…
E; τm[α1/τ1′,…,αn/τn′] ⊢ SimplePatm ⇒ Em
E′ = E1 ⊕ … ⊕ Em
| P_DaCon |
| |
E; τ ⊢ daconid [ Type1 , … , Typen ] { SimplePat1 , … , SimplePatm } ⇒ E′
| |
|
|
Note that in the typing rule for type checking a simple pattern, the
premise judgement corresponds to the rules given in
Section 5.3.1, while the conclusion judgement
corresponds to the rules given in this section.
E; τ ⊢ SimplePat ⇒ E′
| P_SimplePat |
| |
E; τ ⊢ SimplePat ⇒ E′
| |
|
|
5.4 Expressions
The typing rules for expressions use a judgement of the form
which can be read as “in the environment E, the expression
Exp has the semantic type τ.”
Type checking an anonymous function requires type checking the
parameters and then type checking the function body in an environment
extended with the bindings of the parameters. The anonymous function
will have either a type-function type or a function type, as
constructed by the meta-function from the type checking of the
parameters.
E ⊢ Params ⇒ E′; T
E ⊕ E′ ⊢ Exp ⇒ τ
| E_Fn |
| |
E ⊢ fn Params => Exp ⇒ T(τ)
| |
|
|
Type checking an if expression requires checking that the
condition expression has the boolean type and that the then
expression and the else expression have the same type.
E ⊢ Expc ⇒ bool(0)
E ⊢ Expt ⇒ τ
E ⊢ Expe ⇒ τ
| E_If |
| |
E ⊢ if Expi then Expt else Expe ⇒ τ
| |
|
|
Type checking an orelse or andalso expression requires that
both operands have the boolean type and expression itself has the
boolean type.
E ⊢ Expl ⇒ bool(0)
E ⊢ Expr ⇒ bool(0)
| E_Orelse |
| |
E ⊢ Expl orelse Expr ⇒ bool(0)
| |
|
|
E ⊢ Expl ⇒ bool(0)
E ⊢ Expr ⇒ bool(0)
| E_Andalso |
| |
E ⊢ Expl andalso Expr ⇒ bool(0)
| |
|
|
Type checking a constraint expression requires checking that the
expression and the type constraint have the same (sematic) type.
E ⊢ Exp ⇒ τ
E ⊢ Type ⇒ τ
| E_Constraint |
| |
E ⊢ Exp : Type ⇒ τ
| |
|
|
Type checking the binary and unary operations requires checking that
the operands have the types appropriate for the operation.
op ∈ {==,<>,<,<=,>,>=}
E ⊢ Expl ⇒ integer(0)
E ⊢ Expr ⇒ integer(0)
| E_BinOp(CmpOp) |
| |
E ⊢ Expl op Expr ⇒ bool(0)
| |
|
|
op ∈ {^}
E ⊢ Expl ⇒ string(0)
E ⊢ Expr ⇒ string(0)
| E_BinOp(ConcatOp) |
| |
E ⊢ Expl op Expr ⇒ string(0)
| |
|
|
op ∈ {+,-,*,/,%}
E ⊢ Expl ⇒ integer(0)
E ⊢ Expr ⇒ integer(0)
| E_BinOp(ArithOp) |
| |
E ⊢ Expl op Expr ⇒ integer(0)
| |
|
|
op ∈ {~}
E ⊢ Exp ⇒ integer(0)
| E_UnOp(ArithOp) |
| |
E ⊢ op Exp ⇒ integer(0)
| |
|
|
Type checking a data constructor expression requires checking a number
of things. First, the data constructor identifier is looked up in the
environment, returning its semantic type constructor (θ(n),
the formal types of its arguments
(〈τ1,…,τm〉), and the formal type variables
(〈α1,…,αn〉) over which the type
constructor and the formal types of its arguments is abstracted.
Second, each of the actual type arguments is type checked (E
⊢ Typei ⇒ τi′). Third, each of the
actual expression arguments is type checked, and must return a type
equal to the substitution of the actual type arguments for the formal
type variables in the formal type of the corresponding data
constructor argument (E;
τi[α1/τ1′,…,αn/τn′] ⊢
SimplePati ⇒ Ei). The final output
type is the type constructor applied to the actual type arguments.
daconid ∈ dom(E)
E(daconid) = (〈α1,…,αn〉,〈τ1,…,τm〉,θ(n))
E ⊢ Type1 ⇒ τ1′
…
E ⊢ Typen ⇒ τn′
E ⊢ Exp1 ⇒ τ1[α1/τ1′,…,αn/τn′]
…
E ⊢ Expm ⇒ τm[α1/τ1′,…,αn/τn′]
| E_DaCon |
| |
E ⊢ daconid [ Type1 , … , Typen ] { Exp1 , … , Expm } ⇒ θ(n)(τ1′,…,τn′)
| |
|
|
There are two rules for type checking an application, depending on the
kind of the apply argument. For an expression apply argument, we
check that the function expression has a function type and that the
argument expression has the argument type.
E ⊢ Expf ⇒ τa → τr
E ⊢ Expa ⇒ τa
| E_Apply(A_Exp) |
| |
E ⊢ Expf Expa ⇒ τr
| |
|
|
For a type apply argument, we check that the function expression has a
type-function type and that the argument type is well-formed. The
type of the application expression is the subtitution of the
(semantic) type argument for the abstracted type variable in the
result type.
E ⊢ Expf ⇒ ∀ α → τr
E ⊢ Typea ⇒ τa
| E_Apply(A_Type) |
| |
E ⊢ Expf [ Typea ] ⇒ τr[τa/α]
| |
|
|
Type checking a variable identifier returns its semantic type, as
recorded in the environment.
varid ∈ dom(E)
E(varid) = τ
| E_VarName |
| |
E ⊢ varid ⇒ τ
| |
|
|
Type checking an integer or string constant returns the obvious type.
| E_Integer |
| |
E ⊢ integer ⇒ integer(0)
| |
|
|
| E_String |
| |
E ⊢ string ⇒ string(0)
| |
|
|
Type checking a sequence expression requires type checking each
expression, but only returns the type of the final expression.
n ≥ 0
E ⊢ Exp1 ⇒ τ1
…
E ⊢ Expn ⇒ τn
| E_Seq |
| |
E ⊢ ( Exp1 ; … ; Expn ) ⇒ τn
| |
|
|
Type checking a let expression requires checking the declarations
and then type checking the body expression in an environment extended
with the bindings of the declarations.
E ⊢ Decls ⇒ E′
n ≥ 0
E ⊕ E′ ⊢ Exp1 ⇒ τ1
…
E ⊕ E′ ⊢ Expn ⇒ τn
Θ(τn) ⊆ Θ(E)
| E_Let |
| |
E ⊢ let Decls in Exp1 ; … ; Expn end ⇒ τ
| |
|
|
The Θ(τn) ⊆ Θ(E) premise checks that
every (semantic) type constructor in τn is available in the
environment E. We define Θ(τ) and
Θ(E) as follows:
Θ(∀ α → τa) | = | Θ(τa) |
Θ(τa → τr) | = | Θ(τa) ∪ Θ(τr) |
Θ(θ(k)(τ1,…,τk) | = | {θ(k)} ∪ Θ(τ1) ∪ … ∪ Θ(τk) |
Θ(α) | = | {} |
|
Θ(〈TVE,TCE,DCE,VE〉) | = | { θ(k) | tyconid ∈ dom(TCE) and TCE(tyconid) = θ(k)} |
|
|
This prevents type constructors introduced by datatype
declarations in Decls from appearing in the type of Exp.
For example, the following LangF program does not type check:
(* Type constructor escapes the scope of its definition. *)
let
datatype T = A {Integer} | B {String}
fun f (x: Integer) : T = A {x}
in
f
end
On the other hand, the following LangF program does type check:
let
type T = Integer
fun f (x: Integer) : T = Integer
in
f
end
because the type declaration does not introduce a fresh
(semantic) type constructor.
Type checking a case expression requires type checking the
scruitinee and checking the match rules against the type of the
scruitinee.
E ⊢ Exp ⇒ τ
E; τ ⊢ MatchRules ⇒ τ′
| E_Case |
| |
E ⊢ case Exp of MatchRules end ⇒ τ′
| |
|
|
5.5 MatchRules
Type checking a match rule requires checking the right-hand-side
expression in the environment extended with the bindings from the
left-hand-side pattern.
E; τ ⊢ Pat ⇒ E′
E ⊕ E′ ⊢ Exp ⇒ τ′
| MatchRule |
| |
E; τ ⊢ Pat => Exp ⇒ τ′
| |
|
|
5.5.1 Multiple Match Rules
Type checking a list of match rules requires that every match rule has
the same result type.
n ≥ 0
E; τ ⊢ MatchRule1 ⇒ τ
…
E; τ ⊢ MatchRulen ⇒ τ
| MatchRules |
| |
E; τ ⊢ MatchRule1 | … | MatchRulen ⇒ τ′
| |
|
|
5.6 Declarations
The typing rules for declarations use a judgement of the form
which can be read as “in the environment E, the
declaration Decl returns the environment E′.” The
output environment only includes bindings for the identifiers defined
in the declaration; the output environment is not an extension of the
input environment.
5.6.1 Type Declarations
Type checking a type declaration requires introducing new
semantic type variables for the syntactic type variable identifiers
when checking the right-hand-side type. The final environment is an
environment with a binding for the (syntactic) type constructor
identifier that maps to the (semantic) type variables (acting as
formal type parameters) and the (semantic) type.
Etv = { tyvaridi ↦ αi | 1 ≤ i ≤ n and αi fresh }
E ⊕ Etv ⊢ Type ⇒ τ
E′ = { tyconid ↦ (〈 α1,…,αn〉, τ) }
| D_Type |
| |
E ⊢ type tyconid [ tyvarid1 , … , tyvaridn ] = Type ⇒ E′
| |
|
|
The typing rule for a datatype declaration is somewhat
complicated. A new semantic type constructor (θi) is
introduced for each syntactic type constructor identifier
(tyconidi). Furthermore, a new semantic type variable
(αi,j) is introduced for each syntactic type variable
identifier (tyvaridi,j). Each of the data constructor
declarations is type checked in an environment that includes
all of the type constructors (Etc) and
only the appropriate type variables
(Etvi). Note that type checking a data
constructor declaration takes the type variables and the type
constructor as part of the context; it returns an environment of
bindings for the data constructor identifiers defined by the data
constructor declaration. The final output environment includes all of
the type constructors and all of the data constructors (but none of
the type variables).
n ≥ 0
Etc = { tyconidi ↦ θ(mi) | 1 ≤ i ≤ n and θi fresh }
Etv1 = { tyvaridi,1 ↦ αi,1 | 1 ≤ i ≤ m1 and αi,1 fresh }
E ⊕ Etc ⊕ Etv1; (〈 α1,1,…,α1,m1 〉, θ(m1)) ⊢ DaConDecls1 ⇒ EDC1
…
Etvn = { tyvaridi,n ↦ αi,n | 1 ≤ i ≤ mn and αi,n fresh }
E ⊕ Etc ⊕ Etvn; (〈 αn,1,…,αn,mn 〉, θ(mn)) ⊢ DaConDeclsn ⇒ EDCn
E′ = Etc ⊕ EDC1 ⊕ … ⊕ EDCn
| D_Datatype |
| |
E ⊢ | datatype | tyconid1 [ tyvarid1,1 , … , tyvarid1,m1 ] = DaConDecls1 |
… |
and | tyconidn [ tyvaridn,1 , … , tyvaridn,mn ] = DaConDeclsn |
| ⇒ E′
|
| |
|
|
5.6.2 Data Constructor Declarations
Type checking a single data constructor declaration requires type
checking each of its declared argument types. The result environment
binds the data constructor identifier to its type constructor
(θ(n)), the formal types of its arguments
(〈τ1,…,τm〉), and the formal type variables
(〈α1,…,αn〉) over which the type
constructor and the formal types of its arguments is abstracted. The
type constructor and the formal type variables are provided by the
context.
E ⊢ Type1 ⇒ τ1
…
E ⊢ Typem ⇒ τm
E′ = { daconid ↦ (〈 α1,…,αn 〉, 〈 τ1,…,τm 〉, θ(n)) }
| DaConDecl |
| |
E; 〈 α1,…,αn 〉; θ(n) ⊢ daconid { Type1 , … , Typem } ⇒ E′
| |
|
|
Type checking a list of data constructor declarations requires type
checking each data constructor declaration. The final output
environment is the extension of all the individual output
environments.
m ≥ 0
E; 〈 α1,…,αn 〉; θ(n) ⊢ DaConDecl1 ⇒ EDC1
…
E; 〈 α1,…,αn 〉; θ(n) ⊢ DaConDeclm ⇒ EDCm
E′ = EDC1 ⊕ … ⊕ EDCm
| DaConDecls |
| |
E; 〈 α1,…,αn 〉; θ(n) ⊢ DaConDecl1 | … | DaConDeclsm ⇒ E′
| |
|
|
5.6.3 Value Declarations
Type checking a val declaration requires type checking the
expression and type checking the pattern, which returns the bindings
to be returned by the declaration. If there is a type assertion, then
it must return the same (semantic) type as the expression.
E ⊢ Exp ⇒ τ
(E ⊢ Type ⇒ τ)?
E; τ ⊢ SimplePat ⇒ E′
| D_Val |
| |
E ⊢ val SimplePat (: Type)? = Exp ⇒ E′
| |
|
|
Type checking a fun declaration requires constructing an
environment that assigns a type to each of the function variable
identifiers in the declaration. First, each of the parameter lists is
type checked (E ⊢ Paramsi ⇒
EPi), returning an environment and a meta-function.
The environment is used to type check the return type (E
⊕ EPi ⊢ Typei ⇒
τi) and the meta-function is used to construct the type of the
function (T(τi)). Each of the function bodies is type
checked in an environment extended with the appropriate parameters and
the function variable identifiers (E ⊕ E′
⊕ EPi ⊢ Expi ⇒
τi); note that each function body must have its declared return
type.
n ≥ 0
E ⊢ Params1 ⇒ EP1; T1
E ⊕ EP1 ⊢ Type1 ⇒ τ1
…
E ⊢ Paramsn ⇒ EPn; Tn
E ⊕ EPn ⊢ Typen ⇒ τn
E′ = { varidi ↦ Ti(τi) | 1 ≤ i ≤ n }
E ⊕ E′ ⊕ EP1 ⊢ Exp1 ⇒ τ1
…
E ⊕ E′ ⊕ EPn ⊢ Expn ⇒ τn
| D_Fun |
| |
E ⊢ | fun | varid1 Params1 : Type1 = Exp1 |
… |
and | varidn Paramsn : Typen = Expn |
| ⇒ E′
|
| |
|
|
5.6.4 Multiple Declarations
Type checking a list of declarations requires type checking each
declaration in turn. Since identifiers in earlier declarations are
bound in later declarations, each declaration is type checked in an
environment extended with output environments of each earlier
declaration. The final output environment is the extension of all the
individual output environments.
E ⊢ Decl1 ⇒ E1
…
E ⊕ E1 ⊕ … ⊕ En−1 ⊢ Decln ⇒ En
E′ = E1 ⊕ … ⊕ En−1 ⊕ En
| Decls |
| |
E ⊢ Decl1 … Decln ⇒ E′
| |
|
|
5.7 Programs and the Initial Environment
The typing rule for programs use a judgement of the form
which can be read as “the program Prog is statically
correct.”
Type checking a program is similar to type checking a let
expression; it requires type checking the declarations and then
checking the expression in an environment extended with the bindings
of the declarations.
E0 ⊢ Decls ⇒ E′
E0 ⊕ E′ ⊢ Exp ⇒ τ
| Prog |
| |
⊢ Decls ; Exp ⇒ ✓
| |
|
|
The declarations and expression are type checked in the context of an
initial environment E0 that provides predefined
type constructors, data constructors, and variables. This initial
environment is defined as follows:
E0 | = | 〈TVE0,TCE0,DCE0,VE0〉 |
|
|
TVE0 | = | { } |
|
TCE0 | = | ⎧
⎪
⎨
⎪
⎩ | Bool | ↦ | bool(0) |
Integer | ↦ | integer(0) |
String | ↦ | string(0) |
Unit | ↦ | unit(0) |
| ⎫
⎪
⎬
⎪
⎭ |
|
|
DCE0 | = | ⎧
⎪
⎨
⎪
⎩ | False | ↦ | (〈〉,〈〉,bool(0)) |
True | ↦ | (〈〉,〈〉,bool(0)) |
Unit | ↦ | (〈〉,〈〉,unit(0)) |
| ⎫
⎪
⎬
⎪
⎭ |
|
|
VE0 | = | ⎧
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎩ | argc | ↦ | unit(0) → integer(0) |
arg | ↦ | integer(0) → string(0) |
fail | ↦ | ∀ α → string(0) → α |
print | ↦ | string(0) → unit(0) |
size | ↦ | string(0) → integer(0) |
sub | ↦ | string(0) → integer(0) → integer(0) |
| ⎫
⎪
⎪
⎪
⎬
⎪
⎪
⎪
⎭ |
|
|
|
6 Conversion to Abstract Syntax Tree
In addition to checking that a program is statically correct, a type
checker must produce a program representation that can be used by the
rest of the compiler for optimization and code generation. In the
LangF compiler, the type checker will produce a typed abstract syntax
tree (the structure
AbsSynTree
:
ABS_SYN_TREE
module in
the project seed code). The typed abstract syntax tree (AST)
representation is very close to the parse tree (PT) representation,
but with some crucial differences:
-
The AST forms include no source location information.
- Type variables, type constructors, data constructors, and
variables in the AST representation are implemented by structures
with the
ID
signature
(langfc-src/common/id.sig). As discussed in
Section 3, semantic type variables, type constructors,
data constructors, and variables are used to distinguish different
binding occurrences, which otherwise have the same syntactic type
variable names, type constructor names, data constructor names, or
variable names. There are operations in the ID
signature that can be used to create fresh identifiers, not equal to
any other identifier previously created.
- Variables bound (and wildcards) in simple patterns include their
type.
- There is no type constraint expression form; instead, every
expression form is annotated with its type.
- There is no type declaration form; all type abbreviations
will have been expanded during type checking and conversion to the
abstract syntax tree.
- The val declaration form has no type constraint; the simple
pattern includes the type.
We have already introduced one form in the abstract syntax tree
representation: the semantic types from Section 3.
Similarly, we have already introduced one judgement for translating a
parse tree representation form into an abstract syntax tree
representation form: the judgement for type checking types from
Section 5.1:
E ⊢ Type ⇒ τ |
in the environment E, the parse tree type Type is
well-formed and translates to the abstract syntax tree type τ.
|
|
We can imagine other judgements that combine type checking with
translation to the abstract syntax tree:
E ⊢ Exp ⇒ τ; e |
in the environment E, the parse tree
expression Exp has the abstract syntax tree type τ and
translates to the abstract syntax tree expression e.
|
|
E ⊢ Decl ⇒ E′; d |
in the environment E, the
parse tree declaration Decl returns the environment
E′ and translates to the abstract syntax tree declaration d.
|
|
⊢ Prog ⇒ p |
the parse tree program Prog is statically correct and
translates to the abstract syntax tree program p.
|
|
The inference rules for these judgements will be very similar to those
given in Section 5, except they will construct an
appropriate output abstract syntax tree form.
7 Requirements
You should implement a type checker for LangF that enforces the type
system from Section 5 and produces a typed abstract
syntax tree. Your implementation should include (at least) the
following modules:
structure Environment : ENVIRONMENT
structure TypeChecker : TYPE_CHECKER
The ENVIRONMENT
signature in the project seed code is as
follows:
signature ENVIRONMENT =
sig
(* The combined environment is composed of a type variable
* environment, a type constructor environment, a data constructor
* environment, and a variable environment. Each individual
* environment has its own domain and co-domain.
*
* For a simple binding analysis, we can take the co-domain to be
* 'unit', since we only need to know whether or not the identifier
* is in the environment.
*
* For type checking (without producing an abstract syntax tree),
* you will require co-domains similar to those in the project
* description.
*
* For type checking and producing an abstract syntax tree,
* you will require additional components in the co-domains.
*)
structure TyVarEnv :
sig
type dom = ParseTree.TyVarName.t
type cod = unit
end
structure TyConEnv :
sig
type dom = ParseTree.TyConName.t
type cod = unit
end
structure DaConEnv :
sig
type dom = ParseTree.DaConName.t
type cod = unit
end
structure VarEnv :
sig
type dom = ParseTree.VarName.t
type cod = unit
end
type t
(* The empty environment {}. *)
val empty : t
(* Create an environment with a single TyVar entry. *)
val singletonTyVar : TyVarEnv.dom * TyVarEnv.cod -> t
(* Lookup a TyVar in the environment. *)
val lookupTyVar : t * TyVarEnv.dom -> TyVarEnv.cod option
(* Create an environment with a single TyCon entry. *)
val singletonTyCon : TyConEnv.dom * TyConEnv.cod -> t
(* Lookup a TyCon in the environment. *)
val lookupTyCon : t * TyConEnv.dom -> TyConEnv.cod option
(* Create an environment with a single DaCon entry. *)
val singletonDaCon : DaConEnv.dom * DaConEnv.cod -> t
(* Lookup a DaCon in the environment. *)
val lookupDaCon : t * DaConEnv.dom -> DaConEnv.cod option
(* Create an environment with a single Var entry. *)
val singletonVar : VarEnv.dom * VarEnv.cod -> t
(* Lookup a Var in the environment. *)
val lookupVar : t * VarEnv.dom -> VarEnv.cod option
(* Implements E1 (+) E2. *)
val extend : t * t -> t
(* Implements \Theta(E). *)
val tycons : t -> AbsSynTree.TyCon.Set.set
(* The initial environment E_0. *)
val initial: t
You will need to extend the endENVIRONMENT
signature (and the
Environment
structure) with new co-domain types as
required by your type-checker implementation.
The TYPE_CHECKER
signature is as follows:
signature TYPE_CHECKER =
sig
val typeCheck : ErrorStream.t *
ParseTree.Prog.t ->
AbsSynTree.Prog.t option
The endstructure
ParseTree
:
PARSE_TREE
and
structure
AbsSynTree
:
ABS_SYN_TREE
modules are provided
in the seed code; the PARSE_TREE
signature implementation
is at langfc-src/parse-tree/parse-tree.sig; the
ParseTree
structure implementation is at
langfc-src/parse-tree/parse-tree.sml; the
ABS_SYN_TREE
signature implementation is at
langfc-src/abs-syn-tree/abs-syn-tree.sig; and the
AbsSynTree
structure implementation is at
langfc-src/abs-syn-tree/abs-syn-tree.sml.
7.1 Errors
To support error reporting, the TypeChecker.typeCheck
function takes an argument of the type ErrorStream
.
t
. The
ErrorStream
:
ERROR_STREAM
module is provided in the seed
code and provides a common error reporting utility in the LangF
compiler. (The module was used implicitly in Projects 1 and 2.) The
ERROR_STREAM
signature implementation is at
langfc-src/common/error-stream.sig; the
ErrorStream
structure implementation is at
langfc-src/common/error-stream.sml.
Your type checker should report reasonable error messages. You should
report violations of the syntactic restrictions of
Section 2, unbound identifiers, and type
errors. There is a lot of room for creativity and style in reporting
errors. For a program with multiple type errors, you are required to
report at least one error message, but need not report more than one
error message.
8 GForge and Submission
Sources for Project 3 have been (or will shortly be) committed to your
repository in the project3 sub-directory. You will need to
update your local copy, by running the command:
from the cnetid-proj directory.
We will collect projects from the SVN repositories at 10pm on Friday,
February 27; make sure that you have committed your final version
before then. To do so, run the command:
from the cnetid-proj directory.
9 Hints
- Start early!
- Study the interfaces. You will need to be familiar with the
types and operations in the
PARSE_TREE
,
ABS_SYN_TREE
, and ID
signatures. - Think about how to represent the environment(s). In the parse
tree representation, each structure implementing a kind of
identifier provides
structure
Set
:
ORD_SET
,
structure
Map
:
ORD_MAP
, and
structure
Tbl
:
MONO_HASH_TABLE
modules for finite maps,
sets, and hash tables over that kind of identifier;
you can view the ORD_SET
signature
here,
the ORD_MAP
signature
here,
and the MONO_HASH_TABLE
signature
here.
- Think about the structure of the type-checker implementation.
Each judgement can be implemented as a function; just as the parse
tree datatypes for expressions, match rules, and declarations are
mutually recursive, the functions for judgements that type check
expressions, match rules, and declarations will be mutually
recursive. Each function for a judgement will have a case for each
typing rule with that judgement as the conclusion.
- Work in stages. First implement a simple binding checker that
only checks that the program has no unbound variables (but does not
check types and does not produce an abstract syntax tree). This
binding checker will establish the basic structure of the
implementation. Next, implement a type checker that checks for
unbound variables and checks types (but does not produce an abstract
syntax tree). This type checker will require extending the simple
binding checker, but will very closely match the typing rules from
Section 5. Next, implement a full type checker
that checks for unbound variables, checks types, and produces an
abstract syntax tree. This full type checker will require
additional information to be carried in the environment and to be
returned by each type checking function. Finally, extend the full
type checker to additionally check the syntactic restrictions of
Section 2.
- Work on error reporting last. Detecting errors and producing
good error messages can be difficult; it is more important for your
type checker to work on good programs than for it to “work” on bad
programs. Again, work in stages. First implement a type checker
that stops after detecting the first error. Then implement a type
checker that continues after detecting an error.
- To complete the assignment, you should only need to make changes
to the
cnetid-proj/project3/langfc-src/type-checker/environment.sig,
cnetid-proj/project3/langfc-src/type-checker/environment.sml,
and
cnetid-proj/project3/langfc-src/type-checker/type-checker.sml
files.
- Executing the compiler (from the
cnetid-proj/project3 directory) with the command
./bin/langfc -Ckeep-type-check=true file.lgf
|
will produce a file.type-check.ast file that
contains the abstract syntax tree returned by the type checker. Use
this control and its output to check that your type checker is
working as expected. The tests/type-checker directory
includes a number of tests (of increasing complexity); for each
testNN.lgf file, if the test has type errors,
there is a testNN.err file containing sample error
messages to be reported by the type checker. If the test has no
type errors, then there is no output file; rather, the abstract
parse tree returned by the type checker should convert to the core
intermediate representation and type check in the core intermediate
representation without errors. - As in past projects, you are not required to match the sample
error messages exactly. In particular, you need not have
particularly “pretty” error messages. The sample error messages
(and sample solution) have gone to some length to produce good error
messages, with types written using the identifiers in the parse tree
representation. You will probably find it much easier to print
types using the identifiers in the abstract syntax tree
representation (using the functions
Layout
.
toString
and
AST
.
Type
.
layout
), which will print indentifiers with a
uniqueifying suffix. For example, in test64.lgf, the
sample error message is:
test46.lgf:5.0-test46.lgf:6.0 Error:
Constraint and expression of 'val' disagree.
constraint: T
expression: ?T?
in: val y : T = B
where the constraint type is printed using the tycon identifier
T
from the parse tree representation, and the expression
type is also printed using the tycon identifier T
from
the parse tree representation, but printed as ?
T
?
to
indicate that this tycon is shadowed. It will be much easier to
produce an error message like:
test46.lgf:5.0-test46.lgf:6.0 Error:
Constraint and expression of 'val' disagree.
constraint: T__019
expression: T__015
in: val y : T = B
where the constraint type and the expression type are printed using
the tycon identifiers from the abstract syntax tree representation
(with their uniqueifying suffixes). The fact that one tycon is
shadowed by the other (in the parse tree representation) is
indicated by the different type constructor indentifiers in the
abstract syntax tree representation.
-
February 24, 2009
-
Section 5.4:
Θ(〈TVE,TCE,DCE,VE〉) = cod(TCE) ⇒
Θ(〈TVE,TCE,DCE,VE〉) =
{ θ(k) | tyconid ∈ dom(TCE) and TCE(tyconid) = θ(k)}
- February 23, 2009
-
Section 9: Added discussion of unit tests and error messages.
- February 16, 2009
-
Section 3:
the datatype declaration at line 1 will introduce a different
type constructor
⇒
the datatype declaration at line 3 will introduce a different
type constructor
- February 13, 2009
-
-
Made redundancy and exhaustiveness checking extra credit.
- Fixed typing rule for let to handle sequenced expressions
in body.
- For a program with multiple type errors, require at least one
error message, but not necessarily more than one.
- February 12, 2009
- Original version