CMSC 22610
Implementation of Computer Languages - I
Winter 2009



LangF Type Checker


Due: February 27, 2009
 

1  Introduction

The third project is to implement a type checker for LangF, which checks whether or not a parse tree is statically correct and produces a typed abstract syntax tree (AST). The abstract syntax tree includes information about the binding sites of identifiers and about the types of variables and expressions. The project seed code will provide hand-written and ML-ULex based scanners (but you may also use your hand-written scanner from Projects 1 and 2), ML-Antlr and ML-Yacc based parsers (but you may also use your parser specification from Project 2), and modules for implementing the abstract-syntax-tree representation. The bulk of this document is a formal specification of the typing rules for LangF.1 The type system for LangF is essentially an enrichment of the System F type system.

2  LangF Syntactic Restrictions

There are a number of syntactic restrictions that should be enforced by the type checker. Although these restrictions could be specified as part of the typing rules below, it is easier to specify them separately.

3  LangF Types

In the LangF typing rules, we distinguish between syntactic types as they appear in the program text (or parse-tree representation) and semantic types that are inferred for various syntactic forms. To understand why we make this distinction, consider the following LangF program:

datatype T = A {Integer} | B val x : T = A {1} datatype T = C {Integer} | D val y : T = B ; 0

There is a type error at line 4 in the declaration val y : T = B, because the type of the data constructor expression B is the type constructor corresponding to the datatype declaration at line 1, but the type constraint T is the type constructor corresponding to the datatype declaration at line 3. The datatype declaration at line 3 shadows the datatype declaration at line 1. However, in the parse-tree representation, all instances of T correspond to the same type constructor name (that is, as values of the ParseTree.TyVarName.t type, they all carry the same Atom.atom in the node field).


τ ∈ Type::=∀ α → τrtype function, α ∈ TyVar 
 |τa → τrfunction 
 |θ(k)1, …, τk)type constructor, θ ∈ TyCon 
 |αtype variable
Figure 1: LangF semantic types

The abstract syntax of LangF semantic types is given in Figure 1 (and represented by the AbsSynTree.Type.t datatype in the project seed code). A semantic type is formed from type variables, type constructors (including nullary and abstract type constructors like bool(0) and integer(0)), function types, and type-function types. (Note that a k-arity type constructor records its arity as a superscript.)

Although this syntax mirrors that of the syntactic types, there are some crucial differences. First, there will be distinct type variables (α) and type constructors (θ) for each binding occurence of a type variable name and type constructor name in the parse-tree representation. Hence, type checking the datatype declaration at line 1 will introduce one type constructor, say t1(0), and type checking the datatype declaration at line 3 will introduce a different type constructor, say t2(0), which are not equal. Second, we will consider semantic types equal upto renaming of type-function type variables. That is, we will consider the semantic types ∀ α → α → α → bool(0) and ∀ β → β → β → bool(0) to be equal, whereas the parse trees corresponding to ['a] -> 'a -> 'a -> bool and ['b] -> 'b -> 'b -> bool are not equal, because they use different type variable names.

4  Identifiers and Environments

The typing rules for LangF use a number of different environments to track binding information. There is a separate environment for each kind of identifier in the parse-tree repesentation:

TVETyVarEnv = TyVarId ⇀ TyVartype-variable environment 
TCETyConEnv = TyConId ⇀ TyCon ∪ (TyVar* × Type)type-constructor environment 
DCEDaConEnv = DaConId ⇀ TyVar* × Type* × TyCondata-constructor environment 
VEVarEnv = VarId ⇀ Typevariable environment

where TyVarId is the set of syntactic type-variable identifiers ('a in the LangF concrete syntax, tyvarid in the LangF grammar, or ParseTree.TyVarName.t in the project seed code), TyConId is the set of syntactic type-constructor identifiers (T, tyconid, or ParseTree.TyConName.t), DaConId is the set of syntactic data-constructor identifiers (A, daconid, or ParseTree.DaConName.t), VarId is the set of syntactic variable identifiers (x, varid, or ParseTree.VarName.t), Type is the set of semantic types (τ in Figure 1 or AbsSynTree.Type.t in the project seed code), TyVar is the set of semantic type variables (α or AbsSynTree.TyVar.t), and TyCon is the set of semantic type constructors (θ or AbsSynTree.TyCon.t).

We define the extension of an environment E by another environment E′ as follows:

(E ⊕ E′)(x)=


E′(x)if xdom(E′) 
E(x)if xdom(E′) 

Since each of the environments has a different domain, it is convenient to consider a combined environment:

EEnv = TyVarEnv × TyConEnv × DaConEnv × VarEnvcombined environment

For combined environments E = ⟨TVE,TCE,DCE,VE⟩ and E′ = ⟨TVE′,TCE′,DCE′,VE′⟩, we define lookup by lookup in the environment appropriate to the identifier:

  E(x)=
  



      TVE(x)if xTyVarId 
      TCE(x)if xTyConId 
      DCE(x)if xDaConId 
      VE(x)if xVarId

and define extension by extension of each environment:

  E ⊕ E=  ⟨ TVE ⊕ TVE′, TCE ⊕ TCE′, DCE ⊕ DCE′, VE ⊕ VE′ ⟩

5  LangF Typing Rules

The typing rules for LangF provide a specification for the static correctness of LangF programs. The general form of a judgement, as used in the LangF typing rules, is

Context ⊢   Term   ⇒ Descr

which can be read as “in Context, Term has Descr.” The context is usually an environment, but may include other information, while the description is usually a semantic type and/or an (extended) environment. The judgement forms used in the typing rules for LangF are given in Figure 2.

The typing rules for LangF are syntax directed, which means that there is a typing rule for each (major) syntactic form in the parse-tree representation of LangF programs. In the following, a typing rule is labelled by the SML data constructor in the ParseTree module that corresponds to the syntactic form handled by the typing rule.


E ⊢   Type   ⇒ τtype checking a type 
E ⊢   Param   ⇒ E′; Ttype checking a parameter 
E; τ ⊢   SimplePat   ⇒ Etype checking a simple pattern 
E; τ ⊢   Pat   ⇒ Etype checking a pattern 
E ⊢   Exp   ⇒ τtype checking an expression 
E; τ ⊢   MatchRule   ⇒ τ′type checking a match rule 
E ⊢   Decl   ⇒ Etype checking a declaration 
E;⟨α1,…,αn⟩;θ(n) ⊢   DaConDecl   ⇒ Etype checking a data constructor declaration 
⊢   Prog   ⇒ ✓type checking a program
Figure 2: LangF judgement forms

5.1  Types

The typing rules for types check for well-formedness and translate the syntactic types to semantic types. The typing rules for types use a judgement of the form

E ⊢   Type   ⇒ τ

which can be read as “in the environment E, the syntactic type Type is well-formed and translates to the semantic type τ.”

Type checking a type-function type requires introducing a new semantic type variable (α) for the syntactic type variable identifier (tyvarid) when checking the result type.

E ⊕ { tyvarid ↦ α  |  α  fresh } ⊢   Typer   ⇒ τr  T_TyFn
E ⊢   [ tyvarid ] -> Typer   ⇒ ∀ α → τr  

Type checking a function type requires checking the argument type and the result type.

E ⊢   Typea   ⇒ τa           E ⊢   Typer   ⇒ τr  T_Fn
E ⊢   Typea -> Typer   ⇒ τa → τr  

There are two rules for type checking a type-constructor application, depending on whether the type constructor identifier corresponds to a type definition or a datatype definition (or an abstract type). For type definitions, we check the actual (syntactic) type arguments and then substitute the actual (semantic) type arguments for the formal type parameters to produce a new (semantic) type.

tyconid ∈ dom(E)           E(tyconid) = (⟨ α1,…,αn ⟩, τ)                    E ⊢   Type1   ⇒ τ1           …           E ⊢   Typen   ⇒ τn  T_TyCon(type)
E ⊢   tyconid [ Type1 , … , Typen ]   ⇒ τ[τ11,…,τnn]  

For datatype definitions (or abstract types), we check the type arguments and then construct a new (semantic) type.

tyconid ∈ dom(E)           E(tyconid) = θ(n)                    E ⊢   Type1   ⇒ τ1           …           E ⊢   Typen   ⇒ τn  T_TyCon(datatype)
E ⊢   tyconid [ Type1 , … , Typen ]   ⇒ θ(n)1,…,τn)  

Type checking a type variable identifier returns its semantic type variable, as recorded in the environment.

tyvarid ∈ dom(E)           E(tyvarid) = α  T_TyVar
E ⊢   tyvarid   ⇒ α  

5.2  Parameters

Type checking a parameter returns a new environment, which includes a binding for the type variable identifier or variable identifier in the parameter, and a meta-function that constructs a (semantic) type when given the (semantic) type of the expression over which the parameter abstracts. Note that the output environment only includes bindings for the identifiers appearing in the parameter; the output environment is not an extension of the input environment.

Type checking a variable parameter requires type checking the declared type and returns an environment with a single binding and a meta-function that creates a (semantic) function type.

E ⊢   Type   ⇒ τ           E′ = { varid ↦ τ }           T′(x) = τ → x  P_VarName
E ⊢   ( varid : Type )   ⇒ E′; T 

Type checking a type variable parameter requires introducing a new semantic type variable (α) and returns an environment with a single binding and a meta-function that creates a (semantic) type-function type.

E′ = { tyvarid ↦ α  |  α  fresh }           T′(x) = ∀ α → x  P_TyVarName
E ⊢   [ tyvarid ]   ⇒ E′; T 

5.2.1  Multiple Parameters

Type checking a list of parameters requires type checking each paramter in turn. Since type variable identifiers in earlier parameters are bound in later parameters, each parameter is type checked in an environment extended with output environments of each earlier parameter. The final output environment is the extension of all the individual output environments, while the final meta-function is the composition of all the individual meta-functions.

n ≥ 0           E ⊢   Param1   ⇒ E1T1                              …            E ⊕ E1 ⊕ … ⊕ En−1 ⊢   Paramn   ⇒ EnTn                    E′ = E1 ⊕ … ⊕ En−1 ⊕ En           T′ = T1 ∘ … ∘ Tn  Params
E ⊢   Param1 … Paramn   ⇒ E′; T 

5.3  Simple Patterns and Patterns

Type checking a pattern is similar to type checking a parameter, in that it returns a new environment, which includes a binding for any variable identifier in the pattern. Again, the output environment only includes bindings for the identifiers appearing in the pattern; the output environment is not an extension of the input environment. The typing rules for patterns use judgements of the form:

E; τ ⊢   SimplePat   ⇒ E′        E; τ ⊢   Pat   ⇒ E

where τ is the (semantic) type of values matched by the pattern.

5.3.1  Simple Patterns

Type checking a variable identifier simple pattern returns an environment with a single binding (of the variable to the input (semantic) type).

E′ = { varid ↦ τ }  P_VarName
E; τ ⊢   varid   ⇒ E 

Type checking a wildcard simple pattern returns an empty environment.

E′ = { }  P_Wild
E; τ ⊢   _   ⇒ E 

5.3.2  Patterns

Type checking a data constructor pattern requires checking a number of things. First, the (semantic) type of values being matched by the pattern must be a type-constructor type (θ(n)1′,…,τn′)). Second, the data constructor identifier is looked up in the environment, returning its semantic type constructor (θ(n), which must match the type of values being matched by the pattern), the formal types of its arguments (⟨τ1,…,τm⟩), and the formal type variables (⟨α1,…,αn⟩) over which the type constructor and the formal types of its arguments is abstracted. Third, each of the actual type arguments is type checked (E ⊢   Typei   ⇒ τi′). Fourth, each of the constituent simple patterns is type checked with an input type equal to the substitution of the actual type arguments for the formal type variables in the formal type of the corresponding data constructor argument (E; τi11′,…,αnn′] ⊢   SimplePati   ⇒ Ei). The final output environment is the extension of all the individual output environments of the constituent simple patterns.

τ = θ(n)1′,…,τn′)           daconid ∈ dom(E)           E(daconid) = (⟨α1,…,αn⟩,⟨τ1,…,τm⟩,θ(n))                    E ⊢   Type1   ⇒ τ1′           …           E ⊢   Typen   ⇒ τn′                    E; τ111′,…,αnn′] ⊢   SimplePat1   ⇒ E1           …           E; τm11′,…,αnn′] ⊢   SimplePatm   ⇒ Em                     E′ = E1 ⊕ … ⊕ Em  P_DaCon
E; τ ⊢   daconid [ Type1 , … , Typen ] { SimplePat1 , … , SimplePatm }   ⇒ E 

Note that in the typing rule for type checking a simple pattern, the premise judgement corresponds to the rules given in Section 5.3.1, while the conclusion judgement corresponds to the rules given in this section.

E; τ ⊢   SimplePat   ⇒ E P_SimplePat
E; τ ⊢   SimplePat   ⇒ E 

5.4  Expressions

The typing rules for expressions use a judgement of the form

E ⊢   Exp   ⇒ τ

which can be read as “in the environment E, the expression Exp has the semantic type τ.”

Type checking an anonymous function requires type checking the parameters and then type checking the function body in an environment extended with the bindings of the parameters. The anonymous function will have either a type-function type or a function type, as constructed by the meta-function from the type checking of the parameters.

E ⊢   Params   ⇒ E′; T           E ⊕ E′ ⊢   Exp   ⇒ τ  E_Fn
E ⊢   fn Params => Exp   ⇒ T(τ)  

Type checking an if expression requires checking that the condition expression has the boolean type and that the then expression and the else expression have the same type.

E ⊢   Expc   ⇒ bool(0)           E ⊢   Expt   ⇒ τ           E ⊢   Expe   ⇒ τ  E_If
E ⊢   if Expi then Expt else Expe   ⇒ τ  

Type checking an orelse or andalso expression requires that both operands have the boolean type and expression itself has the boolean type.

E ⊢   Expl   ⇒ bool(0)           E ⊢   Expr   ⇒ bool(0)  E_Orelse
E ⊢   Expl orelse Expr   ⇒ bool(0)  
E ⊢   Expl   ⇒ bool(0)           E ⊢   Expr   ⇒ bool(0)  E_Andalso
E ⊢   Expl andalso Expr   ⇒ bool(0)  

Type checking a constraint expression requires checking that the expression and the type constraint have the same (sematic) type.

E ⊢   Exp   ⇒ τ           E ⊢   Type   ⇒ τ  E_Constraint
E ⊢   Exp : Type   ⇒ τ  

Type checking the binary and unary operations requires checking that the operands have the types appropriate for the operation.

op ∈ {==,<>,<,<=,>,>=}           E ⊢   Expl   ⇒ integer(0)           E ⊢   Expr   ⇒ integer(0)  E_BinOp(CmpOp)
E ⊢   Expl op Expr   ⇒ bool(0)  
op ∈ {^}           E ⊢   Expl   ⇒ string(0)           E ⊢   Expr   ⇒ string(0)  E_BinOp(ConcatOp)
E ⊢   Expl op Expr   ⇒ string(0)  
op ∈ {+,-,*,/,%}           E ⊢   Expl   ⇒ integer(0)           E ⊢   Expr   ⇒ integer(0)  E_BinOp(ArithOp)
E ⊢   Expl op Expr   ⇒ integer(0)  
op ∈ {~}           E ⊢   Exp   ⇒ integer(0)  E_UnOp(ArithOp)
E ⊢   op Exp   ⇒ integer(0)  

Type checking a data constructor expression requires checking a number of things. First, the data constructor identifier is looked up in the environment, returning its semantic type constructor (θ(n), the formal types of its arguments (⟨τ1,…,τm⟩), and the formal type variables (⟨α1,…,αn⟩) over which the type constructor and the formal types of its arguments is abstracted. Second, each of the actual type arguments is type checked (E ⊢   Typei   ⇒ τi′). Third, each of the actual expression arguments is type checked, and must return a type equal to the substitution of the actual type arguments for the formal type variables in the formal type of the corresponding data constructor argument (E; τi11′,…,αnn′] ⊢   SimplePati   ⇒ Ei). The final output type is the type constructor applied to the actual type arguments.

daconid ∈ dom(E)           E(daconid) = (⟨α1,…,αn⟩,⟨τ1,…,τm⟩,θ(n))           E ⊢   Type1   ⇒ τ1′           …           E ⊢   Typen   ⇒ τn′           E ⊢   Exp1   ⇒ τ111′,…,αnn′]           …           E ⊢   Expm   ⇒ τm11′,…,αnn′]  E_DaCon
E ⊢   daconid [ Type1 , … , Typen ] { Exp1 , … , Expm }   ⇒ θ(n)1′,…,τn′)  

There are two rules for type checking an application, depending on the kind of the apply argument. For an expression apply argument, we check that the function expression has a function type and that the argument expression has the argument type.

E ⊢   Expf   ⇒ τa → τr           E ⊢   Expa   ⇒ τa  E_Apply(A_Exp)
E ⊢   Expf Expa   ⇒ τr  

For a type apply argument, we check that the function expression has a type-function type and that the argument type is well-formed. The type of the application expression is the subtitution of the (semantic) type argument for the abstracted type variable in the result type.

E ⊢   Expf   ⇒ ∀ α → τr           E ⊢   Typea   ⇒ τa  E_Apply(A_Type)
E ⊢   Expf [ Typea ]   ⇒ τra/α]  

Type checking a variable identifier returns its semantic type, as recorded in the environment.

varid ∈ dom(E)           E(varid) = τ  E_VarName
E ⊢   varid   ⇒ τ  

Type checking an integer or string constant returns the obvious type.

  E_Integer
E ⊢   integer   ⇒ integer(0)  
  E_String
E ⊢   string   ⇒ string(0)  

Type checking a sequence expression requires type checking each expression, but only returns the type of the final expression.

n ≥ 0           E ⊢   Exp1   ⇒ τ1           …           E ⊢   Expn   ⇒ τn  E_Seq
E ⊢   ( Exp1 ; … ; Expn )   ⇒ τn  

Type checking a let expression requires checking the declarations and then type checking the body expression in an environment extended with the bindings of the declarations.

E ⊢   Decls   ⇒ E′                    n ≥ 0           E ⊕ E′ ⊢   Exp1   ⇒ τ1           …           E ⊕ E′ ⊢   Expn   ⇒ τn           Θ(τn) ⊆ Θ(E)  E_Let
E ⊢   let Decls in Exp1 ; … ; Expn end   ⇒ τ  

The Θ(τn) ⊆ Θ(E) premise checks that every (semantic) type constructor in τn is available in the environment E. We define Θ(τ) and Θ(E) as follows:

Θ(∀ α → τa)=Θ(τa
Θ(τa → τr)=Θ(τa) ∪ Θ(τr
Θ(θ(k)1,…,τk)=(k)} ∪ Θ(τ1) ∪ … ∪ Θ(τk
Θ(α)={} 
 
Θ(⟨TVE,TCE,DCE,VE⟩)={ θ(k)  |  tyconid ∈ dom(TCE)  and  TCE(tyconid) = θ(k)}

This prevents type constructors introduced by datatype declarations in Decls from appearing in the type of Exp. For example, the following LangF program does not type check:

(* Type constructor escapes the scope of its definition. *) let datatype T = A {Integer} | B {String} fun f (x: Integer) : T = A {x} in f end

On the other hand, the following LangF program does type check:

let type T = Integer fun f (x: Integer) : T = Integer in f end

because the type declaration does not introduce a fresh (semantic) type constructor.

Type checking a case expression requires type checking the scruitinee and checking the match rules against the type of the scruitinee.

E ⊢   Exp   ⇒ τ           E; τ ⊢   MatchRules   ⇒ τ′  E_Case
E ⊢   case Exp of MatchRules end   ⇒ τ′  

5.5  MatchRules

Type checking a match rule requires checking the right-hand-side expression in the environment extended with the bindings from the left-hand-side pattern.

E; τ ⊢   Pat   ⇒ E′           E ⊕ E′ ⊢   Exp   ⇒ τ′  MatchRule
E; τ ⊢   Pat => Exp   ⇒ τ′  

5.5.1  Multiple Match Rules

Type checking a list of match rules requires that every match rule has the same result type.

n ≥ 0           E; τ ⊢   MatchRule1   ⇒ τ           …           E; τ ⊢   MatchRulen   ⇒ τ  MatchRules
E; τ ⊢   MatchRule1 | … | MatchRulen   ⇒ τ′  

5.6  Declarations

The typing rules for declarations use a judgement of the form

E ⊢   Decl   ⇒ E

which can be read as “in the environment E, the declaration Decl returns the environment E′.” The output environment only includes bindings for the identifiers defined in the declaration; the output environment is not an extension of the input environment.

5.6.1  Type Declarations

Type checking a type declaration requires introducing new semantic type variables for the syntactic type variable identifiers when checking the right-hand-side type. The final environment is an environment with a binding for the (syntactic) type constructor identifier that maps to the (semantic) type variables (acting as formal type parameters) and the (semantic) type.

Etv = { tyvaridi ↦ αi  |  1 ≤ i ≤ n  and  αi  fresh }           E ⊕ Etv ⊢   Type   ⇒ τ           E′ = { tyconid ↦ (⟨ α1,…,αn⟩, τ) }  D_Type
E ⊢   type tyconid [ tyvarid1 , … , tyvaridn ] = Type   ⇒ E 

The typing rule for a datatype declaration is somewhat complicated. A new semantic type constructor (θi) is introduced for each syntactic type constructor identifier (tyconidi). Furthermore, a new semantic type variable (αi,j) is introduced for each syntactic type variable identifier (tyvaridi,j). Each of the data constructor declarations is type checked in an environment that includes all of the type constructors (Etc) and only the appropriate type variables (Etvi). Note that type checking a data constructor declaration takes the type variables and the type constructor as part of the context; it returns an environment of bindings for the data constructor identifiers defined by the data constructor declaration. The final output environment includes all of the type constructors and all of the data constructors (but none of the type variables).

n ≥ 0           Etc = { tyconidi ↦ θ(mi)  |  1 ≤ i ≤ n  and  θi  fresh }           Etv1 = { tyvaridi,1 ↦ αi,1  |  1 ≤ i ≤ m1  and  αi,1  fresh }           E ⊕ Etc ⊕ Etv1; (⟨ α1,1,…,α1,m1 ⟩, θ(m1)) ⊢   DaConDecls1   ⇒ EDC1           …           Etvn = { tyvaridi,n ↦ αi,n  |  1 ≤ i ≤ mn  and  αi,n  fresh }           E ⊕ Etc ⊕ Etvn; (⟨ αn,1,…,αn,mn ⟩, θ(mn)) ⊢   DaConDeclsn   ⇒ EDCn           E′ = Etc ⊕ EDC1 ⊕ … ⊕ EDCn  D_Datatype
E ⊢   
datatype tyconid1 [ tyvarid1,1 , … , tyvarid1,m1 ] = DaConDecls1
… 
and tyconidn [ tyvaridn,1 , … , tyvaridn,mn ] = DaConDeclsn
   ⇒ E
 

5.6.2  Data Constructor Declarations

Type checking a single data constructor declaration requires type checking each of its declared argument types. The result environment binds the data constructor identifier to its type constructor (θ(n)), the formal types of its arguments (⟨τ1,…,τm⟩), and the formal type variables (⟨α1,…,αn⟩) over which the type constructor and the formal types of its arguments is abstracted. The type constructor and the formal type variables are provided by the context.

E ⊢   Type1   ⇒ τ1           …           E ⊢   Typem   ⇒ τm                    E′ = { daconid ↦ (⟨ α1,…,αn ⟩, ⟨ τ1,…,τm ⟩, θ(n)) }  DaConDecl
E; ⟨ α1,…,αn ⟩; θ(n) ⊢   daconid { Type1 , … , Typem }   ⇒ E 

Type checking a list of data constructor declarations requires type checking each data constructor declaration. The final output environment is the extension of all the individual output environments.

m ≥ 0           E; ⟨ α1,…,αn ⟩; θ(n) ⊢   DaConDecl1   ⇒ EDC1           …           E; ⟨ α1,…,αn ⟩; θ(n) ⊢   DaConDeclm   ⇒ EDCm                    E′ = EDC1 ⊕ … ⊕ EDCm  DaConDecls
E; ⟨ α1,…,αn ⟩; θ(n) ⊢   DaConDecl1 | … | DaConDeclsm   ⇒ E 

5.6.3  Value Declarations

Type checking a val declaration requires type checking the expression and type checking the pattern, which returns the bindings to be returned by the declaration. If there is a type assertion, then it must return the same (semantic) type as the expression.

E ⊢   Exp   ⇒ τ           (E ⊢   Type   ⇒ τ)?           E; τ ⊢   SimplePat   ⇒ E′            D_Val
E ⊢   val SimplePat (: Type)? = Exp   ⇒ E 

Type checking a fun declaration requires constructing an environment that assigns a type to each of the function variable identifiers in the declaration. First, each of the parameter lists is type checked (E ⊢   Paramsi   ⇒ EPi), returning an environment and a meta-function. The environment is used to type check the return type (EEPi ⊢   Typei   ⇒ τi) and the meta-function is used to construct the type of the function (Ti)). Each of the function bodies is type checked in an environment extended with the appropriate parameters and the function variable identifiers (EE′ ⊕ EPi ⊢   Expi   ⇒ τi); note that each function body must have its declared return type.

n ≥ 0           E ⊢   Params1   ⇒ EP1T1           E ⊕ EP1 ⊢   Type1   ⇒ τ1                    …           E ⊢   Paramsn   ⇒ EPnTn           E ⊕ EPn ⊢   Typen   ⇒ τn                    E′ = { varidi ↦ Tii)  |  1 ≤ i ≤ n }           E ⊕ E′ ⊕ EP1 ⊢   Exp1   ⇒ τ1           …           E ⊕ E′  ⊕ EPn ⊢   Expn   ⇒ τn  D_Fun
E ⊢   
fun varid1 Params1 : Type1 = Exp1
… 
and varidn Paramsn : Typen = Expn
   ⇒ E
 

5.6.4  Multiple Declarations

Type checking a list of declarations requires type checking each declaration in turn. Since identifiers in earlier declarations are bound in later declarations, each declaration is type checked in an environment extended with output environments of each earlier declaration. The final output environment is the extension of all the individual output environments.

E ⊢   Decl1   ⇒ E1           …           E ⊕ E1 ⊕ … ⊕ En−1 ⊢   Decln   ⇒ En                    E′ = E1 ⊕ … ⊕ En−1 ⊕ En  Decls
E ⊢   Decl1 … Decln   ⇒ E 

5.7  Programs and the Initial Environment

The typing rule for programs use a judgement of the form

 ⊢   Prog   ⇒ ✓

which can be read as “the program Prog is statically correct.”

Type checking a program is similar to type checking a let expression; it requires type checking the declarations and then checking the expression in an environment extended with the bindings of the declarations.

E0 ⊢   Decls   ⇒ E′           E0 ⊕ E′ ⊢   Exp   ⇒ τ  Prog
⊢   Decls ; Exp   ⇒ ✓  

The declarations and expression are type checked in the context of an initial environment E0 that provides predefined type constructors, data constructors, and variables. This initial environment is defined as follows:

E0=TVE0,TCE0,DCE0,VE0⟩ 
 
 
TVE0={ } 
 
TCE0=




Boolbool(0) 
Integerinteger(0) 
Stringstring(0) 
Unitunit(0)




 
DCE0=




False(⟨⟩,⟨⟩,bool(0)
True(⟨⟩,⟨⟩,bool(0)
Unit(⟨⟩,⟨⟩,unit(0))




 
VE0=








argcunit(0) → integer(0) 
arginteger(0) → string(0) 
fail∀ α → string(0) → α 
printstring(0) → unit(0) 
sizestring(0) → integer(0) 
substring(0) → integer(0) → integer(0)








6  Conversion to Abstract Syntax Tree

In addition to checking that a program is statically correct, a type checker must produce a program representation that can be used by the rest of the compiler for optimization and code generation. In the LangF compiler, the type checker will produce a typed abstract syntax tree (the structure AbsSynTree : ABS_SYN_TREE module in the project seed code). The typed abstract syntax tree (AST) representation is very close to the parse tree (PT) representation, but with some crucial differences:

We have already introduced one form in the abstract syntax tree representation: the semantic types from Section 3. Similarly, we have already introduced one judgement for translating a parse tree representation form into an abstract syntax tree representation form: the judgement for type checking types from Section 5.1:

E ⊢   Type   ⇒ τ
in the environment E, the parse tree type Type is well-formed and translates to the abstract syntax tree type τ.

We can imagine other judgements that combine type checking with translation to the abstract syntax tree:

E ⊢   Exp   ⇒ τ; e
in the environment E, the parse tree expression Exp has the abstract syntax tree type τ and translates to the abstract syntax tree expression e.
 
 
E ⊢   Decl   ⇒ E′; d
in the environment E, the parse tree declaration Decl returns the environment E′ and translates to the abstract syntax tree declaration d.
 
 
⊢   Prog   ⇒ p
the parse tree program Prog is statically correct and translates to the abstract syntax tree program p.

The inference rules for these judgements will be very similar to those given in Section 5, except they will construct an appropriate output abstract syntax tree form.

7  Requirements

You should implement a type checker for LangF that enforces the type system from Section 5 and produces a typed abstract syntax tree. Your implementation should include (at least) the following modules:

structure Environment : ENVIRONMENT structure TypeChecker : TYPE_CHECKER

The ENVIRONMENT signature in the project seed code is as follows:

signature ENVIRONMENT = sig (* The combined environment is composed of a type variable * environment, a type constructor environment, a data constructor * environment, and a variable environment. Each individual * environment has its own domain and co-domain. * * For a simple binding analysis, we can take the co-domain to be * 'unit', since we only need to know whether or not the identifier * is in the environment. * * For type checking (without producing an abstract syntax tree), * you will require co-domains similar to those in the project * description. * * For type checking and producing an abstract syntax tree, * you will require additional components in the co-domains. *) structure TyVarEnv : sig type dom = ParseTree.TyVarName.t type cod = unit end structure TyConEnv : sig type dom = ParseTree.TyConName.t type cod = unit end structure DaConEnv : sig type dom = ParseTree.DaConName.t type cod = unit end structure VarEnv : sig type dom = ParseTree.VarName.t type cod = unit end type t
(* The empty environment {}. *) val empty : t (* Create an environment with a single TyVar entry. *) val singletonTyVar : TyVarEnv.dom * TyVarEnv.cod -> t (* Lookup a TyVar in the environment. *) val lookupTyVar : t * TyVarEnv.dom -> TyVarEnv.cod option (* Create an environment with a single TyCon entry. *) val singletonTyCon : TyConEnv.dom * TyConEnv.cod -> t (* Lookup a TyCon in the environment. *) val lookupTyCon : t * TyConEnv.dom -> TyConEnv.cod option (* Create an environment with a single DaCon entry. *) val singletonDaCon : DaConEnv.dom * DaConEnv.cod -> t (* Lookup a DaCon in the environment. *) val lookupDaCon : t * DaConEnv.dom -> DaConEnv.cod option (* Create an environment with a single Var entry. *) val singletonVar : VarEnv.dom * VarEnv.cod -> t (* Lookup a Var in the environment. *) val lookupVar : t * VarEnv.dom -> VarEnv.cod option (* Implements E1 (+) E2. *) val extend : t * t -> t (* Implements \Theta(E). *) val tycons : t -> AbsSynTree.TyCon.Set.set (* The initial environment E_0. *) val initial: t

You will need to extend the endENVIRONMENT signature (and the Environment structure) with new co-domain types as required by your type-checker implementation.

The TYPE_CHECKER signature is as follows:

signature TYPE_CHECKER = sig val typeCheck : ErrorStream.t * ParseTree.Prog.t -> AbsSynTree.Prog.t option

The endstructure ParseTree : PARSE_TREE and structure AbsSynTree : ABS_SYN_TREE modules are provided in the seed code; the PARSE_TREE signature implementation is at langfc-src/parse-tree/parse-tree.sig; the ParseTree structure implementation is at langfc-src/parse-tree/parse-tree.sml; the ABS_SYN_TREE signature implementation is at langfc-src/abs-syn-tree/abs-syn-tree.sig; and the AbsSynTree structure implementation is at langfc-src/abs-syn-tree/abs-syn-tree.sml.

7.1  Errors

To support error reporting, the TypeChecker.typeCheck function takes an argument of the type ErrorStream.t. The ErrorStreamERROR_STREAM module is provided in the seed code and provides a common error reporting utility in the LangF compiler. (The module was used implicitly in Projects 1 and 2.) The ERROR_STREAM signature implementation is at langfc-src/common/error-stream.sig; the ErrorStream structure implementation is at langfc-src/common/error-stream.sml.

Your type checker should report reasonable error messages. You should report violations of the syntactic restrictions of Section 2, unbound identifiers, and type errors. There is a lot of room for creativity and style in reporting errors. For a program with multiple type errors, you are required to report at least one error message, but need not report more than one error message.

8  GForge and Submission

Sources for Project 3 have been (or will shortly be) committed to your repository in the project3 sub-directory. You will need to update your local copy, by running the command:

svn update

from the cnetid-proj directory.

We will collect projects from the SVN repositories at 10pm on Friday, February 27; make sure that you have committed your final version before then. To do so, run the command:

svn commit

from the cnetid-proj directory.

9  Hints

Document history

February 24, 2009
 
Section 5.4: Θ(⟨TVE,TCE,DCE,VE⟩) = cod(TCE) ⇒
Θ(⟨TVE,TCE,DCE,VE⟩) = { θ(k)  |  tyconiddom(TCE)  and  TCE(tyconid) = θ(k)}
February 23, 2009
 
Section 9: Added discussion of unit tests and error messages.
February 16, 2009
 
Section 3: the datatype declaration at line 1 will introduce a different type constructor ⇒
the datatype declaration at line 3 will introduce a different type constructor
February 13, 2009
 
  • Made redundancy and exhaustiveness checking extra credit.
  • Fixed typing rule for let to handle sequenced expressions in body.
  • For a program with multiple type errors, require at least one error message, but not necessarily more than one.
February 12, 2009
Original version

1
Remember, a specification is a description of a property (yes/no question; true/false statement). It does not define (though it may suggest) an implementation for deciding whether or not the property holds. A significant component of this project is to develop the skills needed to produce an impementation from a specification.

Last modified: Tue Feb 24 09:59:23 CST 2009