CMSC 22610 :: Implementation of Computer Languages - I :: Winter 2009

LangF Type Checker

Due: February 27, 2009

1 Introduction

The third project is to implement a type checker for LangF, which checks whether or not a parse tree is statically correct and produces a typed abstract syntax tree (AST). The abstract syntax tree includes information about the binding sites of identifiers and about the types of variables and expressions. The project seed code will provide hand-written and ML-ULex based scanners (but you may also use your hand-written scanner from Projects 1 and 2), ML-Antlr and ML-Yacc based parsers (but you may also use your parser specification from Project 2), and modules for implementing the abstract-syntax-tree representation. The bulk of this document is a formal specification of the typing rules for LangF.¹ The type system for LangF is essentially an enrichment of the System F type system.

2 LangF Syntactic Restrictions

There are a number of syntactic restrictions that should be enforced by the type checker. Although these restrictions could be specified as part of the typing rules below, it is easier to specify them separately.

The type variable identifiers in the type parameters of a type declaration must be distinct.
The type variable identifiers in the type parameters of each DataDecl of a datatype declaration must be distinct.
The type constructor identifiers in a datatype declaration must be distinct.
The data constructor identifiers in a datatype declaration must be distinct. (On the other hand, a datatype declaration may introduce the same constructor name for both a type constructor and a data constructor. For example, datatype Unit = Unit.)
The variable identifiers denoting function variable identifiers in a fun declaration must be distinct.
The variable identifiers in a pattern must be distinct.
Integer literals must be in the range −2³⁰…2³⁰−1 (i.e., representable as a 31-bit 2's-complement integer).
Extra credit: The patterns in a case expression must be irredundant and exhaustive. Consider the match rules Pat₁ => Exp₁ | … | Pat₁ => Exp_n. For the patterns to be irredundant, each Pat_j must match some value (of the right type) that is not matched by Pat_i for any i < j. For the patterns to be exhaustive, every value (of the right type) must be matched by some Pat_i.

3 LangF Types

In the LangF typing rules, we distinguish between syntactic types as they appear in the program text (or parse-tree representation) and semantic types that are inferred for various syntactic forms. To understand why we make this distinction, consider the following LangF program:

datatype T = A {Integer} | B val x : T = A {1} datatype T = C {Integer} | D val y : T = B ; 0

There is a type error at line 4 in the declaration val y : T = B, because the type of the data constructor expression B is the type constructor corresponding to the datatype declaration at line 1, but the type constraint T is the type constructor corresponding to the datatype declaration at line 3. The datatype declaration at line 3 shadows the datatype declaration at line 1. However, in the parse-tree representation, all instances of T correspond to the same type constructor name (that is, as values of the ParseTree.TyVarName.t type, they all carry the same Atom.atom in the node field).

τ ∈ Type ::= ∀ α → τ_r type function, α ∈ TyVar

| τ_a → τ_r function

| θ^(k)(τ₁, …, τ_k) type constructor, θ ∈ TyCon

| α type variable

Figure 1: LangF semantic types

The abstract syntax of LangF semantic types is given in Figure 1 (and represented by the AbsSynTree.Type.t datatype in the project seed code). A semantic type is formed from type variables, type constructors (including nullary and abstract type constructors like bool⁽⁰⁾ and integer⁽⁰⁾), function types, and type-function types. (Note that a k-arity type constructor records its arity as a superscript.)

Although this syntax mirrors that of the syntactic types, there are some crucial differences. First, there will be distinct type variables (α) and type constructors (θ) for each binding occurence of a type variable name and type constructor name in the parse-tree representation. Hence, type checking the datatype declaration at line 1 will introduce one type constructor, say t₁⁽⁰⁾, and type checking the datatype declaration at line 3 will introduce a different type constructor, say t₂⁽⁰⁾, which are not equal. Second, we will consider semantic types equal upto renaming of type-function type variables. That is, we will consider the semantic types ∀ α → α → α → bool⁽⁰⁾ and ∀ β → β → β → bool⁽⁰⁾ to be equal, whereas the parse trees corresponding to ['a] -> 'a -> 'a -> bool and ['b] -> 'b -> 'b -> bool are not equal, because they use different type variable names.

4 Identifiers and Environments

The typing rules for LangF use a number of different environments to track binding information. There is a separate environment for each kind of identifier in the parse-tree repesentation:

TVE	∈	TyVarEnv = TyVarId ⇀ TyVar	type-variable environment
TCE	∈	TyConEnv = TyConId ⇀ TyCon ∪ (TyVar^* × Type)	type-constructor environment
DCE	∈	DaConEnv = DaConId ⇀ TyVar^* × Type^* × TyCon	data-constructor environment
VE	∈	VarEnv = VarId ⇀ Type	variable environment

where TyVarId is the set of syntactic type-variable identifiers ('a in the LangF concrete syntax, tyvarid in the LangF grammar, or ParseTree.TyVarName.t in the project seed code), TyConId is the set of syntactic type-constructor identifiers (T, tyconid, or ParseTree.TyConName.t), DaConId is the set of syntactic data-constructor identifiers (A, daconid, or ParseTree.DaConName.t), VarId is the set of syntactic variable identifiers (x, varid, or ParseTree.VarName.t), Type is the set of semantic types (τ in Figure 1 or AbsSynTree.Type.t in the project seed code), TyVar is the set of semantic type variables (α or AbsSynTree.TyVar.t), and TyCon is the set of semantic type constructors (θ or AbsSynTree.TyCon.t).

We define the extension of an environment E by another environment E′ as follows:

(E ⊕ E′)(x)

⎧
⎨
⎩

E′(x)	if x ∈ dom(E′)
E(x)	if x ∉ dom(E′)

Since each of the environments has a different domain, it is convenient to consider a combined environment:

∈

Env = TyVarEnv × TyConEnv × DaConEnv × VarEnv

combined environment

For combined environments E = ⟨TVE,TCE,DCE,VE⟩ and E′ = ⟨TVE′,TCE′,DCE′,VE′⟩, we define lookup by lookup in the environment appropriate to the identifier:

E(x)

⎧
⎪
⎨
⎪
⎩

TVE(x)	if x ∈ TyVarId
TCE(x)	if x ∈ TyConId
DCE(x)	if x ∈ DaConId
VE(x)	if x ∈ VarId

and define extension by extension of each environment:

E ⊕ E′

⟨ TVE ⊕ TVE′, TCE ⊕ TCE′, DCE ⊕ DCE′, VE ⊕ VE′ ⟩

5 LangF Typing Rules

The typing rules for LangF provide a specification for the static correctness of LangF programs. The general form of a judgement, as used in the LangF typing rules, is

Context ⊢ Term ⇒ Descr

which can be read as “in Context, Term has Descr.” The context is usually an environment, but may include other information, while the description is usually a semantic type and/or an (extended) environment. The judgement forms used in the typing rules for LangF are given in Figure 2.

The typing rules for LangF are syntax directed, which means that there is a typing rule for each (major) syntactic form in the parse-tree representation of LangF programs. In the following, a typing rule is labelled by the SML data constructor in the ParseTree module that corresponds to the syntactic form handled by the typing rule.

E ⊢   Type   ⇒ τ type checking a type

E ⊢   Param   ⇒ E′; T type checking a parameter

E; τ ⊢   SimplePat   ⇒ E′ type checking a simple pattern

E; τ ⊢   Pat   ⇒ E′ type checking a pattern

E ⊢   Exp   ⇒ τ type checking an expression

E; τ ⊢   MatchRule   ⇒ τ′ type checking a match rule

E ⊢   Decl   ⇒ E′ type checking a declaration

E;⟨α₁,…,α_n⟩;θ⁽ⁿ⁾ ⊢   DaConDecl   ⇒ E′ type checking a data constructor declaration

⊢   Prog   ⇒ ✓ type checking a program

Figure 2: LangF judgement forms

5.1 Types

The typing rules for types check for well-formedness and translate the syntactic types to semantic types. The typing rules for types use a judgement of the form

E ⊢ Type ⇒ τ

which can be read as “in the environment E, the syntactic type Type is well-formed and translates to the semantic type τ.”

Type checking a type-function type requires introducing a new semantic type variable (α) for the syntactic type variable identifier (tyvarid) when checking the result type.

E ⊕ { tyvarid ↦ α \| α fresh } ⊢ Type_r ⇒ τ_r	T_TyFn

E ⊢ `[` tyvarid `]` `->` Type_r ⇒ ∀ α → τ_r

Type checking a function type requires checking the argument type and the result type.

E ⊢ Type_a ⇒ τ_a E ⊢ Type_r ⇒ τ_r	T_Fn

E ⊢ Type_a `->` Type_r ⇒ τ_a → τ_r

There are two rules for type checking a type-constructor application, depending on whether the type constructor identifier corresponds to a type definition or a datatype definition (or an abstract type). For type definitions, we check the actual (syntactic) type arguments and then substitute the actual (semantic) type arguments for the formal type parameters to produce a new (semantic) type.

tyconid ∈ dom(E) E(tyconid) = (⟨ α₁,…,α_n ⟩, τ) E ⊢ Type₁ ⇒ τ₁ … E ⊢ Type_n ⇒ τ_n	T_TyCon(`type`)

E ⊢ tyconid `[` Type₁ `,` … `,` Type_n `]` ⇒ τ[τ₁/α₁,…,τ_n/α_n]

For datatype definitions (or abstract types), we check the type arguments and then construct a new (semantic) type.

tyconid ∈ dom(E) E(tyconid) = θ⁽ⁿ⁾ E ⊢ Type₁ ⇒ τ₁ … E ⊢ Type_n ⇒ τ_n	T_TyCon(`datatype`)

E ⊢ tyconid `[` Type₁ `,` … `,` Type_n `]` ⇒ θ⁽ⁿ⁾(τ₁,…,τ_n)

Type checking a type variable identifier returns its semantic type variable, as recorded in the environment.

tyvarid ∈ dom(E) E(tyvarid) = α	T_TyVar

E ⊢ tyvarid ⇒ α

5.2 Parameters

Type checking a parameter returns a new environment, which includes a binding for the type variable identifier or variable identifier in the parameter, and a meta-function that constructs a (semantic) type when given the (semantic) type of the expression over which the parameter abstracts. Note that the output environment only includes bindings for the identifiers appearing in the parameter; the output environment is not an extension of the input environment.

Type checking a variable parameter requires type checking the declared type and returns an environment with a single binding and a meta-function that creates a (semantic) function type.

E ⊢ Type ⇒ τ E′ = { varid ↦ τ } T′(x) = τ → x	P_VarName

E ⊢ `(` varid `:` Type `)` ⇒ E′; T′

Type checking a type variable parameter requires introducing a new semantic type variable (α) and returns an environment with a single binding and a meta-function that creates a (semantic) type-function type.

E′ = { tyvarid ↦ α \| α fresh } T′(x) = ∀ α → x	P_TyVarName

E ⊢ `[` tyvarid `]` ⇒ E′; T′

5.2.1 Multiple Parameters

Type checking a list of parameters requires type checking each paramter in turn. Since type variable identifiers in earlier parameters are bound in later parameters, each parameter is type checked in an environment extended with output environments of each earlier parameter. The final output environment is the extension of all the individual output environments, while the final meta-function is the composition of all the individual meta-functions.

n ≥ 0 E ⊢ Param₁ ⇒ E₁; T₁ … E ⊕ E₁ ⊕ … ⊕ E_n−1 ⊢ Param_n ⇒ E_n; T_n E′ = E₁ ⊕ … ⊕ E_n−1 ⊕ E_n T′ = T₁ ∘ … ∘ T_n	Params

E ⊢ Param₁ … Param_n ⇒ E′; T′

5.3 Simple Patterns and Patterns

Type checking a pattern is similar to type checking a parameter, in that it returns a new environment, which includes a binding for any variable identifier in the pattern. Again, the output environment only includes bindings for the identifiers appearing in the pattern; the output environment is not an extension of the input environment. The typing rules for patterns use judgements of the form:

E; τ ⊢ SimplePat ⇒ E′ E; τ ⊢ Pat ⇒ E′

where τ is the (semantic) type of values matched by the pattern.

5.3.1 Simple Patterns

Type checking a variable identifier simple pattern returns an environment with a single binding (of the variable to the input (semantic) type).

E′ = { varid ↦ τ }	P_VarName

E; τ ⊢ varid ⇒ E′

Type checking a wildcard simple pattern returns an empty environment.

E′ = { }	P_Wild

E; τ ⊢ `_` ⇒ E′

5.3.2 Patterns

Type checking a data constructor pattern requires checking a number of things. First, the (semantic) type of values being matched by the pattern must be a type-constructor type (θ⁽ⁿ⁾(τ₁′,…,τ_n′)). Second, the data constructor identifier is looked up in the environment, returning its semantic type constructor (θ⁽ⁿ⁾, which must match the type of values being matched by the pattern), the formal types of its arguments (⟨τ₁,…,τ_m⟩), and the formal type variables (⟨α₁,…,α_n⟩) over which the type constructor and the formal types of its arguments is abstracted. Third, each of the actual type arguments is type checked (E ⊢ Type_i ⇒ τ_i′). Fourth, each of the constituent simple patterns is type checked with an input type equal to the substitution of the actual type arguments for the formal type variables in the formal type of the corresponding data constructor argument (E; τ_i[α₁/τ₁′,…,α_n/τ_n′] ⊢ SimplePat_i ⇒ E_i). The final output environment is the extension of all the individual output environments of the constituent simple patterns.

τ = θ⁽ⁿ⁾(τ₁′,…,τ_n′) daconid ∈ dom(E) E(daconid) = (⟨α₁,…,α_n⟩,⟨τ₁,…,τ_m⟩,θ⁽ⁿ⁾) E ⊢ Type₁ ⇒ τ₁′ … E ⊢ Type_n ⇒ τ_n′ E; τ₁[α₁/τ₁′,…,α_n/τ_n′] ⊢ SimplePat₁ ⇒ E₁ … E; τ_m[α₁/τ₁′,…,α_n/τ_n′] ⊢ SimplePat_m ⇒ E_m E′ = E₁ ⊕ … ⊕ E_m	P_DaCon

E; τ ⊢ daconid `[` Type₁ `,` … `,` Type_n `]` `{` SimplePat₁ `,` … `,` SimplePat_m `}` ⇒ E′

Note that in the typing rule for type checking a simple pattern, the premise judgement corresponds to the rules given in Section 5.3.1, while the conclusion judgement corresponds to the rules given in this section.

E; τ ⊢ SimplePat ⇒ E′	P_SimplePat

E; τ ⊢ SimplePat ⇒ E′

5.4 Expressions

The typing rules for expressions use a judgement of the form

E ⊢ Exp ⇒ τ

which can be read as “in the environment E, the expression Exp has the semantic type τ.”

Type checking an anonymous function requires type checking the parameters and then type checking the function body in an environment extended with the bindings of the parameters. The anonymous function will have either a type-function type or a function type, as constructed by the meta-function from the type checking of the parameters.

E ⊢ Params ⇒ E′; T E ⊕ E′ ⊢ Exp ⇒ τ	E_Fn

E ⊢ `fn` Params `=>` Exp ⇒ T(τ)

Type checking an if expression requires checking that the condition expression has the boolean type and that the then expression and the else expression have the same type.

E ⊢ Exp_c ⇒ *bool⁽⁰⁾ E ⊢ Exp_t ⇒ τ E ⊢ Exp*_e ⇒ τ	E_If

E ⊢ `if` Exp_i `then` Exp_t `else` Exp_e ⇒ τ

Type checking an orelse or andalso expression requires that both operands have the boolean type and expression itself has the boolean type.

E ⊢ Exp_l ⇒ *bool⁽⁰⁾ E ⊢ Exp_r ⇒ bool*⁽⁰⁾	E_Orelse

E ⊢ Exp_l `orelse` Exp_r ⇒ *bool*⁽⁰⁾

E ⊢ Exp_l ⇒ *bool⁽⁰⁾ E ⊢ Exp_r ⇒ bool*⁽⁰⁾	E_Andalso

E ⊢ Exp_l `andalso` Exp_r ⇒ *bool*⁽⁰⁾

Type checking a constraint expression requires checking that the expression and the type constraint have the same (sematic) type.

E ⊢ Exp ⇒ τ E ⊢ Type ⇒ τ	E_Constraint

E ⊢ Exp `:` Type ⇒ τ

Type checking the binary and unary operations requires checking that the operands have the types appropriate for the operation.

op ∈ {`==`,`<>`,`<`,`<=`,`>`,`>=`} E ⊢ Exp_l ⇒ *integer⁽⁰⁾ E ⊢ Exp_r ⇒ integer*⁽⁰⁾	E_BinOp(CmpOp)

E ⊢ Exp_l op Exp_r ⇒ *bool*⁽⁰⁾

op ∈ {`^`} E ⊢ Exp_l ⇒ *string⁽⁰⁾ E ⊢ Exp_r ⇒ string*⁽⁰⁾	E_BinOp(ConcatOp)

E ⊢ Exp_l op Exp_r ⇒ *string*⁽⁰⁾

op ∈ {`+`,`-`,*``,`/`,`%`*} E ⊢ Exp_l ⇒ integer⁽⁰⁾ E ⊢ Exp_r ⇒ integer*⁽⁰⁾	E_BinOp(ArithOp)

E ⊢ Exp_l op Exp_r ⇒ *integer*⁽⁰⁾

op ∈ {`~`} E ⊢ Exp ⇒ *integer*⁽⁰⁾	E_UnOp(ArithOp)

E ⊢ op Exp ⇒ *integer*⁽⁰⁾

Type checking a data constructor expression requires checking a number of things. First, the data constructor identifier is looked up in the environment, returning its semantic type constructor (θ⁽ⁿ⁾, the formal types of its arguments (⟨τ₁,…,τ_m⟩), and the formal type variables (⟨α₁,…,α_n⟩) over which the type constructor and the formal types of its arguments is abstracted. Second, each of the actual type arguments is type checked (E ⊢ Type_i ⇒ τ_i′). Third, each of the actual expression arguments is type checked, and must return a type equal to the substitution of the actual type arguments for the formal type variables in the formal type of the corresponding data constructor argument (E; τ_i[α₁/τ₁′,…,α_n/τ_n′] ⊢ SimplePat_i ⇒ E_i). The final output type is the type constructor applied to the actual type arguments.

daconid ∈ dom(E) E(daconid) = (⟨α₁,…,α_n⟩,⟨τ₁,…,τ_m⟩,θ⁽ⁿ⁾) E ⊢ Type₁ ⇒ τ₁′ … E ⊢ Type_n ⇒ τ_n′ E ⊢ Exp₁ ⇒ τ₁[α₁/τ₁′,…,α_n/τ_n′] … E ⊢ Exp_m ⇒ τ_m[α₁/τ₁′,…,α_n/τ_n′]	E_DaCon

E ⊢ daconid `[` Type₁ `,` … `,` Type_n `]` `{` Exp₁ `,` … `,` Exp_m `}` ⇒ θ⁽ⁿ⁾(τ₁′,…,τ_n′)

There are two rules for type checking an application, depending on the kind of the apply argument. For an expression apply argument, we check that the function expression has a function type and that the argument expression has the argument type.

E ⊢ Exp_f ⇒ τ_a → τ_r E ⊢ Exp_a ⇒ τ_a	E_Apply(A_Exp)

E ⊢ Exp_f Exp_a ⇒ τ_r

For a type apply argument, we check that the function expression has a type-function type and that the argument type is well-formed. The type of the application expression is the subtitution of the (semantic) type argument for the abstracted type variable in the result type.

E ⊢ Exp_f ⇒ ∀ α → τ_r E ⊢ Type_a ⇒ τ_a	E_Apply(A_Type)

E ⊢ Exp_f `[` Type_a `]` ⇒ τ_r[τ_a/α]

Type checking a variable identifier returns its semantic type, as recorded in the environment.

varid ∈ dom(E) E(varid) = τ	E_VarName

E ⊢ varid ⇒ τ

Type checking an integer or string constant returns the obvious type.

	E_Integer

E ⊢ integer ⇒ *integer*⁽⁰⁾

	E_String

E ⊢ string ⇒ *string*⁽⁰⁾

Type checking a sequence expression requires type checking each expression, but only returns the type of the final expression.

n ≥ 0 E ⊢ Exp₁ ⇒ τ₁ … E ⊢ Exp_n ⇒ τ_n	E_Seq

E ⊢ `(` Exp₁ `;` … `;` Exp_n `)` ⇒ τ_n

Type checking a let expression requires checking the declarations and then type checking the body expression in an environment extended with the bindings of the declarations.

E ⊢ Decls ⇒ E′ n ≥ 0 E ⊕ E′ ⊢ Exp₁ ⇒ τ₁ … E ⊕ E′ ⊢ Exp_n ⇒ τ_n Θ(τ_n) ⊆ Θ(E)	E_Let

E ⊢ `let` Decls `in` Exp₁ `;` … `;` Exp_n `end` ⇒ τ

The Θ(τ_n) ⊆ Θ(E) premise checks that every (semantic) type constructor in τ_n is available in the environment E. We define Θ(τ) and Θ(E) as follows:

Θ(∀ α → τ_a)	=	Θ(τ_a)
Θ(τ_a → τ_r)	=	Θ(τ_a) ∪ Θ(τ_r)
Θ(θ^(k)(τ₁,…,τ_k)	=	{θ^(k)} ∪ Θ(τ₁) ∪ … ∪ Θ(τ_k)
Θ(α)	=	{}

Θ(⟨TVE,TCE,DCE,VE⟩)	=	{ θ^(k) \| tyconid ∈ dom(TCE) and TCE(tyconid) = θ^(k)}

This prevents type constructors introduced by datatype declarations in Decls from appearing in the type of Exp. For example, the following LangF program does not type check:

(* Type constructor escapes the scope of its definition. *) let datatype T = A {Integer} | B {String} fun f (x: Integer) : T = A {x} in f end

On the other hand, the following LangF program does type check:

let type T = Integer fun f (x: Integer) : T = Integer in f end

because the type declaration does not introduce a fresh (semantic) type constructor.

Type checking a case expression requires type checking the scruitinee and checking the match rules against the type of the scruitinee.

E ⊢ Exp ⇒ τ E; τ ⊢ MatchRules ⇒ τ′	E_Case

E ⊢ `case` Exp `of` MatchRules `end` ⇒ τ′

5.5 MatchRules

Type checking a match rule requires checking the right-hand-side expression in the environment extended with the bindings from the left-hand-side pattern.

E; τ ⊢ Pat ⇒ E′ E ⊕ E′ ⊢ Exp ⇒ τ′	MatchRule

E; τ ⊢ Pat `=>` Exp ⇒ τ′

5.5.1 Multiple Match Rules

Type checking a list of match rules requires that every match rule has the same result type.

n ≥ 0 E; τ ⊢ MatchRule₁ ⇒ τ … E; τ ⊢ MatchRule_n ⇒ τ	MatchRules

E; τ ⊢ MatchRule₁ `\|` … `\|` MatchRule_n ⇒ τ′

5.6 Declarations

The typing rules for declarations use a judgement of the form

E ⊢ Decl ⇒ E′

which can be read as “in the environment E, the declaration Decl returns the environment E′.” The output environment only includes bindings for the identifiers defined in the declaration; the output environment is not an extension of the input environment.

5.6.1 Type Declarations

Type checking a type declaration requires introducing new semantic type variables for the syntactic type variable identifiers when checking the right-hand-side type. The final environment is an environment with a binding for the (syntactic) type constructor identifier that maps to the (semantic) type variables (acting as formal type parameters) and the (semantic) type.

E_tv = { tyvarid_i ↦ α_i \| 1 ≤ i ≤ n and α_i fresh } E ⊕ E_tv ⊢ Type ⇒ τ E′ = { tyconid ↦ (⟨ α₁,…,α_n⟩, τ) }	D_Type

E ⊢ `type` tyconid `[` tyvarid₁ `,` … `,` tyvarid_n `]` `=` Type ⇒ E′

The typing rule for a datatype declaration is somewhat complicated. A new semantic type constructor (θ_i) is introduced for each syntactic type constructor identifier (tyconid_i). Furthermore, a new semantic type variable (α_i,j) is introduced for each syntactic type variable identifier (tyvarid_i,j). Each of the data constructor declarations is type checked in an environment that includes all of the type constructors (E_tc) and only the appropriate type variables (E_{tv_i}). Note that type checking a data constructor declaration takes the type variables and the type constructor as part of the context; it returns an environment of bindings for the data constructor identifiers defined by the data constructor declaration. The final output environment includes all of the type constructors and all of the data constructors (but none of the type variables).

n ≥ 0 E_tc = { tyconid_i ↦ θ^(m_i) | 1 ≤ i ≤ n and θ_i fresh } E_tv₁ = { tyvarid_i,1 ↦ α_i,1 | 1 ≤ i ≤ m₁ and α_i,1 fresh } E ⊕ E_tc ⊕ E_tv₁; (⟨ α_1,1,…,α_1,m₁ ⟩, θ^(m₁)) ⊢ DaConDecls₁ ⇒ E_DC₁ … E_{tv_n} = { tyvarid_i,n ↦ α_i,n | 1 ≤ i ≤ m_n and α_i,n fresh } E ⊕ E_tc ⊕ E_{tv_n}; (⟨ α_n,1,…,α_{n,m_n} ⟩, θ^(m_n)) ⊢ DaConDecls_n ⇒ E_{DC_n} E′ = E_tc ⊕ E_DC₁ ⊕ … ⊕ E_{DC_n} D_Datatype

E ⊢

`datatype`	tyconid₁ `[` tyvarid_1,1 `,` … `,` tyvarid_1,m₁ `]` `=` DaConDecls₁
…
`and`	tyconid_n `[` tyvarid_n,1 `,` … `,` tyvarid_{n,m_n} `]` `=` DaConDecls_n

⇒ E′

5.6.2 Data Constructor Declarations

Type checking a single data constructor declaration requires type checking each of its declared argument types. The result environment binds the data constructor identifier to its type constructor (θ⁽ⁿ⁾), the formal types of its arguments (⟨τ₁,…,τ_m⟩), and the formal type variables (⟨α₁,…,α_n⟩) over which the type constructor and the formal types of its arguments is abstracted. The type constructor and the formal type variables are provided by the context.

E ⊢ Type₁ ⇒ τ₁ … E ⊢ Type_m ⇒ τ_m E′ = { daconid ↦ (⟨ α₁,…,α_n ⟩, ⟨ τ₁,…,τ_m ⟩, θ⁽ⁿ⁾) }	DaConDecl

E; ⟨ α₁,…,α_n ⟩; θ⁽ⁿ⁾ ⊢ daconid `{` Type₁ `,` … `,` Type_m `}` ⇒ E′

Type checking a list of data constructor declarations requires type checking each data constructor declaration. The final output environment is the extension of all the individual output environments.

m ≥ 0 E; ⟨ α₁,…,α_n ⟩; θ⁽ⁿ⁾ ⊢ DaConDecl₁ ⇒ E_DC₁ … E; ⟨ α₁,…,α_n ⟩; θ⁽ⁿ⁾ ⊢ DaConDecl_m ⇒ E_{DC_m} E′ = E_DC₁ ⊕ … ⊕ E_{DC_m}	DaConDecls

E; ⟨ α₁,…,α_n ⟩; θ⁽ⁿ⁾ ⊢ DaConDecl₁ `\|` … `\|` DaConDecls_m ⇒ E′

5.6.3 Value Declarations

Type checking a val declaration requires type checking the expression and type checking the pattern, which returns the bindings to be returned by the declaration. If there is a type assertion, then it must return the same (semantic) type as the expression.

E ⊢ Exp ⇒ τ (E ⊢ Type ⇒ τ)^? E; τ ⊢ SimplePat ⇒ E′	D_Val

E ⊢ `val` SimplePat (`:` Type)^? `=` Exp ⇒ E′

Type checking a fun declaration requires constructing an environment that assigns a type to each of the function variable identifiers in the declaration. First, each of the parameter lists is type checked (E ⊢ Params_i ⇒ E_{P_i}), returning an environment and a meta-function. The environment is used to type check the return type (E ⊕ E_{P_i} ⊢ Type_i ⇒ τ_i) and the meta-function is used to construct the type of the function (T(τ_i)). Each of the function bodies is type checked in an environment extended with the appropriate parameters and the function variable identifiers (E ⊕ E′ ⊕ E_{P_i} ⊢ Exp_i ⇒ τ_i); note that each function body must have its declared return type.

n ≥ 0 E ⊢ Params₁ ⇒ E_P₁; T₁ E ⊕ E_P₁ ⊢ Type₁ ⇒ τ₁ … E ⊢ Params_n ⇒ E_{P_n}; T_n E ⊕ E_{P_n} ⊢ Type_n ⇒ τ_n E′ = { varid_i ↦ T_i(τ_i) | 1 ≤ i ≤ n } E ⊕ E′ ⊕ E_P₁ ⊢ Exp₁ ⇒ τ₁ … E ⊕ E′ ⊕ E_{P_n} ⊢ Exp_n ⇒ τ_n D_Fun

E ⊢

`fun`	varid₁ Params₁ `:` Type₁ `=` Exp₁
…
`and`	varid_n Params_n `:` Type_n `=` Exp_n

⇒ E′

5.6.4 Multiple Declarations

Type checking a list of declarations requires type checking each declaration in turn. Since identifiers in earlier declarations are bound in later declarations, each declaration is type checked in an environment extended with output environments of each earlier declaration. The final output environment is the extension of all the individual output environments.

E ⊢ Decl₁ ⇒ E₁ … E ⊕ E₁ ⊕ … ⊕ E_n−1 ⊢ Decl_n ⇒ E_n E′ = E₁ ⊕ … ⊕ E_n−1 ⊕ E_n	Decls

E ⊢ Decl₁ … Decl_n ⇒ E′

5.7 Programs and the Initial Environment

The typing rule for programs use a judgement of the form

⊢ Prog ⇒ ✓

which can be read as “the program Prog is statically correct.”

Type checking a program is similar to type checking a let expression; it requires type checking the declarations and then checking the expression in an environment extended with the bindings of the declarations.

E₀ ⊢ Decls ⇒ E′ E₀ ⊕ E′ ⊢ Exp ⇒ τ	Prog

⊢ Decls `;` Exp ⇒ ✓

The declarations and expression are type checked in the context of an initial environment E₀ that provides predefined type constructors, data constructors, and variables. This initial environment is defined as follows:

E₀ = ⟨TVE₀,TCE₀,DCE₀,VE₀⟩

TVE₀ = { }

TCE₀

⎧
⎪
⎨
⎪
⎩

`Bool`	↦	*bool*⁽⁰⁾
`Integer`	↦	*integer*⁽⁰⁾
`String`	↦	*string*⁽⁰⁾
`Unit`	↦	*unit*⁽⁰⁾

⎫
⎪
⎬
⎪
⎭

DCE₀

⎧
⎪
⎨
⎪
⎩

`False`	↦	(⟨⟩,⟨⟩,*bool*⁽⁰⁾)
`True`	↦	(⟨⟩,⟨⟩,*bool*⁽⁰⁾)
`Unit`	↦	(⟨⟩,⟨⟩,*unit*⁽⁰⁾)

⎫
⎪
⎬
⎪
⎭

VE₀

⎧
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎩

`argc`	↦	*unit⁽⁰⁾ → integer*⁽⁰⁾
`arg`	↦	*integer⁽⁰⁾ → string*⁽⁰⁾
`fail`	↦	∀ α → *string*⁽⁰⁾ → α
`print`	↦	*string⁽⁰⁾ → unit*⁽⁰⁾
`size`	↦	*string⁽⁰⁾ → integer*⁽⁰⁾
`sub`	↦	*string⁽⁰⁾ → integer⁽⁰⁾ → integer*⁽⁰⁾

⎫
⎪
⎪
⎪
⎬
⎪
⎪
⎪
⎭

6 Conversion to Abstract Syntax Tree

In addition to checking that a program is statically correct, a type checker must produce a program representation that can be used by the rest of the compiler for optimization and code generation. In the LangF compiler, the type checker will produce a typed abstract syntax tree (the structure AbsSynTree : ABS_SYN_TREE module in the project seed code). The typed abstract syntax tree (AST) representation is very close to the parse tree (PT) representation, but with some crucial differences:

The AST forms include no source location information.
Type variables, type constructors, data constructors, and variables in the AST representation are implemented by structures with the ID signature (langfc-src/common/id.sig). As discussed in Section 3, semantic type variables, type constructors, data constructors, and variables are used to distinguish different binding occurrences, which otherwise have the same syntactic type variable names, type constructor names, data constructor names, or variable names. There are operations in the ID signature that can be used to create fresh identifiers, not equal to any other identifier previously created.
Variables bound (and wildcards) in simple patterns include their type.
There is no type constraint expression form; instead, every expression form is annotated with its type.
There is no type declaration form; all type abbreviations will have been expanded during type checking and conversion to the abstract syntax tree.
The val declaration form has no type constraint; the simple pattern includes the type.

We have already introduced one form in the abstract syntax tree representation: the semantic types from Section 3. Similarly, we have already introduced one judgement for translating a parse tree representation form into an abstract syntax tree representation form: the judgement for type checking types from Section 5.1:

E ⊢ Type ⇒ τ

in the environment E, the parse tree type Type is well-formed and translates to the abstract syntax tree type τ.

We can imagine other judgements that combine type checking with translation to the abstract syntax tree:

E ⊢ Exp ⇒ τ; e	in the environment E, the parse tree expression Exp has the abstract syntax tree type τ and translates to the abstract syntax tree expression e.

E ⊢ Decl ⇒ E′; d	in the environment E, the parse tree declaration Decl returns the environment E′ and translates to the abstract syntax tree declaration d.

⊢ Prog ⇒ p	the parse tree program Prog is statically correct and translates to the abstract syntax tree program p.

The inference rules for these judgements will be very similar to those given in Section 5, except they will construct an appropriate output abstract syntax tree form.

7 Requirements

You should implement a type checker for LangF that enforces the type system from Section 5 and produces a typed abstract syntax tree. Your implementation should include (at least) the following modules:

structure Environment : ENVIRONMENT structure TypeChecker : TYPE_CHECKER

The ENVIRONMENT signature in the project seed code is as follows:

signature ENVIRONMENT = sig (* The combined environment is composed of a type variable * environment, a type constructor environment, a data constructor * environment, and a variable environment. Each individual * environment has its own domain and co-domain. * * For a simple binding analysis, we can take the co-domain to be * 'unit', since we only need to know whether or not the identifier * is in the environment. * * For type checking (without producing an abstract syntax tree), * you will require co-domains similar to those in the project * description. * * For type checking and producing an abstract syntax tree, * you will require additional components in the co-domains. *) structure TyVarEnv : sig type dom = ParseTree.TyVarName.t type cod = unit end structure TyConEnv : sig type dom = ParseTree.TyConName.t type cod = unit end structure DaConEnv : sig type dom = ParseTree.DaConName.t type cod = unit end structure VarEnv : sig type dom = ParseTree.VarName.t type cod = unit end type t

(* The empty environment {}. *) val empty : t (* Create an environment with a single TyVar entry. *) val singletonTyVar : TyVarEnv.dom * TyVarEnv.cod -> t (* Lookup a TyVar in the environment. *) val lookupTyVar : t * TyVarEnv.dom -> TyVarEnv.cod option (* Create an environment with a single TyCon entry. *) val singletonTyCon : TyConEnv.dom * TyConEnv.cod -> t (* Lookup a TyCon in the environment. *) val lookupTyCon : t * TyConEnv.dom -> TyConEnv.cod option (* Create an environment with a single DaCon entry. *) val singletonDaCon : DaConEnv.dom * DaConEnv.cod -> t (* Lookup a DaCon in the environment. *) val lookupDaCon : t * DaConEnv.dom -> DaConEnv.cod option (* Create an environment with a single Var entry. *) val singletonVar : VarEnv.dom * VarEnv.cod -> t (* Lookup a Var in the environment. *) val lookupVar : t * VarEnv.dom -> VarEnv.cod option (* Implements E1 (+) E2. *) val extend : t * t -> t (* Implements \Theta(E). *) val tycons : t -> AbsSynTree.TyCon.Set.set (* The initial environment E_0. *) val initial: t

You will need to extend the endENVIRONMENT signature (and the Environment structure) with new co-domain types as required by your type-checker implementation.

The TYPE_CHECKER signature is as follows:

signature TYPE_CHECKER = sig val typeCheck : ErrorStream.t * ParseTree.Prog.t -> AbsSynTree.Prog.t option

The endstructure ParseTree : PARSE_TREE and structure AbsSynTree : ABS_SYN_TREE modules are provided in the seed code; the PARSE_TREE signature implementation is at langfc-src/parse-tree/parse-tree.sig; the ParseTree structure implementation is at langfc-src/parse-tree/parse-tree.sml; the ABS_SYN_TREE signature implementation is at langfc-src/abs-syn-tree/abs-syn-tree.sig; and the AbsSynTree structure implementation is at langfc-src/abs-syn-tree/abs-syn-tree.sml.

7.1 Errors

To support error reporting, the TypeChecker.typeCheck function takes an argument of the type ErrorStream.t. The ErrorStream: ERROR_STREAM module is provided in the seed code and provides a common error reporting utility in the LangF compiler. (The module was used implicitly in Projects 1 and 2.) The ERROR_STREAM signature implementation is at langfc-src/common/error-stream.sig; the ErrorStream structure implementation is at langfc-src/common/error-stream.sml.

Your type checker should report reasonable error messages. You should report violations of the syntactic restrictions of Section 2, unbound identifiers, and type errors. There is a lot of room for creativity and style in reporting errors. For a program with multiple type errors, you are required to report at least one error message, but need not report more than one error message.

8 GForge and Submission

Sources for Project 3 have been (or will shortly be) committed to your repository in the project3 sub-directory. You will need to update your local copy, by running the command:

svn update

from the cnetid-proj directory.

We will collect projects from the SVN repositories at 10pm on Friday, February 27; make sure that you have committed your final version before then. To do so, run the command:

svn commit

from the cnetid-proj directory.

9 Hints

Start early!
Study the interfaces. You will need to be familiar with the types and operations in the PARSE_TREE, ABS_SYN_TREE, and ID signatures.
Think about how to represent the environment(s). In the parse tree representation, each structure implementing a kind of identifier provides structure Set: ORD_SET, structure Map: ORD_MAP, and structure Tbl: MONO_HASH_TABLE modules for finite maps, sets, and hash tables over that kind of identifier; you can view the ORD_SET signature here, the ORD_MAP signature here, and the MONO_HASH_TABLE signature here.
Think about the structure of the type-checker implementation. Each judgement can be implemented as a function; just as the parse tree datatypes for expressions, match rules, and declarations are mutually recursive, the functions for judgements that type check expressions, match rules, and declarations will be mutually recursive. Each function for a judgement will have a case for each typing rule with that judgement as the conclusion.
Work in stages. First implement a simple binding checker that only checks that the program has no unbound variables (but does not check types and does not produce an abstract syntax tree). This binding checker will establish the basic structure of the implementation. Next, implement a type checker that checks for unbound variables and checks types (but does not produce an abstract syntax tree). This type checker will require extending the simple binding checker, but will very closely match the typing rules from Section 5. Next, implement a full type checker that checks for unbound variables, checks types, and produces an abstract syntax tree. This full type checker will require additional information to be carried in the environment and to be returned by each type checking function. Finally, extend the full type checker to additionally check the syntactic restrictions of Section 2.
Work on error reporting last. Detecting errors and producing good error messages can be difficult; it is more important for your type checker to work on good programs than for it to “work” on bad programs. Again, work in stages. First implement a type checker that stops after detecting the first error. Then implement a type checker that continues after detecting an error.
To complete the assignment, you should only need to make changes to the cnetid-proj/project3/langfc-src/type-checker/environment.sig, cnetid-proj/project3/langfc-src/type-checker/environment.sml, and cnetid-proj/project3/langfc-src/type-checker/type-checker.sml files.
Executing the compiler (from the cnetid-proj/project3 directory) with the command
./bin/langfc -Ckeep-type-check=true file.lgf

will produce a file.type-check.ast file that contains the abstract syntax tree returned by the type checker. Use this control and its output to check that your type checker is working as expected. The tests/type-checker directory includes a number of tests (of increasing complexity); for each testNN.lgf file, if the test has type errors, there is a testNN.err file containing sample error messages to be reported by the type checker. If the test has no type errors, then there is no output file; rather, the abstract parse tree returned by the type checker should convert to the core intermediate representation and type check in the core intermediate representation without errors.
As in past projects, you are not required to match the sample error messages exactly. In particular, you need not have particularly “pretty” error messages. The sample error messages (and sample solution) have gone to some length to produce good error messages, with types written using the identifiers in the parse tree representation. You will probably find it much easier to print types using the identifiers in the abstract syntax tree representation (using the functions Layout.toString and AST.Type.layout), which will print indentifiers with a uniqueifying suffix. For example, in test64.lgf, the sample error message is:

test46.lgf:5.0-test46.lgf:6.0 Error: Constraint and expression of 'val' disagree. constraint: T expression: ?T? in: val y : T = B
where the constraint type is printed using the tycon identifier T from the parse tree representation, and the expression type is also printed using the tycon identifier T from the parse tree representation, but printed as ?T? to indicate that this tycon is shadowed. It will be much easier to produce an error message like:

test46.lgf:5.0-test46.lgf:6.0 Error: Constraint and expression of 'val' disagree. constraint: T__019 expression: T__015 in: val y : T = B
where the constraint type and the expression type are printed using the tycon identifiers from the abstract syntax tree representation (with their uniqueifying suffixes). The fact that one tycon is shadowed by the other (in the parse tree representation) is indicated by the different type constructor indentifiers in the abstract syntax tree representation.

Document history

February 24, 2009

Section 5.4: Θ(⟨TVE,TCE,DCE,VE⟩) = cod(TCE) ⇒
Θ(⟨TVE,TCE,DCE,VE⟩) = { θ^(k) | tyconid ∈ dom(TCE) and TCE(tyconid) = θ^(k)}

February 23, 2009

Section 9: Added discussion of unit tests and error messages.

February 16, 2009

Section 3: the datatype declaration at line 1 will introduce a different type constructor ⇒
the datatype declaration at line 3 will introduce a different type constructor

February 13, 2009

Made redundancy and exhaustiveness checking extra credit.
Fixed typing rule for let to handle sequenced expressions in body.
For a program with multiple type errors, require at least one error message, but not necessarily more than one.

February 12, 2009

Original version

1: Remember, a specification is a description of a property (yes/no question; true/false statement). It does not define (though it may suggest) an implementation for deciding whether or not the property holds. A significant component of this project is to develop the skills needed to produce an impementation from a specification.

CMSC 22610 Implementation of Computer Languages - I Winter 2009

LangF Type Checker

1 Introduction

2 LangF Syntactic Restrictions

3 LangF Types

4 Identifiers and Environments

5 LangF Typing Rules

5.1 Types

5.2 Parameters

5.2.1 Multiple Parameters

5.3 Simple Patterns and Patterns

5.3.1 Simple Patterns

5.3.2 Patterns

5.4 Expressions

5.5 MatchRules

5.5.1 Multiple Match Rules

5.6 Declarations

5.6.1 Type Declarations

5.6.2 Data Constructor Declarations

5.6.3 Value Declarations

5.6.4 Multiple Declarations

5.7 Programs and the Initial Environment

6 Conversion to Abstract Syntax Tree

7 Requirements

7.1 Errors

8 GForge and Submission

9 Hints

Document history

CMSC 22610
Implementation of Computer Languages - I
Winter 2009