Language Overview¶
This page provides an overview of the language you will be working with throughout the quarter. This quarter you will be building an optimizing compiler for a subset of the Go language that we are calling Golite. The language will have very similar syntax and structure. However, certain aspects of the syntax will be more restrictive and structured than normal Go syntax so please make sure you read over the EBNF grammar carefully and fully understand the semantics of the language. Your first task is to make sure you understand the EBNF grammar for Golite.
All programs in this language are command-line applications that are coded in a single file (with any name). The file extension should be .golite
. If you wish to run your Golite programs using the Go compiler (i.e., for testing purposes) then change the extension to .go
.
Note
This document may be updated throughout the quarter. Please pay close attention to Ed for announcements about updates to this document.
GoLite Grammar¶
The following CFG grammar partially describes the Golite syntax. In the EBNF below, non-terminals are highlighted in blue with a =
instead of an ->
as shown in class. You’ll notice a series of semi-colons lined up at the end of each line in the EBNF. Please ignore those semi-colons because they are needed to signal the end of a production. The language does contain semicolons to signal the end of a statement but those are terminals and are indicated as ';'
. The terminals are highlighted in green with single quotes. Don’t be frighten by the length of the grammar. We are just being very explicit about everything but the language is relatively small in comparison to the full Go language. Additional notation is provided in the below chart.
Usage |
Notation |
Meaning |
Definition |
|
The definition of a production |
Repetition |
|
Zero or more occurrences of the symbol(s) defined in the braces. |
Optional |
|
The symbol(s) defined in subscript brackets are not required to appear. |
Grouping |
|
Any terminal symbol from within this group is valid. |
Please make sure to post on Ed if you do not understand the notation below. Do not just assume. We will help clarify.
Program = Package Import Types Declarations Functions 'eof' ;
Package = 'package' 'id' ';' ;
Import = 'import' '"' 'fmt' '"' ';' ;
Types = {TypeDeclaration} ;
TypeDeclaration = 'type' 'id' 'struct' '{' Fields '}' ';' ;
Fields = Decl ';' {Decl ';'} ;
Decl = 'id' Type ;
Type = 'int' | 'bool' | '*' 'id' ;
Declarations = {Declaration} ;
Declaration = 'var' Ids Type ';' ;
Ids = 'id' {',' 'id'} ;
Functions = {Function} ;
Function = 'func' 'id' Parameters ReturnType '{' Declarations Statements '}' ;
Parameters = '(' [ Decl {',' Decl}] ')' ;
ReturnType = type | 'ε' ;
Statements = {Statement} ;
Statement = Block | Assignment | Print | Conditional | Loop | Return | Read | Invocation ;
Block = '{' Statements '}' ;
Assignment = LValue '=' Expression ';' ;
Read = 'fmt' '.' 'Scan' '(' '&' 'id' ')' ';' ;
Print = 'fmt' '.' 'Print' '(' 'id' ')' ';' ;
Print = 'fmt' '.' 'Println' '(' 'id' ')' ';' ;
Conditional = 'if' '(' Expression ')' Block ['else' Block] ;
Loop = 'for' '(' Expression ')' Block ;
Return = 'return' [Expression] ';' ;
Invocation = 'id' Arguments ';' ;
Arguments = '(' [Expression {',' Expression}] ')' ;
LValue = 'id' {'.' id} ;
Expression = BoolTerm {'||' BoolTerm} ;
BoolTerm = EqualTerm {'&&' EqualTerm} ;
EqualTerm = RelationTerm {('=='| '!=') RelationTerm} ;
RelationTerm = SimpleTerm {('>'| '<' | '<=' | '>=') SimpleTerm} ;
SimpleTerm = Term {('+'| '-') Term} ;
Term = UnaryTerm {('*'| '/') UnaryTerm} ;
UnaryTerm = '!' SelectorTerm | '-' SelectorTerm | SelectorTerm ;
SelectorTerm = Factor {'.' 'id'} ;
Factor = '(' Expression ')' | 'id' [Arguments] | 'number' | 'true' | 'false' | 'nil' ;
Lexical Analysis Information¶
The following rules provides additional information needed for lexical analysis:
A valid program ends with an end-of-file indicator (EOF). No other text must follow it.
An identifier token must begin with a letter from the English alphabet. The letter can be capitalized or lowercase. An identifier can contain zero or more integer digits (0 to 9), lowercase or uppercase English letters following the first letter.
A number token is an integer. Integers contain one or more digits (0 to 9).
As with most languages, a token is formed by taking the longest possible sequence of constituent characters. Whitespace (i.e., one or more spaces, tabs, or newlines) may precede or follow any token. For example,
"a=1"
and"a = 1"
are equivalent. Whitespace does delimit tokens. For example,"abc"
is one token whereas"a bc"
is two tokens.A comment begins with
//
and consists of all characters up to a new line. We do not have multiline comments in this language.
Language Semantics¶
The semantics for GoLite are provide informally. Please ask questions on Ed if you need clarification.
Scoping: local declarations and parameters may hide global declarations (and functions), but a local declaration may not redefine a parameter.
- Identifiers within the same scope can not be redeclared:
Global variables with the same name cannot be defined (compiler produces an error)
Local variables of the same function cannot be defined (compiler produces an error)
Two
struct
definitions with same name cannot be defined (compiler produces an error)Two fields within a struct cannot be defined (compiler produces an error)
Two functions with the same name cannot be defined (compiler produces an error)
You are free to define variables and functions in a separate namespace/scope then
struct
names.The entry point into a program begins in the a function named
main
that takes no arguments. All valid programs must define amain
function.struct
type may only include fields of the primitive types andstruct
types defined before it in a file. You must also allow fields of its own type.The only package allowed must be defined as
main
.The language restricts Mutually-Recursive functions; however, it does allow normal recursion.
The semantics of
if
andfor
statements is the same as Go semantics. However, they both require a boolean expression and parentheses around the boolean expression.Assignment statements require the left-hand side and right-hand side to have same type. However,
nil
can be defined to anystruct
type.All
struct
declarations must be defined with thenew(Type)
function. This is a special function defined in the standard library of the language that allocates a pointer to astruct
type. Allstruct
values in the language are pointers to their types. You can not use the&
to make a pointer to a struct in Golite.struct
values are deallocated by the garbage collector.All allocated memory must be freed (i.e., deallocated) by the programmer. A programmer can deallocate memory that was allocated with the
new
function by callingdelete(pointer)
. Thedelete
function make take in a pointer to a memory address (i.e., it must take in a struct value). Make sure the argument is of type struct.The
.
operator is used to access a field in astruct
value.All arguments in the language are passed by value (including pointers to
struct
values).GoLite only allows the reading and printing of integer values. You can use the
fmt.Print(int)
to print an integer variable without a newline orfmt.Println(int)
to print an integer variable with a newline. No other types can be printed.All arithmetic and relational operators require integer operands.
Equality operators require operands of integer or
struct
type. The operands must have the same type.struct
values are compared by address.Boolean operators require boolean operands.
The language does not require Short-Circuiting via boolean expressions.