CS326 - Project 3 : Symbol tables and type-checking

Due Date: Friday, May 4, 11:59 p.m.

In this part of the project, you are going to extend your parser with a symbol table and add some semantic checks.

Introduction

In FLAME programs, identifiers are used to refer to function names and variables. For example, consider the following program:
fun gcd(x:int, y:int)
    g: int;
    begin
       g := y;
       while x > 0 do
           begin
              g := x;
              x := y - (y/x)*x;
              y := g
           end;
       return g
    end

fun main()
   x : int;
   y : int;
   r : float;
   begin
      print("Enter two numbers\n");
      read(x);
      read(y);
      r := gcd(x,y);
      write(v)
   end
In the the gcd() function, the identifiers x, y, and g are used. In main(), the identifiers x,y,r,v, and gcd are used.

Identifiers must always be declared before they are used. In this program, identifiers are either declared by a function definition, function parameters, or by a declaration statement such as "g:int;". If an undeclared identifier appears, it is a programming error. For example, the "write(v)" statement in main() is an error because "v" is never declared.

Identifiers also have scope. For example, the identifier "g" declared in gcd() is only visible inside the gcd() function. It can't be accessed in main() or any other functions. Similarly, the identifiers "x" and "y" which are used in both gcd() and main() are different. That is, the "x" declared in main() is a complelely different variable than the "x" declared as a parameter of gcd().

To keep track of identifiers, your compiler needs to manage a symbol table. The symbol table simply keeps track of everything that is known about each identifier that appears in a program. The interface to the symbol table primarily occurs in two places:

By adding the symbol table to your parser, you will be able to perform a variety of semantic checks to your parser. Specifically, you will be able to identify the following errors such as the following:

Part 1 - Fix Project 2

Modify your parser to fix all of the tests you failed in Project 2. Specific attention should be given to tests in which valid FLAME programs fail to parse correctly! You will recover half of the points deducted in Project 2 by passing the tests when Project 3 is handed in. Please note: since Project 3 adds additional error messages to the compiler, the project 2 tests will be modified to expect these error messages as normal compiler output.

Part 2 - Symbol Table

Create a file "symtab.py" that implements a simple symbol table for FLAME. Minimally, your implementation should probably include the following features:

Part 3 - Modify your lexer

Modify your lexer to attach symbol table entries to each identifier token. If no symbol table entry exists, you should create a new entry and attach it to the token.

Keep in mind that the lexer interface is fairly dumb. All it does is find an existing symbol table entry (if any) and return it with the token. If no such entry exists, it creates an empty entry and returns it. The primary reason for doing this in the lexer is that the parser is always going to examine the symbol table for every identifier that it encounters. Rather than trying to modify the parser to do the lookup, it is usually easier and more efficient to do it in the lexer and pass the symbol table entry along with the token.

Part 4 - Modify your parser

Modify your parser to manage the symbol table as follows: To do this, you will need to be a little sneaky with your grammar. Here is one way to do it:
def p_function(t):
    'function : fundecl ID LPAREN parmlist RPAREN .... statements '
    ... do whatever ...

    # Restore old symbol table
    symtab.pop_symtab()

def p_fundecl(t):
    'fundecl : FUN'
    # Create a new symbol table
    symtab.new_symtab()
In this case, a new symbol table will be created as soon as the "FUN" token is seen (and reduced) on the input. Afterwards, all identifiers inside the function will be placed into the new symbol table. The only exception is that the "ID" right after "funcdecl" is still placed in the old symbol table. However, not to worry---this is exactly the behavior you want. Finally, when the entire function declaration is reduced, the symbol table will be restored.

Now, modify your parser as follows:

Note: there are some subtle issues of scope. For instance, if a symbol is not declared locally, but is declared globally, your parser will have to check for this in selected locations.

Part 5 - Type Checking

Now that you've got basic identifier and type handling working, it is time to add some type checking to the compiler. Most of this work involves modifications to your arithmetic expression handling.

Implementation Hints

Handin procedures

  1. Your parser must be contained in a file 'flameparse.py'. We must be able to run this file as follows to run a test:
    % python flameparse.py testname.flm
    

  2. Make sure you have a README file that includes your name and anything notable about your implementation.

  3. Create a Unix tar file for your project. This tar file should contain the 'flame' directory and be based on your login name. For example:
    % tar -cf yourlogin.tar flame
    

  4. On classes.cs.uchicago.edu, copy your tar file to the directory /stage/classes/current/CS326/handin/project3 For example:
    % cp yourlogin.tar /stage/classes/current/CS326/handin/project3
    

  5. Late assignments are not accepted. The handin directory will be closed after the due date.
IF WE CAN NOT UNPACK OR RUN YOUR HANDIN USING THE SUPPLIED TEST SCRIPTS, YOU WILL RECEIVE NO CREDIT!