CS326 - Project 4 : Code Generation

Due Date: Friday, May 18, 11:59 p.m.

Well, if you've made it this far, your compiler is well on its way to producing code. In this project, you will be extending your compiler to produce runnable assembly code for use on Sun SPARC workstations. By the end of this project, your compiler will finally be able to create runnable FLAME programs.

DO NOT WAIT UNTIL THE LAST MINUTE TO START THIS PROJECT!

Fixing Project 3

If your implementation of type checking from project 3 is broken, it is absolutely critical that you get that working. By far, the most difficult part of project 3 is type inference on functions. If you are feeling exhausted or frustrated with that part of the project, here is a simple fix that will let you continue with project 4: simply set all function return types to an "int". Obviously, doing this will not allow you to pass every single final test. However, you will still be able to pass a large number of the tests.

If your compiler is still not parsing correct programs, it would be a very good idea to get that working as soon as possible. Please see Ian or myself.

Introduction

Your primary task in this project is to take the syntax tree created by your parser and to produce a file of SPARC assembly code that can be assembled into a runnable program.

For example, if a user supplies a FLAME program hello.flm like this:

fun main() 
   begin
       print("Hello World\n")
   end

The output of your compiler will be a file hello.s that looks something like this:

! generated by flame.py

    .section     ".text"
    
! function: main

    .global main
    .type main,2

main:

    save %sp, 64, %sp
    sethi %hi(.L1), %o0
    or    %o0, %lo(.L1), %o0
    call  flprint
    nop

    call _exit
    nop

    .section     ".rodata"

.L1:   .asciz  "Hello World\n"

This file will then be assembled with the Sun assembler and linked with a small FLAME runtime library to produce an executable. For example:

% python flame.py hello.flm
% as hello.s
% ld hello.o flame.o -lc 
% a.out
Hello World
%

This project assumes no prior experience with SPARC assembly language (details follow). However, you will have to do a considerable amount of experimentation to make sure things work correctly.

Part 0 - Administrative Details

Completing this part of the project will require access to a Sun Solaris machine. If you were taking operating systems last quarter, your class account should still work on the machine stonecrusher.cs.uchicago.edu. If you were not in OS I have gone ahead and added you to this machine (passwords should be the same as your CS account). If you do not have access, please let me know ASAP.

Part 1 - Clean up the interface

So far, you have developed a couple of different parser modules such as flamelex.py and flameparse.py. Before starting on code generation, write a new module that provides a new main program for your compiler.

Create a file 'flame.py'.
Write a script or main program that starts by getting the filename from the command line and setting an output file name. For example:
```
import sys
import os.path

filename = sys.argv[1]
outname = os.path.splitext(filename)[0] + ".s"
```
Read the input file and invoke your parser. For example:
```
f = open(filename)
data = f.read()
f.close()

import flameparse
top = flameparse.parse(data)
```
The output of your parser should be the root node of a well-formed syntax tree representing the contents of the input program. If a syntax error or type error occurred during parsing, you may want to have your parser return None to indicate an error.

Create a new file flamegen.py and put the following function in it:

def generate(file,top):
    print >>file, "! Created by flame.py"
    print >>file, "! Yourname, CS326 (Spring 2001)"

Modify your main program to call generate() as follows:

top = flameparse.parse(data)
if top:
     import flamegen
     outf = open(outfile,"w")
     flamegen.generate(outf,top)
     outf.close()

Test our your new interface by running it on a simple program. You should compiler should produce a .s file like this:
```
% python flame.py simple.flm
% more simple.s
! Created by flame.py
! Yourname, CS326 (Spring 2001)
%
```
Congratulations, your compiler just produced its first output file.
To simplify your debugging, you may want to add some better error checking or special command line options to your main program. For instance, I added support for a "-t" option that would print out the parse tree. We're not going to check for any of this--this is only used to help you debug your compiler.

Part 2 - Walking the Syntax Tree

Next, your task is to write code that walks your syntax tree and looks for specific types of nodes. Think about how your program is structured. For instance, a program is a list of function declarations, a function is a list of statements, a statement may be one of read(), write(), print(), assignment, while, if-else, break, return, and so forth.

To walk the tree, create a collection of functions that try to print information about each statement in the source program. For example (note: you will have to modify this code to match your syntax tree representation):

def emit_program(out,top):
    print >>out,"\n! program"
    funclist = (get list of functions)
    for f in funclist:
          emit_function(out,f)

def emit_function(out,func):
    print >>out,"\n! function: %s (start) " % func.name
    ...
    statements = (get list of statements)
    emit_statements(out,statements)
    ...
    print >>out,"! function: %s (end) " % func.name

def emit_statements(out,statements):
    for s in statements:
         emit_statement(out,s)

def emit_statement(out,s):
    if s.name == 'print':
         emit_print(out,s)
    elif s.name == 'read':
         emit_read(out,s)
    elif s.name == 'write':
         emit_write(out,s):
    elif s.name == 'while':
         emit_while(out,s):
    ...

def emit_print(out,s):
    print >>out, "\n! print (start)"
    ...
    print >>out, "! print (end)"

def emit_while(out,s):
    print >>out, "\n! while (start)"
    ...
    statement = (get body of while)
    emit_statement(out,statement)
    ...
    print >>out, "! while (end)"

For each statement in your program, make the corresponding emit function print a starting and ending marker as shown in the example. Note: lines starting with "!" are comments in assembler.

Now, modify your generate() function to walk through a program and print information about each function and individual statements. For example, if you supplied the following program:

/* Compute GCD */
fun main()
    x: int;
    y: int;
    g: int;
    begin
       read(x);
       read(y);
       g := y;
       while x > 0 do
           begin
              g := x;
              x := y - (y/x)*x;
              y := g
           end;
       write(g)
    end

The output of your compiler should look roughly like this:

! Created by flame.py
! Yourname, CS326 (Spring 2001)

! program

! function: main (start)

! read (start)
! read (end)

! read (start)
! read (end)

! assign (start)
! assign (end)

! while (start)

! assign (start)
! assign (end)

! assign (start)
! assign (end)

! assign (start)
! assign (end)

! while(end)

! write (start)
! write (end)

! function: main (end)

Notes:

Your output should only include information about statements and the overall structure of the program. No details about expressions, parameter lists, or declarations will be printed.
Ignore nested functions for now.
Please include both a starting and ending message for each type of statement. This will prove to be useful later on (take my word for it).
If you did your job right in the parser, you should not have to perform any kind of checking of the syntax tree---type checking should have caught all of the potential problems before getting to this phase of the compiler.

Part 3 - Expression Pseudocode

One of the most simple methods for evaluating arithmetic expressions is to use an expression evaluation stack (if you have ever used an HP calculator with RPN, you have seen this technique in action). In a nutshell, here are the rules:

Keep a stack of values
When numbers and identifiers appear, they are pushed onto the stack.
A binary operator (e.g., "+") pops two values off the stack, computes the result, and pushes it on the stack.
A unary operator pops the top value off the stack, changes the value, and pushes the result.
To get the value of a expression, simply pop the value off the stack.

For example, if you have an expression 2+3*4-9, it can be evaluated as follows:

push 2          # Stack : 2
push 3          # Stack : 3 2
push 4          # Stack : 4 3 2
mutiply         # Stack : 12 2
add             # Stack : 14
push 9          # Stack : 9 14
sub             # Stack : 5
val := pop      # Stack :          # val gets the value 5

Write a function eval_expression() that produces pseudocode for expression evaluation using this technique. For example:

def eval_expression(out,expr):
    if expr.name == 'number':
          print >>out, "!   push", expr.value
    elif expr.name == 'id':
          print >>out, "!   push", expr.value
    elif expr.name == '+':
          left = expr.children[0]
          right = expr.children[1]
          eval_expression(out,left)
          eval_expression(out,right)
          print >>out, "!   add"
    elif expr.name == '-':
          left = expr.children[0]
          right = expr.children[1]
          eval_expression(out,left)
          eval_expression(out,right)
          print >>out, "!   sub"
    ...

In addition, produce similar pseudocode for the relational operations and boolean operators (and,or,not).

Now, modify the emit functions for the statements involving expressions to produce expression pseudocode. For example:

def emit_write(out,s):
    print >>out, "\n! write (start)"
    expr = (get expr from s)
    eval_expression(out,expr)
    print >>out, "!   expr := pop"
    print >>out, "!   write(expr)"
    print >>out, "! write (end)"

With these modifications, the output of your compiler will look something like this:

! Created by flame.py
! Yourname, CS326 (Spring 2001)

! program

! function: main (start)

! read (start)
! read (end)

! read (start)
! read (end)

! assign (start)
!   push y
!   g := pop
! assign (end)

! while (start)
!   push x
!   push 0
!   gt
!   relop := pop

! assign (start)
!   push x
!   g := pop
! assign (end)

! assign (start)
!   push y
!   push y
!   push x
!   div
!   push x
!   mul
!   sub
!   x := pop
! assign (end)

! assign (start)
!   push g
!   y := pop
! assign (end)

! while(end)

! write (start)
!   push g
!   expr := pop
!   write(expr)
! write (end)

! function: main (end)

Notes:

For function calls, simply produce pseudocode such as "! push blah()"
Arrays will involve multiple levels of expression evaluation (to compute the index). To handle an array access such as x:= 2 + a[4*i+10], your pseudocode will look roughly like this:
```
!   push 2
!   push 4
!   push i
!   mul
!   push 10
!   add
!   index := pop
!   push a[index]
!   add
!   x := pop
```

Part 4 - Control Flow Pseudocode

Now, modify the emit functions for the control flow statements to include a little more information about program control flow. For example:

def emit_while(out,s):
    print >>out, "\n! while (start)"
    print >>out, "! test:"
    relop = (get relation expression from s)
    eval_expression(out,relop)
    print >>out, "!   relop := pop"
    print >>out, "!   if not relop: goto done"

    statement = (get statement from s)
    emit_statement(out,statement)

    print >>out, "! goto test"
    print >>out, "! done:"
    print >>out, "! while (end)"

With these changes, your program should now look like this:

! Created by flame.py
! Yourname, CS326 (Spring 2001)

! program

! function: main (start)

! read (start)
! read (end)

! read (start)
! read (end)

! assign (start)
!   push y
!   g := pop
! assign (end)

! while (start)
! test:
!   push x
!   push 0
!   gt
!   relop := pop
!   if not relop: goto done

! assign (start)
!   push x
!   g := pop
! assign (end)

! assign (start)
!   push y
!   push y
!   push x
!   div
!   push x
!   mul
!   sub
!   x := pop
! assign (end)

! assign (start)
!   push g
!   y := pop
! assign (end)

! goto test
! done:
! while(end)

! write (start)
! push g
! expr := pop
! write(expr)
! write (end)

! function: main (end)

Notes:

Most control flow constructs have entry and exit points. For example, the structure of a while loop might look like this:

test:
    compute relational expression
    if false: goto done

    statements
    goto test           # Go back for another test

done:
    ... continue with program ...

An if statement is very similar. For example:

compute relational expression
if false: goto else
   
statements for true
goto next

else:

statements for false

next:
... next statement ...

Part 5 - Function calls

Next, modify the expression evaluation and statement code to produce pseudocode for a function call. This is really pretty easy---just evaluate each function argument separately (as an expression) and "store" the result. For example, if you have a function call like this:

x := foo(3,2*x+y,b) + 2

The pseudocode would look like this:

!  push 3
!  arg1 := pop
!  push 2
!  push x
!  mul
!  push y
!  add
!  arg2 := pop
!  push b
!  arg3 := pop
!  push foo(arg1,arg2,arg3)
!  push 2
!  add
!  x := pop

Intermission

By now, the general idea of what is happening should be starting to click. All of the pseudocode comments are really describing the operation of the program in some low-level sense. In fact, what you have really been doing is creating an informal intermediate representation of the program. If you were writing a production compiler, you might decide to generate some kind of intermediate representation that you could later target to many different types of machines. However, for our project, we are now going to start producing SPARC assembly code (using the pseudocode as a guide).

Part 6 - Program structure

The .s file produced by your compiler needs to split data between a text and a data segment. The text segment contains all of the machine instructions for the program. The data segment contains constants and global variables. For FLAME, the only items in the data segment are string literals and numeric constants (if necessary).

The general structure of your output with these two segments is specified in the Sun assembler as follows:

! Created by flame.py
! Yourname, CS326 (Spring 2001)

    .section     ".text"

Instructions

    .section     ".rodata"

Data

Since these two sections are generally created at the same time, the easiest way to deal with the data segment is to create a special in-memory string file for it like this:

import StringIO
data = StringIO.StringIO()

...
def emit_print(out,n):
    value = (get string literal value)
    label = new_label()
    # Drop a literal in the data segment
    print >>data, '%s:   .asciz  "%s"' % (label, value)
    ...
    # emit rest of print statement

def generate(out,top):
    ...
    print >>out,  '     .section  ".text"'
    print >>data, '     .section  ".rodata"'
    emit all of the code
    ...
    # Append the data segment at the end of the output
    print >>out, data.getvalue()

Within the assembly code created by your compiler, you will need to give names to various things. These names are often known as labels. Labels are always indicated by a name and a colon like this:

     .section ".text"

label1:
     statements
     
label2:
     statements

     .section ".rodata"

label3:
     data

label4:
     data

As a general rule, the only label names explicitly defined by a FLAME program are the names of functions. For all other labels, you should write a utility function new_label() that manufactures a new label name each time it is called. Usually these manufactured names correspond to parts of control-flow constructs such as if and while. They may also refer to global data in the data segment. Normally, these types of special labels are given names such as ".L12" or ".L23" so that they don't conflict with valid program function names (which can never start with a ".").

In addition, some labels may be declared as global so that they can be accessed externally. The .global directive is used to define a global label. All of your functions should be declared in this manner. For example:

      .global main
main:
      statements

To illustrate the use of labels, modify your emit functions to insert labels an appropriate locations. For example,

def emit_while(out,s):
    print >>out, "\n! while (start)"


    test_label = new_label()
    done_label = new_label()
    print >>out, "%s:" % test_label

    relop = (get relation expression from s)
    eval_expression(out,relop)
    print >>out, "!   relop := pop"
    print >>out, "!   if not relop: goto %s" % done_label
    
    statement = (get statement from s)
    emit_statement(out,statement)

    print >>out, "! goto %s" % test_label
    print >>out, "%s:" % done_label

    print >>out, "! while (end)"

With these changes, your output should now look like this (notice the use of .L1, .L2, and main):

! Created by flame.py
! Yourname, CS326 (Spring 2001)

    .section     ".text"

! program

! function: main (start)

        .global main
main:

! read (start)
! read (end)

! read (start)
! read (end)

! assign (start)
!   push y
!   g := pop
! assign (end)

! while (start)

.L1:

!   push x
!   push 0
!   gt
!   relop := pop
!   if not relop: goto .L2

! assign (start)
!   push x
!   g := pop
! assign (end)

! assign (start)
!   push y
!   push y
!   push x
!   div
!   push x
!   mul
!   sub
!   x := pop
! assign (end)

! assign (start)
!   push g
!   y := pop
! assign (end)

! goto .L1

.L2:

! while(end)

! write (start)
! push g
! expr := pop
! write(expr)
! write (end)

! function: main (end)

     .section ".rodata"

Part 7 - Function calls, stack frames, and registers

The general form of a subroutine on the SPARC is as follows:

funcname:
        save %sp, -96, %sp
        ...
        instructions
        ...
        ret
        restore

The first instruction (save) is used to allocate a new stack frame. The number 96 is the size of the stack frame size in bytes. The actual number you supply here will depend on the number of local variables, temporaries, and function calling conventions used by your compiler. The stack frame size must always be larger than 64 bytes and be a multiple of 8. The ret instruction is used to return from a subroutine. The restore instruction restores the stack pointer to its previous value (undoing the effects of the save instruction). (Note: the SPARC has delayed branches so the restore instruction appearing immediately after ret always executes before the ret actually takes effect).

Within the body of a function, two registers are used to hold information about the location of the stack frame. The %fp register contains the address of the top of the stack frame (the value of %sp before the save instruction executes). The %sp register contains the address of the bottom of the stack frame (after the save instruction). Please note that the stack grows downward so the value of %fp is always larger than than the value of %sp.

The first 64 bytes immediately above the stack pointer %sp are always reserved by the system for register storage. Your compiler should never access or store data in the range of memory addresses [%sp, %sp+63]!.

Local variables and temporaries are stored below the frame pointer. For example, if you had a FLAME function like this:

fun foo()
   x : int;
   y : float;
   a : int;
   b : int[20];
   begin
        ...
   end

You would need to allocate at least 4+4+4+4*20=92 bytes of memory for local variables. Thus, the start of your procedure might look like this:

foo:
     save %sp, -160, %sp   
     ...

The value of 160 is formed by taking the 92 bytes of memory needed for local variables, adding 64 bytes for register storage, and adding 4 bytes of padding to make the value a multiple of 8. Of course, if your procedure had additional temporary variables, you would need to allocate extra space for them as well.

The local variables themselves might be stored in the following memory locations:

              
x:int          -> [%fp - 4]              
y:float        -> [%fp - 8]
a:int          -> [%fp - 12]
b:int[20]      -> [%fp - 92]

                  _____________  < %fp
                 |   x:int     |
                 |-------------| < %fp - 4
                 |   y:float   |
                 |-------------| < %fp - 8
                 |   a:int     | 
                 |-------------| < %fp - 12
                 |   b[19]     |
                 |-------------|
                 |   b[18]     |
                 |-------------|    
                //            //
                 |-------------|
                 |   b[0]      |
                 |-------------| < %fp - 92
                 |    PAD      |
                 |-------------| < %fp - 96, %sp+64
                 |             |
                 |  RESERVED   |
                 |             |
                 |-------------| < %sp,  %fp - 160

Within each procedure, the following CPU registers may be used for temporary calculations.

%l0 - %l7. Locals.
%i0 - %i5. Inputs.
%o0 - %o5. Outputs.
%g0 - %g7. Globals.

Every procedure automatically gets its own set of private local registers %l0-%l7 (the save takes care of this). These registers are only visible to the current procedure and are never destroyed or modified by other subroutines. Your compiler should use these registers to perform expression evaluation and other intermediate calculations.

The input registers %i0-%i5 are commonly used to hold the first 6 input arguments to a function.

The output registers %o0-%o5 are used to hold output parameters when a procedure wants to call another subroutine. There is a special relationship between the %i and %o registers---specifically, whenever a procedure call is made, the output registers in the caller turn into the input registers in the called procedure. This shifting is really due to the behavior of SPARC register windows (described later).

The global registers %g0-%g7 are shared by all procedures and should only be used for temporary calculations such as computing memory addresses. The register %g0 is hardwired to the value 0. Registers %g4-%g7 are reserved by the operating system and should generally be avoided by your compiler.

The %i6 register is the same as the frame pointer %fp. The %o6 register is the same as the stack pointer %sp. The %i7 register contains the return address of a subroutine (the value of the PC that will be restored when a procedure returns).

Specific instructions:

Modify your code generator so that it produces proper function entry and exit code for each function. For starters, hardwire your compiler so that it produces the following code:
```
funcname:  
       save %sp, -128, %sp
       ...
.Ln:
       ret
       restore
```
The special .Ln label at the end is used to label the location of the function return. You will be using this a little later.

For the main() function, you need to need to add some special processing to terminate the running program. In this case, modify the exit code as follows:

.Ln:
        mov 0, %o0      ! only appears in main
        call _exit      ! only appears in main
        nop             ! only appears in main
        ret
        restore

The FLAME runtime library

A number of statements in the FLAME specification are used for I/O. Specifically, read(), write(), and print(). To implement these statements, a small runtime library is provided in the file flame.o. The functions in this library are as follows (C prototypes).

void  flwritei(int x);             /* Write an integer */
void  flwritef(float f);           /* Write a float */
int   flreadi();                   /* Read an integer */
float flreadf();                   /* Read a float */
void  flprint(char *s);            /* Print a string */

To call these functions from assembly language, you will need to use the following code sequences:

! call flwritei(int)
         mov  val, %o0
         call flwritei
         nop

! call flwritef(float)
         mov val, %o0
         call flwritef
         nop

! call flreadi()
         call flreadi
         nop
         st  %o0, result

! call flreadf()
         call flreadf
         nop
         st %o0, result

! call flprint()
         sethi %hi(addr), %o0
         or    %o0, %lo(addr), %o0
         call flprint
         nop

In these example, val is assumed to be a register, and result is assumed to be a memory location. addr is the address of the string literal in the program data segment. See the next section.

Part 8 - Hello World

Whew, now it's time for you to compile the traditional hello world program. Here is the FLAME program:

fun main()
    begin
        print("Hello World\n")
    end

To do it, implement the emit function for the print statement by making it produce the following code:

In the data segment (see Part 6), write a declaration like this:
```
.Ln:       .asciz    "Hello World\n"
```

Now, in the text segment, write the following assembly code

         sethi %hi(.Ln), %o0
         or    %o0, %lo(.Ln), %o0
         call  flprint
         nop

The label name .Ln should be manufactured by your compiler. It doesn't matter what it is as long as its not the same as any other label.

Now, let's try out your hello world program:

% flame.py hello.flm
% cat hello.s
! Created by flame.py
! Yourname, CS326 (Spring 2001)

     .section ".text"

! program

! function: main (start)

        .global main
main:
       save %sp, -128, %sp
       sethi %hi(.L2), %o0
       or    %o0, %lo(.L2), %o0
       call  flprint
       nop

.L1: 
       mov  0, %o0
       call _exit
       nop
       ret
       restore

! function: main (end)
       
     .section ".rodata"

.L2:   .asciz  "Hello World\n"
% as hello.s
% ld hello.o flame.o -lc
% a.out
Hello World
%

Congratulations, you have just compiled your first runnable FLAME program.

Part 9 - Storage allocation for local variables

Now, it's time for you to tackle the problem of storage management for local variables.

The local variables to each function should be contained in some kind of list or parse tree structure created back in Project 2/3. In this part of the project, you will walk through this list and figure out where the variable is suppose to be stored on the function stack frame. Recall that the local variables are usually stored at the top of the stack frame (right below the %fp).

First, you need to know the size of each datatype:

int           4 bytes (32 bits)
float         4 bytes (32 bits)
int[n]        4*n bytes
float[n]      4*n bytes

Now, write a function allocate_locals() that iterates over the local variable list and assigns a frame offset to each variable. The frame offset should be stored in the symbol table entry for each local. For example, if you had the function,

fun main()
   x : int;
   y : float;
   a : int;
   b : int[20];
   begin
        ...
   end

you might assign the following offset values (see Part 7):

x.symtab.frame_offset = -4      [%fp - 4]
y.symtab.frame_offset = -8      [%fp - 8]
a.symtab.frame_offset = -12     [%fp - 12]
b.symtab.frame_offset = -92     [%fp - 92]

The allocate_locals() function should return the total number of bytes needed to store the local variables.

Finally, make the following modifications to your code generator:

In the code for emitting a function, call allocate_locals() to figure out how much local variable space is needed and to assign frame offsets to the local variables.
Modify the code that prints the "save" statement to produce an accurately computed frame size. The frame size is equal to 64 plus the returned value of allocate_locals() rounded up to the nearest multiple of 8. For the above function you would produce save %sp, -160, %sp instead of the hardwired save %sp, -128, %sp that you used for the hello world program.

Part 10 - Simple Expression Evaluation

In Part 3, you generated pseudocode for expression evaluation using a stack. Now, it's time to translate this to real hardware. Doing this isn't so difficult if you think about the problem in the right frame of mind.

The easiest way to translate the stack pseudocode to a real implementation is to treat the %l0-%l7 registers as always representing the top 8 elements of the expression stack. Thus, if you wanted to evaluate the expression 2+3*4-9, it might be evaluated as follows:

SPARC assembly                      Pseudocode
----------------------------------------------
mov   2, %l0                        push 2
mov   3, %l1                        push 3
mov   4, %l2                        push 4
mov   %l1,%o0
call  .mul                          multiply
mov   %l2,%o1
mov   %o0, %l1
add   %l0,%l1,%l0                   add
mov   9, %l1                        push 9
sub   %l0,%l1,%l0                   sub
st    %l0, value                    val := pop

To implement this, create a couple of functions that implement the functionality of a stack machine. For example, your implementation might have the following functions or methods:

push(). Prepares to push an item on the stack. Returns a register name where the data should be placed.
pop(). Pops an item off of the stack. Returns a register name where the data is located. If the requested item has been stored in memory, it should be restored and placed in a register.

Here is a short example of how you would use these functions to implement the example above:

print >>out,"mov 2, %s" % push()
print >>out,"mov 3, %s" % push()
print >>out,"mov 4, %s" % push()
r = pop()
l = pop()
print >>out,"mov %s, %%o0" % l
print >>out,"call .mul"
print >>out,"mov %s, %%o1" % r
print >>out,"mov %%o0, %s" % push()
r = pop()
l = pop()
print >>out,"add %s, %s, %s" % (l,r,push())
print >>out,"mov 9, %s" % push()
r = pop()
l = pop()
print >>out,"sub %s, %s, %s" % (l,r,push())
result = pop()
print >>out,"st %s, %s" % (result,val)

One problem with the stack implementation is that once more than 8 values are on the stack, old values will have to be spilled to memory in order to accomodate new values. To handle memory spills, create a temporary variable region that is located below all of the local variables in the function stack frame. Now, modify the push() and pop() functions as follows:

If there are no more registers, push() should produce code that stores the oldest register to memory and increments a temporary variable counter. The old register name should then be made available.
If a pop() operation is performed, but the requested item is no longer stored in a register, you will need to emit some code to load the old value from memory back into a register.

The register spill behavior can be seen in the following example. Suppose you had this expression:

1+(2+(3+(4+(5+(6+(7+(8+(9+10))))))))

The SPARC assembly code for evaluating this on a stack might look like this:

mov  1, %l0                ! push 1
mov  2, %l1                ! push 2
mov  3, %l2                ! push 3
mov  4, %l3                ! push 4
mov  5, %l4                ! push 5
mov  6, %l5                ! push 6
mov  7, %l6                ! push 7
mov  8, %l7                ! push 8
st   %l0, [%fp - 160]      ! spill %l0 (1)
mov  9, %l0                ! push 9
st   %l1, [%fp - 164]      ! spill %l1 (2)
mov  10, %l1               ! push 10
add  %l0,%l1,%l0           ! (9+10)
add  %l7,%l0,%l7           ! 8+(9+10)
add  %l6,%l7,%l6           ! 7+(8+(9+10))
add  %l5,%l6,%l5           ! 6+(7+(8+(9+10)))
add  %l4,%l5,%l4           ! 5+(6+(7+(8+(9+10))))
add  %l3,%l4,%l3           ! 4+(5+(6+(7+(8+(9+10)))))
add  %l2,%l3,%l2           ! 3+(4+(5+(6+(7+(8+(9+10))))))
ld   [%fp - 164], %l1      ! recover 2
add  %l1, %l2, %l1         ! 2+(3+(4+(5+(6+(7+(8+(9+10)))))))
ld   [%fp - 160], %l0      ! recover 1
add  %l0, %l1, %l0         ! 1+(2+(3+(4+(5+(6+(7+(8+(9+10))))))))

You implementation will have to keep track of the maximum number of spilled variables during expression evaluation. This number will have to be added to the stack frame size given to the save statement.

Here are a few general code generation rules you will need to use to evaluate expressions:

Integer constants. If the value is in the range -4095 to 4095, use the following to push a value on the stack:
```
mov  constant, %ln
```
If the value has absolute value greater than 4095, do the following:
```
sethi %hi(.Lm), %g1
or    %g1, %lo(.Lm), %g1
ld    [%g1], %ln
...
       .section rodata

.Lm:    .word    constant
```
Floating point constants. Follow the same procedure as for integer constants greater than 4095. However, instead of declaring a .word, do this:
```
sethi %hi(.Lm), %g1
or    %g1, %lo(.Lm), %g1
ld    [%g1], %ln
...
       .section rodata

.Lm:    .float    floatconstant
```
Local variables and arguments. Generate code like this:
```
ld  [%fp + offset], %ln
```
where offset is the frame pointer offset (stored in the symbol table) for the local variable and %ln is the register corresponding to the top of the expression stack.
Binary operators. You will apply an operator to the top two items on the stack and place the result on the top of the stack. The SPARC only has a limited number of binary operations. In fact, it doesn't even support multiply or division. Here are the instruction sequences you will need to implement integer arithmetic:
- Addition (+), Subtraction (-)
```
add %li, %lj, %lk     ! %li + %lj -> %lk
sub %li, %lj, %lk     ! %li - %lj -> %lk
```
- Multiplication (*), Division (/)
```
mov %li, %o0          ! %li * %lj -> %lk
call .mul
mov %lj, %o1          ! Note: this is in branch delay slot
mov %o0, %lk

mov %li, %o0          ! %li / %lj -> %lk
call .div
mov %lj, %o1          ! Note: this is in branch delay slot
mov %o0, %lk
```

Relational Expressions. Depending on your implementation, you may find it easier to implement relation operators as integer operators with a 0 representing false and 1 represeting true. The following instructions can be used to evaluate various relational operators:

bitwise and, or

and %li, %lj, %lk     ! %li and %lj -> %lk
or %li, %lj, %lk      ! %li or  %lj -> %lk

<, <=, > >=, ==, !=

! <
   cmp %li, %lj       ! if %li < %lj then %lk = 1 else %lk = 0
   bl .Lm             ! branch less than
   mov 1, %lk         ! branch delay (always executed)
   mov 0, %lk
.Lm:

! <=
   cmp %li, %lj       ! if %li <= %lj then %lk = 1 else %lk = 0
   ble .Lm            ! branch less than or equal
   mov 1, %lk         ! branch delay (always executed)
   mov 0, %lk
.Lm:

! >
   cmp %li, %lj       ! if %li > %lj then %lk = 1 else %lk = 0
   bg .Lm             ! branch greater than
   mov 1, %lk         ! branch delay (always executed)
   mov 0, %lk
.Lm:

! >=
   cmp %li, %lj       ! if %li >= %lj then %lk = 1 else %lk = 0
   bge .Lm            ! branch greater than
   mov 1, %lk         ! branch delay (always executed)
   mov 0, %lk
.Lm:

! ==
   cmp %li, %lj       ! if %li == %lj then %lk = 1 else %lk = 0
   be .Lm             ! branch greater than
   mov 1, %lk         ! branch delay (always executed)
   mov 0, %lk
.Lm:

! !=
   cmp %li, %lj       ! if %li != %lj then %lk = 1 else %lk = 0
   bne .Lm            ! branch greater than
   mov 1, %lk         ! branch delay (always executed)
   mov 0, %lk
.Lm:

Unary operators. Simply apply the operator the top of the stack.
- Unary (-)
```
neg %li, %lj    ! -%li -> %lk
```
- Logical not (assuming 1=true, 0=false)
```
xor %li, 1, %lj    ! not %li -> %lk
```

Part 11 - Implement simple assignment

Implement simple variable assignment for integers. Specifically, the statement

a := expr

Gets translated into code like this,

! evaluate expression
    ...
! assignment
    st %lm, [%fp + offset]

where %lm is the register containing the result of the expression and offset is the frame offset of the variable.

Part 12 - Implement write

Implement the write(expr) statement so you can test your expression evaluator. This function requires the use of the FLAME library. Specifically, the code will look like this,

! evaluate expression
...
! write
          mov  %lm, %o0
          call flwritei
          nop

where %lm is the register containing the result of the expression given to write.

Test out your write() statement and assignment functions by compiling a few simple programs such as this:

fun main()
   a : int;
   b : int;
   c : int;
   begin
         a := 3;
         b := a + 10;
         c := b * 2;
         write(c)
   end

Part 13 - Control flow

Implement the basic control flow statements while and if-else. The general form of a while statement is as follows:

! while statement

.Lm:
! Evaluate relop
      ...

      cmp  %li, %g0           ! %li contains result of relop
      be   .Ln                ! == 0, false, exit
      nop

      ...
      statements
      ...
      ba  .Lm
      nop

.Ln:

The general form of a if-else-statement is as follows:

! if-else statement

! Evaluate relop
      ...
      cmp  %li, %g0           ! %li contains result of relop
      be   .Lm                ! == 0, false, goto else
      nop

! then part
      ...
      statements
      ...
      ba .Ln                ! skip else part
      nop

! else part
.Lm:
      ...
      statements
      ...

.Ln:

Part 14 - Arrays

When arrays are allocated on the stack, the frame offset refers to the location of the first element of the array. For example, an array int[100] might occupy the memory region [%fp - 600, %fp - 200).

To access an array element, you will need to do three things:

Compute the array index value.
Compute the array address.
Load the array value.

For example, to evaluate a[expr], you will have to generate code similar to this (note: there are many ways to do this):

! Evaluate expr
       ...
! compute offset
       sll  %lm, 2, %lm           ! Multiply by 4 (left shift 2 bits)
       add  %fp, %lm, %lm         ! Add to frame pointer
       ld   [%lm + offset], %ln

In addition, you may want to put arrays bounds checking in this code as well.

Storing to an array is similar. You will first need to compute the array index and memory address before a value can be stored.

Note: When evaluating expressions such as a + b[4*i-2] + c, the computation of the array index will use the same expression stack as the outer expression. Just make sure your stack allocator allows for nested expression evaluations and everything should work out.

When arrays are passed as parameters to function, they should be passed by reference (e.g., a pointer to the first element of the array is passed instead of a copy of the array itself).

Part 15 - Making function calls

To make a function call, you need to do the following:

Evaluate the expression for each function argument and store the results on the expression stack.
Pop the arguments off the expression stack and store in the appropriate %on registers (or stack).
Make the subroutine call using the call instruction.

When building the function arguments, you will have to push their values onto the expression stack until you are ready to make the complete function call. Thus, if you had the call:

foo(a1,a2,a3,a4)

The code you generate will roughly following this prototype:

evaluate a1       expression stack: a1
evaluate a2       expression stack: a2 a1                      
evaluate a3       expression stack: a3 a2 a1
evaluate a4       expression stack: a4 a3 a2 a1

! Make actual function call
t = pop()
store t, arg4
t = pop()
store t, arg3
t = pop()
store t, arg2
t = pop()
store t, arg1

call foo

Part 16 - Floating point

Defer to Project 5.

Part 17 - Nested functions

To implement nested functions you should use the technique of "displays" as described in chapter 7 of the book. In a nutshell:

For each identifier, stored a nesting level in the symbol table. This is an integer that defines the amount of nesting (0 for outer most functions).

Create a global array in the .data segment for the display. For example:

      .section ".data"

DISPLAY:   .word    0, 0, 0, 0, 0, 0, 0, 0, 0, 0
                   ! number of nesting levels here

Given a function with nesting level i, generate code so that the following steps are performed:
- The old value of DISPLAY[i] is saved to a temporary variable (on the stack)
- The value of the current %fp is written to DISPLAY[i].
- The old value of DISPLAY[i] is restored prior to function exit.
To look up a variable with nesting level i, simply use the value of DISPLAY[i] to get the corresponding frame pointer. Then reference the variable using the frame offset stored in the symbol table.

Implementation hints

It is not necessary to be an expert at assembly language to complete this part of the project. Most of what you will be doing is printing snap snippets of assembly instructions that are strung together to make a program.
When generating function code, it can be difficult to to know the exact stack frame size until you have generated all of the code. One way to deal with this is to print all of the function body code to a string and to add the stack frame allocation at the end. For instance, you might do this:
```
       import StringIO
       f = StringIO.StringIO()       # Create an in-memory file object
       
       emit_function_body(f,node)

       print >>outf,"! function (start)"
       print >>outf,"%s:" % function_name
       print >>outf,"     save %%sp, -%d, %%sp" % frame_size
       print >>outf, f.getvalue()
```
Of course, there is more than one way to do this.
To deal with register allocation, I found it easier to write a Register class with two methods push() and pop(). Within the class, I maintained state related to the current expression stack pointer, which registers were in use, which registers had been spilled to memory and so forth. Note: the expression stack has nothing to do with the call stack.

Handin procedures

Your compiler must be runnable from a file 'flame.py'. We must be able to run this file as follows:
```
% python flame.py testname.flm
% a.out
```
Make sure you have a README file that includes your name and anything notable about your implementation.
Create a Unix tar file for your project. This tar file should contain the 'flame' directory and be based on your login name. For example:
```
% tar -cf yourlogin.tar flame
```
On classes.cs.uchicago.edu, copy your tar file to the directory /stage/classes/current/CS326/handin/project4 For example:
```
% cp yourlogin.tar /stage/classes/current/CS326/handin/project4
```
Late assignments are not accepted. The handin directory will be closed after the due date.

IF WE CAN NOT UNPACK OR RUN YOUR HANDIN USING THE SUPPLIED TEST SCRIPTS, YOU WILL RECEIVE NO CREDIT!