Well, if you've made it this far, your compiler is well on its way to producing code. In this project, you will be extending your compiler to produce runnable assembly code for use on Sun SPARC workstations. By the end of this project, your compiler will finally be able to create runnable FLAME programs.
DO NOT WAIT UNTIL THE LAST MINUTE TO START THIS PROJECT!
If your compiler is still not parsing correct programs, it would be a very good idea to get that working as soon as possible. Please see Ian or myself.
For example, if a user supplies a FLAME program hello.flm like this:
The output of your compiler will be a file hello.s that looks something like this:fun main() begin print("Hello World\n") end
This file will then be assembled with the Sun assembler and linked with a small FLAME runtime library to produce an executable. For example:! generated by flame.py .section ".text" ! function: main .global main .type main,2 main: save %sp, 64, %sp sethi %hi(.L1), %o0 or %o0, %lo(.L1), %o0 call flprint nop call _exit nop .section ".rodata" .L1: .asciz "Hello World\n"
This project assumes no prior experience with SPARC assembly language (details follow). However, you will have to do a considerable amount of experimentation to make sure things work correctly.% python flame.py hello.flm % as hello.s % ld hello.o flame.o -lc % a.out Hello World %
import sys import os.path filename = sys.argv[1] outname = os.path.splitext(filename)[0] + ".s"
The output of your parser should be the root node of a well-formed syntax tree representing the contents of the input program. If a syntax error or type error occurred during parsing, you may want to have your parser return None to indicate an error.f = open(filename) data = f.read() f.close() import flameparse top = flameparse.parse(data)
def generate(file,top): print >>file, "! Created by flame.py" print >>file, "! Yourname, CS326 (Spring 2001)"
top = flameparse.parse(data) if top: import flamegen outf = open(outfile,"w") flamegen.generate(outf,top) outf.close()
Congratulations, your compiler just produced its first output file.% python flame.py simple.flm % more simple.s ! Created by flame.py ! Yourname, CS326 (Spring 2001) %
To walk the tree, create a collection of functions that try to print information about each statement in the source program. For example (note: you will have to modify this code to match your syntax tree representation):
For each statement in your program, make the corresponding emit function print a starting and ending marker as shown in the example. Note: lines starting with "!" are comments in assembler.def emit_program(out,top): print >>out,"\n! program" funclist = (get list of functions) for f in funclist: emit_function(out,f) def emit_function(out,func): print >>out,"\n! function: %s (start) " % func.name ... statements = (get list of statements) emit_statements(out,statements) ... print >>out,"! function: %s (end) " % func.name def emit_statements(out,statements): for s in statements: emit_statement(out,s) def emit_statement(out,s): if s.name == 'print': emit_print(out,s) elif s.name == 'read': emit_read(out,s) elif s.name == 'write': emit_write(out,s): elif s.name == 'while': emit_while(out,s): ... def emit_print(out,s): print >>out, "\n! print (start)" ... print >>out, "! print (end)" def emit_while(out,s): print >>out, "\n! while (start)" ... statement = (get body of while) emit_statement(out,statement) ... print >>out, "! while (end)"
Now, modify your generate() function to walk through a program and print information about each function and individual statements. For example, if you supplied the following program:
The output of your compiler should look roughly like this:/* Compute GCD */ fun main() x: int; y: int; g: int; begin read(x); read(y); g := y; while x > 0 do begin g := x; x := y - (y/x)*x; y := g end; write(g) end
Notes:! Created by flame.py ! Yourname, CS326 (Spring 2001) ! program ! function: main (start) ! read (start) ! read (end) ! read (start) ! read (end) ! assign (start) ! assign (end) ! while (start) ! assign (start) ! assign (end) ! assign (start) ! assign (end) ! assign (start) ! assign (end) ! while(end) ! write (start) ! write (end) ! function: main (end)
Write a function eval_expression() that produces pseudocode for expression evaluation using this technique. For example:push 2 # Stack : 2 push 3 # Stack : 3 2 push 4 # Stack : 4 3 2 mutiply # Stack : 12 2 add # Stack : 14 push 9 # Stack : 9 14 sub # Stack : 5 val := pop # Stack : # val gets the value 5
In addition, produce similar pseudocode for the relational operations and boolean operators (and,or,not).def eval_expression(out,expr): if expr.name == 'number': print >>out, "! push", expr.value elif expr.name == 'id': print >>out, "! push", expr.value elif expr.name == '+': left = expr.children[0] right = expr.children[1] eval_expression(out,left) eval_expression(out,right) print >>out, "! add" elif expr.name == '-': left = expr.children[0] right = expr.children[1] eval_expression(out,left) eval_expression(out,right) print >>out, "! sub" ...
Now, modify the emit functions for the statements involving expressions to produce expression pseudocode. For example:
With these modifications, the output of your compiler will look something like this:def emit_write(out,s): print >>out, "\n! write (start)" expr = (get expr from s) eval_expression(out,expr) print >>out, "! expr := pop" print >>out, "! write(expr)" print >>out, "! write (end)"
Notes:! Created by flame.py ! Yourname, CS326 (Spring 2001) ! program ! function: main (start) ! read (start) ! read (end) ! read (start) ! read (end) ! assign (start) ! push y ! g := pop ! assign (end) ! while (start) ! push x ! push 0 ! gt ! relop := pop ! assign (start) ! push x ! g := pop ! assign (end) ! assign (start) ! push y ! push y ! push x ! div ! push x ! mul ! sub ! x := pop ! assign (end) ! assign (start) ! push g ! y := pop ! assign (end) ! while(end) ! write (start) ! push g ! expr := pop ! write(expr) ! write (end) ! function: main (end)
! push 2 ! push 4 ! push i ! mul ! push 10 ! add ! index := pop ! push a[index] ! add ! x := pop
With these changes, your program should now look like this:def emit_while(out,s): print >>out, "\n! while (start)" print >>out, "! test:" relop = (get relation expression from s) eval_expression(out,relop) print >>out, "! relop := pop" print >>out, "! if not relop: goto done" statement = (get statement from s) emit_statement(out,statement) print >>out, "! goto test" print >>out, "! done:" print >>out, "! while (end)"
Notes:! Created by flame.py ! Yourname, CS326 (Spring 2001) ! program ! function: main (start) ! read (start) ! read (end) ! read (start) ! read (end) ! assign (start) ! push y ! g := pop ! assign (end) ! while (start) ! test: ! push x ! push 0 ! gt ! relop := pop ! if not relop: goto done ! assign (start) ! push x ! g := pop ! assign (end) ! assign (start) ! push y ! push y ! push x ! div ! push x ! mul ! sub ! x := pop ! assign (end) ! assign (start) ! push g ! y := pop ! assign (end) ! goto test ! done: ! while(end) ! write (start) ! push g ! expr := pop ! write(expr) ! write (end) ! function: main (end)
test: compute relational expression if false: goto done statements goto test # Go back for another test done: ... continue with program ...
compute relational expression if false: goto else statements for true goto next else: statements for false next: ... next statement ...
The pseudocode would look like this:x := foo(3,2*x+y,b) + 2
! push 3 ! arg1 := pop ! push 2 ! push x ! mul ! push y ! add ! arg2 := pop ! push b ! arg3 := pop ! push foo(arg1,arg2,arg3) ! push 2 ! add ! x := pop
The general structure of your output with these two segments is specified in the Sun assembler as follows:
Since these two sections are generally created at the same time, the easiest way to deal with the data segment is to create a special in-memory string file for it like this:! Created by flame.py ! Yourname, CS326 (Spring 2001) .section ".text" Instructions .section ".rodata" Data
Within the assembly code created by your compiler, you will need to give names to various things. These names are often known as labels. Labels are always indicated by a name and a colon like this:import StringIO data = StringIO.StringIO() ... def emit_print(out,n): value = (get string literal value) label = new_label() # Drop a literal in the data segment print >>data, '%s: .asciz "%s"' % (label, value) ... # emit rest of print statement def generate(out,top): ... print >>out, ' .section ".text"' print >>data, ' .section ".rodata"' emit all of the code ... # Append the data segment at the end of the output print >>out, data.getvalue()
As a general rule, the only label names explicitly defined by a FLAME program are the names of functions. For all other labels, you should write a utility function new_label() that manufactures a new label name each time it is called. Usually these manufactured names correspond to parts of control-flow constructs such as if and while. They may also refer to global data in the data segment. Normally, these types of special labels are given names such as ".L12" or ".L23" so that they don't conflict with valid program function names (which can never start with a ".")..section ".text" label1: statements label2: statements .section ".rodata" label3: data label4: data
In addition, some labels may be declared as global so that they can be accessed externally. The .global directive is used to define a global label. All of your functions should be declared in this manner. For example:
.global main main: statements
To illustrate the use of labels, modify your emit functions to insert labels an appropriate locations. For example,
With these changes, your output should now look like this (notice the use of .L1, .L2, and main):def emit_while(out,s): print >>out, "\n! while (start)" test_label = new_label() done_label = new_label() print >>out, "%s:" % test_label relop = (get relation expression from s) eval_expression(out,relop) print >>out, "! relop := pop" print >>out, "! if not relop: goto %s" % done_label statement = (get statement from s) emit_statement(out,statement) print >>out, "! goto %s" % test_label print >>out, "%s:" % done_label print >>out, "! while (end)"
! Created by flame.py ! Yourname, CS326 (Spring 2001) .section ".text" ! program ! function: main (start) .global main main: ! read (start) ! read (end) ! read (start) ! read (end) ! assign (start) ! push y ! g := pop ! assign (end) ! while (start) .L1: ! push x ! push 0 ! gt ! relop := pop ! if not relop: goto .L2 ! assign (start) ! push x ! g := pop ! assign (end) ! assign (start) ! push y ! push y ! push x ! div ! push x ! mul ! sub ! x := pop ! assign (end) ! assign (start) ! push g ! y := pop ! assign (end) ! goto .L1 .L2: ! while(end) ! write (start) ! push g ! expr := pop ! write(expr) ! write (end) ! function: main (end) .section ".rodata"
The first instruction (save) is used to allocate a new stack frame. The number 96 is the size of the stack frame size in bytes. The actual number you supply here will depend on the number of local variables, temporaries, and function calling conventions used by your compiler. The stack frame size must always be larger than 64 bytes and be a multiple of 8. The ret instruction is used to return from a subroutine. The restore instruction restores the stack pointer to its previous value (undoing the effects of the save instruction). (Note: the SPARC has delayed branches so the restore instruction appearing immediately after ret always executes before the ret actually takes effect).funcname: save %sp, -96, %sp ... instructions ... ret restore
Within the body of a function, two registers are used to hold information about the location of the stack frame. The %fp register contains the address of the top of the stack frame (the value of %sp before the save instruction executes). The %sp register contains the address of the bottom of the stack frame (after the save instruction). Please note that the stack grows downward so the value of %fp is always larger than than the value of %sp.
The first 64 bytes immediately above the stack pointer %sp are always reserved by the system for register storage. Your compiler should never access or store data in the range of memory addresses [%sp, %sp+63]!.
Local variables and temporaries are stored below the frame pointer. For example, if you had a FLAME function like this:
You would need to allocate at least 4+4+4+4*20=92 bytes of memory for local variables. Thus, the start of your procedure might look like this:fun foo() x : int; y : float; a : int; b : int[20]; begin ... end
The value of 160 is formed by taking the 92 bytes of memory needed for local variables, adding 64 bytes for register storage, and adding 4 bytes of padding to make the value a multiple of 8. Of course, if your procedure had additional temporary variables, you would need to allocate extra space for them as well.foo: save %sp, -160, %sp ...
The local variables themselves might be stored in the following memory locations:
Within each procedure, the following CPU registers may be used for temporary calculations.x:int -> [%fp - 4] y:float -> [%fp - 8] a:int -> [%fp - 12] b:int[20] -> [%fp - 92] _____________ < %fp | x:int | |-------------| < %fp - 4 | y:float | |-------------| < %fp - 8 | a:int | |-------------| < %fp - 12 | b[19] | |-------------| | b[18] | |-------------| // // |-------------| | b[0] | |-------------| < %fp - 92 | PAD | |-------------| < %fp - 96, %sp+64 | | | RESERVED | | | |-------------| < %sp, %fp - 160
The input registers %i0-%i5 are commonly used to hold the first 6 input arguments to a function.
The output registers %o0-%o5 are used to hold output parameters when a procedure wants to call another subroutine. There is a special relationship between the %i and %o registers---specifically, whenever a procedure call is made, the output registers in the caller turn into the input registers in the called procedure. This shifting is really due to the behavior of SPARC register windows (described later).
The global registers %g0-%g7 are shared by all procedures and should only be used for temporary calculations such as computing memory addresses. The register %g0 is hardwired to the value 0. Registers %g4-%g7 are reserved by the operating system and should generally be avoided by your compiler.
The %i6 register is the same as the frame pointer %fp. The %o6 register is the same as the stack pointer %sp. The %i7 register contains the return address of a subroutine (the value of the PC that will be restored when a procedure returns).
Specific instructions:
The special .Ln label at the end is used to label the location of the function return. You will be using this a little later.funcname: save %sp, -128, %sp ... .Ln: ret restore
.Ln: mov 0, %o0 ! only appears in main call _exit ! only appears in main nop ! only appears in main ret restore
To call these functions from assembly language, you will need to use the following code sequences:void flwritei(int x); /* Write an integer */ void flwritef(float f); /* Write a float */ int flreadi(); /* Read an integer */ float flreadf(); /* Read a float */ void flprint(char *s); /* Print a string */
In these example, val is assumed to be a register, and result is assumed to be a memory location. addr is the address of the string literal in the program data segment. See the next section.! call flwritei(int) mov val, %o0 call flwritei nop ! call flwritef(float) mov val, %o0 call flwritef nop ! call flreadi() call flreadi nop st %o0, result ! call flreadf() call flreadf nop st %o0, result ! call flprint() sethi %hi(addr), %o0 or %o0, %lo(addr), %o0 call flprint nop
To do it, implement the emit function for the print statement by making it produce the following code:fun main() begin print("Hello World\n") end
.Ln: .asciz "Hello World\n"
sethi %hi(.Ln), %o0 or %o0, %lo(.Ln), %o0 call flprint nop
Now, let's try out your hello world program:
Congratulations, you have just compiled your first runnable FLAME program.% flame.py hello.flm % cat hello.s ! Created by flame.py ! Yourname, CS326 (Spring 2001) .section ".text" ! program ! function: main (start) .global main main: save %sp, -128, %sp sethi %hi(.L2), %o0 or %o0, %lo(.L2), %o0 call flprint nop .L1: mov 0, %o0 call _exit nop ret restore ! function: main (end) .section ".rodata" .L2: .asciz "Hello World\n" % as hello.s % ld hello.o flame.o -lc % a.out Hello World %
The local variables to each function should be contained in some kind of list or parse tree structure created back in Project 2/3. In this part of the project, you will walk through this list and figure out where the variable is suppose to be stored on the function stack frame. Recall that the local variables are usually stored at the top of the stack frame (right below the %fp).
First, you need to know the size of each datatype:
Now, write a function allocate_locals() that iterates over the local variable list and assigns a frame offset to each variable. The frame offset should be stored in the symbol table entry for each local. For example, if you had the function,int 4 bytes (32 bits) float 4 bytes (32 bits) int[n] 4*n bytes float[n] 4*n bytes
you might assign the following offset values (see Part 7):fun main() x : int; y : float; a : int; b : int[20]; begin ... end
The allocate_locals() function should return the total number of bytes needed to store the local variables.x.symtab.frame_offset = -4 [%fp - 4] y.symtab.frame_offset = -8 [%fp - 8] a.symtab.frame_offset = -12 [%fp - 12] b.symtab.frame_offset = -92 [%fp - 92]
Finally, make the following modifications to your code generator:
The easiest way to translate the stack pseudocode to a real implementation is to treat the %l0-%l7 registers as always representing the top 8 elements of the expression stack. Thus, if you wanted to evaluate the expression 2+3*4-9, it might be evaluated as follows:
SPARC assembly Pseudocode ---------------------------------------------- mov 2, %l0 push 2 mov 3, %l1 push 3 mov 4, %l2 push 4 mov %l1,%o0 call .mul multiply mov %l2,%o1 mov %o0, %l1 add %l0,%l1,%l0 add mov 9, %l1 push 9 sub %l0,%l1,%l0 sub st %l0, value val := pop
To implement this, create a couple of functions that implement the functionality of a stack machine. For example, your implementation might have the following functions or methods:
One problem with the stack implementation is that once more than 8 values are on the stack, old values will have to be spilled to memory in order to accomodate new values. To handle memory spills, create a temporary variable region that is located below all of the local variables in the function stack frame. Now, modify the push() and pop() functions as follows:print >>out,"mov 2, %s" % push() print >>out,"mov 3, %s" % push() print >>out,"mov 4, %s" % push() r = pop() l = pop() print >>out,"mov %s, %%o0" % l print >>out,"call .mul" print >>out,"mov %s, %%o1" % r print >>out,"mov %%o0, %s" % push() r = pop() l = pop() print >>out,"add %s, %s, %s" % (l,r,push()) print >>out,"mov 9, %s" % push() r = pop() l = pop() print >>out,"sub %s, %s, %s" % (l,r,push()) result = pop() print >>out,"st %s, %s" % (result,val)
The SPARC assembly code for evaluating this on a stack might look like this:1+(2+(3+(4+(5+(6+(7+(8+(9+10))))))))
You implementation will have to keep track of the maximum number of spilled variables during expression evaluation. This number will have to be added to the stack frame size given to the save statement.mov 1, %l0 ! push 1 mov 2, %l1 ! push 2 mov 3, %l2 ! push 3 mov 4, %l3 ! push 4 mov 5, %l4 ! push 5 mov 6, %l5 ! push 6 mov 7, %l6 ! push 7 mov 8, %l7 ! push 8 st %l0, [%fp - 160] ! spill %l0 (1) mov 9, %l0 ! push 9 st %l1, [%fp - 164] ! spill %l1 (2) mov 10, %l1 ! push 10 add %l0,%l1,%l0 ! (9+10) add %l7,%l0,%l7 ! 8+(9+10) add %l6,%l7,%l6 ! 7+(8+(9+10)) add %l5,%l6,%l5 ! 6+(7+(8+(9+10))) add %l4,%l5,%l4 ! 5+(6+(7+(8+(9+10)))) add %l3,%l4,%l3 ! 4+(5+(6+(7+(8+(9+10))))) add %l2,%l3,%l2 ! 3+(4+(5+(6+(7+(8+(9+10)))))) ld [%fp - 164], %l1 ! recover 2 add %l1, %l2, %l1 ! 2+(3+(4+(5+(6+(7+(8+(9+10))))))) ld [%fp - 160], %l0 ! recover 1 add %l0, %l1, %l0 ! 1+(2+(3+(4+(5+(6+(7+(8+(9+10))))))))
Here are a few general code generation rules you will need to use to evaluate expressions:
If the value has absolute value greater than 4095, do the following:mov constant, %ln
sethi %hi(.Lm), %g1 or %g1, %lo(.Lm), %g1 ld [%g1], %ln ... .section rodata .Lm: .word constant
sethi %hi(.Lm), %g1 or %g1, %lo(.Lm), %g1 ld [%g1], %ln ... .section rodata .Lm: .float floatconstant
where offset is the frame pointer offset (stored in the symbol table) for the local variable and %ln is the register corresponding to the top of the expression stack.ld [%fp + offset], %ln
add %li, %lj, %lk ! %li + %lj -> %lk sub %li, %lj, %lk ! %li - %lj -> %lk
mov %li, %o0 ! %li * %lj -> %lk call .mul mov %lj, %o1 ! Note: this is in branch delay slot mov %o0, %lk mov %li, %o0 ! %li / %lj -> %lk call .div mov %lj, %o1 ! Note: this is in branch delay slot mov %o0, %lk
and %li, %lj, %lk ! %li and %lj -> %lk or %li, %lj, %lk ! %li or %lj -> %lk
! < cmp %li, %lj ! if %li < %lj then %lk = 1 else %lk = 0 bl .Lm ! branch less than mov 1, %lk ! branch delay (always executed) mov 0, %lk .Lm: ! <= cmp %li, %lj ! if %li <= %lj then %lk = 1 else %lk = 0 ble .Lm ! branch less than or equal mov 1, %lk ! branch delay (always executed) mov 0, %lk .Lm: ! > cmp %li, %lj ! if %li > %lj then %lk = 1 else %lk = 0 bg .Lm ! branch greater than mov 1, %lk ! branch delay (always executed) mov 0, %lk .Lm: ! >= cmp %li, %lj ! if %li >= %lj then %lk = 1 else %lk = 0 bge .Lm ! branch greater than mov 1, %lk ! branch delay (always executed) mov 0, %lk .Lm: ! == cmp %li, %lj ! if %li == %lj then %lk = 1 else %lk = 0 be .Lm ! branch greater than mov 1, %lk ! branch delay (always executed) mov 0, %lk .Lm: ! != cmp %li, %lj ! if %li != %lj then %lk = 1 else %lk = 0 bne .Lm ! branch greater than mov 1, %lk ! branch delay (always executed) mov 0, %lk .Lm:
neg %li, %lj ! -%li -> %lk
xor %li, 1, %lj ! not %li -> %lk
Gets translated into code like this,a := expr
where %lm is the register containing the result of the expression and offset is the frame offset of the variable.! evaluate expression ... ! assignment st %lm, [%fp + offset]
where %lm is the register containing the result of the expression given to write.! evaluate expression ... ! write mov %lm, %o0 call flwritei nop
Test out your write() statement and assignment functions by compiling a few simple programs such as this:
fun main() a : int; b : int; c : int; begin a := 3; b := a + 10; c := b * 2; write(c) end
The general form of a if-else-statement is as follows:! while statement .Lm: ! Evaluate relop ... cmp %li, %g0 ! %li contains result of relop be .Ln ! == 0, false, exit nop ... statements ... ba .Lm nop .Ln:
! if-else statement ! Evaluate relop ... cmp %li, %g0 ! %li contains result of relop be .Lm ! == 0, false, goto else nop ! then part ... statements ... ba .Ln ! skip else part nop ! else part .Lm: ... statements ... .Ln:
To access an array element, you will need to do three things:
In addition, you may want to put arrays bounds checking in this code as well.! Evaluate expr ... ! compute offset sll %lm, 2, %lm ! Multiply by 4 (left shift 2 bits) add %fp, %lm, %lm ! Add to frame pointer ld [%lm + offset], %ln
Storing to an array is similar. You will first need to compute the array index and memory address before a value can be stored.
Note: When evaluating expressions such as a + b[4*i-2] + c, the computation of the array index will use the same expression stack as the outer expression. Just make sure your stack allocator allows for nested expression evaluations and everything should work out.
When arrays are passed as parameters to function, they should be passed by reference (e.g., a pointer to the first element of the array is passed instead of a copy of the array itself).
The code you generate will roughly following this prototype:foo(a1,a2,a3,a4)
evaluate a1 expression stack: a1 evaluate a2 expression stack: a2 a1 evaluate a3 expression stack: a3 a2 a1 evaluate a4 expression stack: a4 a3 a2 a1 ! Make actual function call t = pop() store t, arg4 t = pop() store t, arg3 t = pop() store t, arg2 t = pop() store t, arg1 call foo
.section ".data" DISPLAY: .word 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ! number of nesting levels here
Of course, there is more than one way to do this.import StringIO f = StringIO.StringIO() # Create an in-memory file object emit_function_body(f,node) print >>outf,"! function (start)" print >>outf,"%s:" % function_name print >>outf," save %%sp, -%d, %%sp" % frame_size print >>outf, f.getvalue()
% python flame.py testname.flm % a.out
% tar -cf yourlogin.tar flame
% cp yourlogin.tar /stage/classes/current/CS326/handin/project4