Code Generation
Midterm Statistics
I Max: 54, Min 28, Median: 45
I Questions (best score):
1-5: 8/10
6: 5/5
7: 5/5
8: 5/5
9: 10/10
11: 4/5
12: 10/10
Code Generation: What to do?
I Input: intermediate representation (IR) and symbol table
I Output: semantically equivalent target code for a given a machine
architecture
I Compiler: translate IR to assembly code; Assembler: generate binary
code; Linker: link the binary code (assembler and linker as part of
the compiler tool chain)
Three main tasks:
I Instruction selection: choosing appropriate target machine
instructions to implement the IR statements
I Register allocation and assignment: deciding what values to keep in
which registers (interface with the hardware)
I Instruction ordering: deciding in what order to schedule the
execution of instructions
Understanding Code Generation
Basic code generation (and then code generation for object-oriented
programs)
I MIPS assembly language
I Stack machine (architecture type)
I Simple source language
I Assembly programs have assumptions on architecture:
I which architecture type: stack machine? RISC? CISC?
I how stack grows?
I how many registers? what registers store?
Stack Machines
• A simple evaluation model
• No variables or registers
• A stack of values for intermediate results
Example of a Stack Machine Program
• Consider two instructions
– push i - place the integer i on top of the stack
– add - pop two elements, add them and put
the result back on the stack
• A program to compute 7 + 5
push 7
push 5
add
Stack Machine. Example
5
7 7 ⊕ 12
stack … … … …
push 7 push 5 add
• Each instruction
– Takes its operands from the top of the stack
– Removes those operands from the stack
– Computes the required operation on them
– Pushes the result on the stack
Why Use a Stack Machine ?
• Each operation takes operands from the same
place and puts results in the same place
• This means a uniform compilation scheme
• And therefore a simpler compiler
7
Why Use a Stack Machine?
• Location of the operands is implicit
– Always on the top of the stack
• No need to specify operands explicitly
• No need to specify the location of the result
• Instruction “add” as opposed to “add r1, r2”
⇒ Smaller encoding of instructions
⇒ More compact programs
• This is one reason why Java Bytecodes use a
stack evaluation model
Optimizing the Stack Machine
• The add instruction does 3 memory operations
– Two reads and one write to the stack
– The top of the stack is frequently accessed
• Idea: keep the top of the stack in a register
(called accumulator)
– Register accesses are faster
• The “add” instruction is now
acc ← acc + top_of_stack
– Only one memory operation!
Stack Machine with Accumulator
Invariants
• The result of computing an expression is
always in the accumulator
• For an operation op(e1,…,en) push the
accumulator on the stack after computing
each of e1,…,en-1
– The result of en is in the accumulator before op
– After the operation pop n-1 values
• After computing an expression the stack is as
before
Stack Machine with Accumulator: Example
• Compute 7 + 5 using an accumulator
acc 7 5 12
⊕
7 7
stack … … … …
acc ← 7 acc ← 5 acc ← acc + top_of_stack
push acc pop
A Bigger Example: 3 + (7 + 5)
Code Acc Stack
acc ← 3 3 <init>
push acc 3 3, <init>
acc ← 7 7 3, <init>
push acc 7 7, 3, <init>
acc ← 5 5 7, 3, <init>
acc ← acc + top_of_stack 12 7, 3, <init>
pop 12 3, <init>
acc ← acc + top_of_stack 15 3, <init>
pop 15 <init>
Notes
• It is very important that the stack is
preserved across the evaluation of a
subexpression
– Stack before the evaluation of 7 + 5 is 3, <init>
– Stack after the evaluation of 7 + 5 is 3, <init>
– The first operand is on top of the stack
From Stack Machines to MIPS
• The compiler generates code for a stack
machine with accumulator
• We want to run the resulting code on the
MIPS processor (or simulator)
• We implement stack machine instructions
using MIPS instructions and registers
Simulating a Stack Machine…
• The accumulator is kept in MIPS register $a0
• The stack is kept in memory
• The stack grows towards lower addresses
– Standard convention on the MIPS architecture
• The address of the next location on the stack
is kept in MIPS register $sp
– The top of the stack is at address $sp + 4
MIPS Assembly
MIPS architecture
– Prototypical Reduced Instruction Set Computer
(RISC) architecture
– Arithmetic operations use registers for operands
and results
– Must use load and store instructions to use
operands and results in memory
– 32 general purpose registers (32 bits each)
• We will use $sp, $a0 and $t1 (a temporary register)
• Read the SPIM handout for more details
A Sample of MIPS Instructions
– lw reg1 offset(reg2)
• Load 32-bit word from address reg2 + offset into reg1
– add reg1 reg2 reg3
• reg1 ← reg2 + reg3
– sw reg1 offset(reg2)
• Store 32-bit word in reg1 at address reg2 + offset
– addiu reg1 reg2 imm
• reg1 ← reg2 + imm
• “u” means overflow is not checked
– li reg imm
• reg ← imm
MIPS Assembly: Example
• The stack-machine code for 7 + 5 in MIPS
acc ← 7 li $a0 7
push acc sw $a0 0($sp)
addiu $sp $sp -4
acc ← 5 li $a0 5
acc ← acc + top_of_stack lw $t1 4($sp)
add $a0 $a0 $t1
pop addiu $sp $sp 4
• We now generalize this to a simple language…
A Small Language
• A language with integers and integer
operations
P → D; P | D
D → def id(ARGS) = E;
ARGS → id, ARGS | id
E → int | id | if E1 = E2 then E3 else E4
| E1 + E2 | E1 – E2 | id(E1,…,En)
A Small Language (Cont.)
• The first function definition f is the “main”
routine
• Running the program on input i means
computing f(i)
• Program for computing the Fibonacci numbers
def fib(x) = if x = 1 then 0 else
if x = 2 then 1 else
fib(x - 1) + fib(x – 2)
Code Generation Strategy
• For each expression e we generate MIPS code
that
– Computes the value of e in $a0
– Preserves $sp and the contents of the stack
• We define a code generation function cgen(e)
whose result is the code generated for e
Code Generation for Constants
• The code to evaluate a constant simply copies
it into the accumulator
cgen(i) = li $a0 i
• Note that this also preserves the stack, as
required
Code Generation for Add
cgen(e1 + e2) =
cgen(e1)
sw $a0 0($sp)
addiu $sp $sp -4
cgen(e2)
lw $t1 4($sp)
add $a0 $t1 $a0
addiu $sp $sp 4
• Possible optimization: Put the result of e1 directly in
register $t1 ?
Code Generation for Add. Wrong!
• Optimization: Put the result of e1 directly in $t1?
cgen(e1 + e2) =
cgen(e1)
move $t1 $a0
cgen(e2)
add $a0 $t1 $a0
• Try to generate code for : 3 + (7 + 5)
Code Generation Notes
• The code for + is a template with “holes” for
code for evaluating e1 and e2
• Stack-machine code generation is recursive
• Code for e1 + e2 consists of code for e1 and e2
glued together
• Code generation can be written as a recursive-
descent of the AST
– At least for expressions
Code Generation for Sub and Constants
• New instruction: sub reg1 reg2 reg3
– Implements reg1 ← reg2 - reg3
cgen(e1 - e2) =
cgen(e1)
sw $a0 0($sp)
addiu $sp $sp -4
cgen(e2)
lw $t1 4($sp)
sub $a0 $t1 $a0
addiu $sp $sp 4
Code Generation for Conditional
• We need flow control instructions
• New instruction: beq reg1 reg2 label
– Branch to label if reg1 = reg2
• New instruction: b label
– Unconditional jump to label
Code Generation for If (Cont.)
cgen(if e1 = e2 then e3 else e4) =
cgen(e1)
false_branch:
sw $a0 0($sp)
cgen(e4)
addiu $sp $sp -4
b end_if
cgen(e2)
true_branch:
lw $t1 4($sp)
cgen(e3)
addiu $sp $sp 4
end_if:
beq $a0 $t1 true_branch
The Activation Record
• Code for function calls and function
definitions depends on the layout of the
activation record
• A very simple AR suffices for this language:
– The result is always in the accumulator
• No need to store the result in the AR
– The activation record holds actual parameters
• For f(x1,…,xn) push xn,…,x1 on the stack
• These are the only variables in this language
The Activation Record (Cont.)
• The stack discipline guarantees that on
function exit $sp is the same as it was on
function entry
– No need to save $sp
• We need the return address
• It’s handy to have a pointer to start of the
current activation
– This pointer lives in register $fp (frame pointer)
– Reason for frame pointer will be clear shortly
The Activation Record
• Summary: For this language, an AR with the
caller’s frame pointer, the actual parameters,
and the return address suffices
• Picture: Consider a call to f(x,y), The AR will
be: FP high
addresses
old FP
AR of f y
x
SP
Code Generation for Function Call
• The calling sequence is the instructions (of
both caller and callee) to set up a function
invocation
• New instruction: jal label
– Jump to label, save address of next instruction in
$ra
– On other architectures the return address is
stored on the stack by the “call” instruction
Code Generation for Function Call (Cont.)
cgen(f(e1,…,en)) = • The caller saves its value
sw $fp 0($sp) of the frame pointer
addiu $sp $sp -4 • Then it saves the actual
cgen(en) parameters in reverse
sw $a0 0($sp) order
addiu $sp $sp -4 • The caller saves the
… return address in
cgen(e1) register $ra
sw $a0 0($sp)
addiu $sp $sp -4
jal f_entry
Code Generation for Function Definition
• New instruction: jr reg
– Jump to address in register reg
cgen(def f(x1,…,xn) = e) = • Note: The frame pointer
move $fp $sp points to the top, not
sw $ra 0($sp) bottom of the frame
addiu $sp $sp -4 • The callee pops the return
address, the actual
cgen(e) arguments and the saved
lw $ra 4($sp) value of the frame pointer
addiu $sp $sp z • z = 4*n + 8
lw $fp 0($sp)
jr $ra
Calling Sequence: Example for f(x,y)
Before call On entry Before exit After call
FP FP FP
SP old FP old FP SP
y y
x x
SP FP return
SP
Code Generation for Variables
• Variable references are the last construct
• The “variables” of a function are just its
parameters
– They are all in the AR
– Pushed by the caller
• Problem: Because the stack grows when
intermediate results are saved, the variables
are not at a fixed offset from $sp
Code Generation for Variables (Cont.)
• Solution: use a frame pointer
– Always points to the return address on the stack
– Since it does not move it can be used to find the
variables
• Let xi be the ith (i = 1,…,n) formal parameter of
the function for which code is being generated
Code Generation for Variables (Cont.)
• Example: For a function def f(x1,x2) = e the
activation and frame pointer are set up as
follows:
old fp x1 is at fp + 4
x2 x2 is at fp + 8
x1 • Thus:
FP return cgen(xi) = lw $a0 z($fp)
( z = 4*i )
SP
Summary
• The activation record must be designed
together with the code generator
• Code generation can be done by recursive
traversal of the AST
• Production compilers do different things
– Emphasis is on keeping values (esp. current stack
frame) in registers
– Intermediate results are laid out in the AR, not
pushed and popped from the stack
Allocating Temporaries in the AR
• Idea: Keep temporaries in the AR
• The code generator must assign a location in
the AR for each temporary
How Many Temporaries?
• Let NT(e) = # of temps needed to evaluate e
• NT(e1 + e2)
– Needs at least as many temporaries as NT(e1)
– Needs at least as many temporaries as NT(e2) + 1
• Space used for temporaries in e1 can be reused
for temporaries in e2
The Equations
NT(e1 + e2) = max(NT(e1), 1 + NT(e2))
NT(e1 - e2) = max(NT(e1), 1 + NT(e2))
NT(if e1 = e2 then e3 else e4) = max(NT(e1),1 + NT(e2), NT(e3), NT(e4))
NT(id(e1,…,en) = max(NT(e1),…,NT(en))
NT(int) = 0
NT(id) = 0
Is this bottom-up or top-down?
What is NT(…code for fib…)?
The Revised AR
• For a function definition f(x1,…,xn) = e the AR
has 2 + n + NT(e) elements
– Return address
– Frame pointer
– n arguments
– NT(e) locations for intermediate results
Picture
Old FP
xn
...
x1
Return Addr.
Temp NT(e)
...
Temp 1
Revised Code Generation
• Code generation must know how many
temporaries are in use at each point
• Add a new argument to code generation: the
position of the next available temporary
Code Generation for + (original)
cgen(e1 + e2) =
cgen(e1)
sw $a0 0($sp)
addiu $sp $sp -4
cgen(e2)
lw $t1 4($sp)
add $a0 $t1 $a0
addiu $sp $sp 4
Code Generation for + (revised)
cgen(e1 + e2, nt) =
cgen(e1, nt)
sw $a0 nt($fp)
cgen(e2, nt + 4)
lw $t1 nt($fp)
add $a0 $t1 $a0
Notes
• The temporary area is used like a small, fixed-
size stack
• Exercise: Write out cgen for other constructs
Code Generation for OO Languages
Outline
• Code generation for OO languages
– Object layout
– Dynamic dispatch
• Parameter passing mechanisms
Object Layout
• OO Slogan: If B is a subclass of A, then an
object of class B can be used wherever an
object of class A is expected
• This means that code in class A works
unmodified for an object of class B
Two Issues
• How are objects represented in memory?
• How is dynamic dispatch implemented?
Object Layout Example
Class A {
a: Int <- 0;
d: Int <- 1;
f(): Int { a <- a + d };
};
Class B inherits A { Class C inherits A {
b: Int <- 2; c: Int <- 3;
f(): Int { a }; // Override h(): Int { a <- a * c };
g(): Int { a <- a - b }; };
};
Object Layout (Cont.)
• Attributes a and d are inherited by classes B
and C
• All methods in all classes refer to a
• For A methods to work correctly in A, B, and C
objects, attribute a must be in the same
“place” in each object
Object Layout (Cont.)
An object is like a struct in C. The reference
foo.field
is an index into a foo struct at an offset
corresponding to field
Objects in Cool are implemented similarly
– Objects are laid out in contiguous memory
– Each attribute stored at a fixed offset in object
– When a method is invoked, the object is self and
the fields are the object’s attributes
Cool Object Layout
• The first 3 words of Cool objects contain
header information:
Offset
Class Tag 0
Object Size 4
Dispatch Ptr 8
Attribute 1 12
Attribute 2 16
...
Cool Object Layout (Cont.)
• Class tag is an integer
– Identifies class of the object
• Object size is an integer
– Size of the object in words
• Dispatch ptr is a pointer to a table of methods
– More later
• Attributes in subsequent slots
• Lay out in contiguous memory
Subclasses
Observation: Given a layout for class A, a layout
for subclass B can be defined by extending
the layout of A with additional slots for the
additional attributes of B
Leaves the layout of A unchanged
(B is an extension)
Layout Picture
Offset 0 4 8 12 16 20
Class
A Atag 5 * a d
B Btag 6 * a d b
C Ctag 6 * a d c
Subclasses (Cont.)
• The offset for an attribute is the same in a
class and all of its subclasses
– Any method for an A1 can be used on a subclass A2
• Consider layout for An · … · A3 · A2 · A1
Header A1 object
A1 attrs. A2 object
A2 attrs A3 object
A3 attrs
...
Dynamic Dispatch
• Consider again our example
Class A {
a: Int <- 0;
d: Int <- 1;
f(): Int { a <- a + d };
};
Class C inherits A {
Class B inherits A {
c: Int <- 3;
b: Int <- 2;
h(): Int { a <- a * c };
f(): Int { a };
};
g(): Int { a <- a - b };
};
Dynamic Dispatch Example
• e.g()
– g refers to method in B if e is a B
• e.f()
– f refers to method in A if f is an A or C
(inherited in the case of C)
– f refers to method in B for a B object
• The implementation of methods and dynamic
dispatch strongly resembles the
implementation of attributes
Dispatch Tables
• Every class has a fixed set of methods
(including inherited methods)
• A dispatch table indexes these methods
– An array of method entry points
– A method f lives at a fixed offset in the dispatch
table for a class and all of its subclasses
Dispatch Table Example
• The dispatch table for
Offset 0 4
class A has only 1
method
Class • The tables for B and C
A fA extend the table for A
to the right
• Because methods can be
B fB g
overridden, the method
for f is not the same in
C fA h every class, but is
always at the same
offset
Using Dispatch Tables
• The dispatch pointer in an object of class X
points to the dispatch table for class X
• Every method f of class X is assigned an
offset Of in the dispatch table at compile time
Using Dispatch Tables (Cont.)
• Every method must know what object is “self”
– “self” is passed as the first argument to all
methods
• To implement a dynamic dispatch e.f() we
– Evaluate e, obtaining an object x
– Find D by reading the dispatch-table field of x
– Call D[Of](x)
• D is the dispatch table for x
• In the call, self is bound to x
Parameter Passing Mechanisms
Parameter Passing Mechanisms
• There are many semantic issues in
programming languages centering on when
values are computed and the scope of names
• We’ll focus on parameter passing
– When are arguments of function calls evaluated?
– To what are formal parameters bound?
Call-By-Value
• To evaluate f(e)
– Evaluate e to a value v
– Bind v to the formal parameter in the function body
• Example
g(x) = x <- x + 1;
f(y) = { g(y); print(y); }
• Under call-by-value, f(0) = 0
The Stack Under Call-By-Value
y 0
f(y)
g(y) y 0
x <- x + 1 x 0
print(y)
y 0
x 1
y 0
Call-By-Value Discussion
• Under call-by-value, g(y) does not affect the
value of y
• y’s value, not y itself, is passed to g
– The formal parameter is stored in a different
location from the actual parameter
• Call-by-value is widely used
– C is a prominent example
Call-By-Reference
• To evaluate f(e)
– e is evaluated
– A pointer to e is passed as the argument
– f’s code accesses the argument through the
pointer
• If e is already a stored value (a variable) a
pointer to that location is used
• Otherwise, e is evaluated and stored in a
fresh location first
The Stack Under Call-By-Reference
y 0
f(y)
g(y) y 0
x <- x + 1 x
print(y)
y 1
x
y 1
Call-By-Reference Discussion
• Under Call-By-Reference, only the address is
passed
– References to the value dereference the pointer
• In the example, because x and y refer to the
same value, changes to x also change y
• Many languages pass large data structures
(e.g., arrays) by reference
Call-By-Reference Discussion (Cont.)
• IMPORTANT: There is a difference between
passing a value by reference and passing a
pointer by value!
• In C, often one does pass structures as
arguments; one passes pointers (by value)
Call-By-Name
• Consider a definition f(x) = e
• To evaluate f(e’)
– e’ is passed unevaluated to f
– e’ is evaluated each time e refers to x
– The expression e’ is always evaluated in its original
environment
The Stack Under Call-By-Name
Example
j(y) = { print(y); print(y); }
{z <- 0; j(z++); }
z 0
z <- 0 z 0
j(z++) y z++
print(y); z 1
print(y); y z++
z 2
Call-By-Name Discussion
• Note a function argument may be evaluated 0
or more times
– Evaluated 0 times if the argument is never used
• Call-By-Name is used in lazy functional
languages
• There are other, less important,
parameter passing mechanisms