Position of Code Generator: Principles of Compiler Design Lecture Notes
Position of Code Generator: Principles of Compiler Design Lecture Notes
The final phase in compiler model is the code generator. It takes as input an intermediate
representation of the source program and produces as output an equivalent target program. The
code generation techniques presented below can be used whether or not an optimizing phase
occurs before code generation.
symbol
table
Prior to code generation, the front end must be scanned, parsed and translated into
intermediate representation along with necessary type checking. Therefore, input to code
generation is assumed to be error-free.
2. Target program:
c. Assembly language
- Code generation is made easier.
3. Memory management:
Names in the source program are mapped to addresses of data objects in run-time
memory by the front end and code generator.
It makes use of symbol table, that is, a name in a three-address statement refers to a
symbol-table entry for the name.
4. Instruction selection:
The instructions of target machine should be complete and uniform.
Instruction speeds and machine idioms are important factors when efficiency of target
program is considered.
The quality of the generated code is determined by its speed and size.
The former statement can be translated into the latter statement as shown below:
5. Register allocation
Instructions involving register operands are shorter and faster than those involving
operands in memory.
Register assignment – the specific register that a variable will reside in is picked.
Certain machine requires even-odd register pairs for some operands and
results. For example , consider the division instruction of the form :
D x, y
TARGET MACHINE
Familiarity with the target machine and its instruction set is a prerequisite for designing a
good code generator.
The target computer is a byte-addressable machine with 4 bytes to a word.
It has n general-purpose registers, R0, R1, . . . , Rn-1.
It has two-address instructions of the form:
op source, destination
where, op is an op-code, and source and destination aredata fields.
It has the following op-codes :
MOV (move source to destination)
ADD (add source to destination)
SUB (subtract source from destination)
Absolute M M 1
Register R R 0
Literal #c c 1
For example : MOV R0, M stores contents of Register R0 into memory location M ;
MOV 4(R0), M stores the value contents(4+contents(R0)) into M.
Instruction costs :
Instruction cost = 1+cost for source and destination address modes. This cost corresponds
to the length of the instruction.
Address modes involving registers have cost zero.
Address modes involving memory location or literal have cost one.
Instruction length should be minimized if space is important. Doing so also minimizes
the time taken to fetch and perform the instruction.
For example : MOV R0, R1 copies the contents of register R0 into R1. It has cost one,
since it occupies only one word of memory.
The three-address statement a : = b + c can be implemented by many different
instruction sequences :
i) MOV b, R0
ADD c, R0 cost = 6
MOV R0, a
ii) MOV b, a
ADD c, a cost = 6
In order to generate good code for target machine, we must utilize its
addressing capabilities efficiently.
RUN-TIME STORAGE MANAGEMENT
Static allocation
GOTO callee.code_area /*It transfers control to the target code for the called procedure */
where,
callee.static_area – Address of the activation record callee.code_area
– Address of the first instruction for called procedure
#here +20 – Literal return address which is the address of the instruction following GOTO.
GOTO *callee.static_area
This transfers control to the address saved at the beginning of the activation record.
The statement HALT is the final instruction that returns control to the operating system.
Stack allocation
Static allocation can become stack allocation by using relative addresses for storage in
activation records. In stack allocation, the position of activation record is stored in register so
words in activation records can be accessed as offsets from the value in this register.
The codes needed to implement stack allocation are as follows:
Initialization of stack:
GOTO callee.code_area
where,
caller.recordsize – size of the activation record
#here +16 – address of the instruction following the GOTO
Basic Blocks
t1 : = a * a
t2 : = a * b
t3 : = 2 * t 2
t4 : = t1 + t3
t5 : = b * b
Compiled and Prepared by Dr.Anusuya
Principles of Compiler Design lecture Notes
t6 : = t4 + t5
Output: A list of basic blocks with each three-address statement in exactly one block
Method:
1. We first determine the set of leaders, the first statements of basic blocks. The rules
we use are of the following:
a. The first statement is a leader.
b. Any statement that is the target of a conditional or unconditional goto is a
leader.
c. Any statement that immediately follows a goto or conditional goto statement
is a leader.
2. For each leader, its basic block consists of the leader and all statements up to but not
including the next leader or the end of the program.
Consider the following source code fordot product of two vectors a and b of length 20
begin
prod :=0;
i:=1; do
begin
i :=i+1;
end
while i <= 20
end
(2) i := 1
(3) t1 := 4* i
(5) t3 := 4*i
(7) t5 := t2*t4
(8) t6 := prod+t5
(9) prod := t6
(10) t7 := i+1
(11) i := t7
A number of transformations can be applied to a basic block without changing the set of
expressions computed by the block. Two important classes of transformation are :
Structure-preserving transformations
Algebraic transformations
a:=b+c a:=b+c
b:=a–d b:=a-d
c:=b+c c:=b+c
d:=a–d d:=a-d
b) Dead-code elimination:
Suppose x is dead, that is, never subsequently used, at the point where the statement x : =
y + z appears in a basic block. Then this statement may be safely removed without changing
the value of the basic block.
d) Interchange of statements:
t1 : = b + c
t2 : = x + y
We can interchange the two statements without affecting the value of the block if and
only if neither x nor y is t1 and neither b nor c is t2.
2. Algebraic transformations:
Algebraic transformations can be used to change the set of expressions computed by a basic
block into an algebraically equivalent set.
Examples:
i) x : = x + 0 or x : = x * 1 can be eliminated from a basic block without changing the set of
expressions it computes.
ii) The exponential statement x : = y * * 2 can be replaced by x : = y * y.
Flow Graphs
Compiled and Prepared by Dr.Anusuya
Principles of Compiler Design lecture Notes
Flow graph is a directed graph containing the flow-of-control information for the set of
basic blocks making up a program.
The nodes of the flow graph are basic blocks. It has a distinguished initial node.
E.g.: Flow graph for the vector dot product is given as follows:
prod : = 0 B1
i:=1
t1 : = 4 * i
t2 : = a [ t1 ]
t3 : = 4 * i B2
t4 : = b [ t3 ]
t5 : = t2 * t4
t6 : = prod +
t5 prod : = t6
t7 : = i + 1
i : = t7
if i <= 20 goto B2
B1 is the initial node. B2 immediately follows B1, so there is an edge from B 1 to B2. The
target of jump from last statement of B 1 is the first statement B2, so there is an edge from
B1 (last statement) to B2 (first statement).
B1 is the predecessor of B2, and B2 is a successor of B1.
Loops
NEXT-USE INFORMATION
If the name in a register is no longer needed, then we remove the name from the register
and the register can be used to store some other names.
Symbol Table:
Y Live i
Z Live i
A code generator generates target code for a sequence of three- address statements and
effectively uses registers to store operands of the statements.
(or)
(or)
ADD Rj, Ri
A register descriptor is used to keep track of what is currently in each registers. The
register descriptors show that initially all the registers are empty.
An address descriptor stores the location where the current value of the name can be
found at run time.
A code-generation algorithm:
The algorithm takes as input a sequence of three -address statements constituting a basic
block. For each three-address statement of the form x : = y op z, perform the following
actions:
1. Invoke a function getreg to determine the location L where the result of the computation y
op z should be stored.
2. Consult the address descriptor for y to determine y’, the current location of y. Prefer the
register for y’ if the value of y is currently both in memory and a register. If the value
of y is not already in L, generate the instruction MOV y’ , L to place a copy of y in L.
4. If the current values of y or z have no next uses, are not live on exit from the block, and
are in registers, alter the register descriptor to indicate that, after execution of x : = y op
z , those registers will no longer contain y or z.
The assignment d : = (a-b) + (a-c) + (a-c) might be translated into the following three-
address code sequence:
t:=a–b
u:=a–c
v:=t+u
d:=v+u
with d live at the end.
Register empty
The table shows the code sequences generated for the indexed assignment statements
a : = b [ i ] and a [ i ] : = b
The table shows the code sequences generated for the pointer assignments
a : = *p and *p : = a
a : = *p MOV *Rp, a 2
*p : = a MOV a, *Rp 2
CMP x, y
if x < y goto z CJ< z
MOV y, R0
ADD z, R0
MOV R0,x
x : = y +z if x <0 goto z CJ< z