Unit 5 - Compiler Design - WWW - Rgpvnotes.in
Unit 5 - Compiler Design - WWW - Rgpvnotes.in
Tech
Subject Name: Compiler Design
Subject Code: IT-603
Semester: 6th
Downloaded from www.rgpvnotes.in
Unit V
Syllabus: Code Optimization and Generation: organization of code optimizer, basic blocks and
flow graphs, DAG representation of basic blocks, loops in flow graph, peephole optimization,
Basic of block optimization.
Unit Outcome: Design the structures and support required for compiling advanced language
features.
The code become inefficient because of two factors; one is programmer and the other factor
is compiler. This is the responsibility of both programmer and the compiler to generate
efficient code using proper usage of target code.
Classification of Optimization:
The classification of optimization can be done in two categories:
Machine
Dependent
Code
Optimization
Machine
Independent
The code should be analyzed completely and use alternative equivalent sequence
of source code that will produce a minimum amount of target code.
Use appropriate program structure in order to improve the efficiency of target
code.
From the source program eliminate the unreachable code.
Move two or more identical computations at one place and make use of the result
instead of each time computing the expressions.
Compile Time
Evaluation
Common Sub
Loop Optimization Expression
Elimination
Basic Blocks:
The basic block is a sequence of consecutive statements in which flow of control enters at the
beginning and leaves at the end without halt or possibility of branching. Basic blocks are
important concepts from both code generation and optimization point of view.
Some Terminologies used in Basic Blocks
Define and use: The three address statement a: = b+c is said to define a and to use b
and c.
Live and dead: The name in the basic block is said to be live at a given point if its value
is used after that point in the program. And the name (variable) in the basic block is
said to be dead at a given point if its value is never used after that point in the program.
Algorithm for Partitioning into blocks:
Any given program can be partitioned into basic blocks by using following algorithm.
1. First determine the leaders by using following rules:
The first statement is a leader.
Any target statement of conditional or unconditional goto is a leader.
Any statement that immediately follow a goto or unconditional goto is a
leader.
2. The basic block is formed starting at the leader statement and ending just before the
next leader statement appearing.
The nodes to the control flow graph are represented by basic block.
The block whose leader is the first statement is called initial block.
There is a directed edge from block B1 to block B2 if B2 immediately follows B1 in the
given sequence. We can say that B1 is a predecessor if B2.
Loop:
Loop is a collection of nodes in the flow graph such that,
All such nodes are strongly connected. That means always there is a path from any
node to any other node within that loop.
The collection of nodes has unique entry. That means always there is only one path
from a node outside the loop to the node inside the loop.
The loop that contains no other loop is called inner loop.
Loops on Flow Graph:
A graph representation of three-address statements, called a flow graph, is useful for
understanding code-generation algorithms, even if the graph is not explicitly constructed by
a code-generation algorithm. Nodes in the flow graph represent computations, and the edges
represent the flow of control.
Dominators:
In a flow graph, a node d dominates node n, if every path from initial node of the flow graph
to n goes through d. This will be denoted by d dom n. Every initial node dominates all the
remaining nodes in the flow graph and the entry of a loop dominates all nodes in the loop.
Similarly every node dominates itself.
Example:
In the flow graph below, initial node, node1 dominates every node; node 2 dominates itself;
node 3 dominates all but 1 and 2; node 4 dominates all but 1, 2 and 3; node 5 and 6 dominates
only themselves, since flow of control can skip around either by going through the other; node
7 dominates 7, 8, 9 and 10; node 8 dominates 8,9 and 10; node 9 and 10 dominates only
themselves.
A loop must have a single entry point, called the header. This entry point-dominates
all nodes in the loop, or it would not be the sole entry to the loop.
There must be at least one way to iterate the loop(i.e.)at least one path back to the
header One way to find all the loops in a flow graph is to search for edges in the flow
graph whose heads dominate their tails. If a→b is an edge, b is the head and a is the
tail. These types of edges are called as back edges.
Inner loops:
If we use the natural loops as “the loops”, then we have the useful property that unless two
loops have the same header, they are either disjointed or one is entirely contained in the
other. Thus, neglecting loops with the same header for the moment, we have a natural notion
of inner loop: one that contains no other loop.
When two natural loops have the same header, but neither is nested within the other, they
are combined and treated as a single loop.
Pre-Headers:
Several transformations require us to move statements “before the header”. Therefore begin
treatment of a loop L by creating a new block, called the pre-header. The pre-header has only
the header as successor, and all edges which formerly entered the header of L from outside L
instead enter the pre-header. Edges from inside loop L to the header are not changed. Initially
the pre-header is empty, but transformations on L may place statements in it.
Figure 5.7: (a) Two loops with the same header (b) Introduction of the pre-header
Reducible flow graphs:
Reducible flow graphs are special flow graphs, for which several code optimization
transformations are especially easy to perform, loops are unambiguously defined, dominators
can be easily calculated, and data flow analysis problems can also be solved efficiently.
Exclusive use of structured flow-of-control statements such as if-then-else, while-do,
continue, and break statements produces programs whose flow graphs are always reducible.
The most important properties of reducible flow graphs are that
Leaf nodes are labeled by identifiers or variable names or constants. Generally, leaves
represent R-values.
Interior nodes store operator values.
Nodes are also optionally given a sequence of identifiers for a label.
The DAG and flow graphs are two different pictorial representations. Each node of the flow
graph can be represented by DAG because each node of the flow graph is a basic block.
Example:
Case (i) x: = y op z
Case (ii) x: = op y
Case (iii) x: = y
With the help of the following step, the DAG can be constructed.
Step 1: If y is undefined then create a node (y). Similarly, if z is undefined create a: node (z)
Step 2: For the case (i) create a node (op) whose left child is node (y) and node (z) will be) the
right child. Also, check for any common sub-expressions. For the case (ii) determine whether
is a node labeled op, such node will have a child node (y). In case (iii) node n win be node
(node(y).
Step 3: Delete x from list of identifiers for node (x). Append x to the list of attached identifiers
for node n found in 2.
Applications of DAG:
1. Determining the common sub-expressions (expressions computed more than once).
2. Determining which names are used inside the block and computed outside the block.
3. Determining which statements of the block could have their computed value outside the
block.
4. Simplifying the list of quadruples by elimination the common sub-expressions and not
eliminating expressions performing the assignment of form x: =y unless and until it is a must.
Optimization of Basic Blocks:
There are two types of basic block optimizations. They are:
Structure
Preserving
Transformations
Algebraic
Transformations
Common Sub-
expression Renaming of
Elimination Temporary Variables
Common sub expressions need not be computed over and over again. Instead they can be
computed once and kept in store from where it’s referenced.
Example:
a: =b+c
b: =a-d
c: =b+c
d: =a-d
The 3rd and 4th statements compute the same expression: b+c and a-d
Basic block can be transformed to
a: = b+c
b: = a-d
c: = a
d: = b
Dead Code Elimination:
It is possible that a large amount of dead (useless) code may exist in the program. This might
be especially caused when introducing variables and procedures as part of construction or
error-correction of a program - once declared and defined, one forgets to remove them in
case they serve no purpose. Eliminating these will definitely optimize the code.
Renaming of Temporary Variables:
A statement t:=b+c where t is a temporary name can be changed to u:=b+c where u is another
temporary name, and change all uses of t to u. In this a basic block is transformed to its
equivalent block called normal-form block.
Interchange of Two Independent Adjacent Statements:
Two statements
t1:=b+c
t2:=x+y
can be interchanged or reordered in its computation in the basic block when value of t1 does
not affect the value of t2.
Algebraic Transformations:
Algebraic identities represent another important class of optimizations on basic blocks. This
includes simplifying expressions or replacing expensive operation by cheaper ones i.e.
reduction in strength. Another class of related optimizations is constant folding. Here we
evaluate constant expressions at compile time and replace the constant expressions by their
values. Thus the expression 2*3.14 would be replaced by 6.28.
The relational operators <=, >=, <, >, + and = sometimes generate unexpected common sub
expressions. Associative laws may also be applied to expose common sub expressions.
Example:
x: =x+0 can be removed
x: =y**2 can be replaced by a cheaper statement x:=y*y
The compiler writer should examine the language specification carefully to determine what
rearrangements of computations are permitted, since computer arithmetic does not always
obey the algebraic identities of mathematics. Thus, a compiler may evaluate x*y-x*z as x*(y-
z) but it may not evaluate a+(b-c) as (a+b)-c.
Peephole Optimization:
A statement-by-statement code-generations strategy often produces target code that
contains redundant instructions and suboptimal constructs. The quality of such target code
can be improved by applying “optimizing” transformations to the target program.
A simple but effective technique for improving the target code is peephole optimization, a
method for trying to improving the performance of the target program by examining a short
sequence of target instructions (called the peephole) and replacing these instructions by a
shorter or faster sequence, whenever possible.
The peephole is a small, moving window on the target program. The code in the peephole
need not be contiguous, although some implementations do require this. It is characteristic
of peephole optimization that each improvement may spawn opportunities for additional
improvements.
Characteristics of Peephole Optimization:
Machine
Idioms
Reduction in
Strength
Algebric
Simplification
The Flow of
Control
Unreachable Optimization
Code
Redundant
Instruction
Elimination
We can eliminate the second instruction since x is in already R0. But if MOV x, R0 is a label
statement then we cannot remove it.
Unreachable Code:
Especially the redundant loads and stores can be eliminated in this type of
transformations.
An unlabeled instruction immediately following an unconditional jump may be
removed.
This operation can be repeated to eliminate the sequence of instructions.
Example:
#define debug 0
If(debug) {
Print debugging information
}
In the intermediate representation the if statement may be translated as if-
If debug=1 goto L1
goto L2
L1: print debugging information
L2:
Additionally, one obvious peephole optimization is to eliminate jumps over jumps. Thus no
matter what the value of debugging, can replaced by:
If debug goto L2debug≠1
Print debugging information
L2:
Now, since debug set to 0 at the beginning of the program, constant propagation program,
should replace by
Goto L2
…….
L1: goto L2
If there are no jumps to L1 then it may be possible to eliminate the statement L1: goto L2
provided it preceded by an unconditional jump. Similarly, the sequence
If a<b goto L1
……
L1: goto L2
Can replaced by
If a<b goto L2
……
L1: goto L2
Algebraic Simplification:
So the target instructions have equivalent machine instructions for performing some
have operations.
Hence we can replace these target instructions by equivalent machine instructions in
order to improve the efficiency.
Example: Some machines have auto-increment or auto-decrement addressing modes.
These modes can use in a code for a statement like i=i+1.