[go: up one dir, main page]

0% found this document useful (0 votes)
72 views19 pages

Unit-4 Chapter-2

This document provides an overview of compiler design topics including intermediate code generation, three address code, type checking, and control flow statements. It discusses variants of syntax trees such as directed acyclic graphs, the value-number method for constructing DAGs, types and declarations, translating expressions, static type checking, and generating intermediate code for procedures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views19 pages

Unit-4 Chapter-2

This document provides an overview of compiler design topics including intermediate code generation, three address code, type checking, and control flow statements. It discusses variants of syntax trees such as directed acyclic graphs, the value-number method for constructing DAGs, types and declarations, translating expressions, static type checking, and generating intermediate code for procedures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Department: Computer Science and Engineering Year/Sem: III B.

Tech-II Sem
Name of the Course: Compiler Design(15A05601) Regulation: R15

4 II Part
Unit-3-
Intermediate Code Generation: Variants of syntax trees, three address code, Types and
declarations, Translations of expressions, Type checking, control flow statements, backpatching,
switch statements, intermediate code for procedure.

1. Variants of Syntax Trees


1 Directed Acyclic Graphs for Expressions 2 the Value-Number Method for Constructing DAG's
Variants of Syntax Trees
1 Directed Acyclic Graph for Expressions
2 The Value-Number Method for Constructing DAG's
3 Exercises for Section 6.1
Nodes in a syntax tree represent constructs in the source program; the children of a node
represent the meaningful components of a construct. A directed acyclic graph (hereafter
called a DAG) for an expression identifies the common subexpressions (subexpressions that
occur more than once) of the expression. As we shall see in this section, DAG's can be
constructed by using the same techniques that construct syntax trees.
1. Directed Acyclic Graphs for Expressions
Like the syntax tree for an expression, a DAG has leaves corresponding to atomic operands
and interior codes corresponding to operators. The difference is that a node N in a DAG has more
than one parent if N represents a com-mon subexpression; in a syntax tree, the tree for the common
subexpression would be replicated as many times as the subexpression appears in the original
expression. Thus, a DAG not only represents expressions more succinctly, it gives the compiler
important clues regarding the generation of efficient code to evaluate the expressions.
The leaf for a has two parents, because a appears twice in the expression. More interestingly, the
two occurrences of the common sub expression b - c are represented by one node, the node labeled
—. That node has two parents, representing its two uses in the sub expressions a* (b - c) and (b -
c)*d . Even though b and c appear twice in the complete expression, their nodes each have one
parent, since both uses are in the common sub expression b - c.

SRINIVASA RAMANUJAN INSTITUTE OF TECHNOLOGY, ANATHAPURAMU


1
Example: - Figure shows the DAG for the expression

a + a * (b - c) + (b - c)* d

2. The Value-Number Method for Constructing DAG's

The nodes of a syntax tree or DAG are stored in an array of records, as suggested by Fig.. Each row of
the array represents one record, and therefore one node. In each record, the first field is an
operation code, indicating the label of the node. In Fig.(b), leaves have one additional field, which
holds the lexical value (either a symbol-table pointer or a constant, in this case), and interior nodes
have two additional fields indicating the left and right children.

In this array, we refer to nodes by giving the integer index of the record for that node within the
array. This integer historically has been called the value number for the node or for the expression
represented by the node. For instance, in Fig. 6.6, the node labeled -I- has value number 3, and its
left and right children have value numbers 1 and 2, respectively. In practice, we could use pointers to
records or references to objects instead of integer indexes, but we shall still refer to the reference to

SRINIVASA RAMANUJAN INSTITUTE OF TECHNOLOGY, ANATHAPURAMU


2
a node as its "value number." If stored in an appropriate data structure, value numbers help us
construct expression DAG's efficiently; the next algorithm shows how.

Type Checking:-
 A compiler has to do semantic checks in addition to syntactic checks.
 Type Checking is the process of verifying that each operation executed in a program respects
the type system of the language.
 Semantic Checks
 Static – done during compilation
 Dynamic – done during run-time
 A type system is a collection of rules for assigning type expressions to the parts of a program.
Static Type Checking:-
1. Type checks: A compiler should report an error if an operator is applied to an incompatible
operand.(Incompatible Operand)
2. Flow of control checks: Places for transfer of control should be specified.(break statement)
3. Uniqueness checks: An object must be defined exactly (like the type of an identifier, statements
inside a case/switch statement.) (Uniquely declared identifier)
4. Name Related checks: Same name must appear two or more times.

Uses of Type Checking:-


1) Assignments
2) Overloading
3) Polymorphism types
4) Data structures
Type Expression:-
 The type of a language construct is denoted by a type expression.
 A type expression can be:
o A basic type
 a primitive data type such as integer, real, char, boolean, …
 type-error to signal a type error
 void : no type
o A type name
 a name can be used to denote a type expression.
Type Expressions for Primitive Data Types and constructs:-

SRINIVASA RAMANUJAN INSTITUTE OF TECHNOLOGY, ANATHAPURAMU


3
 Arrays: If T is a type expression, then array(I,T) is a type expression where I denotes index
range. Ex: array(0..99,int)
 Products: If T1 and T2 are type expressions, then their Cartesian product T1 x T2 is a type
expression. Ex: int X int
 Pointers: If T is a type expression, then pointer(T) is a type expression. Ex: pointer(int)
 Functions: We may treat functions in a programming language as mapping from a domain
type D to a range type R. So, the type of a function can be denoted by the type expression
D→R where D are R type expressions. Ex: int→int represents the type of a function which
takes an int value as parameter, and its return type is also int.

Example for Type Expressions for Struct Type:-


struct stud
{
char name[10];
float marks;
}
struct stud student[10];
Here the type name is “stud” having the type expression as:
struct ((name X array(0,1,2…….9,char)) X (marks float))
array(0,1,……9,stud)
Example for function:-
int sum(int a ,char b)
The type expressions for sum is:-
int X char int.

A Simple Type Checking System for Declaration


P → D;E
D → D;D
D → id:T , addtype(id.entry, T.type) -
T → char , T.type=char -
T → int , T.type=int -
T → real , T.type=real -
T → ↑T1 { T.type=pointer(T1.type) }

SRINIVASA RAMANUJAN INSTITUTE OF TECHNOLOGY, ANATHAPURAMU


4
T → array*int num+ of T1 { T.type=array(1..intnum.val,T1.type) }
Type Checking of Expressions
E → id { E.type=lookup(id.entry) }
E → char literal { E.type=char }
E → int literal { E.type=int }
E → real literal { E.type=real }
E → E1 + E2 { if (E1.type=int and E2.type=int) then E.type=int
else if (E1.type=int and E2.type=real) then E.type=real
else if (E1.type=real and E2.type=int) then E.type=real
else if (E1.type=real and E2.type=real) then E.type=real
else E.type=type-error }
E → E1 [E2] { if (E2.type=int and E1.type=array(s,t)) then E.type=t
else E.type=type-error }
E → E1 ↑ { if (E1.type=pointer(t)) then E.type=t
else E.type=type-error }

Type Checking of Statements


S  id = E { if (id.type=E.type then S.type=void
else S.type=type-error }
S  if E then S1 { if (E.type=boolean then S.type=S1.type
else S.type=type-error }
S  while E do S1 { if (E.type=boolean then S.type=S1.type
else S.type=type-error }

Type Checking of Functions


E  E1 ( E2 ) { if (E2.type=s and E1.type=st) then E.type=t
else E.type=type-error }
Ex: int f(double x, char y) { ... }
f: double x char  int
argument types return type
Structural Equivalence of Type Expressions
SRINIVASA RAMANUJAN INSTITUTE OF TECHNOLOGY, ANATHAPURAMU
5
• How do we know that two type expressions are equal?
• As long as type expressions are built from basic types (no type names), we may use
structural equivalence between two type expressions

A type checker implements a type system

What is Three Address Code (TAC)?


 In compiler design, Three Address Code is a form of an intermediate code which is generated
by the compiler for the purpose of implementing code optimization.
 To represent any statement, Three Address Code Instructions uses maximum three
addresses.
 Thus, each Three Address Form instruction will either consist of one address or two
addresses or three addresses but not more than three addresses.
 The Implementation of Three Address Code is done as a record with the address fields.

Usage of Temporaries:-
Temporaries are used to hold intermediate values when evaluated expressions need to be stored in
the symbol-table.
 They can require a lot of space for their values.
 Moreover wasting space implies wasting time. Indeed they have to be loaded into registers,
written back, etc.
 For nowadays compilers, time performance is a crucial issue since space is almost illumined.
Temporaries can be re-used!
 There are a lot of different approaches to doing this and a lot of discussion about when
during compilation it should be done.

General form of Three Address Code-

SRINIVASA RAMANUJAN INSTITUTE OF TECHNOLOGY, ANATHAPURAMU


6
In general, we represent the Three Address Code as-
a = b op c
Here,
 a, b and c represent the operands which may be either constants, names, or temporaries
which the compiler generates.
 op represents the operator

Common Three Address Instruction Forms/ Types of Three Address


Code:-
Following are the common Three Address Instructions forms-

1. Assignment statement-

 x = y op z and x = op y
 Here, x, y and z are the operands and op represents the operator.
 The result obtained after solving the right side expression of the assignment operator is
being assigned to the left side operand.
2. Copy statement-
x=y
 Here, x and y are the operands and ‘=’ is an assignment operator.
 The value of operand y is being copied / assigned to the operand x.
3. Conditional jump-
If x relop y goto X
 Here, x & y are the operands, X is the tag or label of the target statement and relop is a
relational operator.
 If the condition “x relop y” gets satisfied on executing the statement, then the control is
directly sent to the location specified by the label X and all the statement in between them
are skipped.
 If the condition “x relop y” fails on executing the statement, then the next statement
appearing in the usual sequence is executed.
4. Unconditional jump-
goto X
 Here, X is the tag or label of the target statement.
 When the statement is executed, the control is directly sent to the location specified by the
label X without checking any condition and all the statements in between them are skipped.

SRINIVASA RAMANUJAN INSTITUTE OF TECHNOLOGY, ANATHAPURAMU


7
5. Procedure call-
param x call p return y
 Here, p is a function which takes x as a parameter and returns y.

Example for Generation of TAC:-


Write Three Address Code for the expression given below-
- (a x b) + (c + d) – (a + b + c + d)
Solution-
We can write Three Address Code for the expression given as-
(1) T1 = a x b
(2) T2 = uminus T1
(3) T3 = c + d
(4) T4 = T2 + T3
(5) T5 = a + b
(6) T6 = T3 + T5
(7) T7 = T4 – T6
Example2:-
Write Three Address Code for the expression given below-

If A < B then 1 else 0

Solution-
We can write Three Address Code for the expression given as-
(1) If (A < B) goto (4)
(2) T1 = 0
(3) goto (5)
(4) T1 = 1
(5)
Example4: Demonstration for Boolean Expressions:
Write Three Address Code for the expression given below-
If A < B and C < D then t = 1 else t = 0
Solution-
We can write Three Address Code for the expression given as-

SRINIVASA RAMANUJAN INSTITUTE OF TECHNOLOGY, ANATHAPURAMU


8
(1) If (A < B) goto (3)
(2) goto (4)
(3) If (C < D) goto (6)
(4) t = 0
(5) goto (7)
(6) t = 1
(7)

Demonstration for Boolean Expressions using Tree Diagram

SRINIVASA RAMANUJAN INSTITUTE OF TECHNOLOGY, ANATHAPURAMU


9
Example4:-
while a < b do
if c < d then
x: = y + z
else
x := y - z
Then the translation is

L1: if a < b goto L2


goto Lnext
L2: if c < d goto L3
goto L4
L3: t1 := y + z
x := t1
goto L1
L4: t2 := y - z
x := t2
goto L1
Lnext:
Example5:- To demonstrate for switch statement

SRINIVASA RAMANUJAN INSTITUTE OF TECHNOLOGY, ANATHAPURAMU


10
switch E
begin
case V1 : S1
case V2 : S2
case Vn-1 : Sn-1
default : Sn
end
This case statement is translated into intermediate code that has the following form: Translation of a
case statement
code to evaluate E into t goto test
L1 : code for S1 goto next
L2 : code for S2 goto next
Ln-1 : code for Sn-1 goto next
Ln : code for Sn goto next
test: if t = V1 goto L1 if t = V2 goto L2
if t = Vn-1 goto Ln-1 goto Ln
next:
Note: - where t is the name holding the value of the selector expression E, and Ln is the label for the
default statement.
Example to demonstrate Three Address Code for Procedures:-

Example1: - To demonstrate three address code (TAC) For-loop

SRINIVASA RAMANUJAN INSTITUTE OF TECHNOLOGY, ANATHAPURAMU


11
Example2: - To demonstrate three address code (TAC) For-loop

SRINIVASA RAMANUJAN INSTITUTE OF TECHNOLOGY, ANATHAPURAMU


12
Example:- To demonstrate for An Array
Write three address code for following code
for(i = 1; i<=10; i++)
{
a[i] = x * 5;
}

Implementation of Three Address Code/ Data Structures for Three Address


Code/Representation of Three Address Code:-
There are 3 representations of three address code namely
1. Quadruple
2. Triples
3. Indirect Triples
1. Quadruple –
It is structure with consist of 4 fields namely op, arg1, arg2 and result. Op denotes the operator and
arg1 and arg2 denotes the two operands and result is used to store the result of the expression.
Advantage –
 Easy to rearrange code for global optimization.
 One can quickly access value of temporary variables using symbol table.
Disadvantage –
 Contain lot of temporaries.
 Temporary variable creation increases time and space complexity.

Example:-
Consider expression a = b * – c + b * – c.
The three address code is:
SRINIVASA RAMANUJAN INSTITUTE OF TECHNOLOGY, ANATHAPURAMU
13
t1 = uminus c
t2 = t1 * b
t3 = uminus c
t4 = t3 * b
t5 = t2 + t4
a = t5

2. Triples: - Fields are corresponding to temporary results are not used and references to

instructions are available.


Temporaries are not used and instead references to instructions are made.
Disadvantage:-
 Temporaries are implicit and difficult to rearrange code.
 It is difficult to optimize because optimization involves moving intermediate code. When a
triple is moved, any other triple referring to it must be updated also. With help of pointer
one can directly access symbol table entry.
Example – Consider expression a = b * – c + b * – c

SRINIVASA RAMANUJAN INSTITUTE OF TECHNOLOGY, ANATHAPURAMU


14
3. Indirect Triples: - In addition to triples we use a list of pointers to triples.

This representation makes use of pointer to the listing of all references to computations which is
made separately and stored. It’s similar in utility as compared to quadruple representation but
requires less space than it. Temporaries are implicit and easier to rearrange code. In addition to
triples we use a list of triples of pointers to triples.
Example – Consider expression a = b * – c + b * – c

Flow of Control Statements:-


Flow of control statements may be converted to Three Address Code (TAC) by using the
following functions:-
 new_label ():- Returns a new symbolic label each time it is called.
 gen_code ():- “generates” the code (string) passed as a parameter to it.
The following are attributes associated with the non-terminals for the code generation:-
Code: - Code Consists of Generated Three Address Code.
True: - Contains the label to which a jump takes place if the Boolean Expression associated (if any)
evaluates to:”True”
False: - Contains the label to which a jump takes place if the Boolean Expression associated (if any)
evaluates to:”False”.
Begin: - Contains the label / address pointing to the beginning of the code chunk for the statement
“generated” (if any) by the non-terminal.
Next: - Contains the label/ address pointing to the end of the code chunk for the statement
“generated” (if any) by the non terminal.
Basic Concepts involved:
 The basic idea of converting any flow of control statements to the three address code is to
simulate the “branching” of flow of control using the goto statement.

SRINIVASA RAMANUJAN INSTITUTE OF TECHNOLOGY, ANATHAPURAMU


15
 This can be done by skipping to different parts of the code (label) to mimic the different
flow of control branches.
Example:-
S if E then S1
S if E then S1 else S2
S while E do S1

1) S if E then S1
Semantic Rules for the above statement is
E.true:== new_label ();
E.false:= S.next;
S1.next:= S.next;
S.code:= E.code || gen_code (E.true’ :’) || S1.code.
2) S if E then S1 else S2
E.true:== new_label ();
E.false:= new_label ();
S1.next:= S.next;
S2.next:= S.next;
S.code:= E.code||gen_code (E.true ‘:’) ||S1.code||gen_code (‘goto’ S.next) ||
gen_code (E.false‘:’) ||S2.code

SRINIVASA RAMANUJAN INSTITUTE OF TECHNOLOGY, ANATHAPURAMU


16
3) S while E do S1
S. begin:= new_label ();
E.true:=new_label ();
E.false:= S.next;
S1.next:=S.begin;
S.code:=
gen_code(S.begin’:’)||E.code||gen_code(E.true’:’)||S1.code||gen_code(‘goto’S.begin)

Backpatching:-
The syntax directed definition we discussed before can be implemented in two or more passes (we
have both synthesized attributes and inheriting attributes).
 Build the tree first
 Walk the tree in the depth first order.
The main difficulty with code generation in one pass is that we may not know the target of a branch
when we generate code for flow-of –control statements.
Backpatching is the technique to get around this problem.
 Generate branch instructions with empty targets
 When the target is known, fill in the label of the branch instructions (backpatching).
The problem in generating three address codes in a single pass is that we may not know the labels
that control must go to at the time jump statements are generated.
 So to get around this problem a series of branching statements with the targets of the jumps
temporarily left unspecified is generated.
 Back Patching is putting the address instead of labels when the proper label is determined.

SRINIVASA RAMANUJAN INSTITUTE OF TECHNOLOGY, ANATHAPURAMU


17
Back patching Algorithms perform three types of operations
1) makelist (i) – creates a new list containing only i, an index into the array of quadruples
and returns pointer to the list it has made.
2) Merge (i, j) – concatenates the lists pointed to by i and j, and returns a pointer to the
concatenated list.
3) Backpatch (p, i) – inserts i as the target label for each of the statements on the list
pointed to by p.
Back patching for Boolean Expressions
 We now construct a translation scheme suitable for generating code for Boolean expressions
during bottom-up parsing.
 A marker nonterminal M in the grammar causes a semantic action to pick up, at appropriate
times, the index of the next instruction to be generated.
Example: x<100 || x>200 && x! =y

The grammar is as follows Boolean expressions


SRINIVASA RAMANUJAN INSTITUTE OF TECHNOLOGY, ANATHAPURAMU
18
E->E1 or M E2
E->E1 and M E2
E-> not E1
E-> (E1)
E->id1 relop id2
E->true
E->false
M->

Production Semantic Rule


E->E1 or M E2 { backpatch(E1.falselist, M.quad);
E.truelist=merge(E1.truelist,E2.truelist); E.falselist =
E2.falselist;}

E->E1 and M E2 {backpatch(E1.truelist, M.quad);


E.truelist=E2.truelist;
E.falselist = merge(E1.truelist, E2.truelist);}
E-> not E1 {E.truelist = E1.falselist, E.falselist = E1.truelist;}
E-> ( E1 ) {E.truelist = E1.truelist;
E.falselist = E1.falselist;}
E->id1 relop id2 {E.truelist = makelist(nextquad);
E.falselist = makelist(nextquad+1);
emit(‘if’ id1.place relop.op id2.place ‘goto ???’);
emit (‘goto ???’);
}
E->true {E.truelist=makelist (nextquad); emit (‘goto???’) ;}
E->false {E.falselist=makelist (nextquad); emit (‘goto???’) ;}
M-> Epsilon {M.quad = nextquad;}

Advantage of Backpatching:- Reduce Jump Instructions. If it reduces jump instructions internally it


specifies that there is reduce in time complexity and space complexity.

References:-
https://slideplayer.com/slide/9936796/
https://www.geeksforgeeks.org/three-address-code-compiler/
https://www.brainkart.com/article/Variants-of-Syntax-Trees_8160/

SRINIVASA RAMANUJAN INSTITUTE OF TECHNOLOGY, ANATHAPURAMU


19

You might also like