Compiler Final Modified

CHAPTER 1
INTRODUCTION
Language translator: Language translators convert programming source code into language
that the computer processor understands. Programming source code has various structures
and commands, but computer processors only understand machine language. Different types
of translations must occur to turn programming source code into machine language, which is
made up of bits of binary data.
Types of language translators:
Compiler: Compiler is a language translator which translates the higher language program into
machine language. The Compiler transforms source code into executable program. The
output of the compilation has been called object code or sometimes an object module. The
object code is machine code that the processor can process or "execute" one instruction at a
time.
A compiler requires
1) Determining the correctness of the syntax of programs.
2) Generating correct and efficient object code.
3) Run-time organization.
4) Formatting output according to assembler and/or linker conventions.
Assembler: An assembler translates assembly language into machine language. Assembly

language is one step removed from machine language. It uses computer-specific commands
and structure similar to machine language, but assembly language uses names instead of
numbers. An assembler is similar to a compiler, but it is specific to translating programs
written in assembly language into machine language. To do this, the assembler takes basic
computer instructions from assembly language and converts them into a pattern of bits for the
computer processor to use to perform its operations.
Interpreter: Interpreter analyzes and executes each line of source code in succession, without
looking at the entire program. The advantage of interpreters is that they can execute a
program immediately. Compilers require some time before an executable program emerges.
However, programs produced by compilers run much faster than the same programs executed
by an interpreter. Interpreter is useful during software development while compiler is useful
after software development.
Structure of Compiler:
A compiler consists of three main parts: the frontend, the middle-end, and the backend.
Front End: It checks whether the program is correctly written in terms of the programming
language syntax and semantics. Here legal and illegal programs are recognized. Errors are
reported, if any, in a useful way. Type checking is also performed by collecting type
information. The frontend then generates an intermediate representation of the source code
for processing by the middle-end.
Middle End: It is where optimization takes place. Typical transformations for optimization are
removal of useless or unreachable code, discovery and propagation of constant values,
relocation of computation to a less frequently executed place (e.g., out of a loop), or
specialization of computation based on the context. The middle-end generates another
intermediate representation for the following backend. Most optimization efforts are focused
on this part.
Back End: It is responsible for translating the intermediate representation from the middle-end
into assembly code. The target instructions are chosen for each intermediate representation
instruction. Variables are also selected for the registers.
Phases of compiler: The compilation process operates as a sequence of phases, each of

which transforms one representation of the source program to another.
1) Lexical analysis: Lexical analysis groups the programs in the form of lexemes. For each
lexeme, the lexical analyzer produces as output tokens. Tokens are basic unit of program.
Lexical analysis identifies operators, operands, keywords, variables, constants. It does not
count blank spaces and comments in the program, one token most be completed in one line.
Lexical analysis is based on DFA.
2) Syntax Analysis: Syntax analysis analyzes the stream of tokens generated by lexical analysis
and checks whether it satisfies the rules of the grammar of the programming language. It
takes the first components of the tokens produced by the lexical analyzer to create a tree like
intermediate representation that depicts the grammatical structure of the token stream. A
typical representation of this is a syntax tree or parse tree.
3) Semantic Analysis: A semantic analyzer uses the syntax tree and the information in the
symbol table to check the source program for semantic consistency with the language
definition. It also gathers type information and saves it in either the syntax tree or the symbol
table, for subsequent use during intermediate-code generation. An important part of semantic
analysis is type checking, where the compiler checks that each operator has matching
operands.
4) Intermediate code generation: After semantic analysis, many compilers generate all
explicit low-level or machine-like intermediate representation, this intermediate
representation should have two properties:
• It should be easy to produce.
• If should be easy to translate into target machine code.
5) Code optimization: Code optimization phase attempts to improve the intermediate code
so that better target code will result. Usually, better means faster, but other objectives may be
desired, such as shorter code that consumer less power. This phase is optional.
6) Code Generation: The intermediate instructions are translated into sequences of machine
instructions that perform the same task. A crucial aspect of codes generation is the
assignment of registers to hold variables.
7) Symbol table manager: A symbol table is a data structure used by a language translator
such as a compiler or interpreter, where each identifier in a program's source code is
associated with information relating to its declaration or appearance in the source, such as its
type, scope level and sometimes its location.
Grouping Of Phases into Passes:

1) One- pass compiler: If compiler performs all the activities in one scan (pass), then it is
one-pass compiler. In this source code is scanned only once.
2) Two- pass compiler: In two-pass compiler the source code is scanned twice.In first pass-
lexical analysis, syntax analysis, semantic analysis and intermediate code generation are
performed. In the second pass code optimization and code generation are performed.
The first pass is machine independent while the second pass is highly machine dependent.
Two pass compiler is preferred over one-pass compiler because memory requirement of
one-pass compiler is very high.
CHAPTER 2
PARSING: Parsing is a process or techniques used to determine whether a given string can be
generated by a grammar.
Types of parsing:
Top-Down parsing: It is a technique to generate string staring from the start symbol.
Bottom-up parsing: It is a technique to reduce the string into start symbol. Grammar of
programming language must be unambiguous. Bottom up parsers are more powerful than top
down parser.
Top down parser: Top down parser are also known as predictions parser or LL parser
Where
L = Left to right scanning of string
L = left most derivation.
Generally, top down parser uses LL(1) grammar.
LL (1)
Left to right left most derivation.

scanning of string
Look ahead: The value of look ahead depends on the decision that how many inputs are to be
look ahead to generate the string.
LL(0): No input to be looked

LL(1): Current input has to be looked
LL(2): Current & next input has to be looked
For a grammar to be LL or LR, It must be unambiguous-

Ex. Find the look ahead for the grammar.
SàCC
CàcC
Càd
Sol: Consider the string

cdd.
The look ahead for this grammar is 0 i.e. it is LL(0) because c will always be the
first character that can be generated.
Both LL & LR grammars are the subset of unambiguous grammar.
Relationship between LL & LR grammar:

LL(0) ÍLL(1) ÍLL(2) Í ………………… ÍLL (k)
&LR (0) Í LR (1) Í LR(2) Í……………..ÍLR(k)
& Every LL(0) is LR(0)

but every LR(0) is not LL(0).
First: FIRST of a non-terminal is the set of terminals that can occur as the first token in a string
derivable from that non-terminal.
Rules to calculate first:

1) If x is a terminal than first (x) = {x}.
2) If x is a non –terminal & x à y1 y2…….yk is a production, than place a in first (x) if some
i,a is in First (yi) and є is in all of first (y1)………………..,first (y;-1).
3) If x à ε is a production, then add ε to First (x).
Ex. Find the first set for the grammar.

SàA/ B/ ε
AàaA/a
BàbB/b
Sol:
First (S) = {a, b, ε}
First (A) = {a }
First (B) = { b}
Follow: FOLLOW of a non-terminal is the set of terminals that can occur immediately after
that non-terminal in a string derivable from the start symbol of the grammar.
Rules to calculate follow:

1. Add $ to the follow of starting symbol
2. If there is some production of sort Aàa B b , whatever is in the first of b should be
added to the follow of b except ε.
3. If there is some production of sort Aà a B whatever is in the follow of A should be
added to the follow of B.
Ex. Find first follow set of the grammar

EàE+T/T
TàT*F/F
Fà(E)/id
Sol: First (E) = {(, id} Follow (E) = {$, +, )}
First (T) = {(, id} Follow (T) = {$, +, ), *}
First (F) = {(, id} Follow (F) = {$, +, ), *}
Left recursion: A grammar is said to be left recursive if it has a non-terminal A, Such that
A à A a/b
Where
a¹ε
b may be ε
Removal of left recursion:

Aàb A`
A`àa A` / ε
Ex. Remove left recursion from the grammar.

Eà E+ T/T
TàT*F/F
Fà(E) / id
Sol: Eà E +T/ T since this production has left recursion
a b
\ Eà TE`
E` à +TE`/ε
Tà T *F / F since this production also has left recursion

a b
\ T à FT`
T`à*FT`/ε
So the grammar becomes
EàTE`
F`à +TE`/ ε
TàFT`
T`à*FT`/ ε
F à (E)/id
Left Factoring: A grammar is said to be left factored if it has a non-terminal A such that,
Aàab/ag
Removal of left factoring:

AàaA`
A`àb/g
Ex. Remove Left factoring from the grammar

Sà abA / abB
Aà a
Bà b
Sol: Sà abS`
S`àA/B
Aàa
Bàb
Predictive parsing: A non-backtracking top-down parser is called predictive parser. A

predictive parser is characterized by its ability to choose the production to apply solely
on the basis of the next input symbol and the current non - terminal being processed.
Predictive parser can parse only LL (1) grammar.
For a grammar to be LL (1), these three conditions must be satisfied:

1. The grammar should be unambiguous.
2. There should be no left recursion.
3. There should be no left factoring.
These are the necessary but not sufficient conditions for a grammar to be in LL (1).
How to create predictive parsing table:
Calculate the first and follow set of the grammar.
Number of column’s = Number of terminals (except ε) in the grammar + one column for $
Number of rows = Number of non-terminals in the grammar.
Steps:
1) For each terminal a in first (A) add the production Aàa to M (A, a).
2) If ε is in first of (a) then for each terminal b in follow (A), add Aàa , to M [A ,b]. If ε is
in first (a) and $ is in follow (A), add Aàa to M[A,$]
If any cell contains more than one entry then the grammar is not LL(1).
Ex. Construct the predictive parsing table and find whether the grammar is LL(1) or not.
EàTE`
E`à +TE`/ε
TàFT`
T`à*FT`/ε
F à(E)/id
Sol:
First (E) = {(, id} Follow (E) = {$,)}
1
First (E ) = {+, ε} Follow (E`) = {$,)}
First (T) = {(, id} Follow (T) = {+, $, )}
First (T`) = {*, ε} Follow (T`) = {+, $,)}
First (F) = {(, id} Follow (F) = {*, +, $,)}
As there is no cell which has more than one entry.

\ This grammar is LL(1).
Cs 2003-Q.56 Cs2006-Q.58
Bottom up parser: A bottom-up parse starts with the string of terminals itself and builds
from the leaves upward, working backwards to the start symbol by applying the productions
in
reverse. Bottom-up parsing algorithms are more powerful than top-down methods.
Bottom up parsers are also known as shift reduce or LR parser.
Parsing through bottom up parser: Bottom up parser uses stack for parsing.They are
also known as shift reduce parser.
4 action performed by bottom up parser.
1) Shift: in shift the next input symbol is shifted on the top of stack.
2) Reduce: In reduce the parser reduce the context of stack by non- terminal.
3) Accept: in accept the parser announces successful completion of parsing.
4) Error: In error the parser discovers that a syntax error has occurred & call’s on error
recovery routine.
Ex. Consider the grammar

EàE +E
EàE * E
Eà id
And the string id + id * id perform bottom up parsing.
Stack Input Action

$ id+ id * id $ shift
$ id + id * id $ reduce (Eàid)
$ E+ id * id $ shift
$E+ id * id $ shift
$ E + id * id $ reduce (Eàid)
$E+E * id $ reduce (Eà E+ E)
$E * id $ shift
$E* id $ Shift
$ E * id $ reduce (Eàid)
$E*E $ reduce (EàE*E)
$E $ Accept
Bottom up parser are also known as LR parser when

R = Right most derivation in reverse order.
Types of LR parsers:
1) SLRà Simple LR
2) CLRàCanonical LR
3) LALRàLook ahead LR
CLR is the most powerful bottom up parsing technique.

CLR> LALR > SLR.
SLR Parsing: SLR parser can parse LR(0) grammars. It is based on DFA.
LR (0) Items: An LR(0) item of a grammar G is a production of G with a dot (.) at some
position in the right hand side.
Steps:
1) Augment the grammar: If G is a grammar with start symbol S then G` is the

augmented grammar for G with a new start symbol S1 & production.
S`àS
2) Make the states of DFA. i.e. LR(0) items.
I) In the first state (I0) put all the LR(0).

II) Shift the dot(.) of production rule in I0, To make new states. If after dot (.) there is
a non- terminal, then add all the productions which start with that non-terminal
Ex. Construct LR (0) items for the grammar
SàCC -----(1)
CàcC ----(2)
Càd ----(3)
Sol: Augmented grammar:
S`àS
SàCC
CàcC
Càd
Construction of states of DFA.
I0: S`à.S
Sà.CC
Cà.cC
Cà.d
I1: S`àS. //change state if dot shift at end
I2: SàC.C // . variable is applied and add 2 production start with variable
Cà.cC
Cà.d
D->.cD
I3: Càc.C c.D

Cà.cC
Cà.d
I4: Càd. //change state if dot shift at end
I5: SàCC.
I6: CàcC.
Construction of SLR parsing table: The SLR parsing table consist of two parts:
1) A Parsing-action function ACTION
2) A goto function GOTO.
Structure of SLR parsing table:

1) Calculate follow set of original grammar.
2) The action function takes as argument a state ‘i’ and a terminal ‘a’ (or $, the input end
marker). The value of ACTION[I, a] can have any of 4 forms: //for terminal
a) Shift j, where j is a state
b) Reduce Aàb
c) Accept
d) Error
3) We extend the GOTO function defined on sets of items, to states if GOTO [Ii, A] = Ij,
then GOTO also maps a state ‘I’ and a non terminal ‘A’ to state j. // for non-terminal
The codes for the actions are:
1) Si means shift a stack state i.
2) rj means reduce by the production numbered j.
3) acc means accept.
4) Blank means error.
Steps to construct parsing table:

1) Construct C = {I0, I1………..In}, the collection of sets of LR(0) items for G’.
2) State I is constructed from Ii. The parsing actions for state I are determined as follows:
a) If [ Aà a.ab] is in Ii and GOTO (Ii ,a) = Ij, then set ACTION [I, a] to “shift j”
b) If [Aàa.] is in Ii, Then set ACTION [I, a] to “Reduce Aà a” for all a in follow (A);
here A may not be S`.
c) If [ S`àS.] is in Ii, then set ACTION[ I, $ ] to “accept”
3) The GOTO transitions for state I are constructed using the rule: If GOTO (Ii, a) = Ij, then
GOTO [I, A] = j. If any cell of the table contains two entries, we say that the grammar is
not SLR (1) or LR(0).
SLR parsing table of previous Ex.
Follow (C) = {c, d, $}

Follow (S) = {$}
Only Shift-Reduce & Reduce-Reduce conflicts can occur in the parsing table.
CLR parsing: CLR parser can parse LR(1) grammars. It is also based on DFA.
LR(1) items: The general form of an LR(1) item is [Aàa.b, a] where Aà ab is a production
and ‘a’ is a terminal or the right endmarker $.It is an LR(1) item where 1 refers to the length
of the second component called the look ahead of the item.
How to calculate Look ahead: The look ahead of the augmented production is always $.
If A à a.B b is an LR(1) item with look ahead g i.e.
Aàa.Bb, g
Then the look ahead of the production starting with B is. First (bg) i.e.
Bà.aba, First (bg). // any production starting with B
Steps:
1) Augment the grammar. The look ahead of the augmented production rule is always $.
2) Make the states of the DFA i.e. LR(1) items.
a. In the first state (I0) put all the LR(1) items and calculate look ahead for them.
b. Shift the dot (.) of the production rule in I0 to make new states. If after dot (.) there is
a non-terminal, than add all the productions which starts with that non-terminal and
calculate look ahead for them.
Ex. Construct LR (1) Items for the grammar
Sà CC ---- (1)
CàcC ---- (2)
Càd ----(3)
Sol: Augmented grammar:

S ’à S
SàCC
CàcC
Càd
Construction of state of DFA.
I0: S`à.S, $ // look ahead for initial production is $

Sà.CC, $ // look ahead of S is first of ($)={$}
Cà.cC, c/d // look ahead of C is first of C,$ that is first of C ={c,d}
Cà.d, c/d
I1: S`àS., $ // look ahead is calculated , now shift dot as done in SLR parsing
I2: SàC.C, $ // shift the dot and write all production starting from C with look ahead
Cà.cC, $
Cà.d, $
I3: Càc.C, c/d

Cà.cC, c/d // the next item of this production is already before this production
Cà.d, c/d
I4: Càd., c/d. // the next item of this production is already calculated in I4
I5: SàCC., $
I6: Càc.C, $
Cà.cC, $
Cà.d, $
I7: Càd., $
I8: CàcC., c/d
I9: CàcC., $
Steps to construct parsing table:
1) Construct C= {I0, I1, ……………………In}, the collection of sets of LR(1) items for G’
(Augmented Grammar). //done above
2) State i of the parser is constructed from Ii. the parsing action for state I is determined as
follows.
a) If [Aàa. a b, b ] is in Ii and GOTO (Ii, a) = Ij then set ACTION [I, a] to “shift j”. //for
terminal in Action transition
b) If [Aàa., a ] is in Ii A¹S`, then set ACTION [i, a] to “reduce Aàa.” // dot at last
c) If[S`àS. $ ] Is in Ii, then set ACTION [i, $] to “ accept”
If any conflicting actions result from the above rules, we say the grammar is not
LR(1).
3) The goto transition for state ‘I’ for all non-terminals A are constructed using the rule If GOTO
[Ii ,A] = Ij then GOTO [i, A]= j. // for Variable in goto transition
CLR parsing table of the previous Ex.:

LALR Parsing: In CLR there are some states with same items but different lookahead. In
LALR, these states can be combined by combining lookahead’s.
If merging is not possible then LALR is same as CLR.
Only reduce-Reduce conflict can occur during merging, then the grammar is CLR (1) but not
LALR (1).
Number of states in SLR and LALR are equal, but in CLR it is grates than that of SLR or LALR.
All the procedure for constructing LR(1) items & parsing table are same as that of CLR only
some states are merged.
LALR parsing table of previous ex.

In this question states I3 & I6 have same items, states I4 & I7 have same items and states I8 & I9
have same items so, we combine these state then the parsing table becomes.
Cs-2008, Q.55 Cs-2003, Q.57 Cs-2005, Q.60 Cs-2006, Q.7, Q.56, Q.57, Cs-2007,
Q.52,Q.53 Cs-2010, Q.38.
YACC (Yet another Compiler-compiler): YACC provides a general tool for imposing
structure on the input to a computer program. The YACC user prepares a specification of the
input process; this includes rules describing the input structure, code to be invoked when
these rules are recognized, and a low-level routine to do the basic input.
YACC then generates a function to control the input process. This function, called a parser,
calls the user-supplied low-level input routine to pick up the basic items (called tokens) from
the input stream.
These tokens are organized according to the input structure rules, called grammar rules; when
one of these rules has been recognized, then user code supplied for this rule, an action, is
invoked; actions have the ability to return values and make use of the values of other actions.
YACC is available as a command on the UNIX system. If input is gives to it then it gives the
cods of its parsing. It creates the code for LALR parsing.
YACC resolve shift-reduce conflict in favour of shift and removes reduce entry
YACC does not generate any error where ambiguous grammar is given as input.
Cs 2003, Q. 83 (a), Q.83 (b)
Operator Grammar: A grammar is said to be operator grammar when there exist no null
production and there should be no consecutive non-terminal on the right hand side.
Ex: EàE+E
EàE*E
Eàid
Operator Precedence Grammar: A grammar is said to be Operator Precedence Grammar

if
1. It is an Operator Grammar.
2. There exists a unique precedence relationship between the operators/terminals.
Ex. EàE+T/T
TàT*F/F
Fàid
Viable prefix: Viable prefix is any prefix of right most derivation.
Ex. EàE+E
EàExE
Eàid
Sol: Rightmost derivation for string id + id * id
EàE+ E
EàE+ E * E
Eà E+ E * id
Eà E+ id * id
Eàid+ id * id
Viable prefix: E+E*, E + id*, E +
Handle: A handle of a string is a substring that matches the right side of a productions and
whose deduction to a non-terminal on the left side of the production represents one step
along the reverse of a right most derivation.
Cs 2005, Q.59
Syntax Directed translation: Syntax-directed translation refers to a method of compiler

implementation where the source language translation is completely driven by the parser.
In other words, the parsing process and parse trees are used to direct semantic analysis and the
translation of the source program. This can be a separate phase of a compiler or we can augment
our conventional grammar with information to control the semantic analysis and translation.
Such grammars are called attribute grammars.
With each production in a grammar, we give semantic rules or actions, which describe
how to compute the attribute values associated with each grammar symbol in a production. The
attribute value for a parse node may depend on information from its children nodes below or its
siblings and parent node above.
Ex.
Production semantic rule.
EàE+E E.val = E1.val + E2.val
EàE*E E.val = E1.val * E2.val
Eàid E.val = id.val
A grammar in which each production rule is assigned some semantic rule is called as attributed
grammar. Or an SDD without side effect is also called attribute grammar.
Types of attributed grammar:

1) Synthesized Attributed Grammar.
2) Inherited Attributed Grammar.(L)
Synthesized Attributed Grammar: Those syntax directed definition in which value of node
depends on the value of child is known as synthesized attributed grammar.
Ex. E.val = E1.val + E2.val
i.e.
Inherited attributed grammar: Those syntax directed definition in which the value of node
depends on the value of its parent or siblings is known as inherited attributed grammar.
Ex. E1. val = E. val + E2. Val
Every synthesized attributed grammar is inherited attributed grammar but every inherited
attributed grammar is not synthesized attributed grammar.
L-Attributed Grammar: L-attributed grammar is a special case of inherited attributed

grammar. A grammar is said to be L- attributed grammar if the value of a node depends on
child or parent or left sibling.
Ex. E2.val = E.val + E1.val
i.e.
• Every L-attributed grammar is synthesized attributed grammar but every synthesized

attributed grammar is not L-attributed grammar.
• S-attributed grammar is suitable for bottom up evaluation.
Annoted parse tree: A parse tree showing the value of its attribute is called annoted parse
tree. The process of computing the attributes values at the nodes is called annotating (or
decorating) of the parse tree.
Ex. EàE1+E2 E.val= E1.val +E2.val

EàE1*E2 E.val = E1.val * E2.val
Eàid E.val = id.val
and the string 2+3*4.
Construct annoted parse tree for the grammar.
Final value of E is 14 not 16.

To find the output of side effect perform DFS from left to right
Cs 2003 Q.58 Cs 2006 Q.59 Cs 2004 Q.45
Format of Intermediate Codes:

1) Postfix Expression.
2) Syntax tree
3) Quadruple
4) Triple
Syntax tree: A parse tree contains redundant information which can be eliminated thus
producing a more economical representation of the source program is called syntax or
abstract tree.
Ex. c= a+ b * d
Quadruple: A quadruple has four fields, which we call op, arg1, arg2 and result, the op field
contains an internal code for the operator.
Ex. C = a + b * d.
Sol.
Triple: A triple has only three fields which we call op, arg1 & arg2, using triples, we refer to
the result of an operation x op y by its position.
Ex. C = a + b * d.
Sol.
Basic Block: A basic block is a sequence of consecutive statements in which flow of control
enters at the beginning & leaves at the end without halt.
Basic block partition:

1) First determine the set of leaders i.e., the first instruction in some basic block.
(a) The first instruction is a leader.
(b) Any instruction that is the target of conditional or unconditional jump is a leader.
(c) Any instruction that immediately follows a conditional or unconditional jump is a
leader.
2) For each leader, its basic block consists of the leader and all instructions upto but not
including the next leader or the end of the intermediate program.
Ex. S= 0;
a =100;
L1: if (a <10)goto L2;
S=S+a;
a=a-10;
goto L1;
L2: print (“s”);
Sol: The leaders in the program are:

1) S= 0;
2) if a <10
3) S=S+ a
4) Print s
\ Basic blocks:
B1 = S= 0;
a =100;
B2 = if (a <10) goto L2;
B3 = S=S+ a;
a=a-10;
goto L1;
B4: print (“s”);
Control Flow Graph: Once the program is partitioned into basic block we represent the
flow of control between them by a flow graph. The nodes of the flow graph are the basic
blocks. There is an edge from block B to block C if and only if it is possible for the first
instruction in block C to immediately follow the last instruction in block B.
Ex. Construct control flow graph of previous Ex.
Code optimization Techniques:
Basic Block Optimization:

1) Elimination of Common Sub expression: Common Sub expressions can be detected by
noticing, as a new node M is about to be added, whether there is an existing node N with
the same children, in the same order and with the same operator. If so, N Computes the
same value as M and may be used in its place.
Ex. A = a+ c * d
K = c+ c * d
Sol: In this the expression c * d is common in both the statements.

\ We can write these statements as
t=c*d
A = a+ t
K=C+t
2) Strength Reduction: The transformation of replacing an expensive operation such as
multiplication, by a cheaper one such as addition is known as strength reduction.
Ex. a = a * 4
Sol: As we know that when we have to multiply a binary number by 2n we can do this by left
shifting the number by n bits.
\ a = a << 2
3) Dead code Elimination: The statements that compute values that never get used, are
dead codes. The compiler should remove that statement.
4) Code Motion: An important Modification that decreases the amount of cods in a loop is
code motion, this transformation takes an expression that yields the same result
independent of the number of times a loop is executed, and evaluates the expression
before the loop.
Ex. for ( i=1; i<10; ++i)
{
C = 20;
K= i * C;
}
Sol: In this every time loop runs C is initialized a value of 20.
\ This code can be written as
C= 20;
For (i = 1; i<10; ++i)
{
K= i * c;
}
5) Constant Folding: If the expression involves some constant then evaluate them at
compile time so that run time run time reduces.
Ex: area = (22/7) * r * r

Sol: In this expression 22/7 is a constant and it is evaluated at compile time.
6) Simplification of expression with algebraic laws:
Ex. X + 0 = 0 + X = X
X*1=1*X=X
X-0=X
X/1=X
a * b + a * c = a * (b + c)
i.e. if possible apply algebraic identities to reduce the time consuming operations.
Cs 2006, Q.60 Cs 2004, Q. 32
Loop optimization technique: Loop optimization is used to minimize the number of

cache misses and pipeline hazards.
1) Loop Fusion: If Possible merge the two loop so that cache misses are reduced and run
time be saved.
Ex. For (i=1; i<=10; ++i)
{
a [i] = a [i] +1;
}
For (i=1; i<=10; ++i)
{
b[i] = b[i] +1;
}
Sol: We can merge these two loops as
For (i=1; i<=10; ++i)
{
a[i] = a[i] +1;
b[i] = b[i] +1;
}
2) Loop Fission: It is just the reverse process of loop fusion. If possible then divide the
loops.
3) Loop interchange: If possible then change the loop so that cache misses can be reduced.
Ex. For ( i=1; i<=10; ++i)
For (j=1; j<=10; ++j)
a[j] [i] = 0
Sol: The above ex. accesses the away column wise resulting in more cache misses. So, that
above code can be written as
For ( i=1; i<=10; ++i)

For ( j=1; j<=10; ++j)
a[i] [j] = 0.
It accesses the array row wise, resulting in less cache misses.
4) Loop unrolling: Operations from one iteration can not overlap with those from another.
One simple but highly effective technique to mitigate this problem is to unroll the loop a
small number of times before code scheduling.
Ex. For ( i=0; i<=N; i++)
{
S(i);
}
Sol: The above code can be written as
For (i=0; i+4<N; i+ = 4)
{
S(i);
S(i+1);
S(i+2);
S(i+3);
}
Machine Dependent Optimization or Peep Hole Optimization: A simple but
effective technique for locally improving the target code is peephole optimization, which is done
by examining a sliding window of target instructions (called the peephole) and replacing
instruction sequences within the peephole by a shorter or faster sequence, whenever possible.
The code in the peephole need not be contiguous. It is characteristic of peephole optimization
that each improvement may spawn opportunities for additional improvements.
Characteristics of peephole Optimization:
1) Redundant- Instruction Elimination: It is of two types:

a) Eliminating Redundant Loads & Stores:
If we see the instruction sequence
LD a, R0
St R0, a
In a target program, we can delete the store instruction because whenever it is executed,
the first instruction will ensure that the value of a has already been loaded into R0.
Redundant loads and stores of this type- would not be generated by the simple
code generation algorithm.
b) Eliminating Unreachable Code: An unlabeled instruction immediately following an
unconditional jump may be removed. This operation can be repeated to eliminate a sequence
of instructions.
Ex. If debug = = 1 goto L1
goto L2
L1: print debugging information
L2:
Sol: It can also be written as
If debug != 1 goto L2
print debugging information.
2) Flow of Control Optimization: Some intermediate code generation algorithm

frequently produce jumps to jumps, jumps to conditional jumps or conditional jumps
to jumps. These unnecessary jumps can be eliminated, by the peephole optimizations.
Ex. goto L1
………..
L1: goto L2
it can be replaced by
goto L2
………….
L1: goto L2
Ex: goto L1
…………..
L1: if a <b goto L2
L3:
.………….
it can be replaced by
if a<b goto L2
goto L3
…………….
L3:
(3)Algebraic simplification and Reduction in strength: Algebraic identities and

reduction in strength transformations can be applied in the peephole optimization, these are
same as that in basic block optimization.
(4) Use of machine idioms: The target machine may have hardware instructions to
implement certain specific operations efficiently. Detecting situations that permit use of these
instructions can reduce execution time significantly. For example, some machines have auto-
increment and auto-decrement addressing modes. These add or subtract from an operand
before or after using its value. These modes can also be used in code for statements like
X= X+1
Storage Classes of variable in C:
1) Auto
2) Register
3) Static
4) Extern.
By default variable belongs to auto class.
1) Auto: If the class of variable is not mentioned then by default it belongs to auto class.
When function is called auto variables are created and destroyed when function call
terminate. By default they contain garbage value. They are crated in stack.
2) Static: Static variables once created exists throughout the life time of the program.
Declaration is executed only once when function is called for the first time, but other code
is executed many times. By default these variables are initialized to 0. They are created in
heap.
3) Register: Default value of register variable is garbage. They are created in stack and can
be used only for integer variable. Programmer requests compiler to assign register to
variables, if variable declared to belong to register class. Access to register variable is fast
because they are present in register and register access is faster than that of memory
access.
4) Extern: Extern variables are crated in heap and they use the global value of the variable
generally.
Ex. void add( ) void main( )

{ {
int a= 1; add( );
cout<<a; add( );
a = a+1; add( );
} }
Output = 111
Ex. void add( ) void main ( )

{ {
static int a=1; add( );
cout <<a; add( );
a = a+1; add( );
} }
Output= 1 2 3
Ex. void add( ) void main ( )

{ {
static int a; add( )
a=1; add( );
cout <<a; add( );
a = a+1;
} }
Output: 111
Ex. Program: P1 Program: P2

int *p( ) int *p( )
{ {
int a; int *a
return &a; a = malloc( );
} return a
}
void main() void main ( )
{ {
int *a; int *a;
a = p( ); a = p( );
} }
which is correct p1 or p2?
Sol: Implementation of p1 is incorrect because ‘a’ is a auto variable and crated in stack but it will
be destroyed as function terminates. Therefore ‘a’ contains the address of variable which has
been destroyed & hence result in runtime error.
Implementation of p2 is correct because although ‘a’ is destroyed but address is always
present in heap, hence prints the address of variable which exists.
Parameter passing Techniques:
1) Call by value
2) Call by reference
3) Call by need
4) Call by name
5) Call by values result
1) Call by value: In this technique actual parameters and formal parameters have different
memory locations i.e. they do not share memory. When any function is called the value of
actual parameters are copied into formal parameters.
Ex. void swap (int a, int b) void main( )

{ {
int c; int i=10,j=20;
c = a; swap (i,j);
a = b; cout<< i <<j;
b=c; }
}
Sol: i, j- actual parameters

a,b- formal parameters
When function is called
After function termination.
So, it does not swap the values of i, j and original value is printed.
2) Call by Reference: In this technique actual parameters and formal parameters refers to same
memory location i.e. they share memory. Formal parameter refers to the actual parameter’s
memory location. If constants (literals) or expressions are passed then, it will behave as call
by value. Name of actual & formal parameter can be same or different.
Ex. void swap ( int a, int b) void main( )

{ {
int c; int i=10, j=20;
c=a; swap (i, i+j);
a=b; cout << i << j;
b=c; }
}
Sol: When function is called
After function termination
\ The values printed are swapped.

IT2007 Q.33
3) Call by need: Call by need is like call by value. Actual formal parameter does not share
memory. In this when needed actual parameters value is copied into the formal parameter,
therefore it is also known as lazy evaluation technique.
Ex. int a=10; void p (int x)

void main( ) {
{ a=100
P (a); cout << x
cout << a; x ++;
} cout << x;
}
Sol:
The values printed
100, 101, 100
Logic programming languages supports call by need like prolog.
4) Call by value Result: It is like call by value. Actual & formal parameter does not share
memory locations.
Values of actual parameter is copied into formal parameter at the time of
function call and at the end of function call the value of formal parameter is copied into the
actual parameter.

void main( ) {
{ a=100
P (a); cout << x
cout << a; x ++;
} cout << x;
}
Sol:
So output is – 10, 11, 11

If formal parameters are expression then both call by reference & call by value result behave like
call by value.It is also known as call by result.
5) Call by Name: Macros in C are like inline function in c++. Call by Name is like macro
substitution where during substitution formal parameters are replaced by actual parameters.
Only ALGOL Supports call by Name.

void main( ) {
{ a=100;
a= 100; cout << x;
cout << a; x++;
a++; cout<<x;
cout << a; }
cout << a;
}
The output is 100, 101, 101
Ex. Given program

void swap(int a, int b) void main( )
{ {
int c; int k[ ] = {10, 20, 30};
c = a; int i= 0;
a = b; swap (i, k[i]);
b = c; }
}
Solve it by call by reference & call by name method.
Sol: Call by Reference
Output: 10, 0
Call by name
void main( )
{
int k[] = {10, 20, 30};
int i =0;
c = i;
i = k[i];
k[i] = c;
}
Output: c = 0
10 = k[0]
k[10] = 0
Call by reference & call by result may produce different answers.
Scoping Techniques: There are 2 techniques to decide scope of variable:

1) Statics scoping
2) Dynamic scoping
1) Static scoping: Function which declare variables within its body then that function use local
variable if no such variable is declared then function use global variable. Technique used to
decide scope of variable in C is known as static scoping.
Ex. int a= 10
Void P() void Q( ) void main( )
{ { {
int a = 100; cout <<a P( );
Q ( ); } Q ( );
cout <<a }
}
Output: 10, 100, 10

In this function P uses its local variable a = 100 while function Q uses global variable a= 10.
2) Dynamic Scoping: In dynamic scoping all variables are crated within stack whether they are
global or local. In stack global variables are stored first. When a function is called its local
variable is stored in stack and at the end of function call this variable is destroyed.
Ex. int a= 10
Void P() void Q( ) void main( )
{ { {
int a = 100; cout <<a P( );
Q ( ); } Q ( );
cout <<a }
}
Sol:
Output 100, 100, 10
In dynamic scoping, scope is decided at runtime. C does not uses dynamic scoping.
Call by name: In this for every formal parameter compiler creates a thunk which contains the
actual parameter corresponding to formal parameter. In this whenever needed thunk is
created for formal parameter.
Cs-2001, Q.2.17, 2.18, 2.19 Cs-2003, Q.73, 74 IT 2007, Q. 34
Parsing through bottom up parser:Bottom up parser uses stack for parsing. They are
also known as shift reduce parser.
4 action performed by bottom up parser.
5) Shift: in shift the next input symbol is shifted on the top of stack.
6) Reduce: In reduce the parser reduce the context of stack by non- terminal.
7) Accept: in accept the parser announces successful completion of parsing.
8) Error: In error the parser discovers that a syntax error has occurred & call’s on error
recovery routine
Ex. Consider the grammar
EàE +E
EàE x E
Eà id
And the string id +id x id perform bottom up parsing.
Stack Input Action
$ id+ id x id $ shift
$ id + id x id $ reduce (Eàid)
$ E + id x id $ shift
$E+ id x id $ shift
$ E + id x id $ reduce (Eàid
$E+E x id $ reduce (Eà E+ E)
$ E x id $ shift
$Ex id $ Shift
$ E x id $ reduce (Eàid)
$ExE $ reduce(EàExE)
$ E$ Accept
Bottom up parser axe also known as LR parser when

R = Right most derivation in reverse order.

Compiler Final Modified

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Compiler Final Modified

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Compiler Final Modified

Uploaded by

Copyright:

Available Formats

CHAPTER 1

Types of language translators:

Assembler: An assembler translates assembly language into machine language. Assembly

Phases of compiler: The compilation process operates as a sequence of phases, each of

Grouping Of Phases into Passes:

Left to right left most derivation.

LL(0): No input to be looked

For a grammar to be LL or LR, It must be unambiguous-

Sol: Consider the string

Relationship between LL & LR grammar:

& Every LL(0) is LR(0)

Rules to calculate first:

Ex. Find the first set for the grammar.

Rules to calculate follow:

Ex. Find first follow set of the grammar

Removal of left recursion:

Ex. Remove left recursion from the grammar.

Tà T *F / F since this production also has left recursion

Removal of left factoring:

Ex. Remove Left factoring from the grammar

Predictive parsing: A non-backtracking top-down parser is called predictive parser. A

For a grammar to be LL (1), these three conditions must be satisfied:

How to create predictive parsing table:

Calculate the first and follow set of the grammar.

As there is no cell which has more than one entry.

Ex. Consider the grammar

Stack Input Action

Bottom up parser are also known as LR parser when

CLR is the most powerful bottom up parsing technique.

1) Augment the grammar: If G is a grammar with start symbol S then G` is the

2) Make the states of DFA. i.e. LR(0) items.

I) In the first state (I0) put all the LR(0).

Sol: Augmented grammar:

Construction of states of DFA.

I1: S`àS. //change state if dot shift at end

I3: Càc.C c.D

I4: Càd. //change state if dot shift at end

Structure of SLR parsing table:

Steps to construct parsing table:

Follow (C) = {c, d, $}

Sol: Augmented grammar:

I0: S`à.S, $ // look ahead for initial production is $

I3: Càc.C, c/d

I8: CàcC., c/d

CLR parsing table of the previous Ex.:

LALR parsing table of previous ex.

Operator Precedence Grammar: A grammar is said to be Operator Precedence Grammar

Syntax Directed translation: Syntax-directed translation refers to a method of compiler

Types of attributed grammar:

L-Attributed Grammar: L-attributed grammar is a special case of inherited attributed

• Every L-attributed grammar is synthesized attributed grammar but every synthesized

Ex. EàE1+E2 E.val= E1.val +E2.val

Final value of E is 14 not 16.

Format of Intermediate Codes:

Basic block partition:

Sol: The leaders in the program are:

Basic Block Optimization:

Sol: In this the expression c * d is common in both the statements.