0% found this document useful (0 votes)

220 views38 pages

CD Unit-2 (R20)

compiler design unit2

Uploaded by

Rajaji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

220 views38 pages

CD Unit-2 (R20)

compiler design unit2

Uploaded by

Rajaji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

COMPILER DESIGN

UNIT-II
Syntax Analysis

Syntax Analysis: The Role of the Parser, Context-Free Grammars, Derivations,

Parse Trees, Ambiguity, Left Recursion, Left Factoring, Top Down Parsing: Pre
Processing Steps of Top Down Parsing, Backtracking, Recursive Descent Parsing,
LL (1) Grammars, Non-recursive Predictive Parsing, Error Recovery in
Predictive Parsing.

Syntax Analysis

a second phase of the compiler design process in which the given input string is checked for the
confirmation of rules and structure of the formal grammar. It analyses the syntactical structure and
checks if the given input is in the correct syntax of the programming language or not.
Syntax Analysis in Compiler Design process comes after the Lexical
analysis phase. It is also known as the Parse Tree or Syntax Tree. The Parse Tree is developed with
the help of pre-defined grammar of the language. The syntax analyser also checks whether a given
program fulfills the rules implied by a context-free grammar. If it satisfies, the parser then creates
the parse tree of that source program. Otherwise, it will display error messages.

The Role of the Parser

The parser or syntactic analyzer obtains a string of tokens from the lexical analyzer and
verifies that the string can be generated by the grammar for the source language. It reports any syntax
errors in the program. It also recovers from commonly occurring errors so that it can continue
processing its input.

1 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
1. It verifies the structure generated by the tokens based on the grammar.
2. It constructs the parse tree.
3. It reports the errors.
4. It performs error recovery.

Issues :

Parser cannot detect errors such as:

1. Variable re-declaration
2. Variable initialization before use
3. Data type mismatch for an operation.
The above issues are handled by Semantic Analysis phase.

Syntax error handling :

Programs can contain errors at many different levels. For example :
1. Lexical, such as misspelling an identifier, keyword or operator.
2. Syntactic, such as an arithmetic expression with unbalanced parentheses.
3. Semantic, such as an operator applied to an incompatible operand.
4. Logical, such as an infinitely recursive call.

Functions of error handler :

1. It should report the presence of errors clearly and accurately.
2. It should recover from each error quickly enough to be able to detect subsequent errors.
3. It should not significantly slow down the processing of correct programs.

Error recovery strategies :

The different strategies that a parse uses to recover from a syntactic error are:
1. Panic mode
2. Phrase level
3. Error productions
4. Global correction

Panic mode recovery:

On discovering an error, the parser discards input symbols one at a time until a synchronizing token
is found. The synchronizing tokens are usually delimiters, such as

semicolon or end. It has the advantage of simplicity and does not go into an infinite loop. When
multiple errors in the same statement are rare, this method is quite useful.

2 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
Phrase level recovery:

On discovering an error, the parser performs local correction on the remaining input that
allows it to continue. Example: Insert a missing semicolon or delete an extraneous semicolon etc.

Error productions:

The parser is constructed using augmented grammar with error productions. If an error
production is used by the parser, appropriate error diagnostics can be generated to indicate the
erroneous constructs recognized by the input.

Global correction:

Given an incorrect input string x and grammar G, certain algorithms can be used to find a parse tree
for a string y, such that the number of insertions, deletions and changes of tokens is as small as
possible. However, these methods are in general too costly in terms of time and space

Context-Free Grammars

Context Free Grammars (CFG) can be classified on the basis of following two properties:

1) Based on number of strings it generates.

 If CFG is generating finite number of strings, then CFG is Non-Recursive (or the grammar is
said to be Non-recursive grammar)
 If CFG can generate infinite number of strings then the grammar is said to
be Recursive grammar
During Compilation, the parser uses the grammar of the language to make a parse tree(or
derivation tree) out of the source code.
The grammar used must be unambiguous. An ambiguous grammar must not be used for parsing.

2) Based on number of derivation trees.

 If there is only 1 derivation tree then the CFG is unambiguous.
 If there are more than 1 left most derivation t or right most derivation or parse tree , then the
CFG is ambiguous.

Examples of Recursive and Non-Recursive Grammars Recursive Grammars
1) S->SaS
S->b
The language(set of strings) generated by the above grammar is :{b, bab, babab,…}, which is
infinite.
2) S-> Aa
A->Ab|c

3 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
The language generated by the above grammar is :{ca, cba, cbba …}, which is infinite.

Note: A recursive context-free grammar that contains no useless rules necessarily produces an
infinite language.

Non-Recursive Grammars
S->Aa
A->b|c
The language generated by the above grammar is :{ba, ca}, which is finite.

Context free grammar is a formal grammar which is used to generate all possible strings in a
given formal language.

Context free grammar G can be defined by four tuples as:

G= (V, T, P, S)

Where,

G describes the grammar

T describes a finite set of terminal symbols.

V describes a finite set of non-terminal symbols

P describes a set of production rules

S is the start symbol.

In CFG, the start symbol is used to derive the string. You can derive the string by repeatedly replacing
a non-terminal by the right hand side of the production, until all non-terminal have been replaced by
terminal symbols.

Example:

L= {wcwR | w € (a, b)*}

Production rules:

S → aSa
S → bSb
S→c

Now check that abbcbba string can be derived from the given CFG.

4 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
S ⇒ aSa
S ⇒ abSba
S ⇒ abbSbba
S ⇒ abbcbba

By applying the production S → aSa, S → bSb recursively and finally applying the production S →
c, we get the string abbcbba.

Derivations

Derivation is a sequence of production rules. It is used to get the input string through these production
rules. During parsing we have to take two decisions. These are as follows:

o We have to decide the non-terminal which is to be replaced.

o We have to decide the production rule by which the non-terminal will be replaced.

We have two options to decide which non-terminal to be replaced with production rule.

Left-most Derivation

In the left most derivation, the input is scanned and replaced with the production rule from left to
right. So in left most derivatives we read the input string from left to right.

Example:

Production rules:

1. S=S+S
2. S=S-S
3. S = a | b |c

Input:

a-b+c

The left-most derivation is:

S=S+S
S=S-S+S
S=a-S+S
S=a-b+S

5 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
S=a-b+c

Right-most Derivation

In the right most derivation, the input is scanned and replaced with the production rule from right to
left. So in right most derivatives we read the input string from right to left.

Example:

Production rules:

1) S = S + S
2) S = S - S
3) S = a | b |c

Input:

a-b+c

The right-most derivation is:

S=S-S
S=S-S+S
S=S-S+c
S=S-b+c
S=a-b+c

Parse Trees

o Parse tree is the graphical representation of symbol. The symbol can be terminal or non-
terminal.
o In parsing, the string is derived using the start symbol. The root of the parse tree is that start
symbol.
o It is the graphical representation of symbol that can be terminals or non-terminals.
o Parse tree follows the precedence of operators. The deepest sub-tree traversed first. So, the
operator in the parent node has less precedence over the operator in the sub-tree.

The parse tree follows these points:

6 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
o All leaf nodes have to be terminals.
o All interior nodes have to be non-terminals.
o In-order traversal gives original input string.

Example:

Production rules:

1. T= T + T | T * T
2. T = a|b|c

Input:

a*b+c

Step 1:

Step 2:

Step 3:

7 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
Step 4:

Step 5:

Ambiguity

A grammar is said to be ambiguous if there exists more than one leftmost derivation or more than one
rightmost derivative or more than one parse tree for the given input string. If the grammar is not
ambiguous then it is called unambiguous.

Example:

1. S = aSb | SS
2. S=∈

For the string aabb, the above grammar generates two parse trees:

8 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN

If the grammar has ambiguity then it is not good for a compiler construction. No method can
automatically detect and remove the ambiguity but you can remove ambiguity by re-writing the whole
grammar without ambiguity.

You can also read our previously discussed article on the Classification of Context-Free
Grammar.
Context Free Grammars(CFGs) are classified based on:
 Number of Derivation trees
 Number of strings
Depending on the Number of Derivation trees, CFGs are sub-divided into 2 types:
 Ambiguous grammars
 Unambiguous grammars

Ambiguous grammar: A CFG is said to be ambiguous if there exists more than one derivation
tree for the given input string i.e., more than one LeftMost Derivation Tree (LMDT)
or RightMost Derivation Tree (RMDT).

Definition: G = (V,T,P,S) is a CFG that is said to be ambiguous if and only if there exists a string
in T* that has more than one parse tree. where V is a finite set of variables. T is a finite set of
terminals. P is a finite set of productions of the form, A -> α, where A is a variable and α ∈ (V ∪
T)* S is a designated variable called the start symbol.
For Example:
Let us consider this grammar: E -> E+E|id We can create a 2 parse tree from this grammar to
obtain a string id+id+id. The following are the 2 parse trees generated by left-most derivation:

9 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
Both the above parse trees are derived from the same grammar rules but both parse trees are
different. Hence the grammar is ambiguous. 2. Let us now consider the following grammar:

Set of alphabets ∑ = {0,…,9, +, *, (, )}

Inherently ambiguous Language: Let L be a Context Free Language (CFL). If every Context-
Free Grammar G with Language L = L(G) is ambiguous, then L is said to be inherently
ambiguous Language. Ambiguity is a property of grammar not languages. Ambiguous grammar is
unlikely to be useful for a programming language because two parse tree structures(or more) for
the same string(program) imply two different meanings (executable programs) for the program.
An inherently ambiguous language would be absolutely unsuitable as a programming language
because we would not have any way of fixing a unique structure for all its programs. For example,
L = {anbncm} ∪ {anbmcm}
Note: Ambiguity of grammar is undecidable, i.e. there is no particular algorithm for removing the
ambiguity of grammar, but we can remove ambiguity by Disambiguate the grammar i.e.,
rewriting the grammar such that there is only one derivation or parse tree possible for a string of

10 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
the language which the grammar represents. This article is compiled by Saikiran Goud
Burra. Please write comments if you find anything incorrect, or if you want to share more
information about the topic discussed above

1) How to find out whether grammar is ambiguous or not?

Ans:- if we can directly or indirectly observe both left and right recursion in grammar, then the
grammar is ambiguous.
Example - S -> SaS|Є
In this grammar we can see both left and right recurtion. So the grammar is ambiguous.
We can make more than one parse tree/derivation tree for input string (let's say {aa} )

Parse Tree

2) If both left and right recursion are not present in grammar, then is the grammar
unambiguous? Explain with an example.
Ans– No, the grammar can still be ambiguous. If both left and right recursion are present in
grammar, then the grammar is ambiguous, but the reverse is not always true.
Example -
S -> aB | ab
A -> AB | a
B -> Abb | b
In the above example, although both left and right recursion are not present, but if
we see string { ab }, we can make more than one parse tree to generate the string.

11 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
Parse Tree

From the above example, we can see that even if both left and right recursion are not present in
grammar, the grammar can be ambiguous.
3) State whether the grammar is ambiguous or not.
S -> SAB | Є
A -> AaB | a
B -> AS | b
Ans – The grammar is Ambiguous.
If we put
B -> AS in S -> SAB
Then we get S -> SAAS and the grammar clearly contains both left and right recursion. Hence
the grammar is ambiguous.

Left Recursion

A grammar becomes left-recursive if it has any non-terminal ‘A’ whose derivation contains ‘A’
itself as the left-most symbol. Left-recursive grammar is considered to be a problematic situation for
top-down parsers. Top-down parsers start parsing from the Start symbol, which in itself is non-
terminal. So, when the parser encounters the same non-terminal in its derivation, it becomes hard
for it to judge when to stop parsing the left non-terminal and it goes into an infinite loop.

A Grammar G (V, T, P, S) is left recursive if it has a production in the form.

A → A α |β.
The above Grammar is left recursive because the left of production is occurring at a first position on
the right side of production. It can eliminate left recursion by replacing a pair of production with
A → βA′
A → αA′|ϵ

Elimination of left Recursion

We eliminate left-recursion in three steps.

 eliminate ɛ -productions (impossible to generate ɛ!)

 eliminate cycles (A ⇒+ A)
 eliminate left-recursion

Elimination of Left Recursion

Left Recursion can be eliminated by introducing new non-terminal A such that.
This type of recursion is also called Immediate Left Recursion.

12 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
In Left Recursive Grammar, expansion of A will generate Aα, Aαα, Aααα at each step, causing it to
enter into an infinite loop

The general form for left recursion is

A → Aα1|Aα2| … . |Aαm|β1|β2| … . . βn
can be replaced by
A → β1A′|β2A′| … . . | … . . |βnA′
A → α1A′|α2A′| … . . |αmA′|ε
Example1 − Consider the Left Recursion from the Grammar.
E → E + T|T
T → T * F|F
F → (E)|id
Eliminate immediate left recursion from the Grammar.
Solution
Comparing E → E + T|T with A → A α |β

E → E +T | T

A → A α | Β
∴ A = E, α = +T, β = T
∴ A → A α |β is changed to A → βA′and A′ → α A′|ε
∴ A → βA′ means E → TE′

13 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
A′ → α A′|ε means E′ → +TE′|ε
Comparing T → T ∗ F|F with A → Aα|β

T → T *F | F

∴ Complete Grammar will be

14 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
S→ a|^(T)
T→ ST′
T′ →,ST′| ε

Example3 − Eliminate the left recursion from the grammar

E → E + T|T
T → T * F|F
F → (E)|id

Solution
The production after removing the left recursion will be
E → TE′

E′ → +TE′| ∈

T → FT′

T′ →∗ FT′| ∈

F → (E)|id

Example4 − Remove the left recursion from the grammar

E → E(T)|T
T → T(F)|F
F → id

Solution
Eliminating immediate left-recursion among all Aα productions, we obtain
E → TE′

E → (T)E′|ε

T → FT′

15 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN

T′ → (F)T′|ε

F → id

Direct Recursion
For each rule which contains a left-recursive option,
A --> A | β
introduce a new nonterminal A' and rewrite the rule as
A --> β A'
A' --> | A'
 Thus the production:
E --> E + T | T
is left-recursive with "E" playing the role of "A","+ T" playing the role of , and "T" playing
the role of β A'.
 Introducing the new nonterminal E', the production can be replaced by:
E --> T E'
E' --> | + T E'
 Of course, there may be more than one left-recursive part on the right-hand side. The general
rule is to replace
A --> A | | ... | β | β | ... | β
by
A --> β A' | β A'| ...| β A'
A' --> | A' | A' | ...| A'
 Note that this may change the "structure". For the expression above, the original grammar is
left-associative, while the non-left recursive one is now right-associative.

Indirect Recursion

 Step one describes a rule to eliminate direct left recursion from a production. To eliminate
left-recursion from an entire grammar may be more difficult because of indirect left-
recursion.
 For example
A --> B x y | x
B --> C D

16 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
C --> A | c
D --> d
is indirectly recursive because
A ==> B x y ==> C D x y ==> A D x y.
That is, A ==> ... ==> A where is D x y.
The algorithm eliminates left-recursion entirely. It contains a "call" to a procedure which eliminates
direct left-recursion (as described in step one).

Left Factoring

It is a process of factoring out the common prefixes of alternatives.

It is used when it is not clear that which of the two alternatives is used to expand the non -
terminal.
Rewrite the production rules without changing the meaning to avoid left factoring
A → αβ1 / αβ2 ——— (1)

A → αA’
A → β1 / β2
Rewrite the given expression (1) using the 1 and 2 expressions.

Example:
Consider the grammar and eliminate the left recursion
E ⟶ E + T / T ——- (1)
T ⟶ T * F / F ——- (2)

First, let’s take equation (1) and rewrite it using the given formula
E⟶E+T/T

E ⟶ T E’ —— (3)
E’ ⟶ +TE’ / Є —— (4)

Productions 3 and 4 are the Left factoring equations of the given production
T⟶T*F/F

T ⟶ FT’ —–(5)
T’ ⟶ FT’ / Є —— (6)

Productions 5 and 6 are the Left factoring equations of the given production

17 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN

The final productions after left factoring are:

E⟶TE
E’ ⟶ +TE’ / Є
T ⟶ FT’
T’ ⟶ FT’ / Є

Example-

Difference between Left Factoring and Left Recursion

1. Left recursion: when one or more productions can be reached from themselves with no
tokens consumed in-between.
2. Left factoring: a process of transformation, turning the grammar from a left-recursive form
to an equivalent non-left-recursive form.
Left factoring is removing the common left factor that appears in two productions of the same non-
terminal. It is done to avoid back-tracing by the parser. Suppose the parser has a look-
ahead ,consider this example-
A -> qB | qC
where A,B,C are non-terminals and q is a sentence. In this case, the parser will be confused as to
which of the two productions to choose and it might have to back-trace. After left factoring, the
grammar is converted to-
A -> qD
D -> B | C
In this case, a parser with a look-ahead will always choose the right production.
Left recursion is a case when the left-most non-terminal in a production of a non-terminal is the
non-terminal itself (direct left recursion) or through some other non-terminal definitions, rewrites to
the non-terminal again (indirect left recursion).

18 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN

Syntax analysis or parsing is the process of analyzing the string of symbols using grammar rules.
Parsing is divided into two types: bottom-up parsing and top-down parsing. It is breaking
something down into its constituent parts, for example, breaking a sentence down to explain what its
verb does.
For data, the compiler unit performs the sequential parsing operation in top-down and bottom-up
parsing after its tokenizer, lexer, or scanner has produced its tokens.

Top Down Parsing

Top-down parsing means parsing the input and constructing the parse tree, starting from the root
and going down to the leaf. It uses left most derivation to build a parse tree. On the contrary, bottom-
up parsing is based on reverse rightmost derivation. The function of top-down parsers is to construct
from the grammar (free from left recursion and ambiguity). Top-down parsing allows the grammar
free from left factoring.

Top-down parsing is a method of parsing the input string provided by the lexical analyzer. The top-
down parser parses the input string and then generates the parse tree for it.

Construction of the parse tree starts from the root node i.e. the start symbol of the grammar. Then
using leftmost derivation it derives a string that matches the input string.

In the top-down approach construction of the parse tree starts from the root node and end up
creating the leaf nodes. Here the leaf node presents the terminals that match the terminals of the
input string.

1. To derive the input string, first, a production in grammar with a start symbol is applied.
2. Now at each step, the parser has to identify which production rule of a non-terminal must be
applied in order to derive the input string.
3. The next step is to match the terminals in the production with the terminals of the input string.

19 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN

Pre Processing Steps of Top Down Parsing

Preprocessing steps for predictive parsing are,

 Removing the left recursion from the grammar
 Performing left factoring on the resultant grammar
 Removing ambiguity from the grammar
Removing the left recursion from the Grammar
Given grammar,
B → B and B
B → B or
B → (B) | true | false
Let the given grammar be,
B → B and B | true
B → B or B | false
B →(B)
Therefore, the given grammar in left recursive, removing left recursion from the grammar
Performing Left Factoring on the resultant grammar
For the grammar,
A → Aα | B
The productions are
A → βA'
A' → A' | ε
Therefore we have,
B → true B’
B’ → and B B’| ε
B’ → false B’
B’ → or BB’ | ε
B → (B)
i.e., B → true B’ | and d BB’ | false B’ | or BB’ | (B) | ε
The above grammar contains repetitions of the symbol, therefore, performing left factoring on it.

Removing Ambiguity from the Grammar

We know that for the grammar

B → β1 | β2 | β3 …….. | βv | α
We have the productions
B → 1 | 2 | 3 ……… v
Therefore
B → B1B0 | (B) | ε
B0 → true | and B | false | or B

20 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
Backtracking

In Top-Down Parsing with Backtracking, Parser will attempt multiple rules or production to identify
the match for input string by backtracking at every step of derivation. So, if the applied production
does not give the input string as needed, or it does not match with the needed string, then it can undo
that shift.
Example1 − Consider the Grammar
S→aAd
A→bc|b
Make parse tree for the string a bd. Also, show parse Tree when Backtracking is required when the
wrong alternative is chosen.
Solution
The derivation for the string abd will be −
S ⇒ a A d ⇒ abd (Required String)
If bc is substituted in place of non-terminal A then the string obtained will be abcd.
S ⇒ a A d ⇒ abcd (Wrong Alternative)
Figure (a) represents S → aAd

Figure (b) represents an expansion of tree with production A → bc which gives string abcd which
does not match with string abd. So, it backtracks & chooses another alternative, i.e., A → b in figure
(c) which matches with abd.

Algorithm for Backtracking

Example2 − Write an Algorithm for Top-Down Parsing with Backtracking for the following
Grammar.
S→aAd

21 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
A → bc| b
Solution
In the following Algorithm, there is a procedure for each Non-terminal S & A. Terminal symbols a,
b, c, d in Grammar are input symbols or Look ahead symbols to be read by Parser.
advance( ) − It is a function used to move the input pointer to the next input symbol.

22 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
Limitations of Top-Down Parsing with Backtracking
Following are some problems, which occur in Top-Down Parsing with Backtracking.
 Backtracking − Backtracking looks very simple and easy to implement but choosing a
different alternative causes lot of problems which are given below −
o Undoing semantic actions requires a lot of overhead.
o Entries made in the symbol table during parsing have to be removed while backtracking.
Due to these reasons, backtracking is not used for practical compilers.
Left Recursion − A grammar is left recursive if it has the production of form.
A → Aα|β
Which causes the parser to enter into an infinite loop.
 Choosing Right Production − The order in which alternatives are tested can change the
language accepted.
 Difficult to locate error − When failure occurs, it is difficult to know where the error
occurred.

Recursive Descent Parsing

A parser that uses a set of recursive procedures to recognize its input with no backtracking is called a
recursive descent parser
For implementing a recursive descent parser for a grammar.
 The grammar must not be left recursive
 The grammar must be left factored that means it should not have common prefixes for
alternates
 We need a language that has recursion facility
Left factoring Problems:
Eliminate Left recursion and then left factor the following grammar
rexpr → reexpr + rterm | rterm
rterm → rterm rfactor | rfactor
rfactor → rfactor* | rprimary
rprimary → a | b
Design:

 It can be written on the same lines as that of brute force approach that is one procedure for one
non terminal
 The procedures need not return anything because they need not try another alternate as in brute
force
 Only on failure we call some error routine to take care of the error
 On can make use of different type of transition diagrams for designing a recursive descent parser
as used in case of scanners.

The differences are

23 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
 There is one transition diagram for one non terminal
 The labels for edges may be terminals or non terminals
 Label is a terminal which means we must take that transition if the next input symbol is terminal
After elimination of left recursion and left factoring one can construct the transition diagram for
the grammar as,
Also read :Block Structure and Non-Block Structure Storage Allocation
For each non-terminal A,

 Create an initial and final states

 For each production A → X1, X2,....Xn, create a path from initial to the final state with edges
labeled X1, X2, .... Xn
Note:
If the transition diagram so constructed has non determinism, then a recursive descent parser can't be
constructed for the grammar

Also read :Symbol Table Organizing Using Hashing

Example:
Consider the Grammar
E → TE'
E' → +TE' | ε
Also read :Organization for Block Structured Languages
T → FT'
T' → *FT' | ε
F → (E) | id
The transition diagrams for this grammar are:

As there is no non-determinism in the transition diagram the code for parser is:

24 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN

We can further improve on the transition diagrams to better code the parser as, transition diagram for
E' can be transformed as

Now E can be modifies as

Also read :Representation of Names in Symbol Table

25 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
After applying the same technique on T' and T, the final diagrams are:

Now, the code for these transition diagrams runs faster than the code written for the original transition
diagram.
Now the code for parser is, from these reduced transition diagrams

Advantage:
 Easy to hand construct a recursive descent parser from an appropriate grammar
Limitations
 Recursive descent parser encodes state information in its run-time stack, or call stack, and this may
not be particularly efficient
 Uses recursive procedure calls to implement a stack abstraction
 It is very difficult to automated

26 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
Recursive Descent Parser uses the technique of Top-Down Parsing without backtracking. It can be
defined as a Parser that uses the various recursive procedure to process the input string with no
backtracking. It can be simply performed using a Recursive language. The first symbol of the string
of R.H.S of production will uniquely determine the correct alternative to choose.
The major approach of recursive-descent parsing is to relate each non-terminal with a procedure. The
objective of each procedure is to read a sequence of input characters that can be produced by the
corresponding non-terminal, and return a pointer to the root of the parse tree for the non-terminal.
The structure of the procedure is prescribed by the productions for the equivalent non-terminal.
The recursive procedures can be simply to write and adequately effective if written in a language that
executes the procedure call effectively. There is a procedure for each non-terminal in the grammar.
It can consider a global variable lookahead, holding the current input token and a procedure match
(Expected Token) is the action of recognizing the next token in the parsing process and advancing
the input stream pointer, such that lookahead points to the next token to be parsed. Match () is
effectively a call to the lexical analyzer to get the next token.
For example, input stream is a + b$.
lookahead == a
match()
lookahead == +
match ()
lookahead == b
……………………….
……………………….
In this manner, parsing can be done.
Example − In the following Grammar, first symbol, i.e., if, while & begin uniquely determine, which
of the alternative to choose.
As Statement starting with if will be a conditional statement & statement starting with while will be
an Iterative Statement.
Stmt → If condition then Stmt else Stmt
| While condition do Stmt
| begin Stmt end.
Example − Write down the algorithm using Recursive procedures to implement the following
Grammar.
E → TE′
E′ → +TE′
T → FT′
T′ →∗ FT′|ε
F → (E)|id

27 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN

28 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN

One of major drawback or recursive-descent parsing is that it can be implemented only for those
languages which support recursive procedure calls and it suffers from the problem of left-recursion.

LL (1) Grammars

 When we construct predictive parsing table for some grammars, the table may have some multiply
defined entries. This happens when the given grammar is left recursive or ambiguous
 A grammar whose predictive parser has no multiply defined entries is known as LL(1) grammar
 LL(1) means left to right scan of the input to generate the left most derivation by using 1 symbol of
look ahead (can be extended to k symbol look ahead)
Rather than constructing a table to check for LL(1) Grammar, we can check for it using the
following definition of LL(1) Grammar.

Definition:
A grammar G is LL(1) if and only if whenever A → α | β are two distinct productions of G then the
following three conditions hold.
For no terminal a do α and β derive strings beginning with a
At most one of α and β can derive ε
If β⇒* ε, then α does not derive any string beginning with a terminal in FOLLOW(A)

29 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
The LL(1) parsing table

The parser needs to find a production to use for nonterminal N when it sees lookahead token t.To
select which production to use, it suffices to have a table that has, as a key, a pair (N, t) and gives
the number of a production to use.

Let's illustrate with an LL(1) parsing table for the expression grammar that we used earlier, which
looks like this.

1. E → T R
2. R → ε
3. R → + E
4. T → F S
5. S → ε
6. S → * T
7. F → n
8. F → ( E )

Parsing table D below does the job. Each row is labeled by a nonterminal and each column by a
lookahead token, or the special symbol $ that indicates the end of the input.

D(N, t) is the production to use to expand N when the lookahead is t. Blank entries mean syntax
error.

Table D

n + * ( ) $

E 1 1

R 3 2 2 2

T 4 4

S 5 6 5 5

F 7 8

30 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
Now it is easy to use the table to control a top-down parse. Parsing n * n goes as follows. Start
with E.

D(E, n) = 1, so expand E using production 1.

E
/\
T R

Since D(T, n) = 4, we continue by expanding T to F S.

E
/\
T R
/\
F S

Now D(F, n) = 7, and production 7 is F → n.

E
/\
T R
/\
F S
|
n

The lookahead changes to * and D(S, *) = 6. Since production 5 is S → * T, the table tells us to
replace S by * T.

E
/\
T R
/\
F S
| /\
n * T

Now the lookahead is n, and D(T, n) = 4. After using production 4 (T → F S), the table will tell us
to use production 7 (F → n), giving

31 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
E
/\
T R
/\
F S
| /\
n * T
/\
F S
|
n

The parse is almost finished. Since there are no more tokens, the lookahead is $. D(S, $) = 5 and
D(R, $) = 2, which says to replace S and R by ε.

E
/\
/ \
T R
/\ |
F S ε
| /\
n * T
/\
F S
| |
n ε

A stack-based approach

Instead of building a parse tree, it can be preferable to construct a derivation.

We use a two stacks, called Match and Todo. Stack Matched only holds tokens.

At any given point, the string that has been derived is m t where m is the contents of the Matched
stack (from bottom to top) and t is the contents of the Todo stack (from top to bottom).

The action tells the production that is used, or, when a token is moved to the Matched stack and
removed from the input, a match action.

Here is a parse of n + n * n using the same expression grammar.

32 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
Matched Todo Input Action
E$ n+n*n$
TR$ n+n*n$ E→TR
FSR$ n+n*n$ T→FS
nSR$ n+n*n$ F→n
n SR$ +n*n$ match n
n R$ +n*n$ S→ε
n +E$ +n*n$ R→+E
n+ E$ n*n$ match +
n+ TR$ n*n$ E→TR
n+ FSR$ n*n$ T→FS
n+ nSR$ n*n$ F→n
n+n SR$ *n$ match n
n+n *TR$ *n$ S→*T
n+n* TR$ n$ match *
n+n* FSR$ n$ T→FS
n+n* nSR$ n$ F→n
n+n*n SR$ $ match n
n+n*n R$ $ S→ε
n+n*n $ $ R→ε

Non-recursive Predictive Parsing

Predictive parsing can be performed using a pushdown stack, avoiding recursive calls.

 Initially the stack holds just the start symbol of the grammar.
 At each step a symbol X is popped from the stack:
o if X is a terminal symbol then it is matched with lookahead and lookahead is
advanced,
o if X is a nonterminal, then using lookahead and a parsing table (implementing the
FIRST sets) a production is chosen and its right hand side is pushed onto the stack.
 This process goes on until the stack and the input string become empty. It is useful to have
an end_of_stack and an end_of_input symbols. We denote them both by $.

33 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
Figure 8 shows the structure of non-recursive predictive parsers. For each nonterminal A and each
token a the entry M[A, a] of the parsing table contains either an A-production generating sentences
starting with a or an error-entry.

Figure 8: The structure of non-recursive predictive parsers.

S
aAa | BAa |
A
cA | bA |
B b

The parsing table is:

a b c $

S
S aAa S BAa S
A
A A bA A cA
B
B b

where the empty slots correspond to error-entries. Consider the parsing of the word w = bcba.

34 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
Stack Remaining input action
$S bcba$
choose S BAa
$aAB bcba$
choose B b
$aAb bcba$ match b
$aA cba$
choose A cA
$aAc cba$ match c
$aA ba$
choose A bA
$aAb ba$ match b
$aA a$
choose A
$a a$ match a

THE ALGORITHM.

35 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN

Difference between recursive descent and non-recursive predictive parsing

Recursive predictive descent parsing Non-recursive predictive descent parsing

It is a technique which may or may not It is a technique that does not require any kind
require backtracking process. of back tracking.
It uses procedures for every non terminal It finds out productions to use by replacing
entity to parse strings. input string.
It is a type of top-down parsing built from a It is a type of top-down approach, which is
set of mutually recursive procedures where also a type of recursive parsing that does not
each procedure implements one of non- uses technique of backtracking.
terminal s of grammar.
It contains several small functions one for The predictive parser uses a look ahead
each non- terminals in grammar. pointer which points to next input symbols to
make it parser back tracking free, predictive
parser puts some constraints on grammar.
It accepts all kinds of grammars. It accepts only a class of grammar known as
LL(k) grammar.

Error Recovery in Predictive Parsing

 We know that the Predictive parser performs Left most derivative while parsing the given
sentence
 Now the given sentence may be a valid sentence or an invalid sentence with respect to the
specified grammar
 An error is detected during the predictive parsing when the terminal on top of the stack does
not match the next input symbol, or when nonterminal X on top of the stack, a is the next
input symbol, and parsing table entry PT[ X , a ] is empty i.e., X is not in the parsing table.
Our predictive parser works as follows
Stack top X
input symbol a
:
while (...) {
if (X is Terminal)
{
if(X==a)
{
pop X
move input ptr forward to next symbol
}
else // X!=a
{ // Error found }
}
else // X is NT
{

36 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
if (X is found in parsing table )
{
pop X // X---> Y1 Y2…Yn
push Yn... Y2 Y1
}
else // X not in parsing table
{ // Error found }
}
}
 Our code will report only the 1st error, but we want to report all errors , so we need to do
error handling
 Specification of a parser
 Report syntax errors
 Proceed forward after detecting 1st error Error Recovery (ER)
 Simple and Fast

Error Recovery Methods and Techniques:

1. Panic Mode Recovery
2. Phrase level Recovery
3. Erroneous Productions

Panic Mode Recovery:

 Let us think that the parser has successfully scanned and created a parse tree till ‘a’ and next
to that it has found an error W= x1x2……a……xn
 Panic-mode error recovery is based on the idea of skipping symbols on the input until a
token in a selected set of synchronizing tokens (a specific set of symbols) appears. From
there continue parsing the rest of string.
 After detection of error the parser should be restored to a start state, where it can restart
again.
 Its effectiveness depends on the choice of synchronizing set. The sets should be chosen so
that the parser recovers quickly from errors that are likely to occur in practice.
 Good example for specific symbol in C is ; .
 As a starting point, place all symbols in FOLLOW (X) into the synchronizing set for
nonterminal X, if we skip tokens until an element of FOLLOW(X) is seen and pop X from
the stack, it is likely that parsing can continue.
 We need to add some more symbols in the synchronizing set. But how does this work
 By keeping a special character ‘S’ in the parsing table at the places where the elements of
synchronizing set are present we can achieve it
PT SyncSymbol1 …………
.. SyncSymbol2
X S .………... S
………………………
………………………..
 Simple cases are exchanging ; with , and = with == , delete an extraneous semicolon, or
insert a missing semicolon. Difficulties occur when the real error occurred long before an
error was detected.

37 Dr.SGIET ,Markapur .CSE Department

COMPILER DESIGN
 The choice of the local correction is left to the compiler designer.
Erroneous Productions
 Include productions for common errors. We can augment the grammar for the language at
hand with productions that generate the erroneous constructs.
 A parser constructed from a grammar augmented by these error productions detects the
anticipated errors when an error production is used during parsing.
 The parser can then generate appropriate error diagnostics about the erroneous construct that
has been recognized in the input.

38 Dr.SGIET ,Markapur .CSE Department

Unit-2 2.1. Review of CFG Ambiguity of Grammars 2.1.1. Limitations of Regular Language
No ratings yet
Unit-2 2.1. Review of CFG Ambiguity of Grammars 2.1.1. Limitations of Regular Language
44 pages
LR (0) Parser
No ratings yet
LR (0) Parser
8 pages
Qa - CD Unit-3
No ratings yet
Qa - CD Unit-3
8 pages
CD Unit 4
No ratings yet
CD Unit 4
41 pages
Peephole Optimization in Compilers
No ratings yet
Peephole Optimization in Compilers
14 pages
Compiler Design UNIT III
No ratings yet
Compiler Design UNIT III
20 pages
The Phases of Compiler
No ratings yet
The Phases of Compiler
3 pages
Compiler Design Unit 3
No ratings yet
Compiler Design Unit 3
39 pages
Asymptotic Notations
No ratings yet
Asymptotic Notations
4 pages
Type Checking in Compiler Design
No ratings yet
Type Checking in Compiler Design
4 pages
CS 606 Skill Dev Lab - 7TO 10 - 1648109707
No ratings yet
CS 606 Skill Dev Lab - 7TO 10 - 1648109707
12 pages
FLAT Lords Paper
No ratings yet
FLAT Lords Paper
60 pages
Module 4 PDF
No ratings yet
Module 4 PDF
25 pages
Compiler Design Notes
No ratings yet
Compiler Design Notes
101 pages
Unit - Viii Machine Dependent Code Optimization Peephole Optimization
No ratings yet
Unit - Viii Machine Dependent Code Optimization Peephole Optimization
9 pages
Anna University Principle of Compiler Design Question Paper
No ratings yet
Anna University Principle of Compiler Design Question Paper
5 pages
Compiler Design Laboratory I
No ratings yet
Compiler Design Laboratory I
6 pages
Compiler Design Lecture Notes (10CS63) : D C S & E
No ratings yet
Compiler Design Lecture Notes (10CS63) : D C S & E
96 pages
Compiler Design Course Guide
No ratings yet
Compiler Design Course Guide
114 pages
Graph Matrices for Software Testing
No ratings yet
Graph Matrices for Software Testing
24 pages
DAG - Directed Acyclic Graph
No ratings yet
DAG - Directed Acyclic Graph
8 pages
Compiler Design MCQ - Javatpoint
No ratings yet
Compiler Design MCQ - Javatpoint
1 page
Storage Allocation Strategies in Compiler Design
No ratings yet
Storage Allocation Strategies in Compiler Design
2 pages
DSA BCA 2nd SEM 2025
No ratings yet
DSA BCA 2nd SEM 2025
25 pages
CD Sanchit Sir Notes
100% (1)
CD Sanchit Sir Notes
115 pages
Ambiguity in Grammar: Solution
No ratings yet
Ambiguity in Grammar: Solution
5 pages
Bottom Up Parsing
No ratings yet
Bottom Up Parsing
24 pages
Daa Unit 2 Notes
No ratings yet
Daa Unit 2 Notes
22 pages
Compiler Design Practical File PDF
No ratings yet
Compiler Design Practical File PDF
33 pages
OOAD & UML for CS Students
No ratings yet
OOAD & UML for CS Students
165 pages
Compiler Design Notes
No ratings yet
Compiler Design Notes
35 pages
UNIT II-Syntax Analysis: CS416 Compilr Design 1
No ratings yet
UNIT II-Syntax Analysis: CS416 Compilr Design 1
29 pages
Unit 4: Symbol Table
No ratings yet
Unit 4: Symbol Table
38 pages
Question Bank Compiler Design 2024
No ratings yet
Question Bank Compiler Design 2024
9 pages
DAA Unit1
No ratings yet
DAA Unit1
26 pages
AI Unit 2 - PPT - Latest
No ratings yet
AI Unit 2 - PPT - Latest
61 pages
Compiler Design Notes Unit-1 & Unit-2
No ratings yet
Compiler Design Notes Unit-1 & Unit-2
59 pages
Algorithms & Machine Learning Intro
No ratings yet
Algorithms & Machine Learning Intro
76 pages
Compiler Design: Examples of First & Follow Prof. Jigar Dalvadi
No ratings yet
Compiler Design: Examples of First & Follow Prof. Jigar Dalvadi
8 pages
Chapter 7
No ratings yet
Chapter 7
54 pages
Unit 4
No ratings yet
Unit 4
26 pages
Aecs Lab Manual Final - 2019-20
No ratings yet
Aecs Lab Manual Final - 2019-20
101 pages
DR B R Ambedkar National Institute of Technology, Jalandhar CSPC-203, Object Oriented Programming End Semester Examination, Dec 2020
No ratings yet
DR B R Ambedkar National Institute of Technology, Jalandhar CSPC-203, Object Oriented Programming End Semester Examination, Dec 2020
2 pages
CD Viva Questions
No ratings yet
CD Viva Questions
3 pages
Stqa Viva
No ratings yet
Stqa Viva
10 pages
PPL Question Bank Unit 1 and 2
No ratings yet
PPL Question Bank Unit 1 and 2
2 pages
Bottom-Up Parsing in Compiler Design
No ratings yet
Bottom-Up Parsing in Compiler Design
20 pages
C Operator Precedence and Associativity
100% (2)
C Operator Precedence and Associativity
2 pages
PST - Unit 3
No ratings yet
PST - Unit 3
6 pages
NLP Unit-1 Notes
No ratings yet
NLP Unit-1 Notes
59 pages
Write A C Program To Simulate Lexical Analyzer To Validating A Given Input String.
No ratings yet
Write A C Program To Simulate Lexical Analyzer To Validating A Given Input String.
8 pages
Compiler Unit 1
No ratings yet
Compiler Unit 1
110 pages
Recognition of Tokens
No ratings yet
Recognition of Tokens
34 pages
CA Question Bank
No ratings yet
CA Question Bank
13 pages
Compiler Design Unit-2
No ratings yet
Compiler Design Unit-2
29 pages
Imp Questions For RGPV
No ratings yet
Imp Questions For RGPV
4 pages
Unit 2 Compiler
No ratings yet
Unit 2 Compiler
42 pages
CS8602 Compiler Design Two Marks Questions 1
No ratings yet
CS8602 Compiler Design Two Marks Questions 1
22 pages
Unit-2 Syntax Analysis
No ratings yet
Unit-2 Syntax Analysis
27 pages
Compiler Syntax & Yacc Guide
No ratings yet
Compiler Syntax & Yacc Guide
21 pages
List of I B.tech. Admitted Students Section Wise List AY 2022-23
No ratings yet
List of I B.tech. Admitted Students Section Wise List AY 2022-23
120 pages
FRCCS24 Jason V1
No ratings yet
FRCCS24 Jason V1
5 pages
Data Representation Assessment
No ratings yet
Data Representation Assessment
15 pages
CH 6
No ratings yet
CH 6
13 pages
ICT - TLE Reviewers (Q2) G8
No ratings yet
ICT - TLE Reviewers (Q2) G8
4 pages
3-Solving Problems by Searching-31-07-2024
No ratings yet
3-Solving Problems by Searching-31-07-2024
160 pages
Permutations in Math: Concepts & Examples
No ratings yet
Permutations in Math: Concepts & Examples
2 pages
Software Testing & Validation Exam Guide
No ratings yet
Software Testing & Validation Exam Guide
2 pages
Deep Learning
No ratings yet
Deep Learning
22 pages
12 Gseb Imp Question Board
No ratings yet
12 Gseb Imp Question Board
5 pages
Sistemas de Ecuaciones Lineales: Método de Jacobi
No ratings yet
Sistemas de Ecuaciones Lineales: Método de Jacobi
45 pages
Typescript
No ratings yet
Typescript
3 pages
R2 Logic
No ratings yet
R2 Logic
5 pages
CS-200 Chapter 1 - 5
No ratings yet
CS-200 Chapter 1 - 5
45 pages
GR 11 Edwardsmaths Test or Assignment Trigonometry T1 2022 Eng
No ratings yet
GR 11 Edwardsmaths Test or Assignment Trigonometry T1 2022 Eng
3 pages
C++ Polymorphism Guide
No ratings yet
C++ Polymorphism Guide
17 pages
Animate Your Name Scratch Worksheet Ver 2
No ratings yet
Animate Your Name Scratch Worksheet Ver 2
5 pages
FPGA-Based AES Image Encryption
No ratings yet
FPGA-Based AES Image Encryption
5 pages
Atlassian - LeetCode
No ratings yet
Atlassian - LeetCode
2 pages
Non-Blocking Synchronization and System Design - CS-TR-99-1624
No ratings yet
Non-Blocking Synchronization and System Design - CS-TR-99-1624
261 pages
VP (Maps) + OB + S&R + GGSHOT + HH
0% (3)
VP (Maps) + OB + S&R + GGSHOT + HH
60 pages
ICT Mock Paper 1 Final
100% (1)
ICT Mock Paper 1 Final
11 pages
Ug903 Vivado Using Constraints
No ratings yet
Ug903 Vivado Using Constraints
199 pages
Akhila's Resume
No ratings yet
Akhila's Resume
2 pages
Attacks and Countermeasures in System Security
No ratings yet
Attacks and Countermeasures in System Security
1 page
Sample TOC Questions
No ratings yet
Sample TOC Questions
5 pages
Complete Python
No ratings yet
Complete Python
26 pages
Zohocorp QN Set 1
No ratings yet
Zohocorp QN Set 1
7 pages
C Keywords Guide for Beginners
No ratings yet
C Keywords Guide for Beginners
8 pages
MCQ From Control Statement To Pointer-2019
No ratings yet
MCQ From Control Statement To Pointer-2019
2 pages