[go: up one dir, main page]

0% found this document useful (0 votes)
220 views38 pages

CD Unit-2 (R20)

compiler design unit2

Uploaded by

Rajaji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
220 views38 pages

CD Unit-2 (R20)

compiler design unit2

Uploaded by

Rajaji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

COMPILER DESIGN

UNIT-II
Syntax Analysis

Syntax Analysis: The Role of the Parser, Context-Free Grammars, Derivations,


Parse Trees, Ambiguity, Left Recursion, Left Factoring, Top Down Parsing: Pre
Processing Steps of Top Down Parsing, Backtracking, Recursive Descent Parsing,
LL (1) Grammars, Non-recursive Predictive Parsing, Error Recovery in
Predictive Parsing.

Syntax Analysis

a second phase of the compiler design process in which the given input string is checked for the
confirmation of rules and structure of the formal grammar. It analyses the syntactical structure and
checks if the given input is in the correct syntax of the programming language or not.
Syntax Analysis in Compiler Design process comes after the Lexical
analysis phase. It is also known as the Parse Tree or Syntax Tree. The Parse Tree is developed with
the help of pre-defined grammar of the language. The syntax analyser also checks whether a given
program fulfills the rules implied by a context-free grammar. If it satisfies, the parser then creates
the parse tree of that source program. Otherwise, it will display error messages.

The Role of the Parser

The parser or syntactic analyzer obtains a string of tokens from the lexical analyzer and
verifies that the string can be generated by the grammar for the source language. It reports any syntax
errors in the program. It also recovers from commonly occurring errors so that it can continue
processing its input.

1 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
1. It verifies the structure generated by the tokens based on the grammar.
2. It constructs the parse tree.
3. It reports the errors.
4. It performs error recovery.

Issues :

Parser cannot detect errors such as:


1. Variable re-declaration
2. Variable initialization before use
3. Data type mismatch for an operation.
The above issues are handled by Semantic Analysis phase.

Syntax error handling :


Programs can contain errors at many different levels. For example :
1. Lexical, such as misspelling an identifier, keyword or operator.
2. Syntactic, such as an arithmetic expression with unbalanced parentheses.
3. Semantic, such as an operator applied to an incompatible operand.
4. Logical, such as an infinitely recursive call.

Functions of error handler :


1. It should report the presence of errors clearly and accurately.
2. It should recover from each error quickly enough to be able to detect subsequent errors.
3. It should not significantly slow down the processing of correct programs.

Error recovery strategies :


The different strategies that a parse uses to recover from a syntactic error are:
1. Panic mode
2. Phrase level
3. Error productions
4. Global correction

Panic mode recovery:

On discovering an error, the parser discards input symbols one at a time until a synchronizing token
is found. The synchronizing tokens are usually delimiters, such as

semicolon or end. It has the advantage of simplicity and does not go into an infinite loop. When
multiple errors in the same statement are rare, this method is quite useful.

2 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
Phrase level recovery:

On discovering an error, the parser performs local correction on the remaining input that
allows it to continue. Example: Insert a missing semicolon or delete an extraneous semicolon etc.

Error productions:

The parser is constructed using augmented grammar with error productions. If an error
production is used by the parser, appropriate error diagnostics can be generated to indicate the
erroneous constructs recognized by the input.

Global correction:

Given an incorrect input string x and grammar G, certain algorithms can be used to find a parse tree
for a string y, such that the number of insertions, deletions and changes of tokens is as small as
possible. However, these methods are in general too costly in terms of time and space

Context-Free Grammars

Context Free Grammars (CFG) can be classified on the basis of following two properties:

1) Based on number of strings it generates.


 If CFG is generating finite number of strings, then CFG is Non-Recursive (or the grammar is
said to be Non-recursive grammar)
 If CFG can generate infinite number of strings then the grammar is said to
be Recursive grammar
During Compilation, the parser uses the grammar of the language to make a parse tree(or
derivation tree) out of the source code.
The grammar used must be unambiguous. An ambiguous grammar must not be used for parsing.

2) Based on number of derivation trees.


 If there is only 1 derivation tree then the CFG is unambiguous.
 If there are more than 1 left most derivation t or right most derivation or parse tree , then the
CFG is ambiguous.

Examples of Recursive and Non-Recursive Grammars Recursive Grammars
1) S->SaS
S->b
The language(set of strings) generated by the above grammar is :{b, bab, babab,…}, which is
infinite.
2) S-> Aa
A->Ab|c

3 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
The language generated by the above grammar is :{ca, cba, cbba …}, which is infinite.

Note: A recursive context-free grammar that contains no useless rules necessarily produces an
infinite language.

Non-Recursive Grammars
S->Aa
A->b|c
The language generated by the above grammar is :{ba, ca}, which is finite.

Context free grammar is a formal grammar which is used to generate all possible strings in a
given formal language.

Context free grammar G can be defined by four tuples as:

G= (V, T, P, S)

Where,

G describes the grammar

T describes a finite set of terminal symbols.

V describes a finite set of non-terminal symbols

P describes a set of production rules

S is the start symbol.

In CFG, the start symbol is used to derive the string. You can derive the string by repeatedly replacing
a non-terminal by the right hand side of the production, until all non-terminal have been replaced by
terminal symbols.

Example:

L= {wcwR | w € (a, b)*}

Production rules:

S → aSa
S → bSb
S→c

Now check that abbcbba string can be derived from the given CFG.

4 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
S ⇒ aSa
S ⇒ abSba
S ⇒ abbSbba
S ⇒ abbcbba

By applying the production S → aSa, S → bSb recursively and finally applying the production S →
c, we get the string abbcbba.

Derivations

Derivation is a sequence of production rules. It is used to get the input string through these production
rules. During parsing we have to take two decisions. These are as follows:

o We have to decide the non-terminal which is to be replaced.


o We have to decide the production rule by which the non-terminal will be replaced.

We have two options to decide which non-terminal to be replaced with production rule.

Left-most Derivation

In the left most derivation, the input is scanned and replaced with the production rule from left to
right. So in left most derivatives we read the input string from left to right.

Example:

Production rules:

1. S=S+S
2. S=S-S
3. S = a | b |c

Input:

a-b+c

The left-most derivation is:

S=S+S
S=S-S+S
S=a-S+S
S=a-b+S

5 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
S=a-b+c

Right-most Derivation

In the right most derivation, the input is scanned and replaced with the production rule from right to
left. So in right most derivatives we read the input string from right to left.

Example:

Production rules:

1) S = S + S
2) S = S - S
3) S = a | b |c

Input:

a-b+c

The right-most derivation is:

S=S-S
S=S-S+S
S=S-S+c
S=S-b+c
S=a-b+c

Parse Trees

o Parse tree is the graphical representation of symbol. The symbol can be terminal or non-
terminal.
o In parsing, the string is derived using the start symbol. The root of the parse tree is that start
symbol.
o It is the graphical representation of symbol that can be terminals or non-terminals.
o Parse tree follows the precedence of operators. The deepest sub-tree traversed first. So, the
operator in the parent node has less precedence over the operator in the sub-tree.

The parse tree follows these points:

6 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
o All leaf nodes have to be terminals.
o All interior nodes have to be non-terminals.
o In-order traversal gives original input string.

Example:

Production rules:

1. T= T + T | T * T
2. T = a|b|c

Input:

a*b+c

Step 1:

Step 2:

Step 3:

7 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
Step 4:

Step 5:

Ambiguity

A grammar is said to be ambiguous if there exists more than one leftmost derivation or more than one
rightmost derivative or more than one parse tree for the given input string. If the grammar is not
ambiguous then it is called unambiguous.

Example:

1. S = aSb | SS
2. S=∈

For the string aabb, the above grammar generates two parse trees:

8 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN

If the grammar has ambiguity then it is not good for a compiler construction. No method can
automatically detect and remove the ambiguity but you can remove ambiguity by re-writing the whole
grammar without ambiguity.

You can also read our previously discussed article on the Classification of Context-Free
Grammar.
Context Free Grammars(CFGs) are classified based on:
 Number of Derivation trees
 Number of strings
Depending on the Number of Derivation trees, CFGs are sub-divided into 2 types:
 Ambiguous grammars
 Unambiguous grammars

Ambiguous grammar: A CFG is said to be ambiguous if there exists more than one derivation
tree for the given input string i.e., more than one LeftMost Derivation Tree (LMDT)
or RightMost Derivation Tree (RMDT).

Definition: G = (V,T,P,S) is a CFG that is said to be ambiguous if and only if there exists a string
in T* that has more than one parse tree. where V is a finite set of variables. T is a finite set of
terminals. P is a finite set of productions of the form, A -> α, where A is a variable and α ∈ (V ∪
T)* S is a designated variable called the start symbol.
For Example:
Let us consider this grammar: E -> E+E|id We can create a 2 parse tree from this grammar to
obtain a string id+id+id. The following are the 2 parse trees generated by left-most derivation:

9 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
Both the above parse trees are derived from the same grammar rules but both parse trees are
different. Hence the grammar is ambiguous. 2. Let us now consider the following grammar:

Set of alphabets ∑ = {0,…,9, +, *, (, )}

E -> I
E -> E + E
E -> E * E
E -> (E)
I -> ε | 0 | 1 | … | 9
From the above grammar String 3*2+5 can be derived in 2 ways:
I) First leftmost derivation II) Second leftmost derivation
E=>E*E E=>E+E
=>I*E =>E*E+E
=>3*E+E =>I*E+E
=>3*I+E =>3*E+E
=>3*2+E =>3*I+E
=>3*2+I =>3*2+I
=>3*2+5 =>3*2+5
Following are some examples of ambiguous grammar:
 S-> aS |Sa| Є
 E-> E +E | E*E| id
 A -> AA | (A) | a
 S -> SS|AB , A -> Aa|a , B -> Bb|b
Whereas following grammars are unambiguous:
 S -> (L) | a, L -> LS | S
 S -> AA , A -> aA , A -> b

Inherently ambiguous Language: Let L be a Context Free Language (CFL). If every Context-
Free Grammar G with Language L = L(G) is ambiguous, then L is said to be inherently
ambiguous Language. Ambiguity is a property of grammar not languages. Ambiguous grammar is
unlikely to be useful for a programming language because two parse tree structures(or more) for
the same string(program) imply two different meanings (executable programs) for the program.
An inherently ambiguous language would be absolutely unsuitable as a programming language
because we would not have any way of fixing a unique structure for all its programs. For example,
L = {anbncm} ∪ {anbmcm}
Note: Ambiguity of grammar is undecidable, i.e. there is no particular algorithm for removing the
ambiguity of grammar, but we can remove ambiguity by Disambiguate the grammar i.e.,
rewriting the grammar such that there is only one derivation or parse tree possible for a string of

10 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
the language which the grammar represents. This article is compiled by Saikiran Goud
Burra. Please write comments if you find anything incorrect, or if you want to share more
information about the topic discussed above

1) How to find out whether grammar is ambiguous or not?


Ans:- if we can directly or indirectly observe both left and right recursion in grammar, then the
grammar is ambiguous.
Example - S -> SaS|Є
In this grammar we can see both left and right recurtion. So the grammar is ambiguous.
We can make more than one parse tree/derivation tree for input string (let's say {aa} )

Parse Tree

2) If both left and right recursion are not present in grammar, then is the grammar
unambiguous? Explain with an example.
Ans– No, the grammar can still be ambiguous. If both left and right recursion are present in
grammar, then the grammar is ambiguous, but the reverse is not always true.
Example -
S -> aB | ab
A -> AB | a
B -> Abb | b
In the above example, although both left and right recursion are not present, but if
we see string { ab }, we can make more than one parse tree to generate the string.

11 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
Parse Tree

From the above example, we can see that even if both left and right recursion are not present in
grammar, the grammar can be ambiguous.
3) State whether the grammar is ambiguous or not.
S -> SAB | Є
A -> AaB | a
B -> AS | b
Ans – The grammar is Ambiguous.
If we put
B -> AS in S -> SAB
Then we get S -> SAAS and the grammar clearly contains both left and right recursion. Hence
the grammar is ambiguous.

Left Recursion

A grammar becomes left-recursive if it has any non-terminal ‘A’ whose derivation contains ‘A’
itself as the left-most symbol. Left-recursive grammar is considered to be a problematic situation for
top-down parsers. Top-down parsers start parsing from the Start symbol, which in itself is non-
terminal. So, when the parser encounters the same non-terminal in its derivation, it becomes hard
for it to judge when to stop parsing the left non-terminal and it goes into an infinite loop.

A Grammar G (V, T, P, S) is left recursive if it has a production in the form.


A → A α |β.
The above Grammar is left recursive because the left of production is occurring at a first position on
the right side of production. It can eliminate left recursion by replacing a pair of production with
A → βA′
A → αA′|ϵ

Elimination of left Recursion


We eliminate left-recursion in three steps.

 eliminate ɛ -productions (impossible to generate ɛ!)


 eliminate cycles (A ⇒+ A)
 eliminate left-recursion

Elimination of Left Recursion


Left Recursion can be eliminated by introducing new non-terminal A such that.
This type of recursion is also called Immediate Left Recursion.

12 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
In Left Recursive Grammar, expansion of A will generate Aα, Aαα, Aααα at each step, causing it to
enter into an infinite loop

The general form for left recursion is


A → Aα1|Aα2| … . |Aαm|β1|β2| … . . βn
can be replaced by
A → β1A′|β2A′| … . . | … . . |βnA′
A → α1A′|α2A′| … . . |αmA′|ε
Example1 − Consider the Left Recursion from the Grammar.
E → E + T|T
T → T * F|F
F → (E)|id
Eliminate immediate left recursion from the Grammar.
Solution
Comparing E → E + T|T with A → A α |β

E → E +T | T

A → A α | Β
∴ A = E, α = +T, β = T
∴ A → A α |β is changed to A → βA′and A′ → α A′|ε
∴ A → βA′ means E → TE′

13 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
A′ → α A′|ε means E′ → +TE′|ε
Comparing T → T ∗ F|F with A → Aα|β

T → T *F | F

A → A α | β
∴ A = T, α =∗ F, β = F
∴ A → β A′ means T → FT′
A → α A′|ε means T′ →* FT′|ε
Production F → (E)|id does not have any left recursion
∴ Combining productions 1, 2, 3, 4, 5, we get
E → TE′
E′ → +TE′| ε
T → FT′
T →* FT′|ε
F → (E)| id
Example2 − Eliminate the left recursion for the following Grammar.
S → a|^|(T)
T → T, S|S
Solution
We have immediate left recursion in T-productions.
Comparing T → T, S|S With A → A α | β where A = T, α =, S and β = S

∴ Complete Grammar will be

14 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
S→ a|^(T)
T→ ST′
T′ →,ST′| ε

Example3 − Eliminate the left recursion from the grammar


E → E + T|T
T → T * F|F
F → (E)|id

Solution
The production after removing the left recursion will be
E → TE′

E′ → +TE′| ∈

T → FT′

T′ →∗ FT′| ∈

F → (E)|id

Example4 − Remove the left recursion from the grammar


E → E(T)|T
T → T(F)|F
F → id

Solution
Eliminating immediate left-recursion among all Aα productions, we obtain
E → TE′

E → (T)E′|ε

T → FT′

15 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN

T′ → (F)T′|ε

F → id

Direct Recursion
For each rule which contains a left-recursive option,
A --> A | β
introduce a new nonterminal A' and rewrite the rule as
A --> β A'
A' --> | A'
 Thus the production:
E --> E + T | T
is left-recursive with "E" playing the role of "A","+ T" playing the role of , and "T" playing
the role of β A'.
 Introducing the new nonterminal E', the production can be replaced by:
E --> T E'
E' --> | + T E'
 Of course, there may be more than one left-recursive part on the right-hand side. The general
rule is to replace
A --> A | | ... | β | β | ... | β
by
A --> β A' | β A'| ...| β A'
A' --> | A' | A' | ...| A'
 Note that this may change the "structure". For the expression above, the original grammar is
left-associative, while the non-left recursive one is now right-associative.

Indirect Recursion

 Step one describes a rule to eliminate direct left recursion from a production. To eliminate
left-recursion from an entire grammar may be more difficult because of indirect left-
recursion.
 For example
A --> B x y | x
B --> C D

16 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
C --> A | c
D --> d
is indirectly recursive because
A ==> B x y ==> C D x y ==> A D x y.
That is, A ==> ... ==> A where is D x y.
The algorithm eliminates left-recursion entirely. It contains a "call" to a procedure which eliminates
direct left-recursion (as described in step one).

Left Factoring

It is a process of factoring out the common prefixes of alternatives.


It is used when it is not clear that which of the two alternatives is used to expand the non -
terminal.
Rewrite the production rules without changing the meaning to avoid left factoring
A → αβ1 / αβ2 ——— (1)

A → αA’
A → β1 / β2
Rewrite the given expression (1) using the 1 and 2 expressions.

Example:
Consider the grammar and eliminate the left recursion
E ⟶ E + T / T ——- (1)
T ⟶ T * F / F ——- (2)

First, let’s take equation (1) and rewrite it using the given formula
E⟶E+T/T

E ⟶ T E’ —— (3)
E’ ⟶ +TE’ / Є —— (4)

Productions 3 and 4 are the Left factoring equations of the given production
T⟶T*F/F

T ⟶ FT’ —–(5)
T’ ⟶ FT’ / Є —— (6)

Productions 5 and 6 are the Left factoring equations of the given production

17 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN

The final productions after left factoring are:

E⟶TE
E’ ⟶ +TE’ / Є
T ⟶ FT’
T’ ⟶ FT’ / Є

Example-

Difference between Left Factoring and Left Recursion


1. Left recursion: when one or more productions can be reached from themselves with no
tokens consumed in-between.
2. Left factoring: a process of transformation, turning the grammar from a left-recursive form
to an equivalent non-left-recursive form.
Left factoring is removing the common left factor that appears in two productions of the same non-
terminal. It is done to avoid back-tracing by the parser. Suppose the parser has a look-
ahead ,consider this example-
A -> qB | qC
where A,B,C are non-terminals and q is a sentence. In this case, the parser will be confused as to
which of the two productions to choose and it might have to back-trace. After left factoring, the
grammar is converted to-
A -> qD
D -> B | C
In this case, a parser with a look-ahead will always choose the right production.
Left recursion is a case when the left-most non-terminal in a production of a non-terminal is the
non-terminal itself (direct left recursion) or through some other non-terminal definitions, rewrites to
the non-terminal again (indirect left recursion).

18 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN

Syntax analysis or parsing is the process of analyzing the string of symbols using grammar rules.
Parsing is divided into two types: bottom-up parsing and top-down parsing. It is breaking
something down into its constituent parts, for example, breaking a sentence down to explain what its
verb does.
For data, the compiler unit performs the sequential parsing operation in top-down and bottom-up
parsing after its tokenizer, lexer, or scanner has produced its tokens.

Top Down Parsing

Top-down parsing means parsing the input and constructing the parse tree, starting from the root
and going down to the leaf. It uses left most derivation to build a parse tree. On the contrary, bottom-
up parsing is based on reverse rightmost derivation. The function of top-down parsers is to construct
from the grammar (free from left recursion and ambiguity). Top-down parsing allows the grammar
free from left factoring.

Top-down parsing is a method of parsing the input string provided by the lexical analyzer. The top-
down parser parses the input string and then generates the parse tree for it.

Construction of the parse tree starts from the root node i.e. the start symbol of the grammar. Then
using leftmost derivation it derives a string that matches the input string.

In the top-down approach construction of the parse tree starts from the root node and end up
creating the leaf nodes. Here the leaf node presents the terminals that match the terminals of the
input string.

1. To derive the input string, first, a production in grammar with a start symbol is applied.
2. Now at each step, the parser has to identify which production rule of a non-terminal must be
applied in order to derive the input string.
3. The next step is to match the terminals in the production with the terminals of the input string.

19 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN

Pre Processing Steps of Top Down Parsing

Preprocessing steps for predictive parsing are,


 Removing the left recursion from the grammar
 Performing left factoring on the resultant grammar
 Removing ambiguity from the grammar
Removing the left recursion from the Grammar
Given grammar,
B → B and B
B → B or
B → (B) | true | false
Let the given grammar be,
B → B and B | true
B → B or B | false
B →(B)
Therefore, the given grammar in left recursive, removing left recursion from the grammar
Performing Left Factoring on the resultant grammar
For the grammar,
A → Aα | B
The productions are
A → βA'
A' → A' | ε
Therefore we have,
B → true B’
B’ → and B B’| ε
B’ → false B’
B’ → or BB’ | ε
B → (B)
i.e., B → true B’ | and d BB’ | false B’ | or BB’ | (B) | ε
The above grammar contains repetitions of the symbol, therefore, performing left factoring on it.

Removing Ambiguity from the Grammar

We know that for the grammar


B → β1 | β2 | β3 …….. | βv | α
We have the productions
B → 1 | 2 | 3 ……… v
Therefore
B → B1B0 | (B) | ε
B0 → true | and B | false | or B

20 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
Backtracking

In Top-Down Parsing with Backtracking, Parser will attempt multiple rules or production to identify
the match for input string by backtracking at every step of derivation. So, if the applied production
does not give the input string as needed, or it does not match with the needed string, then it can undo
that shift.
Example1 − Consider the Grammar
S→aAd
A→bc|b
Make parse tree for the string a bd. Also, show parse Tree when Backtracking is required when the
wrong alternative is chosen.
Solution
The derivation for the string abd will be −
S ⇒ a A d ⇒ abd (Required String)
If bc is substituted in place of non-terminal A then the string obtained will be abcd.
S ⇒ a A d ⇒ abcd (Wrong Alternative)
Figure (a) represents S → aAd

Figure (b) represents an expansion of tree with production A → bc which gives string abcd which
does not match with string abd. So, it backtracks & chooses another alternative, i.e., A → b in figure
(c) which matches with abd.

Algorithm for Backtracking


Example2 − Write an Algorithm for Top-Down Parsing with Backtracking for the following
Grammar.
S→aAd

21 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
A → bc| b
Solution
In the following Algorithm, there is a procedure for each Non-terminal S & A. Terminal symbols a,
b, c, d in Grammar are input symbols or Look ahead symbols to be read by Parser.
advance( ) − It is a function used to move the input pointer to the next input symbol.

22 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
Limitations of Top-Down Parsing with Backtracking
Following are some problems, which occur in Top-Down Parsing with Backtracking.
 Backtracking − Backtracking looks very simple and easy to implement but choosing a
different alternative causes lot of problems which are given below −
o Undoing semantic actions requires a lot of overhead.
o Entries made in the symbol table during parsing have to be removed while backtracking.
Due to these reasons, backtracking is not used for practical compilers.
Left Recursion − A grammar is left recursive if it has the production of form.
A → Aα|β
Which causes the parser to enter into an infinite loop.
 Choosing Right Production − The order in which alternatives are tested can change the
language accepted.
 Difficult to locate error − When failure occurs, it is difficult to know where the error
occurred.

Recursive Descent Parsing

A parser that uses a set of recursive procedures to recognize its input with no backtracking is called a
recursive descent parser
For implementing a recursive descent parser for a grammar.
 The grammar must not be left recursive
 The grammar must be left factored that means it should not have common prefixes for
alternates
 We need a language that has recursion facility
Left factoring Problems:
Eliminate Left recursion and then left factor the following grammar
rexpr → reexpr + rterm | rterm
rterm → rterm rfactor | rfactor
rfactor → rfactor* | rprimary
rprimary → a | b
Design:

 It can be written on the same lines as that of brute force approach that is one procedure for one
non terminal
 The procedures need not return anything because they need not try another alternate as in brute
force
 Only on failure we call some error routine to take care of the error
 On can make use of different type of transition diagrams for designing a recursive descent parser
as used in case of scanners.

The differences are

23 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
 There is one transition diagram for one non terminal
 The labels for edges may be terminals or non terminals
 Label is a terminal which means we must take that transition if the next input symbol is terminal
After elimination of left recursion and left factoring one can construct the transition diagram for
the grammar as,
Also read :Block Structure and Non-Block Structure Storage Allocation
For each non-terminal A,

 Create an initial and final states


 For each production A → X1, X2,....Xn, create a path from initial to the final state with edges
labeled X1, X2, .... Xn
Note:
If the transition diagram so constructed has non determinism, then a recursive descent parser can't be
constructed for the grammar

Also read :Symbol Table Organizing Using Hashing


Example:
Consider the Grammar
E → TE'
E' → +TE' | ε
Also read :Organization for Block Structured Languages
T → FT'
T' → *FT' | ε
F → (E) | id
The transition diagrams for this grammar are:

As there is no non-determinism in the transition diagram the code for parser is:

24 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN

We can further improve on the transition diagrams to better code the parser as, transition diagram for
E' can be transformed as

Now E can be modifies as


Also read :Representation of Names in Symbol Table

25 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
After applying the same technique on T' and T, the final diagrams are:

Now, the code for these transition diagrams runs faster than the code written for the original transition
diagram.
Now the code for parser is, from these reduced transition diagrams

Advantage:
 Easy to hand construct a recursive descent parser from an appropriate grammar
Limitations
 Recursive descent parser encodes state information in its run-time stack, or call stack, and this may
not be particularly efficient
 Uses recursive procedure calls to implement a stack abstraction
 It is very difficult to automated

26 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
Recursive Descent Parser uses the technique of Top-Down Parsing without backtracking. It can be
defined as a Parser that uses the various recursive procedure to process the input string with no
backtracking. It can be simply performed using a Recursive language. The first symbol of the string
of R.H.S of production will uniquely determine the correct alternative to choose.
The major approach of recursive-descent parsing is to relate each non-terminal with a procedure. The
objective of each procedure is to read a sequence of input characters that can be produced by the
corresponding non-terminal, and return a pointer to the root of the parse tree for the non-terminal.
The structure of the procedure is prescribed by the productions for the equivalent non-terminal.
The recursive procedures can be simply to write and adequately effective if written in a language that
executes the procedure call effectively. There is a procedure for each non-terminal in the grammar.
It can consider a global variable lookahead, holding the current input token and a procedure match
(Expected Token) is the action of recognizing the next token in the parsing process and advancing
the input stream pointer, such that lookahead points to the next token to be parsed. Match () is
effectively a call to the lexical analyzer to get the next token.
For example, input stream is a + b$.
lookahead == a
match()
lookahead == +
match ()
lookahead == b
……………………….
……………………….
In this manner, parsing can be done.
Example − In the following Grammar, first symbol, i.e., if, while & begin uniquely determine, which
of the alternative to choose.
As Statement starting with if will be a conditional statement & statement starting with while will be
an Iterative Statement.
Stmt → If condition then Stmt else Stmt
| While condition do Stmt
| begin Stmt end.
Example − Write down the algorithm using Recursive procedures to implement the following
Grammar.
E → TE′
E′ → +TE′
T → FT′
T′ →∗ FT′|ε
F → (E)|id

27 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN

28 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN

One of major drawback or recursive-descent parsing is that it can be implemented only for those
languages which support recursive procedure calls and it suffers from the problem of left-recursion.

LL (1) Grammars

 When we construct predictive parsing table for some grammars, the table may have some multiply
defined entries. This happens when the given grammar is left recursive or ambiguous
 A grammar whose predictive parser has no multiply defined entries is known as LL(1) grammar
 LL(1) means left to right scan of the input to generate the left most derivation by using 1 symbol of
look ahead (can be extended to k symbol look ahead)
Rather than constructing a table to check for LL(1) Grammar, we can check for it using the
following definition of LL(1) Grammar.

Definition:
A grammar G is LL(1) if and only if whenever A → α | β are two distinct productions of G then the
following three conditions hold.
For no terminal a do α and β derive strings beginning with a
At most one of α and β can derive ε
If β⇒* ε, then α does not derive any string beginning with a terminal in FOLLOW(A)

29 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
The LL(1) parsing table

The parser needs to find a production to use for nonterminal N when it sees lookahead token t.To
select which production to use, it suffices to have a table that has, as a key, a pair (N, t) and gives
the number of a production to use.

Let's illustrate with an LL(1) parsing table for the expression grammar that we used earlier, which
looks like this.

1. E → T R
2. R → ε
3. R → + E
4. T → F S
5. S → ε
6. S → * T
7. F → n
8. F → ( E )

Parsing table D below does the job. Each row is labeled by a nonterminal and each column by a
lookahead token, or the special symbol $ that indicates the end of the input.

D(N, t) is the production to use to expand N when the lookahead is t. Blank entries mean syntax
error.

Table D

n + * ( ) $

E 1 1

R 3 2 2 2

T 4 4

S 5 6 5 5

F 7 8

30 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
Now it is easy to use the table to control a top-down parse. Parsing n * n goes as follows. Start
with E.

D(E, n) = 1, so expand E using production 1.

E
/\
T R

Since D(T, n) = 4, we continue by expanding T to F S.

E
/\
T R
/\
F S

Now D(F, n) = 7, and production 7 is F → n.

E
/\
T R
/\
F S
|
n

The lookahead changes to * and D(S, *) = 6. Since production 5 is S → * T, the table tells us to
replace S by * T.

E
/\
T R
/\
F S
| /\
n * T

Now the lookahead is n, and D(T, n) = 4. After using production 4 (T → F S), the table will tell us
to use production 7 (F → n), giving

31 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
E
/\
T R
/\
F S
| /\
n * T
/\
F S
|
n

The parse is almost finished. Since there are no more tokens, the lookahead is $. D(S, $) = 5 and
D(R, $) = 2, which says to replace S and R by ε.

E
/\
/ \
T R
/\ |
F S ε
| /\
n * T
/\
F S
| |
n ε

A stack-based approach

Instead of building a parse tree, it can be preferable to construct a derivation.

We use a two stacks, called Match and Todo. Stack Matched only holds tokens.

At any given point, the string that has been derived is m t where m is the contents of the Matched
stack (from bottom to top) and t is the contents of the Todo stack (from top to bottom).

The action tells the production that is used, or, when a token is moved to the Matched stack and
removed from the input, a match action.

Here is a parse of n + n * n using the same expression grammar.

32 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
Matched Todo Input Action
E$ n+n*n$
TR$ n+n*n$ E→TR
FSR$ n+n*n$ T→FS
nSR$ n+n*n$ F→n
n SR$ +n*n$ match n
n R$ +n*n$ S→ε
n +E$ +n*n$ R→+E
n+ E$ n*n$ match +
n+ TR$ n*n$ E→TR
n+ FSR$ n*n$ T→FS
n+ nSR$ n*n$ F→n
n+n SR$ *n$ match n
n+n *TR$ *n$ S→*T
n+n* TR$ n$ match *
n+n* FSR$ n$ T→FS
n+n* nSR$ n$ F→n
n+n*n SR$ $ match n
n+n*n R$ $ S→ε
n+n*n $ $ R→ε

Non-recursive Predictive Parsing

Predictive parsing can be performed using a pushdown stack, avoiding recursive calls.

 Initially the stack holds just the start symbol of the grammar.
 At each step a symbol X is popped from the stack:
o if X is a terminal symbol then it is matched with lookahead and lookahead is
advanced,
o if X is a nonterminal, then using lookahead and a parsing table (implementing the
FIRST sets) a production is chosen and its right hand side is pushed onto the stack.
 This process goes on until the stack and the input string become empty. It is useful to have
an end_of_stack and an end_of_input symbols. We denote them both by $.

33 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
Figure 8 shows the structure of non-recursive predictive parsers. For each nonterminal A and each
token a the entry M[A, a] of the parsing table contains either an A-production generating sentences
starting with a or an error-entry.

Figure 8: The structure of non-recursive predictive parsers.

S
aAa | BAa |
A
cA | bA |
B b

The parsing table is:


a b c $

S
S aAa S BAa S
A
A A bA A cA
B
B b

where the empty slots correspond to error-entries. Consider the parsing of the word w = bcba.

34 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
Stack Remaining input action
$S bcba$
choose S BAa
$aAB bcba$
choose B b
$aAb bcba$ match b
$aA cba$
choose A cA
$aAc cba$ match c
$aA ba$
choose A bA
$aAb ba$ match b
$aA a$
choose A
$a a$ match a

THE ALGORITHM.

35 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN

Difference between recursive descent and non-recursive predictive parsing

Recursive predictive descent parsing Non-recursive predictive descent parsing


It is a technique which may or may not It is a technique that does not require any kind
require backtracking process. of back tracking.
It uses procedures for every non terminal It finds out productions to use by replacing
entity to parse strings. input string.
It is a type of top-down parsing built from a It is a type of top-down approach, which is
set of mutually recursive procedures where also a type of recursive parsing that does not
each procedure implements one of non- uses technique of backtracking.
terminal s of grammar.
It contains several small functions one for The predictive parser uses a look ahead
each non- terminals in grammar. pointer which points to next input symbols to
make it parser back tracking free, predictive
parser puts some constraints on grammar.
It accepts all kinds of grammars. It accepts only a class of grammar known as
LL(k) grammar.

Error Recovery in Predictive Parsing

 We know that the Predictive parser performs Left most derivative while parsing the given
sentence
 Now the given sentence may be a valid sentence or an invalid sentence with respect to the
specified grammar
 An error is detected during the predictive parsing when the terminal on top of the stack does
not match the next input symbol, or when nonterminal X on top of the stack, a is the next
input symbol, and parsing table entry PT[ X , a ] is empty i.e., X is not in the parsing table.
Our predictive parser works as follows
Stack top X
input symbol a
:
while (...) {
if (X is Terminal)
{
if(X==a)
{
pop X
move input ptr forward to next symbol
}
else // X!=a
{ // Error found }
}
else // X is NT
{

36 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
if (X is found in parsing table )
{
pop X // X---> Y1 Y2…Yn
push Yn... Y2 Y1
}
else // X not in parsing table
{ // Error found }
}
}
 Our code will report only the 1st error, but we want to report all errors , so we need to do
error handling
 Specification of a parser
 Report syntax errors
 Proceed forward after detecting 1st error Error Recovery (ER)
 Simple and Fast

Error Recovery Methods and Techniques:


1. Panic Mode Recovery
2. Phrase level Recovery
3. Erroneous Productions

Panic Mode Recovery:


 Let us think that the parser has successfully scanned and created a parse tree till ‘a’ and next
to that it has found an error W= x1x2……a……xn
 Panic-mode error recovery is based on the idea of skipping symbols on the input until a
token in a selected set of synchronizing tokens (a specific set of symbols) appears. From
there continue parsing the rest of string.
 After detection of error the parser should be restored to a start state, where it can restart
again.
 Its effectiveness depends on the choice of synchronizing set. The sets should be chosen so
that the parser recovers quickly from errors that are likely to occur in practice.
 Good example for specific symbol in C is ; .
 As a starting point, place all symbols in FOLLOW (X) into the synchronizing set for
nonterminal X, if we skip tokens until an element of FOLLOW(X) is seen and pop X from
the stack, it is likely that parsing can continue.
 We need to add some more symbols in the synchronizing set. But how does this work
 By keeping a special character ‘S’ in the parsing table at the places where the elements of
synchronizing set are present we can achieve it
PT SyncSymbol1 …………
.. SyncSymbol2
X S .………... S
………………………
………………………..
 Simple cases are exchanging ; with , and = with == , delete an extraneous semicolon, or
insert a missing semicolon. Difficulties occur when the real error occurred long before an
error was detected.

37 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
 The choice of the local correction is left to the compiler designer.
Erroneous Productions
 Include productions for common errors. We can augment the grammar for the language at
hand with productions that generate the erroneous constructs.
 A parser constructed from a grammar augmented by these error productions detects the
anticipated errors when an error production is used during parsing.
 The parser can then generate appropriate error diagnostics about the erroneous construct that
has been recognized in the input.

38 Dr.SGIET ,Markapur .CSE Department

You might also like