CHAPTER
3: Mr. Yirga K.
Parser /
Syntax
Analysis
OUTLIN
1. Introduction to Parsing
E
2. Types of Parsing
1. Top-down Parsing
Top-down Parsing Implementation:
- Recursive Decent parsing
- Predictive (Non-Recursive)
Parsing
- LL (1) Grammar
OUTLIN
2. Bottom-Up Parsing
E
- Handles a n d Handles pruning
- Stack Implementation of Shift Re d u ce
Parsing
- LR Parser Implementation
- SLR, CLR a n d LALR parser
3. Error Recovery in Parsing
4. Ya c c Automatic Parser Generator
Introduction to Parsing
Chapter Content
In this chapter, the process of parsing, the role of parser,
various types of parses, a n d their implementation
strategies will b e addressed.
Objective: Students should:
Design parsers using the c o n c e p t of CFG.
C o m p a re different parsing strategies.
Implement LL(1) a n d LR(1) parsers.
Students b e able to use YA C C as a syntax generator.
Introduction to Parsing
What is
parsing? or parsing is the second phase of a
• Synta x compiler
• a
It na lysistransformation of a sequence of tokens to a n
is the
architecture.
abstract syntax tree.
• The parser obtains a set of tokens from the lexical analyzer.
• Syntax Analyzer creates the syntactic structure of the
given source program.
• It uses BNF (Backus-Naur Form) notation in the description of
CFGs.
Introduction to Parsing
• Tokens arrangements are c h e c ke d against source c o d e grammar.
The synta x o fa p ro g ra mming la ng ua g e is
we ll d e sc rib e d by using
c ontext-free grammar.
It checks whether a given source program satisfies the rules
implied by a context-free grammar or not.
• If it satisfies, the parser creates the parse tree/syntax
tree/, which further used for Generating Intermediate
code(ICG).
• Otherwise the parser gives the error messages.
Context Free C onti
Grammar
Pushdown …
Automata
Regular
Expressions Finite
State Automata
Introduction to Parsing
Syntax is the rule governing the formation of
statements in a programming language.
It is the way tokens are put together to form expressions,
statements, or blocks of statements.
Syntax analysis is the task con ce rn e d with fitting a
sequence of tokens into a specifi ed syntax.
Due to the limitation of RE, a scanner cannot c h e c k the
syntax of a se nte nc e . E.g . RE c a nno t c he c k { a nd }.
Syntax of a programming language is the grammar rules of
context- free grammar (CFG).
Introduction to Parsing
A grammar describes the hierarchical structure
of most programming languages.
Exa mp le : if ( e xp re ssio n ) sta te me nt else
sta te me nt
Using expr for expression a n d stmt for statement, it c a n b e
expressed a s, stmt: if ( expr ) stmt else stmt
This is c a lle d produc tion.
In production, lexical elements like the keyword if, else a n d
parentheses are called Terminals.
Variables like expr a n d stmt represent sequences of
terminals a n d are called non-terminals.
Introduction to Parsing
sp e c ific a ti
C FG s G ive s a p re c ise synta c tic of a
p ro g ra mming language. on
The d e sig no f the g ra mma r is a n initia l p ha se o f
the d e sig no f a compiler.
We categorize the parsers into two groups:
Top-Down Parser
The parse tree is created top to bottom starting from the
root.
Bottom-Up Parser
The parse tree is created bottom to top starting from the
leaves.
Introduction to Parsing
In general,
The parser accomplishes the following tasks, i.e.,
Parsing the c o d e ,
Looking for errors (syntactical) a n d
G e ne ra ting a p a rse tre e /synta x tre e / as the
o utp ut fo r the ne xt phase.
1) Top-down
Parsing
It Parses the input, a n d starts constructing a parse tree from
the root n o d e gradually moving down to the leaf nodes.
Top-down
1)Recursive Descent Parsing: Parsing
It is a c o mmo n fo rm o f to p -d o wn p a rsing .
It use s re c ursive p ro c e d ure s to p ro c e ss the inp ut.
Recursive descent parsing suffers from backtracking.
It constructs the parse tree from the top a n d the input
is read from left to right.
Top-down
Backtracking: Parsing
It is a te c hniq ue in to p -d o wn p a rsing tha t a llo ws
p a rse rs to explore alternative choices a n d handle
ambiguity.
In backtracking, the parser restores the input to its
original state when it fails.
It goes b a c k to earlier state to find another route after
knowing that current state is a d e a d end.
Top-down
Backtracking: Parsing
If one derivation of a production fails, the syntax analyzer
restarts the process using different rules of that
production.
This technique may process the input string more than
o n c e to determine the right production.
Top-down
Parsing
Top-down
Consider a Parsing
CFG:
S → rXd | rZd
X → oa | ea
Z → ai
Parse a n
input string:
“read”,
Try: P A B, A
xy | x, By,
a nd p a rse
inp ut “ xy”?
Top-down
2) Predic tive Parser Parsing
A form of recursive-descent parsing that does not require
any back- tracking is known as predictive parsing.
Predictive parser has the capability to predict which
production is to b e used to replace the input string.
It d o e s no t suffe r fro m b a c ktra c king .
Predictive parser uses a look-ahead pointer, which points to
the next input symbols.
To ma ke the p a rse r b a c k-tra c king fre e ,
the p re d ic tive p a rse r p uts some constraints on
the grammar.
Top-down
Parsing
It a c c e p t s only a class of grammar known as LL (1) grammar.
The first ‘L’ stands for scanning the input from left to right,
and
the second “L” stands for left most derivations, a n d
‘1’ refers to using only one input symbol (lookahead) at
e a c h step to make parsing action decisions.
Note: LL(1) is a top-down parsing algorithm that reads input
from left to right a n d builds leftmost derivations.
Top-down
Parsing
Two c ommon problems in top down parsing.
1) Left Rec ursion
A g ra mma r bec omes le ft-re c ursive
if it ha s a ny no n-te rmina l ‘ A’ whose derivation
contains ‘ A ’ itself as the left-most symbol.
Example: S Sα | β, where α (V+T)* a n d β (V+T)*
(β is a sequenc e of terminals a n d non terminals that d o not
start with S).
Top-down
Parsing
Left-recursive grammars are considered as problematic as
the parser goes into a n infinite loop.
Left recursion c a n b e immediate or indirect recursion.
Example:
A Aα | β
S A α | β, A Sd
(1) is a n example of immediate left recursion, where A is
any non- terminal symbol a n d ‘ α ’ represents a string of non-
terminals.
(2) is a n e xa mp le o f indirec t-left re c ursio n.
Top-down
A top-down parser
Parsing
will first parse the ‘A’, which in-turn will
yield a string consisting of ‘ A ’ itself a n d the parser may
goes into a loop infinitely.
Top-down
Removal of Left Rec ursion Parsing
O n e way to remove left recursion is to use the following
technique:
The production, A A α | β , which is a type of immediate
recursion
is converted or rewriting into following productions.
A βA'
A ’ αA' | ε,
This d o e s not imp a c t the string s d e rive d fro m
the g ra mma r, b ut it removes immediate left
recursion.
Se c o nd me tho d is to use the fo llo wing a lg o rithm,
whic h sho uld
eliminate all direct and indirect left recursions.
Top-down
More Generally
Parsing
A Aα 1 | Aα 2 | Aα 3 | ….. | A α m | β1 | β2 | …. | β n Where
no βi begins with a n A .
Then w e replace the “A” productions by
A β1 A ’ | β2 A ’ | ….. | β n A ’
A ’ α 1 A ’ | α 2 A ’ | α 3 A ’ | ….. | α m A ’ | ε.
Exercise: G1: SSab | Scd | Sef | g | h, G2: S (L)|a a n d L LS|S?
Top-down
Parsing
Top-down
Parsing
Exercise: Eliminate left recursion from the following
grammar?
EE+T| Solution:
T
TT*F|F ETE’
F(E)| id E’ε|+TE’
TFT’
T’*FT’/ ε
F(E)|id
Top-down
2) Left Fac toring
Parsing
If more than one grammar production rules has a
c o m m o n prefix string, then the top-down parser cannot
make a choice.
Le ft fa c to ring is a g ra mma r tra nsfo rma tio n tha t
is use ful fo r producing a grammar suitable for
predictive or top down parsing.
A grammar is said to b e Left Factored when it has the form:
A αβ 1 | αβ 2 | αβ 3 | … … | αβ n | γ, i.e. the productions start
with the same terminal (or set of terminals).
Top-down
Fro m the a b o ve e xa mp le , b y Parsing
se e ing the inp ut
‘α’ we c a nno t immediately tell which production to
choose to expand A.
Left factoring transforms the grammar to make it useful for
top-down parsers.
For the grammar A αβ 1 | αβ 2 | αβ 3 | … … | αβ n | γ
The equivalent Left Factored grammar will b e by rewriting:
A αA’ | γ
A ’ β 1 | β 2 | β 3 | … … | βn.
Top-down
3) FIRST and FOLLOW Parsing
First a nd Fo llo w Se ts use d in LL (1) o r p re d ic tive p a rse r.
Parser table construction is applying to create first a n d
follow sets.
The se se ts c a n p ro vid e the a c tua l p o sitio n of
a ny te rmina lin the derivation.
They allow the parser c a n properly apply the n e e d e d rule
at the correct position.
This is d o ne to c re a te the p a rsing ta b le whe re
the d e c isio n o f replacing T[A, t] = α, with some
production rule.
Top-down
Parsing
First Set: If there is a variable, a n d from that variable, if w e
try to drive all the strings then the beginning Terminal
Symbol is called the First.
FIRST(X) for a grammar symbol X is the set of terminals that
begin the strings derivable from X.
Rules to c ompute FIRST set:
If x is a te rmina l, the n FIRST(x) = { ‘ x’ }.
If Xɛ, is a production rule, then FIRST(X) = {ɛ}.
If XY1Y2Y3… .Yn is a p ro d uc tio n, the n
FIRST(X) = FIRST(Y1).
If FIRST(Y1) contains ɛ, then FIRST(X) = { FIRST(Y1) – ɛ } U
{ FIRST(Y2) }.
If FIRST (Yi) contains ɛ for all i = 1 to n, then a d d Є to FIRST(X).
Top-down
Example 1: Pro d uc tio n Rule s o f Parsing
G ra mma r:
E TE’ ; E’ +T E’ | Є
T F T’ ; T’ *F T’ | Є
F (E) | id
FIRST sets:
FIRST(E) = FIRST(T) = { ( , id }
FIRST(E’ ) = { +, Є }
FIRST(T) = FIRST(F) = { ( , id }
FIRST(T’ ) = { *, Є }
FIRST(F) = { ( , id }
Top-down
Parsing
Follow Set: What is the Terminal Symbol which follows a
variable in the process of derivation.
Rules to c ompute FOLLOW set
Follow (S) = { $}, where S is the starting Non-Terminal.
If A pBq is a production, where p, B a n d q are any
grammar symbols, then everything in FIRST(q) except Є is
in FOLLOW(B).
If A p B is a p ro d uc tio n, the n e ve rything
in FO LLO W(A ) is in
FOLLOW(B).
If A p Bq is a p ro d uc tio n a nd FIRST(q ) c o nta ins
Є, the n FO LLO W(B) c o nta ins { FIRST(q ) – Є } U
FO LLO W(A ).
Top-down
Example: Consider a production
rules: E TE’
Parsing
E’ +T E’ | Є
T F T’
T’ *F T’ | Є
F (E) | id
FOLLOW Set
will be:
FO LLO
W(E) = {
$ , ) } //
No te ')'
is the re
b e c a us
e o f 5th
rule
FO LLO
W(E’ ) =
Top-down
Example 2: Consider production rules of Parsing
grammar:
S AC B | C bb |
Ba A d a | BC
Bg | Є
C h | Є
FIRST
Sets:
FIRST(S) =
FIRST(A) U
FIRST(B) U
FIRST(C) =
{ d, g, h,
Є}
FIRST(A ) = { d } U FIRST(B) = { d , g , Є } FIRST(B)
= { g , Є } FIRST(C ) = { h , Є }
Top-down
Production FIRST set:
FIRST(S) = {
Parsing
a }
Rules:
FIRST(B) = { c }
S ->
FIRST(C) = { b , Є }
aBDh B FIRST(D) = FIRST(E) U FIRST(F) = { g, f,
-> cC Є } FIRST(E) = { g , Є } FIRST(F) = { f
C -> bC , Є }
| D ->
F -> f |
EF
Є FOLLOW
FOLLOW(S = { $ }
Set:
E -> g | )
Є FOLLOW(B = { FIRST(D) – Є } U FIRST(h) = { g , f ,
) h }
FOLLOW(C = FOLLOW(B) = { g , f , h }
)
FOLLOW(D = FIRST(h) = { h }
)
FOLLOW(E = { FIRST(F) – Є } U FOLLOW(D) = { f ,
) h }
FOLLOW(F = FOLLOW(D) = { h }
)
Top-down
Limitations of Syntax Analyzers Parsing
It c a nno t d e te rmine if a to ke n is valid.
It c a nno t d e te rmine if a to ke n is dec lared b e fo re it is b e ing
use d .
It cannot determine if a token is initialized before it is being
used,
It c a nno t d e te rmine if a n o p e ra tio n p e rfo rme d o n
a to ke n type is valid or not.
LL(1)
Implementation of LL(1) Implementation
Step1: First c he c k fo r le ft re c ursio n in the g ra mma r,
if the re is le ft recursion in the grammar remove that
a n d g o to step 2.
Step
Left2: Calculate First() a n d Follow() for all non-terminals.
factor ? ?
Top-down
Parsing
Step 3: For e a c h production A α. (A tends to alpha)
1. Find First(α) a n d for e a c h terminal in First(α), make entry A
α in the table.
2. If First(α) contains ε (epsilon) as terminal then, find the
Follow(A) a n d for e a c h terminal in Follow(A), make entry
A α in the table.
3. If the First(α) contains ε a n d Follow(A) contains $ as
terminal, then make entry A α in the table for the $.
Top-down
In the table, rows will contain Parsing
the Non-Terminals a n d the
column will contain the Terminal Symbols.
All the Null Productions of the Grammars will g o under the
Follow elements a n d the remaining productions will lie
under the elements of the First set.
Example-1: Consider the Grammar:
E TE’
E' +TE' | ε
T FT’
T' *FT' | ε
F id | (E) ε denotes epsilon
Top-down
Parsing
LL(1) or predictive Parsing
Table.
2) Bottom-Up
Parsing
Build the parse tree from leaves to root.
Bottom-up parsing reduces the input string ‘ w ’ to the start
symbol of a grammar by tracing out the rightmost
derivations of ‘ w ’ in reverse.
Example 1: Consider the following rules, a n d apply bottom-up
appro a ch.
2) Bottom-Up
Parsing
Bottom-up parsing starts with the input symbols a n d tries to
construct the parse tree up to the start symbol.
Example 2: Pro d uc tio n rule s:
S→E
E→E+T
E→E*T
E→T
T → id
Le t us sta rt b o tto m-up p a rsing fo r inp ut, a + b * c
Read the input a n d c h e c k if any production matches with
the input.
2) Bottom-Up
a +b * Parsing
c
T+b *
c
E+b *
c
E+T*
c
E*c
E*T
E
S
2) Bottom-Up
Classifications of Bottom-Up Parsing
Parsing.
2) Bottom-Up
Benefits of LR parsing: Parsing
Man y programming languages using some variations of a n
LR parser.
It should b e noted that C++ a n d Perl are exceptions to it.
LR Parser c a n b e implemented very effi ciently.
Of all the Parsers that scan their symbols from left to right, LR
Parsers detect syntactic errors, as soon as possible.
2) Bottom-Up
Parsing
Shift Reduce Parsing Implementation
Shift Re du ce parser attempts for the construction of parse
tree in a similar manner as d o n e in bottom up parsing(i.e.
the parse tree is constructed from leaves to the root).
A mo re g e ne ra l fo rm o f shift re d uc e p a rse r is LR(0) p a rse r.
The L stands for scanning the input from left to right a n d R
stands for constructing a rightmost derivation in reverse.
This p a rse r re q uire s so me d a ta struc ture s, i.e .
A input buff er for storing the input string.
A stack for storing a n d accessing the production rules.
2) Bottom-Up
Parsing
Shift-reduce parsing uses two operations(steps).
a) Shift step: The shift step refers to the a d v a n c e me n t of the
input pointer to the next input symbol, which is called the
shifted symbol. This symbol is pushed onto the stack. The
shifted symbol is treated as a single n o d e of the parse tree.
b) Reduce step : When the parser finds a complete grammar
rule (RHS) a n d then replaces it to (LHS), it is known as
reduce step. This occurs when the top of the stack
contains a handle. To reduce, a POP function is performed
on the stack which pops off the handle a n d replaces it
with LHS non-terminal symbol.
2) Bottom-Up
Accept: If only start symbol isParsing
present in the stack a n d
the
input buffer is empty then, the parsing action is called
a c c e p t . When a c c e p t action is obtained, it means
successful parsing is done.
Error: This is the situation in which the parser c a n neither
perform shift action nor reduce action a n d not even a c c e p t
action.
Example 1 – C o nsid e r the
g ra mma r: S S + S
S S *
S S id
Apply Shift
Reduce parsing
for input string, “id
2) Bottom-Up
Parsing
2) Bottom-Up
Parsing
Example 2 – Consider the grammar. E 2E2, E 3E3, a n d E
4, Apply Shift Reduce Parsing for input string “32423”?
Solution:
2) Bottom-Up
Parsing
Example 3 – C o nsid e r the g ra mma r, S (L)| a ,
L L , S | S, a nd apply
Shift Re du ce parsing for input string “( a , ( a , a ) ) “?
Project Work:
Implement Simple LR Parser that recognizes inputs from certain grammar?
2) Bottom-Up
Handles and Handles pruning Parsing
In compiler design, handle pruning is a technique that
improves the grammar's parsing process.
It involves removing the children of the left-hand side non-
terminal from the parse tree.
A handle is a substring that matches the body of a
production.
Reducing a handle represents one step along with the
reverse of a Rightmost derivation.
https://www.geeksforgeeks.org/what-is-handle-pruning/
SLR, CLR a n d LALR parser
https://www.geeksforgeeks.org/slr-clr-and-lalr-parsers-set-3/
https://www.geeksforgeeks.org/lalr-parser-with-examples/
2) Bottom-Up
3. Error Recovery in Parsing Parsing
4. Ya c c Automatic Parser
Generator
Thank
you!!