0% found this document useful (0 votes)

33 views52 pages

Chapter 6

Uploaded by

md5fxz9ths

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views52 pages

Chapter 6

Uploaded by

md5fxz9ths

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Introduction to the Parser

Copyright © All Rights Reserved by Yuan-Hao Chang

Benefits of Grammars for Programming
Languages
• Give a precise syntactic specification of a
programming language.
• Construct automatically a parser that determines
the syntactic structure of a source program.
– Parser-construction process could reveal syntactic
ambiguities and trouble spots.
• Make the source program translation and error
detection easier.
• Make the adding of new constructs for a language
easier.
The Role of the Parser

• The parser obtains a string of tokens from the lexical analyzer, and
verifies them with the grammar for the language.
– Collect information about various tokens into the symbol table.
– Perform type checking and other semantic analysis.
– Generate intermediate code. No strategy is proven universally
acceptable, and the simplest
• In practice, parsers are expected to approach for the parser is to quit
– Report syntax errors and with an error message when it
– Recover from commonly occurring errors. detects the first error.

Tokens
Lexical Reset of
Parser
Source Analyzer Get next Parse Front End Intermediate
program token tree representation

Symbol
Table
6

Types of Parsers

• There are three general types of parsers:

– Universal parsing
- These general methods are too inefficient to use in production
compilers.
- E.g., Cocke-Younger-Kasami algorithm and Earley’s algorithm
– Top-down parsing
- Build parse trees from the root to the leaves.
– Bottom-up parsing
- Build pares trees from the leaves to the root.
Context-Free Grammar

• Context-free grammar is also called grammar for short. It consists of

– Terminals
- The basic symbols from which strings are formed.
- The token name is a synonym for “terminal”.
– Nonterminals
- Nonterminals are syntactic variables denote sets of strings that help define the language generated by the
grammar.
- Nonterminals impose a hierarchical structure that is key to syntax analysis and translation.
– Start symbol
- Start symbol is the first nonterminal of the grammar and can generate the language.
– Productions
- Productions of a grammar specify the manner in which the terminals and nonterminals are combined to
form strings. Each production consists of:
· A nonterminal called the head or left side.
· The symbol →
· A body or right side that consists of zero or more terminals and nonterminals.
A Grammar to Define Arithmetic Expressions

• A grammar to define arithmetic expressions

– 7 terminals or terminal symbols: id + - * / ( )
– 3 nonterminals: expression, term, factor

expression → expression + term

expression → expression – term
expression → term
term → term * factor
term → term / factor
term → factor
factor → ( expression )
factor → id
Notational Conventions

• These symbols are terminals:

– Lowercase letters early in the alphabet, e.g., a, b, c
– Operator symbols, e.g., +, -, *, /, and so on
– Punctuation symbols, e.g., parentheses, comma, and so on
– The digits 0, 1, …, 9.
– Boldface strings each of which represents a single terminal symbol,
e.g., id.
• These symbols are nonterminals:
– Uppercase letters early in the alphabet, e.g., A, B, C
– The letter S represent the start symbol
– Lowercase, italic names, e.g., expr or stmt.
– When discussing programming constructs, uppercase letters may be
used to represent nonterminals for the constructs, e.g., E, T, and F.
(E: expression, T: term, F: factor)
Notational Conventions (Cont.)

• Grammar symbols (either nonterminals or terminals)

– Uppercase letters late in the alphabet, e.g., X, Y, Z
• Strings of terminals
– Lowercase letters late in the alphabet, e.g., u, v, …, z
• Strings of grammar symbols
– Lowercase Greek letters, e.g., , , 
– E.g., A → 
• A-productions
– A set of productions A→ 1, A→ 2, …, A→ K with a common head A.
– It can be written A→ 1 | 2 , | … | K ,
– 1, 2 , …, K are the alternatives for A.
• The head of the first production is the start symbol.
Grammar with Notational Convention

expression → expression + term

expression → expression – term
expression → term
term → term * factor
term → term / factor
term → factor
factor → ( expression )
factor → id

E→E+T|E–T|T
T→T*F|T/F|F
F → ( E ) | id
Derivations

• The construction of a parse tree can be made

precise by taking a derivational view.
– Productions are treated as rewriting rules.
– Beginning with the start symbol, each rewriting step
replaces a nonterminal by the body of one of its
productions.
- The leftmost derivation corresponds to the top-down parsing.
- The rightmost derivation corresponds to the bottom-up parsing.
– E.g., E → E + E | E * E | -E | (E) | id (4.7) read as “E derives –E”
The replacement of a single E by –E is described by: E  -E
E.g., E  -E  -(E)  -(id)
A derivation of –(id) from E
Derivations (Cont.)

• Consider a nonterminal A in the middle of a sequence of

grammar symbol, as in A.
–  and  are arbitrary strings of grammar symbols (either
nontermials or terminals).
– If A→, then A    Derive in
(A derives  in one step) one step

•   …  n ( derives n ) Derive in


* zero or
more steps
.  *  , for any string
+
2. If  *  and  *  , then  *  Derive in
one or
more steps
Derivations (Cont..)

• A language that can be generated by a (context-free)

grammar is a context-free language.
• If two grammars generate the same language, the
grammars are equivalent.
• The language generated by a grammar is the set of
sentences of the grammar.
– A sentential form may contain both terminals and nonterminals, and
may be empty.
– A sentence of grammar G is a sentential form with no nonterminals.
- A string of terminals w is in L(G) iff w is a sentence of G
If S *  , where S is the start symbolof grammar G

 is a sentential form of G
Leftmost Derivation and Rightmost Derivation

• The string –(id + id) is a sentence of grammar (4.7)

• Leftmost derivation: E → E + E | E * E | -E | (E) | id (4.7)
– The leftmost nonterminal in each sentential form is
always chosen. We write   
lm
– E.g.,
E  -E  -(E)  -(E + E)  -(id + E)  -(id + id) (4.8)
lm lm lm lm lm

• Rightmost (or canonical)derivation:

– The rightmost nonterminal in each sentential form is
always chosen. We write rm 
– E.g., E  -E  -(E)  -(E + E)  -(E + id )  -(id + id)
(4.9)
rm rm rm rm rm
Left-Sentential and Right-Sentential Form
• Every leftmost step can be written as wA 
w, where
– w consists of terminals only.
– A→  is the production applied.
–  is a string of grammar symbols.
• If S * , then  is a left-sentential form of the
lm
gram mar.
• If S 
rm
* , then  is a right-sentential form of the
grammar.
Parse Trees and Derivations

• A parse tree is a graphical representation of a derivation,

and filters out the order in which productions are applied to
replace nonterminals.
– Each interior node labeled with the nonterminal in the head of the
production to represent the application of the production.
– The children of an interior node are labeled (from left to right) by the
symbols in the body of the corresponding production.
– During derivation, the head of the production is replaced by the body
of the corresponding production.
• Yield or frontier of the tree:
– Read the leaves of a parse tree from left to right to constitute a
sentential form.
Parse Trees and Derivations (Cont.)

Parse string -(id+id) with the grammar: E → E + E | E * E | -E | (E) | id (4.7)

Derivation with leftmost derivation: Derivation with rightmost derivation:

E  -E E  -E
 -(E)  -(E)
 -(E + E) (4.8)  -(E + E) (4.9)
 -(id + E)  -(E + id)
 -(id + id)  -(id + id)

E  E  E  E  E  E
Yield: read
- E - E - E - E - E leaves from left
(E ) (E ) to right
( E )
(E )
E+ E E+ E E+ E
id id id
Sequence of parse trees for leftmost derivation (4.8)
Relationship Induction between Derivations and Parse
Trees
• Consider any derivation 1  2  …  n, where 1 is a
single nonterminal A.
– For each sentential form i, we can construct a parse tree whose
yield is i.
• Induction process on i
– BASIS: the tree for 1= A is a single node labeled A.
– INDUCTION:
- Suppose we have constructed a parse tree with yield
i-1=X1X2…Xk (where Xi is either a nonterminal or a terminal)
- Suppose i is derived from i-1 by replacing Xj with 
where Xj→ , and  = Y1Y2…Ym
→ i=X1X2…Xj-1  Xj+1…Xk
- To model this step:
· Find the jth leaf from the left in the current parse tree.
· Let this leaf Xj
· Give this leaf m children labeled Y1, Y2, … Ym
Ambiguity

• A grammar that produces more than one parse tree for

some sentence is said to be ambiguous.
– There should be a one-to-one relationship between parse trees and
its rightmost (or rightmost) derivation.
– In other words, every parse tree has associated with a unique
leftmost and a unique rightmost derivation.
• E.g., Produce the sentence id+id*id with the grammar:
E → E + E | E * E | (E) | id (4.3)

Derivation with leftmost derivation: Derivation with another leftmost derivation:

EE+E EE*E
 id + E E+E*E
 id + E * E (4.8)  id + E * E (4.8)
 id + id * E Produce the  id + id * E
 id + id * id same sentence  id + idCopyright
* id © All Rights Reserved by Yuan-Hao Chang
Ambiguity (Cont.)

Derivation with leftmost derivation: Derivation with another leftmost derivation:

EE+E EE*E
 id + E E+E*E
 id + E * E (4.8)  id + E * E (4.8)
 id + id * E  id + id * E
 id + id * id  id + id * id

E E

E + E E * E

id E * E E + E id

id id id id
Two parse trees for id+id*id
Context-Free Grammars vs. Regular Expression

• Grammars are more powerful than regular expressions.

– Every construct that can be described by a regular expression can be
described by a grammar, but not vice-versa.
– Every regular language is a context-free language, but not vice-versa.
• E.g., the language L = {anbn | n1} with an equal number a’s and b’s.
– Grammar: A0 → aA1b, A1 → aA1b | 
The path from si back
– Regular expression with finite automata:
to itself
Path labeled aj-i
…
Path labeled ai Path labeled bi sk-1
s0 … si … f

Construct a DFA D with a finite number of states k to accept the language L.

– For an input beginning with more than k a’s, D must enter some state twice (i.e., si)
– aibi is in the language, but there is also a path labeled ajbi. Not in the language
Construct a Grammar from NFA

• Construct a grammar to recognize the same language as

an NFA as follows:
– 1. For each state i, create a a
nonterminal Ai. b b
start a
– 2. If state i has a transition to 0 1 2 3
state j on input a, add the
production Ai → aAj. The NFA for (a|b)*abb
If state i goes to state j on input , b
add the production Ai → Aj
A0 → aA0 | bA0 | aA1
– 3. If i is an accepting state,
A1 → bA2
add Ai → 
A2 → bA3
– 4. If i is the start state, make
A3 → 
Ai be the start symbol.
The grammar for (a|b)*abb
Top-Down Parsing

• Top-down parsing can be viewed as

– Constructing a parse tree for the input string from the
root
– Creating the nodes of the parse tree in preorder
– Finding a leftmost derivation for an input string.
• The key problem is to determine the production to
be applied for a nonterminal.
– Once a production is chosen, the rest of the parsing
process is to match the terminal symbols in the
production body with the input string.
E → TE’
Top-Down Parsing (Cont.) E’ → + TE’ | 
T → FT’ (4.2)
T’ → * FT’ | 
• Top-down parse for id+id*id F → (E) | id
E  E  E  E  E  E  E
lm lm lm lm lm lm
E’ T E’
T E’ T E’ T E’ T E’ T
F T’ F T’ F T’ F T’ + T E’ F T’ + T E’
id id  id  id  F T’
 E  E   
lm lm E E E
T E’ T E’ lm E’
lm
E’ lm
T T T E’
F T’ + T E’ F T’ + T E’ F T’ + T E’ F T’ + T E’ F T’ + T E’
id  F T’ id  F T’ id  F T’ id  F T’ id  F T’ 
id id * F T’ id * F T’ id * F T’ id * F T’
id id  id 
Recursive-Descent Parsing

• A recursive-descent parsing consists of a set of procedures, each of which is for one nonterminal.
• Backtracking might be needed to repeat scans over the input.
– NOTE: backtracking is not very efficient, and tabular methods such as the dynamic programming
algorithm is preferred.
• Left-recursive grammar can cause a recursive-decent parser to go into an infinite loop. (i.e., A production
might be expanded repeatedly without consuming any input.

void A() {
1) Choose an A-production, A→X1X2 … Xk To allow backtracking, this
2) for ( i = 1 to k) { should try each production in
3) if ( Xi is a nonterminal )
4) call procedure Xi(); some order
5) else if ( Xi equals the current input symbol a)
6) advance the input to the next symbol; To allow backtracking, this
7) else /* an error has occurred */ should return to line (1) and try
}
}
another A-production until no
more A-productions to try.
A typical procedure for a nonterminal in a top-down parser
Recursive-Descent Parsing (Cont.)
S → cAd
• Input string w = cad. A → ab | a
Grammar

S  S  S  S
backtrack
c A d c A d c A d c A d

a b a a
match

w = cad w = cad w = cad w = cad

FIRST and FOLLOW

• FIRST and FOLLOW allow us to choose which production

to apply, based on the next input symbol.
– FIRST() is the set of terminals that begin strings derived from ,
where  is any string of grammar symbols. If  * , then  is also in
FIRST().
– FOLLOW() is (for nonterminal A) the set of terminals a that can
appear immediately to the right of A in some sentential form.
- If A can be the rightmost symbol in some sentential form, then $ is in
FOLLOW(A), where $ is a special “endmarker” symbol.

S * Aa S

 A a  c is in FIRST(A)
A * c a is in FOLLOW(A)
c 
FIRST
• Compute FIRST(X) for all grammar symbols X:
– If X is a terminal, then FIRST(X) = {X}.
– If X is a nonterminal and X→Y1Y2 … Yk is a production for some k1,
- Everything in FIRST(Y1) is surely in FIRST(X).
- If Y1 does not derive , then nothing more is added to FIRST(X).
- If Y1 * , then FIRST(Y2) is added to FIRST(X), and so on.
– If X→  is a production, then add  to FIRST(X).
• FIRST(F) = { (, id }
• FIRST(T’) = {*, }
E→ TE’ – The two productions for T’ begins with * and .
E’ → + TE’ |  • FIRST(T) = FIRST(F) = { (, id }
T → FT’ (4.2) – T has one production beginning with F.
T’ → * FT’ |  • FIRST(E’) = {+, }
F→ (E) | id – The two productions for E’ begins with + and .
• FIRST(E) = FIRST(T) = { (, id }
Copyright © All
– E has one production Rights Reserved
beginning with by
T.Yuan-Hao Chang
FOLLOW

• Compute FOLLOW(A) for all nonterminals A

– Place $ in FOLLOW(S), where S is the start symbol and $ is the input right endmarker.
– If there is a production A → B, then everything in FIRST() except  is in FOLLOW(B).
– If there is a production A → B
(or A → B with FIRST() • FOLLOW(E) = { ), $ }
contains ), then everything in – E is the start symbol with the production body (E)
FOLLOW(A) is in FOLLOW(B). • FOLLOW(E’) = FOLLOW(E) = { ), $ }
– E’ appears at the ends of the body of E-productions.
• FOLLOW(T) = { +, ), $ }
• FIRST(E) = { (, id } – T only appears in the body followed by E’. Everything in
E → TE’ FIRST(E’) except  is in FOLLOW(T). → +
• FIRST(E’) = { +,  }
E’ → + TE’ |  – In E→TE’, E’ *  so that everything in FOLLOW(E) is in
• FIRST(T) = { (, id } T → FT’ (4.2) FOLLOW(T).

• FIRST(T’) = { ,  } T’ → FT’ |  • FOLLOW(T’) = FOLLOW(T) = { +, ), $ }

– In T→FT’, everything in FOLLOW(T) is in FOLLOW(T’).
• FIRST(F) = { (, id } F → (E) | id
• FOLLOW(F) = { +, *, ), $ }
– In T→FT’, everything in FIRST(T’) except  is in FOLLOW(F)
– In T→FT’, T’ *  so that everthing in FOLLOW(T) is in
FOLLOW(F) → +, Copyright
), $ © All Rights Reserved by Yuan-Hao Chang
LL(1) Grammars

• LL(1) grammar:
– First L: scan the input from left to right.
– Second L: produce a leftmost derivation.
– The “1”: use one input symbol of lookahead at each step to make parsing
action decisions. FIRST() and
• No left-recursive or ambiguous grammar can be LL(1). FIRST() are disjoint.

• A grammar is LL(1) if whenever A→  |  are two distinct productions of

G, the following conditions should hold to prevent multiply defined
entries in the parsing table:
– 1. For no terminal a do both  and  derive strings beginning with a.
– 2. At most one of  and  can derive the empty string.
– 3. If *, then  does not derive any string beginning with a terminal in
FOLLOW(A), and likewise if  is in FIRST().
If  is in FIRST(), then FIRST() and FOLLOW(A) are disjoint.
Predictive Parsers for LL(1) Grammars

• Predictive parsers
– Are recursive-descent parsers that need no backtracking.
– Look only at the current input symbol on applying the proper
production for a nonterminal.
– Can be constructed for a class of grammars called LL(1).
• E.g., we have the following productions:
stmt → if (expr) stmt else stmt
| while (expr) stmt
| { stmt_list }
The keywords if, while and the symbol
{ tell us which alternative is the only
one that could possibly succeed if we
are to find a statement.
Transition Diagrams for Predictive Parsers

• To construct the transition diagram from a grammar:

– First eliminate left recursion, and left factor the grammar.
– Then, for each nonterminal A,
- 1. Create an initial and final (return) state.
- 2. For each production A→X1X2…Xk, create a path from the initial to the final state,
with edges labeled X1, X2, …, Xk.

• Parsers have one diagram for each nonterminal.

– The labels of edges can be tokens (terminals) or nonterminals.
- A transition on a token means that the token is the next input symbol.
- A transition on a nonterminal A is a call of the procedure for A.
T E’ E → TE’
E: 0 1 2 E’ → + TE’ | 
+ T E’ +
E’: 3 4 5 6 
 E: 0
T
1 2
Use the diagram E’ to substitute E’ in the diagram E with Copyright
tail-recursion removal.
© All Rights Reserved by Yuan-Hao Chang
Predictive Parsing Table

• A predictive parsing table M[A, a] is a two-

dimensional array, where A is a nonterminal, and a
is a terminal or the symbol $ (the input endmarker).
– The production A→ is chosen if the next input symbol a
is in FIRST().
– When = or * , we should choose A→ if
- The current input symbol is in FOLLOW(A) or
- The $ on the input has been reached and $ is in FOLLOW(A).
Predictive Parsing Table (Cont.)

• Algorithm: Construction of a predictive parsing table

• INPUT: Grammar G.
• OUTPUT: Parsing table M.
• METHOD: For each production A→ of the grammar, do
the following:
– For each terminal a in FIRST(A), add A→ to M[A, a].
– If  is in FIRST(), then for each terminal b in FOLLOW(A), add
A→ to M[A, b].
– If  is in FIRST() and $ is in FOLLOW(A), add A→ to M[A, $].
– If (after performing the above) there is no production in M[A, a], then
set M[A, a] to error or an empty entry.
E→ TE’
Predictive Parsing Table (Cont.) E’ → + TE’ | 
• E→TE’: FIRST(TE’) = FIRST(T) = { (, id } T → FT’ (4.2)
• E’→+TE’: FIRST(+TE’) = {+} T’ → * FT’ | 
• E’→: FOLLOW(E’)={ ), $ } F→ (E) | id
• T→FT’: FIRST(FT’)=FIRST(F)={ (, id } • FIRST(E) = { (, id } • FOLLOW(E) = { ), $ }
• T’→*FT’: FIRST(*FT’)={ * } • FIRST(E’) = { +,  } • FOLLOW(E’) = { ), $ }
• T’→: FOLLOW(T’)={ +, ), $} • FIRST(T) = { (, id } • FOLLOW(T) = { +, ), $ }
• F→(E): FIRST((E))={ ( } • FIRST(T’) = { *,  } • FOLLOW(T’) = { +, ), $ }
• F→id: FIRST(id) = { id } • FIRST(F) = { (, id } • FOLLOW(F) = { +, *, ), $ }

NON- INPUT SYMBOL

TERMINAL
id + * ( ) $
E E→TE’ E→TE’
E’ E’→+TE’ E’→ E’→
T T→FT’ T→FT’
T’ T’→ T’→*FT’ T’→ T’→
F F→id F→Copyright
(E) © All Rights Reserved by Yuan-Hao Chang
Predictive Parsing Table (Cont.)
• For every LL(1) grammar, each parsing-table entry uniquely identifies a
production or signals an error.
– If G is left-recursive or ambiguous, then M will have at least one multiply
defined entry.
– Although left-recursion elimination and left factoring are easy to do, some
grammars have no corresponding LL(1) grammar.
• E.g., • S→iEtSS’: FIRST(iEtSS’) = { i } • FOLLOW(S’) = FOLLOW(S) S → i E t S S’ | a
• S→a: FIRST(a) = { a } • FOLLOW(S) = {$}: start symbol S’ → eS | 
• S’→eS: FIRST(eS) = { e } • FOLLOW(S)=FIRST(S’)={e} E→b
• S’→: FOLLOW(S’)= {e, $} Grammar
ambiguity
• E→b: FIRST(b) = {b}
NON- INPUT SYMBOL
TERMINAL a b e i t $
S S→a S→iEtSS’
S’→ S’→
S’ S’→eS
E E→b Copyright © All Rights Reserved by Yuan-Hao Chang
Nonrecursive Predictive Parsing

• A nonrecursive predictive parser is a table-driven parser

that maintains a stack explicitly instead of recursive calls.
• If w is the matched input so far, then the stack holds a
sequence of grammar symbols  such that S  * w
lm
The symbol on Input a + b $
top of the stack
Current input symbol
Stack X Predictive
Y Parsing Output
Z Program
$
Parsing
Table M
Table-Driven Predictive Parsing
• Algorithm: Table-driven predictive parsing
• INPUT: A string w and a parsing table M for grammar G.
• OUTPUT: If w is in L(G), a leftmost derivation of w; otherwise, an error indication.
• METHOD: Initially, the parser is in a configuration with w$ in the input buffer, and
the start symbol S of G on top of the stack, above $.
set ip to the first symbol of w;
set X to the top stack symbol; /* a is the current input symbol */
while (X≠$) { /* stack is not empty */
if( X is a ) pop the stack and advance ip; /* pop X */
else if (X is a terminal) error();
else if (M[X, a] is an error entry) error();
else if (M[X,a]=X→Y1Y2…Yk) {
output the production X→ Y1Y2…Yk
pop the stack; /* pop X */
push YkYk-1…Y1 onto the stack with Y1 on top;
}
set X to the top stack symbol;
} Copyright © All Rights R
E → TE’
E’ → + TE’ | 
Table-Driven Predictive Parsing (Cont.) T → FT’ (4.2)
T’ → * FT’ | 
• Input: id+id*id F→ (E) | id

MATCHED STACK INPUT ACTION

E$ id+id*id$
TE’$ id+id*id$ output E→TE’
FT’E’$ id+id*id$ output T→FT’
idT’E’$ id+id*id$ output F→id
id T’E’$ +id*id$ match id
id E’$ +id*id$ output T’→
id +TE’$ +id*id$ output E’→+TE’
id+ TE’$ id*id$ match +
id+ FT’E’$ id*id$ output T→FT’
id+ idT’E’$ id*id$ output F→id
id+id T’E’$ *id$ match id
id+id *FT’E’$ *id$ output T’→*FT’
id+id* FT’E’$ id$ match *
id+id* idT’E’$ id$ output F→id
id+id*id T’E’$ $ match id
id+id*id E’$ $ output T’→
id+id*id $ $ output E’→
match $
Error Recovery in Predictive Parsing

• An error is detected during predictive parsing

– When the terminal on top of the stack does not match the next input symbol.
Or
– When nonterminal A is on top of the stack, a is the next input symbol, and
M[A, a] is error.
• Error recovery methods:
– Panic mode
- Skip symbols on the input until a token in a selected set of synchronizing tokens
appears.
- The effectiveness depends on the choice of synchronizing set.
– Phrase-level recovery
- Fill in the blank entries in the predictive parsing table with pointers to error routines.
- Error routines may change, insert, or delete symbols on the input and issue
appropriate error messages.
- An infinite loop must be prevented: checking that any recovery action eventually
consumes input symbols.
Panic-Mode Error Recovery

• Some heuristics to select synchronizing set:

– All symbols in FOLLOW(A) as the synchronizing set for nontermial A
- Skip tokens until an element of FOLLOW(A) is seen and pop A.
– The symbols that begin higher-level constructs as the synchronizing
set of a lower-level construct
- E.g., add keywords that begin statements to the synchronizing sets for
the nonterminals generating expressions.
– The symbols in FIRST(A) as the synchronizing set for nonterminal A.
– If a nonterminal can generate the empty string, the production
deriving  can be used as a default.
- To postpone some error detection, but cannot miss an error.
– If a terminal on top of the stack cannot be matched, pop the terminal,
issue a message saying that the terminal was inserted, and continue
parsing.
- This approach takes the synchronizing set of a token to consist of all of
other tokens.
E→ TE’
Panic-Mode Error Recovery (Cont.) E’ → + TE’ | 
T → FT’ (4.2)
T’ → * FT’ | 
• Obtain synchronizing tokens from the FOLLOW set
F→ (E) | id
of the nonterminal.
– If checked M[A, a] is blank, skip the input symbol a. • FOLLOW(E) = { ), $ }
– If the entry is synch, pop the nonterminal on top of the • FOLLOW(E’) = { ), $ }
stack. • FOLLOW(T) = { +, ), $ }
– If a token on top of the stack does not match the input • FOLLOW(T’) = { +, ), $ }
symbol, pop the token from the stack.
• FOLLOW(F) = { +, *, ), $ }

NON- INPUT SYMBOL

TERMINAL
id + * ( ) $
E E→TE’ E→TE’ synch synch
E’ E’→+TE’ E’→ E’→
T T→FT’ synch T→FT’ synch synch
T’ T’→ T’→*FT’ T’→ T’→
F F→id synch synch F→(E) synch
Copyright © All Rights synchChang
Reserve d by Yuan-Hao
E→ TE’
Panic-Mode Error Recovery (Cont.) E’ → + TE’ | 
T → FT’ (4.2)
• Input: +id*+id T’ → * FT’ | 
MATCHED STACK INPUT Remark F→ (E) | id
E$ +id*+id$ error, skip +
E$ id*+id$ id is in FIRST(E)
TE’$ id*+id$
FT’E’$ id*+id$
idT’E’$ id*+id$
id T’E’$ *+id$ match id
id *FT’E’$ *+id$
id* FT’E’$ +id$ match *
id* FT’E’$ +id$ Error, M[F, +]=synch
id* T’E’$ +id$ F has been popped
id* E’$ +id$
id* +TE’$ +id$
id*+ TE’$ id$ match +
id*+ FT’E’$ id$
id*+ idT’E’$ id$
id*+id T’E’$ $ match id
id*+id E’$ $
id*+id $ $ Copyright © All Rights Reserved by Yuan-Hao Chang
Bottom-Up Parse

• A bottom-up parse constructs a parse tree for an

input string beginning at the leaves towards the root.
– It describes parsing as the process of building parse trees.
id * id F * id T * id T * F T E
id F F id T * F T
id id F id T * F
id F id
A bottom-up parse for id*id
id
The derivation corresponds to the parse: E→E+T|T
E  T  T*F  T*id  F*id  id*id T→T*F|F (4.1)
F → (E) | id
A rightmost derivation
Reductions

• Bottom-up parsing is the process of “reducing” a string w to

the start symbol of the grammar.
– The goal is to construct a derivation in reverse.
– At each reduction step, a specific substring matching the body of a
production is replaced by the nonterminal at the head of the production.
– Key decisions: When to reduce and what production to apply

E→E+T|T
T→T*F|F (4.1)
F → (E) | id

Reduction sequence: A reduction is the reverse

id*id, F*id, T*id, T*F, T, E of a step in a derivation.
Handle Pruning

• Bottom-up parsing during a left-to-right scan of the input

constructs a right-most derivation in reverse.
– Handle: a handle is a substring that matches the body of a
production and
– Reduction: the reduction of a handle represents one E→E+T|T
step along the reverse of a rightmost derivation. T→T*F|F (4.1)
F → (E) | id

Right sentential Reducing

Handle
form production
id1 * id2 id1 F→id
F * id2 F T→F
T * id2 id2 F→id
T*F T*F T→T * F
T T E→T
a, b, c: a terminal
w, x, y, z: strings of terminals
Handle Pruning (Cont.) A,B, C: a nonterminal
W, X, Y, Z: a grammar symbol (termina or nonterminal)
  : strings of grammar symbols

• If S * Aw  w, given a production

→A,→) is a handle of w.
– The A(or
• Given a right sentential form  , a handle  of , a production
A→, and a position of  where  may be found, replace  at
that position by A to produce the previous right-sentential
form in a rightmost derivation of .
• Every right-sentential form of the grammar has exactly one
handle, except ambiguous grammars.
– A rightmost derivation in reverse can be obtaine by “handle pruning”.
S
Handle pruning (rightmost derivation in reverse)

A S * Aw  w
rm rm Production A→b,
  w Rightmost derivation
Shift-Reduce Parsing

• Shift-reduce parsing is a form of bottom-up parsing in which

– a stack holds grammar symbols and
– an input buffer holds the rest of the string to be parsed.
• The handle always appears at the top of the stack just before it is
identified as the handle.

Initial state: finish state:

Mark the STACK INPUT STACK INPUT
bottom of $ w$ $S $
the stack
Input The start
string The top symbol
of the
stack
June 15, 2011 72

Shift-Reduce Parsing (Cont.)

• Operations of shift-reduce E→E+T|T

T→T*F|F (4.1)
parsing: F → (E) | id
– Shift: Shift the next input symbol
STACK INPUT ACTION
onto the top of the stack.
$ id1 * id2 $ shift
– Reduce: Locate the left end of the
string within the stack and decide $id1 * id2 $ reduce by F→ id
with what nonterminal to replace the $F * id2 $ reduce by T→ F
string. $T * id2 $ shift
– Accept: Announce successful $T* id2 $ shift
completion of parsing. $T*id2 $ reduce by F→ id
– Error: Discover a syntax error and $T*F $ reduce by T→ T*F
call an error recovery routine. $T $ reduce by E→ T
Copyright © All Rights Reserved by Yuan-Hao Chang $E $ accept
E.g., parse id1 * id2
June 15, 2011 73

Shift-Reduce Parsing (Cont.)

The handle will always eventually appear on top of the stack.
Case 1 Case 2
S * Az  Byz  yz Leftmost S * BxAz  Bxyz  xyz
rm rm rm derivation rm rm rm
S S

B B A

   y z   x y z
STACK INPUT ACTION STACK INPUT ACTION
$ yz $ reduce B→ $ xyz $ reduce B→
$B yz $ shift $B xyz $ shift xy
$By z$ reduce A→By $Bxy z$ reduce A→y
$  z$ shift $ Bx z$ Shift z
Copyright © All Rights Reserved by Yuan -Hao Chang
Conflicts During Shift-Reduce Parsing

• Some context-free grammars could let the shift-

reduce parsing encounter conflicts on deciding the
next action.
– Shift/reduce conflict
- Cannot decide whether to shift or to reduce Dangling-else
- E.g., shift-reduce conflict grammer

stmt → if expr then stmt

| if expr then stmt else stmt Cannot determine
| other whether to shift or
STACK INPUT to reduce
… if expr then stmt else … $
– Reduce/reduce conflict
- Cannot decide which production should be adopted to reduce
Conflicts During Shift-Reduce Parsing (Cont.)

• E.g., a grammar for function call and array for the input p(i,j)
– A function called with parameters surrounded by parentheses.
– Indices of arrays are surrounded by parentheses.
(1) stmt → id (parameter_list) One solution to resolve this problem
(2) stmt → expr := expr is to change production into
(3) parameter_list → parameter_list, parameter stmt → procid (parameter_list)
(4) parameter_list → parameter For the token name of procedures.
(5) parameter → id STACK INPUT
(6) expr → id (expr_list)
(7) expr → Id … procid ( id , id) … $
(8) expr_list → expr_list, expr A procedure call is encountered
(9) expr_list → expr
STACK INPUT
Input: p(i,j) is converted to the token string id(id, id)
… id ( id , id) … $ The correct choice is production (5) if p is a
An array is encountered procedure call.
The correct choice is production (7) if p is an array.
Thanks

Compiler Design: Parsing Basics
No ratings yet
Compiler Design: Parsing Basics
45 pages
Unit2 TopDownParsing
No ratings yet
Unit2 TopDownParsing
12 pages
Compiler Design Chapter-3
0% (1)
Compiler Design Chapter-3
177 pages
CST302 - Compiler - Design - Module 2
No ratings yet
CST302 - Compiler - Design - Module 2
19 pages
Context-Free Grammar & Parsing
No ratings yet
Context-Free Grammar & Parsing
18 pages
Compiler Design Unit 2
No ratings yet
Compiler Design Unit 2
44 pages
2.2 - Syntax Analysis (Upto Top-Down Parsing)
No ratings yet
2.2 - Syntax Analysis (Upto Top-Down Parsing)
91 pages
Atcd Unit 2
No ratings yet
Atcd Unit 2
49 pages
Compiler Design Unit 2
No ratings yet
Compiler Design Unit 2
24 pages
Unit 2
No ratings yet
Unit 2
29 pages
Lec 01. Grammar, Derivations, Parse Tree
No ratings yet
Lec 01. Grammar, Derivations, Parse Tree
14 pages
Compiler Design Lec-Three Syntax Analysis
No ratings yet
Compiler Design Lec-Three Syntax Analysis
60 pages
(Week 4) Syntax Analysis (CFG)
No ratings yet
(Week 4) Syntax Analysis (CFG)
50 pages
CD Unit 2
No ratings yet
CD Unit 2
19 pages
Unit Iii
No ratings yet
Unit Iii
28 pages
Module 2
No ratings yet
Module 2
19 pages
4th - Syntax Analysis
No ratings yet
4th - Syntax Analysis
29 pages
Module 2 C D Notes
No ratings yet
Module 2 C D Notes
21 pages
CD Mod2
No ratings yet
CD Mod2
18 pages
Syntax Analysis (Part-I)
No ratings yet
Syntax Analysis (Part-I)
88 pages
CH03
No ratings yet
CH03
57 pages
Chapter - Three: Syntax Analysis
No ratings yet
Chapter - Three: Syntax Analysis
100 pages
Chapter-3 So Far
No ratings yet
Chapter-3 So Far
50 pages
Syntax Analysis
No ratings yet
Syntax Analysis
73 pages
Samir CFG
No ratings yet
Samir CFG
105 pages
3 Role of Parser
No ratings yet
3 Role of Parser
135 pages
Unit-2 PCD
No ratings yet
Unit-2 PCD
36 pages
Syntax & Semantic Analysis Guide
No ratings yet
Syntax & Semantic Analysis Guide
32 pages
CC 3
No ratings yet
CC 3
29 pages
Context-Free Grammar Guide
No ratings yet
Context-Free Grammar Guide
4 pages
Chapter-3-Syntax Analysis
No ratings yet
Chapter-3-Syntax Analysis
126 pages
CD Unit 2
No ratings yet
CD Unit 2
15 pages
Syntax Analysis in Compiler Design
No ratings yet
Syntax Analysis in Compiler Design
39 pages
Lecture 4 PDF
No ratings yet
Lecture 4 PDF
28 pages
CD Unit 2
No ratings yet
CD Unit 2
19 pages
Chapter 3 (Updated)
No ratings yet
Chapter 3 (Updated)
165 pages
Syntax Analysis for Programmers
No ratings yet
Syntax Analysis for Programmers
58 pages
Class 18 Context Free Grammar
No ratings yet
Class 18 Context Free Grammar
35 pages
G52Cmp Compilers: Syntax Analysis
No ratings yet
G52Cmp Compilers: Syntax Analysis
36 pages
ATCD PPT Module-3
No ratings yet
ATCD PPT Module-3
136 pages
Syntax Analysis & Parsing Techniques
No ratings yet
Syntax Analysis & Parsing Techniques
43 pages
C Depart
No ratings yet
C Depart
7 pages
1 Syntax Analyzer
No ratings yet
1 Syntax Analyzer
33 pages
Automata Chapter 3
No ratings yet
Automata Chapter 3
14 pages
Chương 3. Phân Tích Cú Pháp
No ratings yet
Chương 3. Phân Tích Cú Pháp
103 pages
Chapter 3
No ratings yet
Chapter 3
96 pages
Chapter 3 Syntax Analysis
No ratings yet
Chapter 3 Syntax Analysis
20 pages
CS602PC - Compiler Design Lecture Notes Unit 2
No ratings yet
CS602PC - Compiler Design Lecture Notes Unit 2
42 pages
CD - Ch.2
No ratings yet
CD - Ch.2
39 pages
Compiler Design Unit II-1
No ratings yet
Compiler Design Unit II-1
46 pages
Cdmodule 2
No ratings yet
Cdmodule 2
22 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
14 pages
SYNTAX Analyzer
No ratings yet
SYNTAX Analyzer
29 pages
Unit-Ii: Top Down Parsing
No ratings yet
Unit-Ii: Top Down Parsing
42 pages
Chapter 3
No ratings yet
Chapter 3
180 pages
Syntax Analysis and Parsing Techniques
No ratings yet
Syntax Analysis and Parsing Techniques
76 pages
CD Unit-2 Notes
No ratings yet
CD Unit-2 Notes
23 pages
1 Syntax Analyzer
No ratings yet
1 Syntax Analyzer
33 pages
WordPress Sudo Privilege Escalation
No ratings yet
WordPress Sudo Privilege Escalation
4 pages
1 Data Storage Structures-Ch13
No ratings yet
1 Data Storage Structures-Ch13
34 pages
Parallel and Distributed Computing - 482-CCS-3: Dr. Mohammad Nadeem Ahmed
No ratings yet
Parallel and Distributed Computing - 482-CCS-3: Dr. Mohammad Nadeem Ahmed
17 pages
Chapter THREE
No ratings yet
Chapter THREE
24 pages
Syllabus Logic Fall 2020
No ratings yet
Syllabus Logic Fall 2020
6 pages
Assignment Quantifier 16 Marh 2020-Dikonversi
No ratings yet
Assignment Quantifier 16 Marh 2020-Dikonversi
2 pages
Module 3
No ratings yet
Module 3
62 pages
Lexical Analysis Explained
100% (1)
Lexical Analysis Explained
35 pages
Solutions For MATH 4603 (Advanced Calculus I)
No ratings yet
Solutions For MATH 4603 (Advanced Calculus I)
42 pages
4-DFA To RE Conversion-24-01-2025
No ratings yet
4-DFA To RE Conversion-24-01-2025
11 pages
Solutions To Sipser
No ratings yet
Solutions To Sipser
30 pages
Compiler Design Lab Tasks
No ratings yet
Compiler Design Lab Tasks
44 pages
L32 Pda To CFG
No ratings yet
L32 Pda To CFG
19 pages
Math RDP
No ratings yet
Math RDP
7 pages
Chapter-2 CC Assignment-2
No ratings yet
Chapter-2 CC Assignment-2
3 pages
Compiler Design Quiz Test-1 - Question - Paper
No ratings yet
Compiler Design Quiz Test-1 - Question - Paper
10 pages
Q1. Circle The Correct Answer
No ratings yet
Q1. Circle The Correct Answer
17 pages
AI Unit 4 Imp Questions
No ratings yet
AI Unit 4 Imp Questions
3 pages
Tcs Notes
No ratings yet
Tcs Notes
5 pages
Cse-V-Formal Languages and Automata Theory (10cs56) - Notes
67% (3)
Cse-V-Formal Languages and Automata Theory (10cs56) - Notes
125 pages
Compiler Design
No ratings yet
Compiler Design
118 pages
String Matching - KMP
No ratings yet
String Matching - KMP
14 pages
Chapter 05 - Pushdown Automata
No ratings yet
Chapter 05 - Pushdown Automata
31 pages
Aryan CD Lab Manual PDF
No ratings yet
Aryan CD Lab Manual PDF
24 pages
Valida CNPJ-CPF Java
No ratings yet
Valida CNPJ-CPF Java
3 pages
Grammars, Recursively Enumerable Languages, and Turing Machines
No ratings yet
Grammars, Recursively Enumerable Languages, and Turing Machines
11 pages
CFG Normal Forms & Optimization
No ratings yet
CFG Normal Forms & Optimization
36 pages
Tucs Dissertation 2014
No ratings yet
Tucs Dissertation 2014
247 pages
Compiler Construction MCQ
100% (1)
Compiler Construction MCQ
16 pages
Important Question
No ratings yet
Important Question
6 pages
Cse Flat Digital Notes Full 2020 21
No ratings yet
Cse Flat Digital Notes Full 2020 21
195 pages
Noether Chapter 8 Only
No ratings yet
Noether Chapter 8 Only
7 pages
Non Regular Language
No ratings yet
Non Regular Language
25 pages
7-Parsing and Ambiguity-16-09-2024
No ratings yet
7-Parsing and Ambiguity-16-09-2024
10 pages

Chapter 6

Uploaded by

Chapter 6

Uploaded by

Introduction to the Parser

Copyright © All Rights Reserved by Yuan-Hao Chang

• There are three general types of parsers:

• Context-free grammar is also called grammar for short. It consists of

• A grammar to define arithmetic expressions

expression → expression + term

• These symbols are terminals:

• Grammar symbols (either nonterminals or terminals)

expression → expression + term

• The construction of a parse tree can be made

• Consider a nonterminal A in the middle of a sequence of

•   …  n ( derives n ) Derive in

• A language that can be generated by a (context-free)

• The string –(id + id) is a sentence of grammar (4.7)

• Rightmost (or canonical)derivation:

• A parse tree is a graphical representation of a derivation,

Parse string -(id+id) with the grammar: E → E + E | E * E | -E | (E) | id (4.7)

Derivation with leftmost derivation: Derivation with rightmost derivation:

• A grammar that produces more than one parse tree for

Derivation with leftmost derivation: Derivation with another leftmost derivation:

Derivation with leftmost derivation: Derivation with another leftmost derivation:

• Grammars are more powerful than regular expressions.

Construct a DFA D with a finite number of states k to accept the language L.

• Construct a grammar to recognize the same language as

• Top-down parsing can be viewed as

w = cad w = cad w = cad w = cad

• FIRST and FOLLOW allow us to choose which production

• Compute FOLLOW(A) for all nonterminals A

• FIRST(T’) = { *,  } T’ → * FT’ |  • FOLLOW(T’) = FOLLOW(T) = { +, ), $ }

• A grammar is LL(1) if whenever A→  |  are two distinct productions of

• To construct the transition diagram from a grammar:

• Parsers have one diagram for each nonterminal.

• A predictive parsing table M[A, a] is a two-

• Algorithm: Construction of a predictive parsing table

NON- INPUT SYMBOL

• A nonrecursive predictive parser is a table-driven parser

MATCHED STACK INPUT ACTION

• An error is detected during predictive parsing

• Some heuristics to select synchronizing set:

NON- INPUT SYMBOL

• A bottom-up parse constructs a parse tree for an

• Bottom-up parsing is the process of “reducing” a string w to

Reduction sequence: A reduction is the reverse

• Bottom-up parsing during a left-to-right scan of the input

Right sentential Reducing

• If S * Aw  w, given a production

• Shift-reduce parsing is a form of bottom-up parsing in which

Initial state: finish state:

Shift-Reduce Parsing (Cont.)

• Operations of shift-reduce E→E+T|T

Shift-Reduce Parsing (Cont.)

• Some context-free grammars could let the shift-

stmt → if expr then stmt

You might also like

• FIRST(T’) = { ,  } T’ → FT’ |  • FOLLOW(T’) = FOLLOW(T) = { +, ), $ }