Introduction to the Parser
Copyright © All Rights Reserved by Yuan-Hao Chang
Benefits of Grammars for Programming
Languages
• Give a precise syntactic specification of a
programming language.
• Construct automatically a parser that determines
the syntactic structure of a source program.
– Parser-construction process could reveal syntactic
ambiguities and trouble spots.
• Make the source program translation and error
detection easier.
• Make the adding of new constructs for a language
easier.
The Role of the Parser
• The parser obtains a string of tokens from the lexical analyzer, and
verifies them with the grammar for the language.
– Collect information about various tokens into the symbol table.
– Perform type checking and other semantic analysis.
– Generate intermediate code. No strategy is proven universally
acceptable, and the simplest
• In practice, parsers are expected to approach for the parser is to quit
– Report syntax errors and with an error message when it
– Recover from commonly occurring errors. detects the first error.
Tokens
Lexical Reset of
Parser
Source Analyzer Get next Parse Front End Intermediate
program token tree representation
Symbol
Table
6
Types of Parsers
• There are three general types of parsers:
– Universal parsing
- These general methods are too inefficient to use in production
compilers.
- E.g., Cocke-Younger-Kasami algorithm and Earley’s algorithm
– Top-down parsing
- Build parse trees from the root to the leaves.
– Bottom-up parsing
- Build pares trees from the leaves to the root.
Context-Free Grammar
• Context-free grammar is also called grammar for short. It consists of
– Terminals
- The basic symbols from which strings are formed.
- The token name is a synonym for “terminal”.
– Nonterminals
- Nonterminals are syntactic variables denote sets of strings that help define the language generated by the
grammar.
- Nonterminals impose a hierarchical structure that is key to syntax analysis and translation.
– Start symbol
- Start symbol is the first nonterminal of the grammar and can generate the language.
– Productions
- Productions of a grammar specify the manner in which the terminals and nonterminals are combined to
form strings. Each production consists of:
· A nonterminal called the head or left side.
· The symbol →
· A body or right side that consists of zero or more terminals and nonterminals.
A Grammar to Define Arithmetic Expressions
• A grammar to define arithmetic expressions
– 7 terminals or terminal symbols: id + - * / ( )
– 3 nonterminals: expression, term, factor
expression → expression + term
expression → expression – term
expression → term
term → term * factor
term → term / factor
term → factor
factor → ( expression )
factor → id
Notational Conventions
• These symbols are terminals:
– Lowercase letters early in the alphabet, e.g., a, b, c
– Operator symbols, e.g., +, -, *, /, and so on
– Punctuation symbols, e.g., parentheses, comma, and so on
– The digits 0, 1, …, 9.
– Boldface strings each of which represents a single terminal symbol,
e.g., id.
• These symbols are nonterminals:
– Uppercase letters early in the alphabet, e.g., A, B, C
– The letter S represent the start symbol
– Lowercase, italic names, e.g., expr or stmt.
– When discussing programming constructs, uppercase letters may be
used to represent nonterminals for the constructs, e.g., E, T, and F.
(E: expression, T: term, F: factor)
Notational Conventions (Cont.)
• Grammar symbols (either nonterminals or terminals)
– Uppercase letters late in the alphabet, e.g., X, Y, Z
• Strings of terminals
– Lowercase letters late in the alphabet, e.g., u, v, …, z
• Strings of grammar symbols
– Lowercase Greek letters, e.g., , ,
– E.g., A →
• A-productions
– A set of productions A→ 1, A→ 2, …, A→ K with a common head A.
– It can be written A→ 1 | 2 , | … | K ,
– 1, 2 , …, K are the alternatives for A.
• The head of the first production is the start symbol.
Grammar with Notational Convention
expression → expression + term
expression → expression – term
expression → term
term → term * factor
term → term / factor
term → factor
factor → ( expression )
factor → id
E→E+T|E–T|T
T→T*F|T/F|F
F → ( E ) | id
Derivations
• The construction of a parse tree can be made
precise by taking a derivational view.
– Productions are treated as rewriting rules.
– Beginning with the start symbol, each rewriting step
replaces a nonterminal by the body of one of its
productions.
- The leftmost derivation corresponds to the top-down parsing.
- The rightmost derivation corresponds to the bottom-up parsing.
– E.g., E → E + E | E * E | -E | (E) | id (4.7) read as “E derives –E”
The replacement of a single E by –E is described by: E -E
E.g., E -E -(E) -(id)
A derivation of –(id) from E
Derivations (Cont.)
• Consider a nonterminal A in the middle of a sequence of
grammar symbol, as in A.
– and are arbitrary strings of grammar symbols (either
nontermials or terminals).
– If A→, then A Derive in
(A derives in one step) one step
• … n ( derives n ) Derive in
* zero or
more steps
. * , for any string
+
2. If * and * , then * Derive in
one or
more steps
Derivations (Cont..)
• A language that can be generated by a (context-free)
grammar is a context-free language.
• If two grammars generate the same language, the
grammars are equivalent.
• The language generated by a grammar is the set of
sentences of the grammar.
– A sentential form may contain both terminals and nonterminals, and
may be empty.
– A sentence of grammar G is a sentential form with no nonterminals.
- A string of terminals w is in L(G) iff w is a sentence of G
If S * , where S is the start symbolof grammar G
is a sentential form of G
Leftmost Derivation and Rightmost Derivation
• The string –(id + id) is a sentence of grammar (4.7)
• Leftmost derivation: E → E + E | E * E | -E | (E) | id (4.7)
– The leftmost nonterminal in each sentential form is
always chosen. We write
lm
– E.g.,
E -E -(E) -(E + E) -(id + E) -(id + id) (4.8)
lm lm lm lm lm
• Rightmost (or canonical)derivation:
– The rightmost nonterminal in each sentential form is
always chosen. We write rm
– E.g., E -E -(E) -(E + E) -(E + id ) -(id + id)
(4.9)
rm rm rm rm rm
Left-Sentential and Right-Sentential Form
• Every leftmost step can be written as wA
w, where
– w consists of terminals only.
– A→ is the production applied.
– is a string of grammar symbols.
• If S * , then is a left-sentential form of the
lm
gram mar.
• If S
rm
* , then is a right-sentential form of the
grammar.
Parse Trees and Derivations
• A parse tree is a graphical representation of a derivation,
and filters out the order in which productions are applied to
replace nonterminals.
– Each interior node labeled with the nonterminal in the head of the
production to represent the application of the production.
– The children of an interior node are labeled (from left to right) by the
symbols in the body of the corresponding production.
– During derivation, the head of the production is replaced by the body
of the corresponding production.
• Yield or frontier of the tree:
– Read the leaves of a parse tree from left to right to constitute a
sentential form.
Parse Trees and Derivations (Cont.)
Parse string -(id+id) with the grammar: E → E + E | E * E | -E | (E) | id (4.7)
Derivation with leftmost derivation: Derivation with rightmost derivation:
E -E E -E
-(E) -(E)
-(E + E) (4.8) -(E + E) (4.9)
-(id + E) -(E + id)
-(id + id) -(id + id)
E E E E E E
Yield: read
- E - E - E - E - E leaves from left
(E ) (E ) to right
( E )
(E )
E+ E E+ E E+ E
id id id
Sequence of parse trees for leftmost derivation (4.8)
Relationship Induction between Derivations and Parse
Trees
• Consider any derivation 1 2 … n, where 1 is a
single nonterminal A.
– For each sentential form i, we can construct a parse tree whose
yield is i.
• Induction process on i
– BASIS: the tree for 1= A is a single node labeled A.
– INDUCTION:
- Suppose we have constructed a parse tree with yield
i-1=X1X2…Xk (where Xi is either a nonterminal or a terminal)
- Suppose i is derived from i-1 by replacing Xj with
where Xj→ , and = Y1Y2…Ym
→ i=X1X2…Xj-1 Xj+1…Xk
- To model this step:
· Find the jth leaf from the left in the current parse tree.
· Let this leaf Xj
· Give this leaf m children labeled Y1, Y2, … Ym
Ambiguity
• A grammar that produces more than one parse tree for
some sentence is said to be ambiguous.
– There should be a one-to-one relationship between parse trees and
its rightmost (or rightmost) derivation.
– In other words, every parse tree has associated with a unique
leftmost and a unique rightmost derivation.
• E.g., Produce the sentence id+id*id with the grammar:
E → E + E | E * E | (E) | id (4.3)
Derivation with leftmost derivation: Derivation with another leftmost derivation:
EE+E EE*E
id + E E+E*E
id + E * E (4.8) id + E * E (4.8)
id + id * E Produce the id + id * E
id + id * id same sentence id + idCopyright
* id © All Rights Reserved by Yuan-Hao Chang
Ambiguity (Cont.)
Derivation with leftmost derivation: Derivation with another leftmost derivation:
EE+E EE*E
id + E E+E*E
id + E * E (4.8) id + E * E (4.8)
id + id * E id + id * E
id + id * id id + id * id
E E
E + E E * E
id E * E E + E id
id id id id
Two parse trees for id+id*id
Context-Free Grammars vs. Regular Expression
• Grammars are more powerful than regular expressions.
– Every construct that can be described by a regular expression can be
described by a grammar, but not vice-versa.
– Every regular language is a context-free language, but not vice-versa.
• E.g., the language L = {anbn | n1} with an equal number a’s and b’s.
– Grammar: A0 → aA1b, A1 → aA1b |
The path from si back
– Regular expression with finite automata:
to itself
Path labeled aj-i
…
Path labeled ai Path labeled bi sk-1
s0 … si … f
Construct a DFA D with a finite number of states k to accept the language L.
– For an input beginning with more than k a’s, D must enter some state twice (i.e., si)
– aibi is in the language, but there is also a path labeled ajbi. Not in the language
Construct a Grammar from NFA
• Construct a grammar to recognize the same language as
an NFA as follows:
– 1. For each state i, create a a
nonterminal Ai. b b
start a
– 2. If state i has a transition to 0 1 2 3
state j on input a, add the
production Ai → aAj. The NFA for (a|b)*abb
If state i goes to state j on input , b
add the production Ai → Aj
A0 → aA0 | bA0 | aA1
– 3. If i is an accepting state,
A1 → bA2
add Ai →
A2 → bA3
– 4. If i is the start state, make
A3 →
Ai be the start symbol.
The grammar for (a|b)*abb
Top-Down Parsing
• Top-down parsing can be viewed as
– Constructing a parse tree for the input string from the
root
– Creating the nodes of the parse tree in preorder
– Finding a leftmost derivation for an input string.
• The key problem is to determine the production to
be applied for a nonterminal.
– Once a production is chosen, the rest of the parsing
process is to match the terminal symbols in the
production body with the input string.
E → TE’
Top-Down Parsing (Cont.) E’ → + TE’ |
T → FT’ (4.2)
T’ → * FT’ |
• Top-down parse for id+id*id F → (E) | id
E E E E E E E
lm lm lm lm lm lm
E’ T E’
T E’ T E’ T E’ T E’ T
F T’ F T’ F T’ F T’ + T E’ F T’ + T E’
id id id id F T’
E E
lm lm E E E
T E’ T E’ lm E’
lm
E’ lm
T T T E’
F T’ + T E’ F T’ + T E’ F T’ + T E’ F T’ + T E’ F T’ + T E’
id F T’ id F T’ id F T’ id F T’ id F T’
id id * F T’ id * F T’ id * F T’ id * F T’
id id id
Recursive-Descent Parsing
• A recursive-descent parsing consists of a set of procedures, each of which is for one nonterminal.
• Backtracking might be needed to repeat scans over the input.
– NOTE: backtracking is not very efficient, and tabular methods such as the dynamic programming
algorithm is preferred.
• Left-recursive grammar can cause a recursive-decent parser to go into an infinite loop. (i.e., A production
might be expanded repeatedly without consuming any input.
void A() {
1) Choose an A-production, A→X1X2 … Xk To allow backtracking, this
2) for ( i = 1 to k) { should try each production in
3) if ( Xi is a nonterminal )
4) call procedure Xi(); some order
5) else if ( Xi equals the current input symbol a)
6) advance the input to the next symbol; To allow backtracking, this
7) else /* an error has occurred */ should return to line (1) and try
}
}
another A-production until no
more A-productions to try.
A typical procedure for a nonterminal in a top-down parser
Recursive-Descent Parsing (Cont.)
S → cAd
• Input string w = cad. A → ab | a
Grammar
S S S S
backtrack
c A d c A d c A d c A d
a b a a
match
w = cad w = cad w = cad w = cad
FIRST and FOLLOW
• FIRST and FOLLOW allow us to choose which production
to apply, based on the next input symbol.
– FIRST() is the set of terminals that begin strings derived from ,
where is any string of grammar symbols. If * , then is also in
FIRST().
– FOLLOW() is (for nonterminal A) the set of terminals a that can
appear immediately to the right of A in some sentential form.
- If A can be the rightmost symbol in some sentential form, then $ is in
FOLLOW(A), where $ is a special “endmarker” symbol.
S * Aa S
A a c is in FIRST(A)
A * c a is in FOLLOW(A)
c
FIRST
• Compute FIRST(X) for all grammar symbols X:
– If X is a terminal, then FIRST(X) = {X}.
– If X is a nonterminal and X→Y1Y2 … Yk is a production for some k1,
- Everything in FIRST(Y1) is surely in FIRST(X).
- If Y1 does not derive , then nothing more is added to FIRST(X).
- If Y1 * , then FIRST(Y2) is added to FIRST(X), and so on.
– If X→ is a production, then add to FIRST(X).
• FIRST(F) = { (, id }
• FIRST(T’) = {*, }
E→ TE’ – The two productions for T’ begins with * and .
E’ → + TE’ | • FIRST(T) = FIRST(F) = { (, id }
T → FT’ (4.2) – T has one production beginning with F.
T’ → * FT’ | • FIRST(E’) = {+, }
F→ (E) | id – The two productions for E’ begins with + and .
• FIRST(E) = FIRST(T) = { (, id }
Copyright © All
– E has one production Rights Reserved
beginning with by
T.Yuan-Hao Chang
FOLLOW
• Compute FOLLOW(A) for all nonterminals A
– Place $ in FOLLOW(S), where S is the start symbol and $ is the input right endmarker.
– If there is a production A → B, then everything in FIRST() except is in FOLLOW(B).
– If there is a production A → B
(or A → B with FIRST() • FOLLOW(E) = { ), $ }
contains ), then everything in – E is the start symbol with the production body (E)
FOLLOW(A) is in FOLLOW(B). • FOLLOW(E’) = FOLLOW(E) = { ), $ }
– E’ appears at the ends of the body of E-productions.
• FOLLOW(T) = { +, ), $ }
• FIRST(E) = { (, id } – T only appears in the body followed by E’. Everything in
E → TE’ FIRST(E’) except is in FOLLOW(T). → +
• FIRST(E’) = { +, }
E’ → + TE’ | – In E→TE’, E’ * so that everything in FOLLOW(E) is in
• FIRST(T) = { (, id } T → FT’ (4.2) FOLLOW(T).
• FIRST(T’) = { *, } T’ → * FT’ | • FOLLOW(T’) = FOLLOW(T) = { +, ), $ }
– In T→FT’, everything in FOLLOW(T) is in FOLLOW(T’).
• FIRST(F) = { (, id } F → (E) | id
• FOLLOW(F) = { +, *, ), $ }
– In T→FT’, everything in FIRST(T’) except is in FOLLOW(F)
– In T→FT’, T’ * so that everthing in FOLLOW(T) is in
FOLLOW(F) → +, Copyright
), $ © All Rights Reserved by Yuan-Hao Chang
LL(1) Grammars
• LL(1) grammar:
– First L: scan the input from left to right.
– Second L: produce a leftmost derivation.
– The “1”: use one input symbol of lookahead at each step to make parsing
action decisions. FIRST() and
• No left-recursive or ambiguous grammar can be LL(1). FIRST() are disjoint.
• A grammar is LL(1) if whenever A→ | are two distinct productions of
G, the following conditions should hold to prevent multiply defined
entries in the parsing table:
– 1. For no terminal a do both and derive strings beginning with a.
– 2. At most one of and can derive the empty string.
– 3. If *, then does not derive any string beginning with a terminal in
FOLLOW(A), and likewise if is in FIRST().
If is in FIRST(), then FIRST() and FOLLOW(A) are disjoint.
Predictive Parsers for LL(1) Grammars
• Predictive parsers
– Are recursive-descent parsers that need no backtracking.
– Look only at the current input symbol on applying the proper
production for a nonterminal.
– Can be constructed for a class of grammars called LL(1).
• E.g., we have the following productions:
stmt → if (expr) stmt else stmt
| while (expr) stmt
| { stmt_list }
The keywords if, while and the symbol
{ tell us which alternative is the only
one that could possibly succeed if we
are to find a statement.
Transition Diagrams for Predictive Parsers
• To construct the transition diagram from a grammar:
– First eliminate left recursion, and left factor the grammar.
– Then, for each nonterminal A,
- 1. Create an initial and final (return) state.
- 2. For each production A→X1X2…Xk, create a path from the initial to the final state,
with edges labeled X1, X2, …, Xk.
• Parsers have one diagram for each nonterminal.
– The labels of edges can be tokens (terminals) or nonterminals.
- A transition on a token means that the token is the next input symbol.
- A transition on a nonterminal A is a call of the procedure for A.
T E’ E → TE’
E: 0 1 2 E’ → + TE’ |
+ T E’ +
E’: 3 4 5 6
E: 0
T
1 2
Use the diagram E’ to substitute E’ in the diagram E with Copyright
tail-recursion removal.
© All Rights Reserved by Yuan-Hao Chang
Predictive Parsing Table
• A predictive parsing table M[A, a] is a two-
dimensional array, where A is a nonterminal, and a
is a terminal or the symbol $ (the input endmarker).
– The production A→ is chosen if the next input symbol a
is in FIRST().
– When = or * , we should choose A→ if
- The current input symbol is in FOLLOW(A) or
- The $ on the input has been reached and $ is in FOLLOW(A).
Predictive Parsing Table (Cont.)
• Algorithm: Construction of a predictive parsing table
• INPUT: Grammar G.
• OUTPUT: Parsing table M.
• METHOD: For each production A→ of the grammar, do
the following:
– For each terminal a in FIRST(A), add A→ to M[A, a].
– If is in FIRST(), then for each terminal b in FOLLOW(A), add
A→ to M[A, b].
– If is in FIRST() and $ is in FOLLOW(A), add A→ to M[A, $].
– If (after performing the above) there is no production in M[A, a], then
set M[A, a] to error or an empty entry.
E→ TE’
Predictive Parsing Table (Cont.) E’ → + TE’ |
• E→TE’: FIRST(TE’) = FIRST(T) = { (, id } T → FT’ (4.2)
• E’→+TE’: FIRST(+TE’) = {+} T’ → * FT’ |
• E’→: FOLLOW(E’)={ ), $ } F→ (E) | id
• T→FT’: FIRST(FT’)=FIRST(F)={ (, id } • FIRST(E) = { (, id } • FOLLOW(E) = { ), $ }
• T’→*FT’: FIRST(*FT’)={ * } • FIRST(E’) = { +, } • FOLLOW(E’) = { ), $ }
• T’→: FOLLOW(T’)={ +, ), $} • FIRST(T) = { (, id } • FOLLOW(T) = { +, ), $ }
• F→(E): FIRST((E))={ ( } • FIRST(T’) = { *, } • FOLLOW(T’) = { +, ), $ }
• F→id: FIRST(id) = { id } • FIRST(F) = { (, id } • FOLLOW(F) = { +, *, ), $ }
NON- INPUT SYMBOL
TERMINAL
id + * ( ) $
E E→TE’ E→TE’
E’ E’→+TE’ E’→ E’→
T T→FT’ T→FT’
T’ T’→ T’→*FT’ T’→ T’→
F F→id F→Copyright
(E) © All Rights Reserved by Yuan-Hao Chang
Predictive Parsing Table (Cont.)
• For every LL(1) grammar, each parsing-table entry uniquely identifies a
production or signals an error.
– If G is left-recursive or ambiguous, then M will have at least one multiply
defined entry.
– Although left-recursion elimination and left factoring are easy to do, some
grammars have no corresponding LL(1) grammar.
• E.g., • S→iEtSS’: FIRST(iEtSS’) = { i } • FOLLOW(S’) = FOLLOW(S) S → i E t S S’ | a
• S→a: FIRST(a) = { a } • FOLLOW(S) = {$}: start symbol S’ → eS |
• S’→eS: FIRST(eS) = { e } • FOLLOW(S)=FIRST(S’)={e} E→b
• S’→: FOLLOW(S’)= {e, $} Grammar
ambiguity
• E→b: FIRST(b) = {b}
NON- INPUT SYMBOL
TERMINAL a b e i t $
S S→a S→iEtSS’
S’→ S’→
S’ S’→eS
E E→b Copyright © All Rights Reserved by Yuan-Hao Chang
Nonrecursive Predictive Parsing
• A nonrecursive predictive parser is a table-driven parser
that maintains a stack explicitly instead of recursive calls.
• If w is the matched input so far, then the stack holds a
sequence of grammar symbols such that S * w
lm
The symbol on Input a + b $
top of the stack
Current input symbol
Stack X Predictive
Y Parsing Output
Z Program
$
Parsing
Table M
Table-Driven Predictive Parsing
• Algorithm: Table-driven predictive parsing
• INPUT: A string w and a parsing table M for grammar G.
• OUTPUT: If w is in L(G), a leftmost derivation of w; otherwise, an error indication.
• METHOD: Initially, the parser is in a configuration with w$ in the input buffer, and
the start symbol S of G on top of the stack, above $.
set ip to the first symbol of w;
set X to the top stack symbol; /* a is the current input symbol */
while (X≠$) { /* stack is not empty */
if( X is a ) pop the stack and advance ip; /* pop X */
else if (X is a terminal) error();
else if (M[X, a] is an error entry) error();
else if (M[X,a]=X→Y1Y2…Yk) {
output the production X→ Y1Y2…Yk
pop the stack; /* pop X */
push YkYk-1…Y1 onto the stack with Y1 on top;
}
set X to the top stack symbol;
} Copyright © All Rights R
E → TE’
E’ → + TE’ |
Table-Driven Predictive Parsing (Cont.) T → FT’ (4.2)
T’ → * FT’ |
• Input: id+id*id F→ (E) | id
MATCHED STACK INPUT ACTION
E$ id+id*id$
TE’$ id+id*id$ output E→TE’
FT’E’$ id+id*id$ output T→FT’
idT’E’$ id+id*id$ output F→id
id T’E’$ +id*id$ match id
id E’$ +id*id$ output T’→
id +TE’$ +id*id$ output E’→+TE’
id+ TE’$ id*id$ match +
id+ FT’E’$ id*id$ output T→FT’
id+ idT’E’$ id*id$ output F→id
id+id T’E’$ *id$ match id
id+id *FT’E’$ *id$ output T’→*FT’
id+id* FT’E’$ id$ match *
id+id* idT’E’$ id$ output F→id
id+id*id T’E’$ $ match id
id+id*id E’$ $ output T’→
id+id*id $ $ output E’→
match $
Error Recovery in Predictive Parsing
• An error is detected during predictive parsing
– When the terminal on top of the stack does not match the next input symbol.
Or
– When nonterminal A is on top of the stack, a is the next input symbol, and
M[A, a] is error.
• Error recovery methods:
– Panic mode
- Skip symbols on the input until a token in a selected set of synchronizing tokens
appears.
- The effectiveness depends on the choice of synchronizing set.
– Phrase-level recovery
- Fill in the blank entries in the predictive parsing table with pointers to error routines.
- Error routines may change, insert, or delete symbols on the input and issue
appropriate error messages.
- An infinite loop must be prevented: checking that any recovery action eventually
consumes input symbols.
Panic-Mode Error Recovery
• Some heuristics to select synchronizing set:
– All symbols in FOLLOW(A) as the synchronizing set for nontermial A
- Skip tokens until an element of FOLLOW(A) is seen and pop A.
– The symbols that begin higher-level constructs as the synchronizing
set of a lower-level construct
- E.g., add keywords that begin statements to the synchronizing sets for
the nonterminals generating expressions.
– The symbols in FIRST(A) as the synchronizing set for nonterminal A.
– If a nonterminal can generate the empty string, the production
deriving can be used as a default.
- To postpone some error detection, but cannot miss an error.
– If a terminal on top of the stack cannot be matched, pop the terminal,
issue a message saying that the terminal was inserted, and continue
parsing.
- This approach takes the synchronizing set of a token to consist of all of
other tokens.
E→ TE’
Panic-Mode Error Recovery (Cont.) E’ → + TE’ |
T → FT’ (4.2)
T’ → * FT’ |
• Obtain synchronizing tokens from the FOLLOW set
F→ (E) | id
of the nonterminal.
– If checked M[A, a] is blank, skip the input symbol a. • FOLLOW(E) = { ), $ }
– If the entry is synch, pop the nonterminal on top of the • FOLLOW(E’) = { ), $ }
stack. • FOLLOW(T) = { +, ), $ }
– If a token on top of the stack does not match the input • FOLLOW(T’) = { +, ), $ }
symbol, pop the token from the stack.
• FOLLOW(F) = { +, *, ), $ }
NON- INPUT SYMBOL
TERMINAL
id + * ( ) $
E E→TE’ E→TE’ synch synch
E’ E’→+TE’ E’→ E’→
T T→FT’ synch T→FT’ synch synch
T’ T’→ T’→*FT’ T’→ T’→
F F→id synch synch F→(E) synch
Copyright © All Rights synchChang
Reserve d by Yuan-Hao
E→ TE’
Panic-Mode Error Recovery (Cont.) E’ → + TE’ |
T → FT’ (4.2)
• Input: +id*+id T’ → * FT’ |
MATCHED STACK INPUT Remark F→ (E) | id
E$ +id*+id$ error, skip +
E$ id*+id$ id is in FIRST(E)
TE’$ id*+id$
FT’E’$ id*+id$
idT’E’$ id*+id$
id T’E’$ *+id$ match id
id *FT’E’$ *+id$
id* FT’E’$ +id$ match *
id* FT’E’$ +id$ Error, M[F, +]=synch
id* T’E’$ +id$ F has been popped
id* E’$ +id$
id* +TE’$ +id$
id*+ TE’$ id$ match +
id*+ FT’E’$ id$
id*+ idT’E’$ id$
id*+id T’E’$ $ match id
id*+id E’$ $
id*+id $ $ Copyright © All Rights Reserved by Yuan-Hao Chang
Bottom-Up Parse
• A bottom-up parse constructs a parse tree for an
input string beginning at the leaves towards the root.
– It describes parsing as the process of building parse trees.
id * id F * id T * id T * F T E
id F F id T * F T
id id F id T * F
id F id
A bottom-up parse for id*id
id
The derivation corresponds to the parse: E→E+T|T
E T T*F T*id F*id id*id T→T*F|F (4.1)
F → (E) | id
A rightmost derivation
Reductions
• Bottom-up parsing is the process of “reducing” a string w to
the start symbol of the grammar.
– The goal is to construct a derivation in reverse.
– At each reduction step, a specific substring matching the body of a
production is replaced by the nonterminal at the head of the production.
– Key decisions: When to reduce and what production to apply
E→E+T|T
T→T*F|F (4.1)
F → (E) | id
Reduction sequence: A reduction is the reverse
id*id, F*id, T*id, T*F, T, E of a step in a derivation.
Handle Pruning
• Bottom-up parsing during a left-to-right scan of the input
constructs a right-most derivation in reverse.
– Handle: a handle is a substring that matches the body of a
production and
– Reduction: the reduction of a handle represents one E→E+T|T
step along the reverse of a rightmost derivation. T→T*F|F (4.1)
F → (E) | id
Right sentential Reducing
Handle
form production
id1 * id2 id1 F→id
F * id2 F T→F
T * id2 id2 F→id
T*F T*F T→T * F
T T E→T
a, b, c: a terminal
w, x, y, z: strings of terminals
Handle Pruning (Cont.) A,B, C: a nonterminal
W, X, Y, Z: a grammar symbol (termina or nonterminal)
: strings of grammar symbols
• If S * Aw w, given a production
→A,→) is a handle of w.
– The A(or
• Given a right sentential form , a handle of , a production
A→, and a position of where may be found, replace at
that position by A to produce the previous right-sentential
form in a rightmost derivation of .
• Every right-sentential form of the grammar has exactly one
handle, except ambiguous grammars.
– A rightmost derivation in reverse can be obtaine by “handle pruning”.
S
Handle pruning (rightmost derivation in reverse)
A S * Aw w
rm rm Production A→b,
w Rightmost derivation
Shift-Reduce Parsing
• Shift-reduce parsing is a form of bottom-up parsing in which
– a stack holds grammar symbols and
– an input buffer holds the rest of the string to be parsed.
• The handle always appears at the top of the stack just before it is
identified as the handle.
Initial state: finish state:
Mark the STACK INPUT STACK INPUT
bottom of $ w$ $S $
the stack
Input The start
string The top symbol
of the
stack
June 15, 2011 72
Shift-Reduce Parsing (Cont.)
• Operations of shift-reduce E→E+T|T
T→T*F|F (4.1)
parsing: F → (E) | id
– Shift: Shift the next input symbol
STACK INPUT ACTION
onto the top of the stack.
$ id1 * id2 $ shift
– Reduce: Locate the left end of the
string within the stack and decide $id1 * id2 $ reduce by F→ id
with what nonterminal to replace the $F * id2 $ reduce by T→ F
string. $T * id2 $ shift
– Accept: Announce successful $T* id2 $ shift
completion of parsing. $T*id2 $ reduce by F→ id
– Error: Discover a syntax error and $T*F $ reduce by T→ T*F
call an error recovery routine. $T $ reduce by E→ T
Copyright © All Rights Reserved by Yuan-Hao Chang $E $ accept
E.g., parse id1 * id2
June 15, 2011 73
Shift-Reduce Parsing (Cont.)
The handle will always eventually appear on top of the stack.
Case 1 Case 2
S * Az Byz yz Leftmost S * BxAz Bxyz xyz
rm rm rm derivation rm rm rm
S S
B B A
y z x y z
STACK INPUT ACTION STACK INPUT ACTION
$ yz $ reduce B→ $ xyz $ reduce B→
$B yz $ shift $B xyz $ shift xy
$By z$ reduce A→By $Bxy z$ reduce A→y
$ z$ shift $ Bx z$ Shift z
Copyright © All Rights Reserved by Yuan -Hao Chang
Conflicts During Shift-Reduce Parsing
• Some context-free grammars could let the shift-
reduce parsing encounter conflicts on deciding the
next action.
– Shift/reduce conflict
- Cannot decide whether to shift or to reduce Dangling-else
- E.g., shift-reduce conflict grammer
stmt → if expr then stmt
| if expr then stmt else stmt Cannot determine
| other whether to shift or
STACK INPUT to reduce
… if expr then stmt else … $
– Reduce/reduce conflict
- Cannot decide which production should be adopted to reduce
Conflicts During Shift-Reduce Parsing (Cont.)
• E.g., a grammar for function call and array for the input p(i,j)
– A function called with parameters surrounded by parentheses.
– Indices of arrays are surrounded by parentheses.
(1) stmt → id (parameter_list) One solution to resolve this problem
(2) stmt → expr := expr is to change production into
(3) parameter_list → parameter_list, parameter stmt → procid (parameter_list)
(4) parameter_list → parameter For the token name of procedures.
(5) parameter → id STACK INPUT
(6) expr → id (expr_list)
(7) expr → Id … procid ( id , id) … $
(8) expr_list → expr_list, expr A procedure call is encountered
(9) expr_list → expr
STACK INPUT
Input: p(i,j) is converted to the token string id(id, id)
… id ( id , id) … $ The correct choice is production (5) if p is a
An array is encountered procedure call.
The correct choice is production (7) if p is an array.
Thanks