Parsing — Part II
(Ambiguity, Top-down parsing, Left-recursion Removal)
Ambiguous Grammars
Definitions
• If a grammar has more than one leftmost derivation for a single sentential
form, the grammar is ambiguous
• If a grammar has more than one rightmost derivation for a single sentential
form, the grammar is ambiguous
• The leftmost and rightmost derivations for a sentential form may differ, even
in an unambiguous grammar
Classic example — the if-then-else problem
Stmt if Expr then Stmt
| if Expr then Stmt else Stmt
| … other stmts …
This ambiguity is entirely grammatical in nature
from Cooper & Torczon 2
Ambiguity
This sentential form has two derivations (If a derivation has more than 1 parse
tree, the grammar is ambiguous
if Expr1 then if Expr2 then Stmt1 else Stmt2
if if
E1 then else E1 then
if S2 if
E2 then E2 then else
S1 S1 S2
production 2, then production 1, then
production 1 production 2
from Cooper & Torczon 3
Ambiguity
Removing the ambiguity
• Must rewrite the grammar to avoid generating the problem
• Match each else to innermost unmatched if (common sense rule)
With this grammar, the example has only one derivation
from Cooper & Torczon 4
Ambiguity
if Expr1 then if Expr2 then Stmt1 else Stmt2
R ule S e nte n tial Form
— St m t
2 NoE ls e
5 if Exp r th e n St m t
? if E 1 th e n St m t
1 if E 1 th e n W ith Els e
3 if E 1 th e n if Exp r th e n W ith Els e e ls e W ith Els e
? if E 1 th e n if E 2 th e n W ith Els e e ls e W ith Els e
4 if E 1 th e n if E 2 th e n S 1 e ls e W ith Els e
4 if E 1 th e n if E 2 th e n S 1 e ls e S 2
This binds the else controlling S2 to the inner if
from Cooper & Torczon 5
Deeper Ambiguity
Ambiguity usually refers to confusion in the CFG
Overloading can create deeper ambiguity
a = f(17)
In some languages, f could be either a function or a subscripted variable
Disambiguating this one requires context
• Need values of declarations
• Really an issue of type, not context-free syntax
• Requires an extra-grammatical solution (not in CFG)
• Must to handle these with a different mechanism
Step outside grammar rather than use a more complex grammar
from Cooper & Torczon 6
Ambiguity - the Final Word
Ambiguity arises from two distinct sources
• Confusion in the context-free syntax (if-then-else)
• Confusion that requires context to resolve (overloading)
Resolving ambiguity
• To remove context-free ambiguity, rewrite the grammar
• To handle context-sensitive ambiguity takes cooperation
Knowledge of declarations, types, …
Accept a superset of L(G) & check it with other means†
This is a language design problem
Sometimes, the compiler writer accepts an ambiguous grammar
Parsing techniques that “do the right thing”
† See Chapter 4
from Cooper & Torczon 7
Parsing Techniques
Top-down parsers (LL(1), recursive descent)
• Start at the root of the parse tree and grow toward leaves
• Pick a production & try to match the input
• Bad “pick” may need to backtrack
• Some grammars are backtrack-free (predictive parsing)
Bottom-up parsers (LR(1), operator precedence)
• Start at the leaves and grow toward root
• As input is consumed, encode possibilities in an internal state
• Start in a state valid for legal first tokens
• Bottom-up parsers handle a large class of grammars
from Cooper & Torczon 8
Top-down Parsing
A top-down parser starts with the root of the parse tree
The root node is labeled with the goal symbol of the grammar
Top-down parsing algorithm:
Construct the root node of the parse tree
Repeat until the fringe of the parse tree matches the input string
At a node labeled A, select a production with A on its lhs and, for each
symbol on its rhs, construct the appropriate child
When a terminal symbol is added to the fringe and it doesn’t match the fringe,
backtrack
Find the next node to be expanded (label NT)
The key is picking the right production in step 1
That choice should be guided by the input string
from Cooper & Torczon 9
Remember the expression grammar?
Version with precedence derived last lecture
And the input x - 2 * y
from Cooper & Torczon 10
Example
Let’s try x - 2 * y :
Goal
Expr
Expr + Term
Term
Fact.
<id,x>
from Cooper & Torczon 11
Example
Let’s try x - 2 * y :
Goal
Expr
Expr + Term
Term
Fact.
<id,x>
This worked well, except that “-” doesn’t match “+”
The parser must backtrack to here
from Cooper & Torczon 12
Example
Continuing with x - 2 * y :
Goal
Expr
Expr - Term
Term
Fact.
<id,x>
from Cooper & Torczon 13
Example
Continuing with x - 2 * y :
Goal
Expr
Expr - Term
Term
Fact.
<id,x>
This time, “-” and We can advance past “-”
“-” matched to look at “2”
Now, we need to expand Term - the last NT on the fringe
from Cooper & Torczon 14
Example
Trying to match the “2” in x - 2 * y :
Goal
Expr
Expr - Term
Term Fact.
Fact. <num,2>
<id,x>
from Cooper & Torczon 15
Example
Trying to match the “2” in x - 2 * y :
Goal
Expr
Expr - Term
Term Fact.
Fact. <num,2>
Where are we?
<id,x>
• “2” matches “2”
• We have more input, but no NTs left to expand
• The expansion terminated too soon
Need to backtrack
from Cooper & Torczon 16
Example
Trying again with “2” in x - 2 * y :
Goal
Expr
Expr - Term
Term Term * Fact.
Fact. Fact. <id,y>
<id,x> <num,2>
This time, we matched & consumed all the input
Success!
from Cooper & Torczon 17
Another possible parse
Other choices for expansion are possible
R u le S en te n tial For m In pu t
— Goa l x - 2 * y
1 Expr x - 2 * y consuming no input !
2 Expr + Ter m x - 2 * y
2 Expr + Ter m +Ter m x - 2 * y
2 Expr + Ter m + Ter m +Ter m x - 2 * y
2 Expr +Ter m + Ter m + …+Ter m x - 2 * y
This doesn’t terminate (obviously)
• Wrong choice of expansion leads to non-termination
• Non-termination is a bad property for a parser to have
• Parser must make the right choice
from Cooper & Torczon 18
Left Recursion
Top-down parsers cannot handle left-recursive grammars
Formally,
A grammar is left recursive if A NT such that
a derivation A + A, for some string (NT T )+
Our expression grammar is left recursive
• This can lead to non-termination in a top-down parser
• For a top-down parser, any recursion must be right recursion
• We would like to convert the left recursion to right recursion
Non-termination is a bad property in any part of a compiler
from Cooper & Torczon 19
Eliminating Left Recursion
To remove left recursion, we can transform the grammar
Consider a grammar fragment of the form
Fee Fee
|
where neither nor start with Fee
We can rewrite this as
Fee Fie
Fie Fie
|
where Fie is a new non-terminal
This accepts the same language, but uses only right recursion
from Cooper & Torczon 20
Eliminating Left Recursion
The expression grammar contains two cases of left recursion
Applying the transformation yields
These fragments use only right recursion
They retains the original left associativity
from Cooper & Torczon 21
Eliminating Left Recursion
Substituting back into the grammar yields
• This grammar is correct,
if somewhat non-intuitive.
• It is left associative, as was
the original
• A top-down parser will
terminate using it.
• A top-down parser may
need to backtrack with it.
from Cooper & Torczon 22
Eliminating Left Recursion
The transformation eliminates immediate left recursion
What about more general, indirect left recursion
The general algorithm:
arrange the NTs into some order A1, A2, …, An
for i 1 to n
replace each production Ai As with
Ai 1 2 k , where As 1 2k
are all the current productions for As
eliminate any immediate left recursion on Ai
using the direct transformation
This assumes that the initial grammar has no cycles (Ai + Ai),
and no epsilon productions
from Cooper & Torczon 23
Eliminating Left Recursion
How does this algorithm work?
1. Impose arbitrary order on the non-terminals
2. Outer loop cycles through NT in order
3. Inner loop ensures that a production expanding Ai has no non-terminal As in its
rhs, for s < I
4. Last step in outer loop converts any direct recursion on Ai to right recursion
using the transformation showed earlier
5. New non-terminals are added at the end of the order & have no left recursion
At the start of the ith outer loop iteration
For all k < I, no production that expands Ak contains a non-terminal
As in its rhs, for s < k
from Cooper & Torczon 24