CSE 401 – Compilers
LR Parsing
Hal Perkins
Autumn 2010
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-1
Agenda
LR Parsing
Table-driven Parsers
Parser States
Shift-Reduce and Reduce-Reduce
conflicts
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-2
LR(1) Parsing
We’ll look at LR(1) parsers
Left to right scan, Rightmost derivation, 1
symbol lookahead
Almost all practical programming
languages have an LR(1) grammar
LALR(1), SLR(1), etc. – subsets of LR(1)
LALR(1) can parse most real languages, tables
are more compact, and is used by YACC/Bison/
CUP/etc.
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-3
Bottom-Up Parsing
Idea: Read the input left to right
Whenever we’ve matched the right
hand side of a production, reduce it to
the appropriate non-terminal and add
that non-terminal to the parse tree
The upper edge of this partial parse
tree is known as the frontier
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-4
Example
Grammar Bottom-up Parse
S ::= aAB e
A ::= Abc | b
B ::= d
a b b c d e
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-5
Details
The bottom-up parser reconstructs a reverse
rightmost derivation
Given the rightmost derivation
S =>β1=>β2=>…=>βn-2=>βn-1=>βn = w
the parser will first discover βn-1=>βn , then
βn-2=>βn-1 , etc.
Parsing terminates when
β1 reduced to S (start symbol, success), or
No match can be found (syntax error)
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-6
How Do We Parse with This?
Key: given what we’ve already seen and the
next input symbol, decide what to do.
Choices:
Perform a reduction
Look ahead further
Can reduce A=>β if both of these hold:
A=>β is a valid production
A=>β is a step in this rightmost derivation
This is known as a shift-reduce parser
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-7
Sentential Forms
If S =>* α, the string α is called a sentential
form of the of the grammar
In the derivation
S =>β1=>β2=>…=>βn-2=>βn-1=>βn = w
each of the βi are sentential forms
A sentential form in a rightmost derivation is
called a right-sentential form (similarly for
leftmost and left-sentential)
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-8
Handles
Informally, a substring of the tree
frontier that matches the right side of a
production
Even if A::=β is a production, β is a handle
only if it matches the frontier at a point
where A::=β was used in that derivation
β may appear in many other places in the
frontier without being a handle for that
particular production
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-9
Handles (cont.)
Formally, a handle of a right-sentential
form γ is a production A ::= β and a
position in γ where β may be replaced
by A to produce the previous right-
sentential form in the rightmost
derivation of γ
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-10
Handle Examples
In the derivation
S => aABe => aAde => aAbcde => abbcde
abbcde is a right sentential form whose
handle is A::=b at position 2
aAbcde is a right sentential form whose
handle is A::=Abc at position 4
Note: some books take the left of the match as
the position
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-11
Implementing Shift-Reduce
Parsers
Key Data structures
A stack holding the frontier of the tree
A string with the remaining input
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-12
Shift-Reduce Parser
Operations
Reduce – if the top of the stack is the
right side of a handle A::=β, pop the
right side β and push the left side A
Shift – push the next input symbol onto
the stack
Accept – announce success
Error – syntax error discovered
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-13
S ::= aABe
A ::= Abc | b
Shift-Reduce Example B ::= d
Stack Input Action
$ abbcde$ shift
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-14
How Do We Automate This?
Def. Viable prefix – a prefix of a right-
sentential form that can appear on the stack
of the shift-reduce parser
Equivalent: a prefix of a right-sentential form that
does not continue past the rightmost handle of
that sentential form
Idea: Construct a DFA to recognize viable
prefixes given the stack and remaining input
Perform reductions when we recognize them
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-15
S ::= aABe
A ::= Abc | b
DFA for prefixes of B ::= d
e
accept
8 9 S ::= aABe
$ B
start a A b c
1 2 3 6 7 A ::= Abc
b d
4 5
A ::= b B ::= d
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-16
S ::= aABe
A ::= Abc | b
Trace B ::= d
accept e
8 9 S ::= aABe
Stack Input B
$ abbcde$ start
$
a A b c
1 2 3 6 7 A ::= Abc
b d
4 5
A ::= b B ::= d
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-17
Observations
Way too much backtracking
We want the parser to run in time
proportional to the length of the input
Where the heck did this DFA come from
anyway?
From the underlying grammar
We’ll defer construction details for now
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-18
Avoiding DFA Rescanning
Observation: after a reduction, the contents
of the stack are the same as before except
for the new non-terminal on top
∴ Scanning the stack will take us through the
same transitions as before until the last one
∴ If we record state numbers on the stack, we
can go directly to the appropriate state when we
pop the right hand side of a production from the
stack
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-19
Stack
Change the stack to contain pairs of
states and symbols from the grammar
$s0 X1 s1 X2 s2 … Xn sn
State s0 represents the accept state
(Not always added – depends on particular presentation)
Observation: in an actual parser, only the state numbers need
to be pushed, since they implicitly contain the symbol
information, but for explanations it’s clearer to use both.
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-20
Encoding the DFA in a Table
A shift-reduce parser’s DFA can be
encoded in two tables
One row for each state
action table encodes what to do given the
current state and the next input symbol
goto table encodes the transitions to take
after a reduction
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-21
Actions (1)
Given the current state and input
symbol, the main possible actions are
si – shift the input symbol and state i onto
the stack (i.e., shift and move to state i )
rj – reduce using grammar production j
The production number tells us how many
<symbol, state> pairs to pop off the stack
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-22
Actions (2)
Other possible action table entries
accept
blank – no transition – syntax error
A LR parser will detect an error as soon as
possible on a left-to-right scan
A real compiler needs to produce an error
message, recover, and continue parsing when
this happens
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-23
Goto
When a reduction is performed,
<symbol, state> pairs are popped from
the stack revealing a state uncovered_s
on the top of the stack
goto[uncovered_s , A] is the new state
to push on the stack when reducing
production A ::= β (after popping β and
revealing state uncovered_s on top)
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-24
S ::= aABe
A ::= Abc | b
Reminder: DFA for B ::= d
accept e
8 9 S ::= aABe
$ B
start a A b c
1 2 3 6 7 A ::= Abc
b d
4 5
A ::= b B ::= d
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-25
1. S ::= aABe
2. A ::= Abc
3. A ::= b
LR Parse Table for 4. B ::= d
action goto
State
a b c d e $ A B S
1 s2 acc g1
2 s4 g3
3 s6 s5 g8
4 r3 r3 r3 r3 r3 r3
5 r4 r4 r4 r4 r4 r4
6 s7
7 r2 r2 r2 r2 r2 r2
8 s9
9 r1 r1 r1 r1 r1 r1
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-26
LR Parsing Algorithm (1)
word = scanner.getToken(); } else if (action[s, word] = accept ) {
while (true) { return;
s = top of stack; } else {
if (action[s, word] = si ) { // no entry in action table
push word; push i (state); report syntax error;
word = scanner.getToken(); halt or attempt recovery;
} else if (action[s, word] = rj ) { }
pop 2 * length of right side of
production j (2*|β|);
uncovered_s = top of stack;
push left side A of production j ;
push state goto[uncovered_s, A];
}
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-27
1. S ::= aABe
2. A ::= Abc
3. A ::= b
Example 4. B ::= d
action goto
Stack Input S
a b c d e $ A B S
$ abbcde$
1 s2 ac g1
2 s4 g3
3 s6 s5 g8
4 r3 r3 r3 r3 r3 r3
5 r4 r4 r4 r4 r4 r4
6 s7
7 r2 r2 r2 r2 r2 r2
8 s9
9 r1 r1 r1 r1 r1 r1
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-28
LR States
Idea is that each state encodes
The set of all possible productions that we
could be looking at, given the current state
of the parse, and
Where we are in the right hand side of
each of those productions
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-29
Items
An item is a production with a dot in
the right hand side
Example: Items for production A ::= XY
A ::= .XY
A ::= X.Y
A ::= XY.
Idea: The dot represents a position in
the production
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-30
S ::= aABe
A ::= Abc | b
DFA for B ::= d
1 8 9
$ e S ::= aABe.
S ::= .aABe accept S ::= aAB.e
B
a
2 3
S ::= a.ABe S ::= aA.Be 6
A b
A ::= .Abc A ::= A.bc A ::= Ab.c
A ::= .b B ::= .d c
7
b d
4 5 A ::= Abc.
A ::= b. B ::= d.
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-31
Problems with Grammars
Grammars can cause problems when
constructing a LR parser
Shift-reduce conflicts
Reduce-reduce conflicts
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-32
Shift-Reduce Conflicts
Situation: both a shift and a reduce are
possible at a given point in the parse
(equivalently: in a particular state of the
DFA)
Classic example: if-else statement
S ::= ifthen S | ifthen S else S
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-33
1. S ::= ifthen S
Parser States for 2. S ::= ifthen S else S
S ::= . ifthen S
1
S ::= . ifthen S else S State 3 has a shift-
ifthen reduce conflict
2 S ::= ifthen . S Can shift past else
S ::= ifthen . S else S into state 4 (s4)
S
Can reduce (r1)
3 S ::= ifthen S . S ::= ifthen S
S ::= ifthen S . else S
else
4 S ::= ifthen S else . S (Note: other S ::= . ifthen
items not included in states
2-4 to save space)
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-34
Solving Shift-Reduce Conflicts
Fix the grammar
Done in Java reference grammar, others
Use a parse tool with a “longest match”
rule – i.e., if there is a conflict, choose
to shift instead of reduce
Does exactly what we want for if-else case
Guideline: a few shift-reduce conflicts are
fine, but be sure they do what you want
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-35
Reduce-Reduce Conflicts
Situation: two different reductions are
possible in a given state
Contrived example
S ::= A
S ::= B
A ::= x
B ::= x
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-36
1. S ::= A
2. S ::= B
Parser States for 3.
4.
A ::= x
B ::= x
S ::= .A
1
S ::= .B State 2 has a
A ::= .x reduce-reduce
B ::= .x
x
conflict (r3, r4)
2
A ::= x.
B ::= x.
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-37
Handling Reduce-Reduce
Conflicts
These normally indicate a serious
problem with the grammar.
Fixes
Use a different kind of parser generator
that takes lookahead information into
account when constructing the states
Most practical tools use this information
Fix the grammar
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-38
Another Reduce-Reduce
Conflict
Suppose the grammar separates
arithmetic and boolean expressions
expr ::= aexp | bexp
aexp ::= aexp * aident | aident
bexp ::= bexp && bident | bident
aident ::= id
bident ::= id
This will create a reduce-reduce conflict
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-39
Covering Grammars
A solution is to merge aident and bident into
a single non-terminal (or use id in place of
aident and bident everywhere they appear)
This is a covering grammar
Includes some programs that are not generated
by the original grammar
Use the type checker or other static semantic
analysis to weed out illegal programs later
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-40
Coming Attractions
Constructing LR tables
We’ll present a simple version (SLR(0)) in
lecture, then talk about extending it to
LR(1)
LL parsers and recursive descent
Continue reading ch. 3
10/10/2010 © 2002-10 Hal Perkins & UW CSE D-41