[go: up one dir, main page]

0% found this document useful (0 votes)
11 views76 pages

Chapter 3

computer science compiler design course chapter 3

Uploaded by

ashe bin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views76 pages

Chapter 3

computer science compiler design course chapter 3

Uploaded by

ashe bin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

ባሕር ዳር ቴክኖሎጂ ኢንስቲትዩት

Bahir Dar Institute of Technology


ባሕር ዳር ዩኒቨርሲቲ
Bahir Dar University

Compiler Design
Course Code: CoSc4022 Target Group: 3rd Year CSD Students Instructor: Haileyesus D.
Chapter Three
Syntax Analyzer / Parser
Chapter Outline
▪ Introduction
▪ Context-free grammar
▪ Derivation
▪ Parse Tree
▪ Ambiguity
▪ Resolving Ambiguity
▪ Immediate & Indirect Left Recursion
▪ Eliminating Immediate & Indirect Left Recursion
▪ Left Factoring
▪ Non-Context Free Language Constructs

3
Introduction
▪ A syntax analyzer or parser takes the input from a lexical
analyzer in the form of token streams.
▪ The parser analyzes the source code (token stream) against
the production rules to detect any errors in the code. The
output of this phase is a parse tree.
Parse
Tree

4
Introduction

5
Introduction
▪ Abstract representations of the input program could be:
• Abstract-syntax tree/parse tree + symbol table
• Intermediate code
• Object code
▪ Syntax analysis is done by the parser.
• Produces a parse tree from which intermediate code can be
generated
• By detecting whether the program is written following the
grammar rules.
• Reports syntax errors, attempts error correction and recovery
• Collects information into symbol tables
6
Context Free Grammar (CFG)
▪ CFG is used to specify the structure of legal programs.
▪ The design of the grammar is an initial phase of the design
of a programming language.
▪ Formally a CFG G = (Vt,Vn,S,P), where:
▪ Vt is the set of terminal symbols in the grammar
(i.e.,the set of tokens returned by the scanner)
▪ Vn, the non-terminals, are variables that denote sets of (sub)strings
occurring in the language. These impose a structure on the
grammar.

7
Context Free Grammar (CFG)
▪ S is the start/goal symbol, a distinguished non-terminal in Vn
denoting the entire set of strings in L(G).
▪ P is a finite set of productions specifying how terminals and non-
terminals can be combined to form strings in the language.
Each production must have a single non-terminal on its left hand
side.
▪ The set V = Vt  Vn is called the vocabulary of G

8
Context Free Grammar (CFG)
▪ Example (G1):
▪ E→ E+E | E–E | E*E | E/E | -E
▪ E→ (E)
▪ E → id
▪ Where
▪ Vt = {+, -, *, / (,), id}, Vn = {E}
▪ S = {E}
▪ Production are shown above
▪ Sometimes → can be replaced by ::=

9
Context Free Grammar (CFG)
▪ CFG is more expressive than RE - Every language that can
be described by regular expressions can also be described by
a CFG
▪ L = {anbn | n>=1}, is an example language that can be expressed by
CFG but not by RE
▪ Context-free grammar is sufficient to describe most
programming languages.

10
Derivation
▪ A sequence of replacements of non-terminal symbols to
obtain strings/sentences is called a derivation
▪ If we have a grammar E → E+E then we can replace E by
E+E
▪ In general a derivation step is A   if there is a
production rule A→ in a grammar
▪ where  and  are arbitrary strings of terminal and non-
terminal symbols
▪ Derivation of a string should start from a production with
start symbol in the left.

11
Derivation

 is a sentential form (terminals & non-terminals Mixed)


S 
 is a sentence if it contains only terminal symbols

12
Derivation
▪ Derivate string –(id+id) from G1
E  -E  -(E)  -(E+E)  -(id+E)  -(id+id) (LMD)
OR
E  -E  -(E)  -(E+E)  -(E+id)  -(id+id) (RMD)
▪ At each derivation step, we can choose any of the non-
terminal in the sentential form of G for the replacement.
▪ If we always choose the left-most non-terminal in each
derivation step, this derivation is called as left-most
derivation(LMD).
▪ If we always choose the right-most non-terminal in each
derivation step, this derivation is called as right-most
derivation(RMD).
13
Derivation
• A parse tree can be seen as a graphical representation of a derivation
• Inner nodes of a parse tree are non-terminal symbols.
• The leaves of a parse tree are terminal symbols.

E  -E E
 -(E) E
 -(E+E) E
- E - E - E

( E ) ( E )

E E E + E
- E - E
 -(id+E)  -(id+id)
( E ) ( E )

E + E E + E

id id id
14
Derivation
• Example: Let any set of production rules in a CFG be
X → X+X | X*X |X| a
over an alphabet {a}.
• The leftmost derivation for the string "a+a*a" may be
X → X+X
→ a+X
→ a + X*X
→ a+a*X
→ a+a*a
• The stepwise derivation of the above string is shown as below −

15
Derivation
• Example: Let any set of production rules in a CFG be
X → X+X | X*X |X| a
over an alphabet {a}.
• The rightmost derivation for the above string "a+a*a" may be:
• X → X*X
→ X*a
→ X+X*a
→ X+a*a
→ a+a*a
• The stepwise derivation of the above string is shown as:

16
Derivation
• Exercise 1: Derive the string "aabbabba" for leftmost derivation and rightmost derivation
using a CFG given by,
S → aB | bA
A → a | aS | bAA
B → b | aS | aBB

▪ Exercise 2: Derive the string "00101" for leftmost derivation and rightmost derivation
using a CFG given by,
S → A1B
A → 0A | ε
B → 0B | 1B | ε

17
Parse Tree
▪ Derivation tree is a graphical representation for the derivation of the given
production rules for a given CFG.
▪ It is the simple way to show how the derivation can be done to obtain some
string from a given set of production rules.
▪ The derivation tree is also called a parse tree.
▪ A parse tree contains the following properties:
▪ The root node is always a node indicating start symbols.
▪ The derivation is read from left to right.
▪ The leaf node is always terminal nodes.
▪ The interior nodes are always the non-terminal nodes.

18
Parse Tree
▪ Example: Draw a derivation tree for the string “a*b+c" from the CFG
given by Step 1 Step 2
E=E+E
E=E*E
E=a|b|c

Step 3 Step 4 Step 5

19
Parse Tree
▪ Exercise: Construct a derivation tree for the string aabbabba for the CFG
given by,
S → aB | bA
A → a | aS | bAA
B → b | bS | aBB

20
Parse Tree
▪ An ambiguous grammar is one that produces more than one
LMD or more than one RMD for the same sentence.
E  E+E E  E*E
 id+E  E+E*E
 id+E*E
 id+E*E
 id+id*E  id+id*E
 id+id*id  id+id*id
E E

E + E E * E
id E * E E + E id

id id id id
21
Parse Tree
▪ For the most parsers, the grammar must be unambiguous.
▪ If a grammar is unambiguous grammar then there are
unique selection of the parse tree for a sentence
▪ We should eliminate the ambiguity in the grammar
during the design phase of the compiler.
▪ An unambiguous grammar should be written to eliminate
the ambiguity.
▪ We have to prefer one of the parse trees of a sentence
(generated by an ambiguous grammar) to disambiguate
that grammar to restrict to this choice.

22
Left Recursion
▪ A grammar is left recursive if it has a non-terminal A such
that there is a derivation.
▪ A  A for some string 
▪ Top-down parsing techniques cannot handle left-recursive
grammars.
▪ So, we have to convert our left-recursive grammar into an
equivalent grammar which is not left-recursive.
▪ Two types of left-recursion
▪ Immediate left-recursion - appear in a single step of the derivation,
▪ Indirect left-recursion - appear in more than one step of the
derivation
23
Left Recursion
A→A|  where  does not start with A
 eliminate immediate left recursion
A →  A’
A →  A’ | 
A’ →  A’ |  OR
A’ →  A’ | 

In general,
A → A 1 | ... | A m | 1 | ... | n where 1 ... n do not start with
A
 eliminate immediate left recursion
A → 1 A’ | ... | n A’
A’ → 1 A’ | ... | m A’ |  an equivalent grammar
24
Eliminating Left Recursion
▪ Remove left recursion from the grammar below
E → E+T | T
T → T*F | F
F → id | (E)
▪ Answer
E → T E’
E’ → +T E’ | 
T → F T’
T’ → *F T’ | 
F → id | (E)

25
Indirect Left Recursion
▪ A grammar cannot be immediately left-recursive, but it still
can be left-recursive.
▪ By just eliminating the immediate left-recursion, we may not
get a grammar which is not left-recursive.
S → Aa | b
A → Sc | d This grammar is not immediately left-recursive,
but it is still left-recursive.
S  Aa  Sca or
A  Sc  Aac causes to a left-recursion

So, we have to eliminate all left-recursions from our grammar

26
Indirect Left Recursion
Arrange non-terminals in some order: A1 ... An
▪ we will remove indirect left recursion by constructing an
equivalent grammar G’ such that - if Ai → Aja is any
production of G’, then i < j

▪ For each non-terminal in turn, do:


▪ For each terminal Ai such that 1< j<i and we have a production rule
of the form Ai → Aj , where the Aj productions are Aj → 1 | …|Bn
, do:
▪ Replace the production rule Ai → Aj  with the rule Ai → 1 |
…|Bn
▪ Eliminate any immediate left recursion among the productions 1

27
Eliminating Indirect Left Recursion
• Example 1
S → Aa | b
A → Ac | Sd | f
- Order of non-terminals: S = A1, A = A2
A1 → A2 a | b
A2 → A2 c | A1 d | f
The only production with j<i is A2 → A1 d
for A:
- Replace it with A2 → A2 ad | bd
A2 → A2 c | A2 ad | bd | f
- Eliminate the immediate left-recursion in A
A2 → bdA2’|fA2’
A2’ → cA2’ | adA2’| 
So, the resulting equivalent grammar which is not left-recursive is:
S → Aa | b
A → bdA’ | fA’
A’ → cA’ | adA’ | 

28
Eliminating Indirect Left Recursion
• Example 2
A1 → A2 A3
A2 → A3 A1 | b
A3 → A1 A1 | a
Replace A3 → A1 A1 by A3 → A2 A3 A1
and then replace this by
A3 → A3 A1 A3 A1 and A3 → b A3 A1
Eliminating direct left recursion in the above,
gives: A3 → aK | b A3 A1K
k → A1 A3 A1K | 
The resulting grammar is then:
A1 → A2 A3
A2 → A3 A1 | b
A3 → aK | b A3 A1K
k → A1 A3 A1K | 

29
Session Two
▪ Top Down Parsing
▪ Recursive-Descent Parsing
▪ Predictive Parser
▪ Recursive Predictive Parsing
▪ Non-Recursive Predictive Parsing
▪ LL(1) Parser – Parser Actions
▪ Constructing LL(1) - Parsing Tables
▪ Computing FIRST and FOLLOW functions
▪ LL(1) Grammars
▪ Properties of LL(1) Grammars
30
Top Down Parsing
 Top-down parsing involves constructing a parse tree for the input string, starting
from the root
 Basically, top-down parsing can be viewed as finding a leftmost derivation for an
input string.
 How it works? Start with the tree of one node labeled with the start symbol and repeat the
following steps until the fringe of the parse tree matches the input string
1. At a node labeled A, select a production with A on its LHS and for each symbol on its
RHS, construct the appropriate child
2. When a terminal is added to the fringe that doesn't match the input string, backtrack
3. Find the next node to be expanded
 Minimize the number of backtracks as much as possible
31
Top Down Parsing
 Two types of top-down parsing
 Recursive-Descent Parsing
 Backtracking is needed (If a choice of a production rule does not work, we backtrack
to try other alternatives.)
 It is a general parsing technique, but not widely used because it is not efficient
 Predictive Parsing
 No backtracking and hence efficient
 Needs a special form of grammars (LL(1) grammars).
 Two types
 Recursive Predictive Parsing is a special form of Recursive Descent Parsing without
backtracking.
 Non-Recursive (Table Driven) Predictive Parser is also known as LL(1) parser.
32
Recursive-Descent Parsing
 It tries to find the left-most derivation. Backtracking is needed
 Example
S → aBc
B → bc | b
 input: abc

 A left-recursive grammar can cause a recursive-descent parser, even


one with backtracking, to go into an infinite loop.
 That is, when we try to expand a non-terminal B, we may eventually find
ourselves again trying to expand B without having consumed any input.
33
Non-Recursive Predictive Parsing
 A non-recursive predictive parser can be built by maintaining a stack
explicitly, rather than implicitly via recursive calls
 Non-Recursive predictive parsing is a table-driven top-down parser.

Model of a table-driven
predictive parser
34
Non-Recursive Predictive Parsing
 Input buffer
 our string to be parsed. We will assume that its end is marked with a special symbol $.
 Output
 a production rule representing a step of the derivation sequence (left-most derivation) of the string in the
input buffer.
 Stack
 contains the grammar symbols
 at the bottom of the stack, there is a special end marker symbol $.
 initially the stack contains only the symbol $ and the starting symbol S.
 when the stack is emptied (i.e. only $ left in the stack), the parsing is completed.
 Parsing table
 a two-dimensional array M[A,a]
 each row is a non-terminal symbol
 each column is a terminal symbol or the special symbol $
 each entry holds a production rule.

35
LL(1) Parser – Parser Actions
 The symbol at the top of the stack (say X) and the current symbol in the input string (say a)
determine the parser action.
 There are four possible parser actions.
1. If X and a are $ ➔ parser halts (successful completion)
2. If X and a are the same terminal symbol (different from $)
➔ parser pops X from the stack, and moves the next symbol in the input buffer.
3. If X is a non-terminal
➔ parser looks at the parsing table entry M[X,a]. If M[X,a] holds a production rule
X→Y1Y2...Yk, it pops X from the stack and pushes Yk,Yk-1,...,Y1 into the stack. The parser also
outputs the production rule X→Y1Y2...Yk to represent a step of the derivation.
4. none of the above ➔ error
 all empty entries in the parsing table are errors.
 If X is a terminal symbol different from a, this is also an error case.

36
LL(1) Parser – Example 1
S → aBa a b $
LL(1) Parsing
B → bB |  S S → aBa Table
B B→ B → bB

stack input output We will see how to


$S abba$ S → aBa construct parsing table
$aBa abba$
$aB bba$ B → bB
Very soon
$aBb bba$
$aB ba$ B → bB
$aBb ba$
$aB a$ B→
$a a$
$ $ accept, successful completion
37
LL(1) Parser – Example 2
Grammar: stack input output
E → E+T | T $E id+id$ E → TE’
T → T*F | F $E’T id+id$ T → FT’
F → id | (E) $E’ T’F id+id$ F → id
$ E’ T’id id+id$
Input: id+id $ E’ T’ +id$ T’ → 
$ E’ +id$ E’ → +TE’
$ E’ T+ +id$
$ E’ T id$ T → FT’
$ E’ T’ F id$ F → id
$ E’ T’id id$
$ E’ T’ $ T’ → 
$ E’ $ E’ → 
$ $ accept
38
LL(1) Parser – Example 3
Taking Input
id+id*id
which is formed
from the Grammar
for Example 2

39
Constructing LL(1) Parsing Table
 Two functions are used in the construction of LL(1) parsing tables:
 FIRST
 FOLLOW

 FIRST() is a set of the terminal symbols which occur as first symbols in strings
derived from  where  is any string of grammar symbols.
 if  derives to , then  is also in FIRST() .

 FOLLOW(A) is the set of the terminals which occur immediately after the non-
terminal A in the strings derived from the starting symbol.
 a terminal a is in FOLLOW(A) if S  Aa

40
Constructing FIRST for a String X
1. If X is a terminal symbol, then FIRST(X)={X}

2. If X is , then FIRST(X)={}
3. If X is a non-terminal symbol and X →  is a production rule, then add  in
FIRST(X).

4. If X is a non-terminal symbol and X →Y1Y2..Yn is a production rule, then


 if a terminal a in FIRST(Yi) and  is in all FIRST(Yj) for j=1,...,i-1, then a
is in FIRST(X).
 if  is in all FIRST(Yj) for j=1,...,n, then  is in FIRST(X).

41
Constructing FIRST for a String X
Example
E → TE’
E’ → +TE’ | 
T → FT’
T’ → *FT’| 
F → (E) | id

FIRST(E) = First(T) = First(F) = {(, id}


FIRST(E’) = {+, }
FIRST(T) = {(, id}
FIRST(T’) = {*, }
FIRST(F) = {(, id}

42
Constructing FIRST for a String X
Exercise 1: Find the FIRST for the grammar

S→ABCDE
A→a | 
B→b | 
C→c
D→d | 
E→e | 

43
Constructing FIRST for a String X
Exercise 2: Find the FIRST for the grammar

S→aBDh
B→cC
C→bC | 
D→EF
E→g | 
F→f | 

44
Constructing FOLLOW (For Non Terminals)
1. $ is in FOLLOW(S), if S is the start symbol

2. Look at the occurrence of a non‐terminal on the RHS of a production which is


followed by something
 if A → B is a production rule, then everything in FIRST() except  is FOLLOW(B)

3. Look at B on the RHS that is not followed by anything


 If ( A → B is a production rule ) or ( A → B is a production rule and  is in
FIRST() ), then everything in FOLLOW(A) is in FOLLOW(B).

We apply these rules until nothing more can be added to any follow set.

45
Constructing FOLLOW (For Non Terminals)
Example
i. E → TE’
ii.E’ → +TE’ | 
T → FT’
iii.
iv.T’ → *FT’ | 
v. F → (E) | id
FOLLOW(E) = { $, ) }, because
 From first rule Follow (E) contains $
 From Rule 2 Follow(E) is first()), from the production F → (E)
FOLLOW(E’) = { $, ) } …. Rule 3
FOLLOW(T) = { +, ), $ }
 From Rule 2 + is in FOLLOW(T)
 From Rule 3 Everything in Follow(E) is in Follow(T) since First(E’) contains 
FOLLOW(F) = {+, *, ), $ } …same reasoning as above
FOLLOW(T’) = { +, ), $ } ….Rule3
46
Constructing LL(1) Parsing Table -- Algorithm
 For each production rule A →  of a grammar G
1. for each terminal a in FIRST()
➔ add A →  to M[A,a]

2. If  in FIRST()
➔ for each terminal a in FOLLOW(A) add A →  to M[A,a]

3. If  in FIRST() and $ in FOLLOW(A)


➔ add A →  to M[A,$]

 All other undefined entries of the parsing table are error entries
47
Constructing LL(1) Parsing Table -- Algorithm
E → TE’ FIRST(TE’)={(,id} ➔ E → TE’ into M[E,(] and M[E,id]
E’ → +TE’ FIRST(+TE’ )={+} ➔ E’ → +TE’ into M[E’,+]
E’ →  FIRST()={} ➔ none
but since  in FIRST()
and FOLLOW(E’)={$,)} ➔ E’ →  into M[E’,$] and M[E’,)]
T → FT’ FIRST(FT’)={(,id} ➔ T → FT’ into M[T,(] and M[T,id]
T’ → *FT’ FIRST(*FT’ )={*} ➔ T’ → *FT’ into M[T’,*]
T’ →  FIRST()={} ➔ none
but since  in FIRST()
and FOLLOW(T’)={$,),+} ➔ T’ →  into M[T’,$], M[T’,)] & M[T’,+]
F → (E) FIRST((E) )={(} ➔ F → (E) into M[F,(]
F → id FIRST(id)={id} ➔ F → id into M[F,id]

48
LL(1) Parsing Table
E → TE’
E’ → +TE’ | 
T → FT’
T’ → *FT’ | 
F → (E) | id
id + * ( ) $
E E → TE’ E → TE’
E’ E’ → +TE’ E’ →  E’ → 
T T → FT’ T → FT’
T’ T’ →  T’ → *FT’ T’ →  T’ → 
F F → id F → (E)
49
Session Three
▪ Bottom Up Parsing
▪ Handle Pruning
▪ Implementation of A Shift-Reduce Parser
▪ LR Parsers
▪ LR Parsing Algorithm
▪ Actions of A LR-Parser
▪ Constructing SLR Parsing Tables
▪ SLR(1) Grammar
▪ Error Recovery in LR Parsing
50
Bottom-Up Parsing
 A bottom-up parser creates the parse tree of the given input starting from leaves
towards the root.
 A bottom-up parser tries to find the RMD of the given input in the reverse order.
 Bottom-up parsing is also known as shift-reduce parsing because its two main
actions are shift and reduce.
 At each shift action, the current symbol in the input string is pushed to a stack.
 At each reduction step, the symbols at the top of the stack will be replaced by the
non-terminal at the left side of that production.
 Accept: Successful completion of parsing.
 Error: Parser discovers a syntax error, and calls an error recovery routine.

51
Bottom-Up Parsing
 A shift-reduce parser tries to reduce the given input string into the starting
symbol.
 At each reduction step, a substring of the input matching to the right side of a
production rule is replaced by the non-terminal at the left side of that
production rule.
 If the substring is chosen correctly, the right most derivation of that string is
created in the reverse order.
Rightmost Derivation: S
rm

Shift-Reduce Parser finds:   ...  S


rm rm

52
Shift-Reduce Parsing: Example
S → aABb input string: aaabb
A → aA | a aaAbb
B → bB | b aAbb  reduction
aABb
S
S  aABb  aAbb  aaAbb  aaabb

Right Sentential Forms

 How do we know which substring to be replaced at each reduction step?

53
Handle
 Informally, a handle of a string is a substring that matches the right side of a
production rule.
 But not every substring matches the right side of a production rule is handle

 If the grammar is unambiguous, then every right-sentential form of the


grammar has exactly one handle.

54
Shift-Reduce Parsing: Example
E → E+T | T Right-Most Derivation of id+id*id
T → T*F | F E  E+T  E+T*F  E+T*id  E+F*id
F → (E) | id  E+id*id  T+id*id  F+id*id  id+id*id

Right-Most Sentential Form Reducing Production


id+id*id F → id
F+id*id T→F
T+id*id E →T
E+id*id F → id
E+F*id T→F
E+T*id F → id
E+T*F T → T*F
E+T E → E+T
E
Handles are red and underlined in the right-sentential forms.
55
Shift-Reduce Parsing: Example
Stack Input Action
$ id+id*id$ shift Initial stack just contains only
$id +id*id$ reduce by F → id the end-marker $ & the
$F +id*id$ reduce by T → F end of the input string is
$T +id*id$ reduce by E → T marked by the end-
$E +id*id$ shift marker $.
$E+ id*id$ shift
$E+id *id$ reduce by F → id
$E+F *id$ reduce by T → F
$E+T *id$ shift
$E+T* id$ shift
$E+T*id $ reduce by F → id
$E+T*F $ reduce by T → T*F
$E+T $ reduce by E → E+T
$E $ accept

56
Shift-Reduce Parser
 The most prevalent type of bottom-up parser today is based on a concept called
LR(k) parsing;

CFG
left to right right-most k lookhead (k is omitted ➔ it is 1)
LR
LALR
 LR-Parsers overs wide range of grammars. SLR
 Simple LR parser (SLR )
 Look Ahead LR (LALR)
 most general LR parser (LR )

 SLR, LR and LALR work same, only their parsing tables are different.

57
LR Parser
 LR parsing is attractive because:
 LR parsers can be constructed to recognize virtually all programming-language constructs for
which context-free grammars can be written.
 LR parsing is most general non-backtracking shift-reduce parsing, yet it is still efficient.
 The class of grammars that can be parsed using LR methods is a proper superset of the class of
grammars that can be parsed with predictive parsers.
 LL(1)-Grammars  LR(1)-Grammars
 An LR-parser can detect a syntactic error as soon as it is possible to do so a left-to-right scan
of the input.
 Drawback of the LR method is that it is too much work to construct an LR parser by hand.
 Use tools e.g. yacc

58
LR Parsing Algorithm
input a1 ... ai ... an $
stack

Sm
Xm output
LR Parsing Algorithm
Sm-1
Xm-1
.
.
Action Table Goto Table
S1 terminals and $ non-terminal
X1 s s
t four different t each item is
S0 a actions a a state number
t t
e e
s s
59
A Configuration of LR Parsing Algorithm
 A configuration of a LR parsing is:

( So S1 ... Sm, ai ai+1 ... an $ )

Stack Rest of Input

 Sm and ai decides the parser action by consulting the parsing action table. (Initially Stack
contains just So )

 A configuration of a LR parsing represents the right sentential form:


X1 ... Xm ai ai+1 ... an $
 Xi is the grammar symbol represented by state si

60
Actions of a LR Parser
1. If ACTION[Sm, ai ] = shift s, the parser executes a shift move ; it shifts the next state s
onto the stack, entering the configuration
( So S1 ... Sm, ai ai+1 ... an $ ) ➔ ( So S1 ... Sm s, ai+1 ... an $ )

2. If ACTION[Sm, ai ] = reduce A→, then the parser executes a reduce move changing
configuration from
( So S1 ... Sm, ai ai+1 ... an $ ) to ( So S1 ... Sm-r s, ai ... an $ ) where r is the length of , and s =
GOTO[sm-r, A]. Output is the reducing production A→
 Here the parser first popped r state symbols off the stack, exposing state sm-r then the parser
pushed s.
3. If ACTION[Sm, ai ] = Accept, parsing successfully completed
4. If ACTION[Sm, ai ] = Error, parser detected an error (an empty entry in the action
table)

61
(SLR) Parsing Tables for Expression Grammar
Action Table Goto Table
Expression state id + * ( ) $ E T F
0 s5 s4 1 2 3
Grammar
1 s6 acc
1) E → E+T 2 r2 s7 r2 r2
2) E →T 3 r4 r4 r4 r4
3) T → T*F 4 s5 s4 8 2 3
5 r6 r6 r6 r6
4) T→F 6 s5 s4 9 3
5) F → (E) 7 s5 s4 10
6) F → id 8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 62
Actions of A (S)LR-Parser -- Example
For id*id+id

63
Constructing SLR Parsing Tables – LR(0) Item
 An LR parser makes shift-reduce decisions by maintaining states to keep track of
where we are in a parse.
 An LR(0) item of a grammar G is a production of G a dot at the some position of
the right side.
 Ex: A → aBb ..
Possible LR(0) Items: A → aBb

A → aB.b
(four different possibility) A → a Bb

A → aBb.
 Sets of LR(0) items will be the states of action and goto table of the SLR parser.
 i.e. States represent sets of "items.“
 A collection of sets of LR(0) items (the canonical LR(0) collection) is the basis
for constructing SLR parsers.
64
Constructing SLR Parsing Tables
 To construct the canonical LR(0) collection for a grammar, we define an augmented
grammar and two functions, CLOSURE and GOTO.
 Augmented Grammar:
 G’ is G with a new production rule S’→S where S’ is the new starting symbol.
 Purpose: to provide a single production that, when reduced, signals the end of parsing
Closure operation
 If I is a set of LR(0) items for a grammar G, then closure(I) is the set of LR(0)
items constructed from I by the two rules:
1. Initially, every LR(0) item in I is added to closure(I).
2. If A → .B is in closure(I), for all production rules B→ in G, add B→. in the
closure(I).
We will apply this rule until no more new LR(0) items can be added to closure(I).
65
Constructing SLR Parsing Tables – LR(0) Item
 Give a grammar
E → E +T |T
T →T * F | F
F →( E ) | id
Then,
 Closure({T → T .* F}) = {T → T . * F}
 Closure ({T → T .* F, T → T * .F}) = {T →T .* F, T → T * .F, F → .(E ), F → .id}
 Closure ({F →( .E ) } ) = {F →( .E ), E → . E + T, E → . T, T → . T * F, T → . F, F →
.( E ), F → . id }
 closure({E’ → .E}) ={E’ → .E, E → .E+T, E → .T, T → .T*F, T → .F, F → .(E),
F → .id}
66
Go-to Operation
 If I is a set of LR(0) items and X is a grammar symbol (terminal or non-terminal),

.
then goto(I,X) is defined as follows:
.
 If A →  X in I, then every item in closure({A → X }) will be in goto(I,X).

Example:
..
.. . . .
I ={E’ → E, E → E+T, E → T, T → T*F, T → F,

 .. ..
F → (E), F → id}

. . . .
goto(I,E) = closure({ E’ → E , E → E +T }) = { E’ → E , E → E +T })

 .. .. . . .
goto(I,T) = closure({ E →T , T → T *F }) = {E → T , T → T *F}
goto(I,F) = closure({ T → F }) = { T → F })

. . ..
goto(I,() = closure({ F → ( E)}) = {F → ( E), E → E+T, E → T, T → T*F, T

.
→ F, F → (E), F → id }

 .
goto(I,id) = closure( { F → id }) = { F → id }
Goto({E’ → E , E → E + .T},+) = closure({ }) = { }
67
Construction of The Canonical LR(0) Collection

 To create the SLR parsing tables for a grammar G, we will create the canonical
LR(0) collection of the grammar G’.
 Algorithm:
Void items(G’) {
C = { CLOSURE({S’→.S}) }
repeat
for (each set of items I in C)
for(each grammar symbol X)
if (goto(I,X) is not empty and not in C)
add goto(I,X) to C
Until no new sets of items are added to C on a round
} 68
Construction of The Canonical LR(0) Collection

 C = {closure({E’ → .E}) } = {E’ → . E, E →. E + T, E → . T, T → . T * F, T →. F, F →


.( E ), F → . id}.
 This gives us the items for the first state (state0 – I0) of our DFA
 Now we need to compute Goto functions for all of the relevant symbols in the set.
 In this case, we care about the symbols E, T, F, (, and id, since those are the symbols
that have a .symbol in front of them in some item of the set C.
 For symbol E, Goto(I0, E) = closure({E’ → E ., E → E . + T})
= {E’ → E ., E → E . + T} = call it I1
 For symbol T, Goto(I0, T) = closure({E → T ., T → T . * F})
= {E → T ., T → T . * F}) = call it I2

69
Construction of The Canonical LR(0) Collection
 For symbol F, Goto(I0, F) =closure({T → F.}) = {T → F.} = I3
 For symbol (, Goto(I0, () = closure({F → ( .E ) }) = {F →(.E ), E → . E + T, E → . T, T → . T *
F, T → . F, F → .( E ), F → . Id} = I4
 For symbol id, Goto(I0, id) = closure({F → id.}) = {F → id.} = I5
 Repeat this step for newly created states (I1, I2, I3, I4, I5) till . occures at the end of kernl of each state.
 For symbol +, Goto(I1, +) = closure({E → E +. T}) = {E → E +. T, }, T → . T * F, T → . F, F → .( E ),
F → . Id} = I6
 For symbol *, Goto(I2, *) = closure({T → T * .F})= {T → T * .F, F → .( E ), F → . Id} = I7.
 For symbol E, Goto(I4, E) = closure({F → ( E. ), E → E .+ T})
= {F → ( E. ), E → E .+ T} = I8
 For symbol T, Goto(I6, T) = closure({E → E +T., T → T. * F}) = ({E → E +T., T → T. * F =
I9
 For symbol F, Goto(I7, F) = closure({T → T * F.}) = ({T → T * F.}) = I1o
 For symbol (, Goto(I8, )) = closure({F → ( E ).}) = I11

70
Construction of The Canonical LR(0) Collection

71
Transition Diagram (DFA) of Goto Function

72
Constructing SLR Parsing Tables
1. Construct the canonical collection of sets of LR(0) items for G’.
C{I0,...,In}
2. Create the parsing action table as follows
2.1. If a is a terminal, A→.a in Ii and goto(Ii,a)=Ij then action[i, a] is shift j.
2.2. If A→. is in Ii , then action[i,a] is reduce A→ for all a in FOLLOW(A) where
AS’.
2.3. If S’→S. is in Ii , then action[i,$] is accept.
2.4. If any conflicting actions generated by these rules, the grammar is not SLR(1).
3. Create the parsing goto table
• for all non-terminals A, if goto(Ii,A)=Ij then goto[i,A]=j
4. All entries not defined by (2) and (3) are errors.
5. Initial state of the parser contains S’→.S

73
Constructing SLR Parsing Tables

 From Rule 2.1.


 Take F → .( E ) from I0, Goto(I0, ( ) = I4, then action[0, (] = shift 4
 Take E → E . + T from I1, Goto(I1, +) = I6, then action[1, +] = shift 6
 Take T → T . * F from I2, Goto(I2, *) = I7, then action[2, *] = shift 7
…other shifts can be populated in the same way…
 From Rule 2.2.
 Take E → T. from I2, Follow(T) = {$,),+}
 action[2,$] = reduce 2 (2: E → T )
 Action[2, )] = reduce 2
 Action[2, +)] = reduce 2
…other reduces can be done in the same way…
 From Rule 2.3.
 E’ → E . is I1, action[1,$] = accept

74
Constructing SLR Parsing Tables

 From 3 - Creating the parsing goto table


o Take E in I0, goto(I0,E)=I1 then goto[0,E]=1
o Take T in I0 , goto(I0,T)= I2 then goto[0, T] = 2
o Take T in I0, goto(I0,F)= I3 then goto[0, F] = 3
o Take E in I4, goto(I4,E)=I8 then goto[4,E]=8
o Take T in I4 , goto(I4,T)= I2 then goto[4, T] = 2
o Take F in I4, goto(I4,F)= I3 then goto[4, F] = 3
o Take T in I6 , goto(I6,T)= I9 then goto[6, T] = 9
o Take F in I6, goto(I6,F)= I3 then goto[6, F] = 3
o Take F in I7, goto(I7,F)= I10 then goto[7, F] = 10

75
Parsing Tables of Expression Grammar
Action Table Goto Table
state id + * ( ) $ E T F
0 s5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5 76

You might also like