Lecture# 05
Compiler Construction
A Simple Syntax-Directed Translator,
(Part-I)
by Safdar Hussain
Topics
• Syntax Definition (Structure of Compiler, Tokens vs Terminals, Tree Terminology, Grammars,
Parse Trees, Derivation, Ambiguity, Associativity & Precedence, left-associative grammars,
right-associative grammars, Syntax Statements, Multiple CFG Tasks
Overview
• This chapter contains introductory material
• Building a simple compiler
– Syntax Definition
– Syntax-Directed Translation
– Parsing
– A Translator for Simple Expressions
– The Lexical Analyzer
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 2
Overview
Programming Language can be defined by describing
1. The syntax of the language
a. What its program looks like
b. We use CFG or BNF (Backus Naur Form)
2. The semantics of the language
a. What its program mean
b. Difficult to describe
c. Use informal descriptions and suggestive examples
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 3
The Entire Compilation Process
• Grammars for Syntax Definition
• Syntax-Directed Translation
• Parsing - Top Down & Predictive
• Pulling Together the Pieces
• The Lexical Analysis Process
• Symbol Table Considerations
• A Brief Look at Code Generation
• Concluding Remarks/Looking Ahead
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 4
The Structure of our Compiler
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 5
A model of a Compiler’s Front End
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 6
Grammars for Syntax Definition
• A Context-free Grammar (CFG) Is Utilized to Describe
the Syntactic Structure of a Language
• A CFG Is Characterized By:
Non-Terminals
1. A Set of Tokens or Terminal Symbols
list list + digit
2. A Set of Non-terminals
list list - digit
3. A Set of Production Rules list digit
Each Rule Has the Form digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
NT {T, NT}*
4. A Non-terminal Designated As the Start Symbol Terminals
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 7
Tokens Versus Terminals
• The lexical analyzer reads the characters of the
source program, groups them into lexically
meaningful units called lexemes, and produces as
output tokens representing these lexemes.
• The token names are abstract symbols that are used
by the parser for syntax analysis. Often, we shall call
these token names terminals, since they appear as
terminal symbols in the grammar for a programming
language.
• In syntax analysis, often the tokens and terminals are
used interchangeably (synonymously).
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 8
Parse Trees
• The root of the tree is labeled by the start symbol
• Each leaf of the tree is labeled by a terminal (=token)
or
• Each interior node is labeled by a non-terminal
• If A X1 X2 … Xn is a production, then node A has
children X1, X2, …, Xn where Xi is a (non)terminal or
( denotes the empty string)
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 9
Tree Terminology
Thanks to Dragon Book
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 10
Grammars for Syntax Definition
Example CFG
list list + digit
list list - digit
list digit
digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
(the “|” means OR)
(So we could have written
list list + digit | list - digit |digit )
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 11
Example CFG (Function call)
call id ( optparams )
optparams params |
params params, param |param
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 12
Key Terms
A string of tokens is a sequence of zero or more tokens.
The string containing with zero tokens, written as , is
called empty string.
A grammar derives strings by beginning with the start
symbol and repeatedly replacing the non terminal by
the right side of a production for that non terminal.
The token strings that can be derived from the start
symbol form the language defined by the grammar.
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 13
Grammars are Used to Derive Strings:
Using the CFG defined on the earlier slide, we can
derive the string: 9 - 5 + 2 as follows:
list list + digit P1 : list list + digit
list - digit + digit P2 : list list - digit
digit - digit + digit P3 : list digit
9 - digit + digit P4 : digit 9
9 - 5 + digit P4 : digit 5
9-5+2 P4 : digit 2
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 14
Grammars are Used to Derive Strings:
This derivation could also be represented via a Parse Tree
(parents on left, children on right)
list list + digit list
list - digit + digit
list + digit
digit - digit + digit
- 2
9 - digit + digit list digit
9 - 5 + digit 5
digit
9-5+2
9
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 15
Ambiguity
Consider the following Context Free Grammar:
G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string>
With productions p =
string string + string | string - string | 0 | 1 | … | 9
This grammar is ambiguous, because more than one parse
tree generates the string 9-5+2
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 16
Ambiguity Continued
string string
string string string
string string string string string
9 - 5 + 2 9 - 5 + 2
First Parse Tree Second Parse Tree
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 17
Associativity of Operators
Left-associative operators have left-recursive productions:
left left + term | term
String a+b+c has the same meaning as (a+b)+c
Right-associative operators have right-recursive
productions:
right term = right | term
String a=b=c has the same meaning as a=(b=c)
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 18
Parse trees for left & right-associative
grammars
Parse tree for Left-associative grammar Parse tree for right-associative grammar
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 19
Precedence of Operators
Operators with higher precedence “bind more tightly”:
expr expr + term | term
term term * factor | factor BoDMAS Rule
factor number | ( expr )
String 2+3*5 has the same meaning as 2+(3*5)
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 20
Syntax of Statements
stmt id := expr
| if expr then stmt
| if expr then stmt else stmt
| while expr do stmt
| begin opt_stmts end
opt_stmts stmt ; opt_stmts
|
Note: Opt stands for optional
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 21
More Complex Grammars
block begin opt_stmts end
opt_stmts stmt_list |
stmt_list stmt_list ; stmt | stmt
What is this grammar for ?
What does “” represent ?
What kind of production rule is this ?
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 22
Defining a Parse Tree
• A parse tree pictorially shows how the start symbol of a
grammar derives a string in the language.
• More formally, a parse tree for a CFG has the following
properties:
Root is labeled with the Start Symbol
Leaf Node is a Token or
Interior Node is a Non-Terminal
If A X1X2…Xn, Then A Is an Interior; X1X2…Xn Are Children
of A and May Be Non-Terminals or Tokens
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 23
Problem & Solution (CFG Tasks)
Consider the context-free grammar
SSS+|SS*|a
a) Show how the string aa+a* can be generated by this grammar.
b) Construct a parse tree for this string
c) What language does this grammar generate? Justify your
answer.
?
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 24
Tasks Solution
Consider the context-free grammar
SSS+|SS*|a
a) Show how the string aa+a* can be generated by this grammar.
b) Construct a parse tree for this string
c) What language does this grammar generate? Justify your
answer.
Answers
(a)- string generation (b)- parse tree (c)- Grammar’s language
S SS*
S SS+S* L = {Postfix expression
S aS+S* consisting of digits, plus and
S aa+S* multiple signs}
S aa+a*
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 25
Problem & Solution (CFG Tasks)
What language is generated by the following
grammars? In each case justify your answer.
a) S 0S1 | 01
b) S +SS | -SS | a
c) S S(S)S | Ԑ
?
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 26
Tasks Solution
What language is generated by the following grammars? In each
case justify your answer.
a) S 0S1 | 01
b) S +SS | -SS | a
c) S S(S)S | Ԑ
Q# Languages
a L = {0n 1n | n>=1}
b L = {Prefix expression consisting of plus and minus signs}
c L = {Matched brackets of arbitrary arrangement and nesting, includes Ԑ}
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 27
Problem & Solution (CFG Tasks)
Consider the following grammars:
a) S 0S1 | 01
b) S +SS | -SS | a
c) S S(S)S | Ԑ
Which of the grammars are ambiguous?
?
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 28
Tasks Solution
Consider the following grammars: Which of the grammars
are ambiguous?
Q# Grammar Ambiguous? c
a S 0S1 |01 No
b S +SS | -SS | a No
For String = ()(), it can generate 2
trees
c S S(S)S | Ԑ Yes
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 29
Problem & Solution (CFG Tasks)
Construct unambiguous context-free grammars for each of the
following languages. In each case show that your grammar is
correct.
a) Arithmetic expressions in postfix notation.
b) Left-associative lists of identifiers separated by commas.
c) Right-associative lists of identifiers separated by commas.
?
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 30
Tasks Solution
Construct unambiguous context-free grammars for each of the
following languages. In each case show that your grammar is
correct.
a) Arithmetic expressions in postfix notation.
b) Left-associative lists of identifiers separated by commas.
c) Right-associative lists of identifiers separated by commas.
Q# Unambiguous CFGs
a E E E op | num
b list list , id |id
c list id , list |id
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 31
Problem & Solution (CFG Task)
Construct a context-free grammar for roman numerals.
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 32
Task Hint
Decimal Roman Decimal Roman
1 I 11 XI
2 II 12 XII
3 III 13 XIII
4 IV 14 XIV
5 V 15 XV
6 VI 16 XVI
7 VII 17 XVII
8 VIII 18 XVIII
9 IX 19 XIX
10 X 20 XX
Safdar Hussain, Lecturer CS, Khwaja Fareed University, RYK 33
The End
34