Compiler Construction
1
Two Pass Compiler
2
Two Pass Compiler
• The figure above shows the structure of a two-pass
compiler.
• The front end maps legal source code into an
intermediate representation (IR).
• The back end maps IR into target machine code.
• An immediate advantage of this scheme is that it allows
multiple front ends and multiple passes.
3
Front End
4
Front End
• The front end recognizes legal and illegal programs
presented to it.
• When it encounters errors, it attempts to report errors
in a useful way.
• For legal programs, front end produces IR and
preliminary storage map for the various data structures
declared in the program.
5
Front End
• The front end consists of two modules:
1. Scanner
2. Parser
6
Scanner
• The scanner takes a program as input and maps the
character stream into “words” that are the basic unit of
syntax.
• It produces pairs – a word and its part of speech.
7
Scanner
• For example,
the input
x = x + y
becomes
<id,x>
<assign,=>
<id,x>
<op,+>
<id,y> 8
Scanner
• We call the pair “<token type, word>” a token.
• Typical tokens are: number, identifier, +,
- , new, while, if.
9
Parser
• The parser takes in the stream of tokens, recognizes
context- free syntax and reports errors.
• It guides context-sensitive (“semantic”) analysis for
tasks like type checking.
• The parser builds IR for source program.
10
Context Free Grammars
• The syntax of most programming languages is
specified using Context-Free Grammars (CFG).
• Context- free syntax is specified with a grammar
G=(S,N,T,P) where
• S is the start symbol
• N is a set of non-terminal symbols
• T is set of terminal symbols or words
• P is a set of productions or rewrite rules
11
Context Free Grammars
• For example, the Context-Free Grammar for arithmetic
expressions is
1. goal → expr
2. expr → expr op term
3. | term
4. term → number
5. | id
6. op → +
7. | –
12
CFG
For This CFG
• S = goal
• T = { number, id, +, -}
• N = { goal, expr, term, op}
• P = { 1, 2, 3, 4, 5, 6, 7}
13
Context Free Grammar
• Given a CFG, we can derive sentences by repeated
substitution.
• Consider the sentence
x+2–y
14
Context Free Grammars
• For example, the Context-Free Grammar for arithmetic
expressions is
1. goal → expr
2. expr → expr op term
3. | term
4. term → number
5. | id
6. op → +
7. | –
15
Derivation
Production Result
goal
1. goal → expr expr
2. expr → expr op term expr op term
5. term → id expr op y
7. op → - expr - y
2. expr → expr op term expr op term - y
4. term → number expr op 2 - y
6. op → + expr + 2 - y
3. expr → term term + 2 - y
5. term → id x + 2 - y
16
Front End
• To recognize a valid sentence in some CFG, we reverse
this process and build up a parse, thus the name
“parser”.
• A parse can be represented by a tree: parse tree or
syntax tree.
17
Parse
18
Parse Tree
• A parse can be represented by a tree: parse tree or syntax
tree. For example, here is the parse tree for the expression
x+2–y
19
Parse Tree
• The parse tree captures all rewrite during the
derivation.
• The derivation can be extracted by starting at the root
of the tree and working towards the leaf nodes.
20
Abstract Syntax Tree
• The parse tree contains a lot of unneeded information.
Compilers often use an abstract syntax tree (AST).
• For example, the AST for the above parse tree is
21
Abstract Syntax Tree
• This is much more concise.
• AST summarizes grammatical structure without the
details of derivation.
• ASTs are one kind of intermediate representation
(IR).
22