Compiler Design
• Course Code: CS105101CS
• Unit 1: Introduction and Lexical
Analysis
• Lecture 2: Phases of the Compiler,
Cousins of the Compiler
Outcome
§ Overview of various phases of the Compiler
§ Tools to implement different phases
§ Cousins of the Compiler
§ Grouping of Phases
Recap
Source code / HLL code
Language Translator
Machine code
Recap
Source code / HLL code
Preprocessor Compiler Assembler Linker/Loader
Machine code
Recap
Compiler
Pure HLL
Assembly language
Source Program / High-level Language
Pure HLL
Phases of Compiler
X=a+b*c
Compiler
Phases of Compiler Lexical Analysis
X=a+b*c Syntax Analysis
Semantic Analysis
Compiler Intermediate Code
Generation
Code Optimization
Target Code
Generation
Lexical Analyzer
X=a+b*c
Recognizes Tokens using
Lexical Analysis Regexs.
Example: Regex for identifier
Lexemes Tokens
l(l+d)*|_(l+d)*
x Identifier
l: letter
= Operator
d: digit
a Identifier
_: underscore
+ Operator
b Identifier
* Operator
c Identifier
Example: Regex for identifier
Lexical Analyzer
l(l+d)*|_(l+d)*
l: letter
d: digit
Lexemes Tokens _: underscore
x Identifier
= Operator
a Identifier
+ Operator letter q1 (Letter OR digit)*
b Identifier
* Operator q0 q3
c Identifier
Underscore q2 (Letter OR digit)*
Syntax Analyzer
Id = id + id * id ;
S
Lexemes Tokens
id = E ;
x Identifier
= Operator
E + T
a Identifier Syntax Analysis
+ Operator T T * F
b Identifier
* Operator F F id
c Identifier S id = E ;
E E+T|T id id
T T*F|F
X=a+b*c F id
Parse Tree
Semantic Analyzer
S
id = E ;
E + T Semantically
Semantic Analyzer Verified
T T * F Parse Tree
F F id
id id
Parse Tree
Intermediate Code Generator
Semantically
Verified Intermediate Code Generator Intermediate Code
Parse Tree
Three address code
(TAC)
id = id + id * id ;
1. t0 = b * c;
2. t1 = a + t0;
X=a+b*c; 3. x = t1;
Code Optimizer
Intermediate Code Code Optimization Optimized Code
Three address code Three address code
(TAC) (TAC)
1. t0 = b * c; 1. t0 = b * c;
2. t1 = a + t0; 2. x = a + t0;
3. x = t1;
Target Code Generator
Target Code
Optimized Code
Generation
Three address code
(TAC)
1. t0 = b * c;
2. x = a + t0;
Phases of Compiler Lexical Analysis
X=a+b*c Syntax Analysis
Semantic Analysis
Compiler Intermediate Code
Generation
Code Optimization
Target Code
Generation
Tools for Practical Implementation
LEX
Lexical Analysis
YACC
Syntax Analysis
Semantic Analysis
Intermediate Code
Generation
Code Optimization
Target Code
Generation
Tools for Practical Implementation
Lexical Analysis
Syntax Analysis Front-end
Semantic Analysis
Intermediate Code
Generation
Back-end
Code Optimization
Target Code
Generation
Practice Question
§ Price = amount + rate * 50
§ Write the answer of each phases of the compiler (till code optimization)
Compiler: Internal Architecture
Lexical Analysis
Syntax Analysis
Semantic Analysis
Symbol Table Error
Intermediate Code
Manager Handler
Generation
Code Optimization
Target Code
Generation
Symbol Table
§ Data Structure
Lexical Analysis
§ Variable and Function names
§ Objects Syntax Analysis
§ Classes
§ Interfaces Semantic Analysis
Symbol Table Intermediate Code
Manager Generation
Code Optimization
Target Code
Generation
Parts of the Compiler Phases
§ Analysis: Analysis part breaks up the program into
constituent pieces and creates an intermediate Lexical Analysis
representation of the source program . It consists the
Syntax Analysis
phases that are
§ Lexical Analysis Semantic Analysis
§ Syntax Analysis
§ Semantic Analysis Intermediate Code
§ Intermediate Generation
Code Optimization
§ Synthesis: Synthesis parts construct the desired
target from intermediate representation. It consists Target Code
the phases that are Generation
§ Code Optimization
§ Code Generation
Symbol Table
Usage:
Lexical Analysis § Create entries for identifiers.
Syntax Analysis § Adds information regarding attributes
Semantic Analysis § Using the available information checks
Symbol Table Intermediate Code semantics and update the symbol table.
Manager Generation § Available information helps in adding
temporary variable information.
Code Optimization § Available information used in machine
depended optimization.
Target Code
§ Generates the Target code using address
Generation
information of identifiers.
Symbol Table: Entries
Name Type Size Dimension Line of Declaration Line of Usage Address
Symbol Table: Entries
int count;
char x[] = “NIT Raipur”;
Name Type Size Dimension Line of Declaration Line of Usage Address
count int 2 0 -- -- --
x char 10 1 -- -- --
Symbol Table: Operations
§ Non-Block Structured Language:
o Contains single instance of the variable declaration.
o Operations:
i. Insert()
ii. Lookup()
§ Block Structured Language:
o Variable declaration may happen multiple times.
o Operations:
i. Insert()
ii. Lookup()
iii. Set()
iv. Reset()
Error Handler
§ Error Handler detects each phase error.
§ An error is a blank entry in the symbol table.
Basic Classification of Errors:
1. Lexical: Misspellings of identifiers, keywords, operators, etc.
2. Syntactical: Missing semi-colon, unbalanced parenthesis, etc.
3. Semantical: Incompatible value assignment, operator- operand type mismatch.
4. Logical: Code not reachable, infinite loop, etc.
Cousins of the Compiler
§ Preprocessor: It converts high-level language to pure high-level
language. The functions performed by preprocessor are like macro-
processing. File inclusion, Rational preprocessor and language
extension.
§ Compiler: Take high level language as input and convert it into
assembly language.
§ Assembler: Takes assembly code as input and converts it into machine
code.
§ Linker: The system program which is capable of combining program
modules or library functions into a single machine language program is
known as linker.
§ Loader: Process of loader consists of taking relocatable machine code
altering the relocatable address placing the attend in strings and data in
memory at proper locations.
Grouping of Phases
Two compiler: Single phase & Multi phase (two pass)
Single Pass 1st pass
Lexical Analysis
Lexical Analysis
Syntax Analysis
Syntax Analysis Front end
Semantic Analysis
Semantic Analysis
Intermediate Code
Intermediate Code Generation
Generation
Code Optimization Code Optimization
Target Code Back end Target Code 2nd pass
Generation Generation
Front end & Back end
§ 1st pass § 2nd pass
§ Known as analysis part § Known as synthesis part
§ Platform independent § Platform dependent
§ Depends on High-level language § Dependent on machine
§ Different front end for different § Different back end for different
machine machine
§ Takes less time to execute § Takes more time
§ Takes more memory § Takes less memory
§ Portability § Non portable
Compiler Construction Tools
Some commonly used construction tools are:
§ Parser generator
§ Scanner generator
§ Syntax directed translation engine
§ Code generator generators
§ Data flow analysis engine
§ Compiler construction toolkit
Few Questions
1. In the context of Compilers, which of the following is/are NOT an intermediate
representation of the source program?
a) Three Address Code
b) Abstract Syntax tree (AST)
c) Symbol Table
d) Control Flow Graph (CFG)