Compiler Design
Compilers – Principles, Techniques and Tools
2nd Edition by Alfred V. Aho, Ravi Sethi, Jeffery D.
Dr. C.Ullman
Dhaya
Professor
Department of Computer Science & Engineering
Adhiparasakthi Engineering College
Melmaruvathur - 603319
What do we expect to achieve
by the end of the course?
Students are confident that they can use
language processing technology for
various software developments
Students are confident that they can
design, develop, understand,
modify/enhance and maintain
compilers
History and Milestones
Until 1952
Most programming in assembly language
1952
Grace Hopper writes first compiler for
the A-0 programming language
1957-58
John Backus and team writes first Fortran
compiler
Optimization was an integral component
of the compiler
About the Nature of Compiler
Algorithms
Makes practical application of
Greedy algorithms - register allocation
Heuristic search - list scheduling
Graph algorithms - dead code elimination, register
allocation
Dynamic programming - instruction selection
Optimization techniques - instruction scheduling
Finite automata - lexical analysis
Pushdown automata - parsing
Fixed point algorithms - data-flow analysis
Complex data structures - symbol tables, parse trees,
data
dependence graphs
Computer architecture - machine code generation
Unit – I
Introduction to Compilers
1 Translators – Compilers and Interpreters
2 Language Processors
3 Phases of a compiler
4 Errors Encountered in different Phases
5 Grouping of Phases
6 Compiler Construction tools
Programming Language
7
Basics
Translators
A translator is a computer program that performs the
translation of a program written in a one programming
language into a different programming language,
without losing the functional or logical structure of the original
code (ie. the "essence" of original program).
One Programming Other
Language Programming
(High Level language
Languages) (Low Level
TRANSLAT
Languages)
OR
Computer Languages
• Computer languages are the languages through
which user can communicate with the computer by
writing program instructions.
Types of Translators
Types of Translators –
contd…
Introduction to Compiling
2 mark
Compiler is a program that reads a program
written in one language (the source
language) and translates it into another
language (the target language).
First compiler : started 1950’s (took 18 yrs)
Source Target
COMPILER
Program Program
Error
Messag
es
Contd…
Source Program : Traditional Programming Languages (Fortran and
Pascal)
Target Program : Any another programming language (or) the machine
language
Contd…
Input
Source Target
COMPILER
Program Program
Error
Messag Output
es
Compiler Vs. Interpreter
Compiler Vs. Interpreter –
Contd…
Diff between Compiler & Interpreter(2
mark)
Language Processing system.
How actually the compilation process? (6 or 7 mark)
Skeletal source
program
Preprocessor
Source Program
Compiler
Target assembly
program
Assembl
er
Relocatable machine
code
Loader/
Linker Run time Libraries
Absolute machine
code
Language Processing system-
contd…
A computer program is divided into modules. Pre-
processor takes a module, and processes the skeletal
source program before its passed to the compiler. It adds
pre-processor directives and produces source code.
Compiler is a program that takes source program as
input and produces assembly language program as
output.
Assembler is a program that converts assembly
language program into machine language program. It
produces re-locatable machine code as its output.
Language Processing system-
contd…
The re-locatable machine code has to be linked
together with other re-locatable object files and
library files.
The loader puts together the entire executable
object files into memory for execution.
Loader
Memory
Language Processing system.-
contd…
Analysis – synthesis model of
Compilation
2 parts of compilation
COMPILER
Source Target
Program Analysi Synthe Program
s sis
Analysis Synthesis
(i)BreaksPhase
the source Phase
program into (i) Constructs the desired
constituent pieces and target program from
creates intermediate the intermediate
representation representation
(ii)Front end of the compiler (ii)Back end of the compiler
Analysis – synthesis model of a
compiler
Phases of a compiler (13 mark)
Analysis of the source
program
Analysis consists of 3 phases
Linear analysis
Hierarchical analysis
Semantic analysis
Linear analysis
Called as lexical analysis (or) SCANNING
Read left to right, stream of characters and
grouped into TOKENS
Token Sequence of characters having a collective
meaning, such as keywords, operators, identifiers
etc.
Lexeme Instance of the Token
Pattern Describes the rule that the lexeme takes
Example:
Position :=initial + rate * 60
Contd…
Stream of Tokens
characters Lexical Analysis
(O/P)
(I/P)
Token Template : <token-name, attribute-value>
Hence,
<id,1><=><id,2><+><id,3><*><60>
Syntax analysis
Hierarchical analysis (or) PARSING
Converts tokens produced by Lexical analyzer
into tree like representation called PARSE TREE.
Tokens Syntax Tree
Syntax Analysis
(I/P) (O/P)
Parse tree for position := initial + rate
* 60
Contd…
Characters grouped are recorded in a table –
Symbol table
Parse tree – describe the syntactic structure of
the input.
Syntax tree – compressed representation of parse
tree.
Semantic analysis
Semantic analysis checks for semantic
errors and gathers information for code
generation phase.
Type checking – checks each operator has 2
operands
Intermediate code
generation
Produces intermediate representation of the
source program
Properties of Intermediate code generation
Easy to produce
Easy to translate into target program
Various formats for representing Intermediate
code generation are
Postfixnotation
Three address code – most popular and most
widely used
Quadruple & Triple
Contd - Three address code
Sequence of instructions, each of which has at most 3
operands.
1. temp:= inttoreal(60)
2. temp2 := id3 * temp1
3. temp3 := id2 + temp2
4. id1 := temp3
Properties of three address code
Has at most 1 operator in addition to assignment
Generate temporary name to hold the intermediate
value
Some instructions, have operands < 3 (example 1 & 4)
Code optimization
Improve the intermediate code so that object
programs runs faster and takes lesser
space.
Optimization may be
Detection and removal of dead(unreachable)
code)
Calculation of constant expressions and terms
Loop unrolling
Moving codes outside loops
Removal of unnecessary temporary variables
Code generation
Final phase of the compiler
Once optimizations are completed, the
intermediate code is mapped into sequence of
machine instructions.
This involves
Allocation of registers and memory
Generation of correct references
Generation of correct types
Generation of machine code
Symbol table management
Symbol table : Used to store identifiers(variables)
used in the program
Information about attributes of each identifier.
Scope
Information about storage allocated
Example:
int a
char b
float c;
Why to use symbol table?
To track the record of each identifier and retrieve
quickly.
Error detecting and
handling
The errors identified, is reported to error handler
Handles error
Errors normally in the form of message
During,
Lexical analysis when they do not form
token
Syntactical analysis syntax errors
Semantic analysis type mismatch
Statement translation
Intermediate Code Generator
Non-optimized
Intermediate Code
Lexical Analyzer
Tokens
Code Optimizer
Syntax Analyzer Optimized Intermedia
Code
Syntax
tree
Code Generator
Semantic analyzer Target machine
code
Semantic Tree
F Floating point
number
1st operand source
nd
Assignment
Translate the statement for
p = i + r * 60
a = b + c + 50
si = p * n * r /100
Grouping of phases
Activities from more than one phase can be
grouped together.
Back end
Input Front end Intermediate
Lexical Parse Semant Code Code
Progra
Analyz r icAnaly code Optimize Generato
m er zer r r
output
Progra
Front and Back Ends m
Passes
Reducing the number of passes
Grouping of phases –
contd…
Front and Back Ends:
The front end phases depends more on source
language and is independent of the target
machine.
These normally include lexical and syntactic analysis,
semantic analysis and intermediate code generation.
In addition it includes symbol table management &
error handling.
The back end phases depends more on target
machine and is independent of the source
language
This includes code optimization and code generation.
In addition it includes symbol table management &
error handling.
Why grouping of phases is
advantageous?
Keeping same front end and attaching
different back ends (ie. Same source
program for different machines)
Keeping different front ends and same
back end.
Grouping of phases –
contd…
Passes:
One complete scan of source program - PASS
It includes reading an input file and writing an
output file.
Grouping of phases –
contd…
Reducing the number of passes:
Takes time to read and write intermediate files.
Grouping of several phases into one pass, may force the
entire program in memory, because one phase may
need information in a different order than previous
phase produces it.
Compiler requires
1st pass – scanning and parsing
2nd pass – semantic analysis
3rd pass – code generation and target code optimization
C, PASCAL 1 pass
MODULA -2 2 passes
Intermediate code and code generation are often
merged into one pass using a technique called
Differences between phases and pass
of the compiler
Phase
Output as another
Input as one
PHAS representation
representation
of
E without changing
source program
its meaning
Pass
Portionsof 1 or more phases combined
into a module called pass
Source program Intermediate file
(or) PASS which is used for next
output of previous pass
Compiler construction tools
Tools that help in the design of compilers
are
Parser generator
Scanner generator
Syntax directed translation engines
Automatic code generators
Data flow engines
Parser Generator
Input
(Syntax of
Parser Syntax
Programming Language
Generator analyzer
– based on Context free
grammar)
Scanner Generator
Input
(Specification of tokens Scanner Lexical
– based on Regular Generator analyzer
Expressions)
UNIX has LEX. LEX has various regular expressions for
representing tokens
Syntax directed translation
engine
Input Scan Using Intermedia
(Parse collection of te code
tree) routines
Translation is done for each node.
Automatic Code Generator
Each rule of intermediate
Each rule of intermediate
Input Automatic language into equivalent
(Intermediate code machine language
code) generator
Template matching technique is done. Intermediate code
statements are replaced by “templates”
Data flow engines
It is used to perform good code
optimization
It’s just gathering information about how
values are transmitted from one part of
program to another.
Types of compiler
Incremental compiler
Performsrecompilation of only modified
source rather than compiling the whole
program.
Cross compiler
Compiler
that runs on one machine and
produces target code for another
machine
Tools that perform Analysis
Structure Editors
Pretty Printers
Static Checkers
Interpreters
Text Formatters
Silicon Compilers
Query Interpreters
Structure Editors
Ordinary text editor - text creation and
modification functions
Structure Editor - Analyze the program text by
putting an appropriate hierarchical structure on
the source program
Structure Structure Build a
of Editor source
commands Program
Example:
While – check for matching do
Begin – end
Left parenthesis – right parenthesis
Pretty printer
A Pretty printer analyses a program and
prints it in such a way that the
structure of the program becomes
clearly visible.
Example
Comments in special font
Indentation proportional to the depth –
nesting of the program
Static checker
A static checker reads a program, analyses it and
attempts to discover potential bugs with out
running the program.
Static
Identify
Program Checker potential
(Read, analyze
without running) bugs
Example
Detect parts that never be executed
Variable defined but not used
Variable used but not defined
Interpreters
Instead of producing a target program as a
translation, an interpreter performs the
operations implied by the source program.
For example, for an assignment statement an
interpreter might build a tree and then carry
out the operations at the nodes as it “walks”
the tree.
:=
position + position := initial + rate * 60
initial *
rate 60
Text Formatters
Stream of Text
characters formatters
- Text
- Commands to include
paragraphs
- Figures
- Subscripts and superscripts
Silicon compilers
Source Circuit
language Silicon design in
Logical compilers appropriate
signals 0’s language
and 1’s
Query interpreters
Predicate Commands
containing Query
(helps in searching
relational and Interprete the database)
Boolean rs
expressions
Cousins of compiler
Source Preprocess COMPIL Post Target
Progra ing ER processi Progra
m ng m
To help the compilation process, various
translators used are,
Preprocessors
Assemblers Cousins of
Loaders compiler
Link editors
Preprocessor
Skelet Preprocess Source COMPIL Machine
al or progra ER code
Source m,
Progra
Functions
m performed by preprocessor are
Macro processing
Allow user to define macros
File inclusion
Header files can be included example : #include <global.h>
Rational preprocessor
Augment older languages with modern flow and data
structuring facilities
Language extensions
Add more capabilities to the language
Example: Database query language EQUAL is embedded in
C
Loaders
Taking relocatable machine code,
altering relocatable addresses with
proper locations.
Source COMPIL Relocatable Proper
Progra machine code Loaders Machine
ER
m code
Link editors
Several files of
Link Single
Relocatable
Editors Program
machine code
If files are used together, there may be external
references (one file refers to another file data)
The relocatable machine code retains all the
information in the symbol table.
Programming Language
Basics
The terminologies related with the programming
languages are discussed here. The concepts
related with the language like C, C++, C#, Java
are discussed here.
Static/Dynamic Decision Policy
Variable scope
Mapping of Names to Values
Scope of the Data Members
Block Structure
Access Specifiers
Dynamic Scope
Static/Dynamic Decision
Policy
During designing a compiler, a policy
should be followed to decide on issues
ie. following static policy or dynamic
policy.
If languages uses Static policy then
issues are decided at compile time
If languages uses Dynamic policy then
issues are decided at run time
Variable scope
The scope of the variable declaration ”d”
is the region of the program in which
the use of “d” is referred.
Static scope scope is determined at
compile time(C & Java)
Dynamic scope scope is determined at
run time
Mapping of Names to
Values
How a change in the program affect the
values of data elements in the program.
Environment: mapping from (variable)
names to store addresses
State: mapping from addresses to their
values
Both mappings can be static or dynamic
Mapping of Names to Values –
Contd…
Block Structure
Block is the sequence of variable declarations and
member function definitions.
In C, C++ language block is represented by “{“ and
“}”
In Algol block is represented by the keywords ‘begin’
and ‘end’.
#include<iostream.h>
#include<conio.h>
void main()
{
private int a,b; Block
void display() B1
{
cout<<“No Pain; No Gain”;
} Block
} B2
Access Specifiers
The access specifiers are used to include the
scope of the variables, in and out of the
class.
In C++ and Java, access specifiers are public,
private & protected.
Class Apple
{
private:
int a,b;
public:
int c;
protected:
int d;
}
Dynamic Scope
If the scope of the variables are decided at run
time, then its called as Dynamic Scope.
The Dynamic scope of the variable is decided
by parameter passing mechanisms
Parameters are of two types
ActualParameters
Formal Parameters
UNIT – I Completed