[go: up one dir, main page]

0% found this document useful (0 votes)
13 views71 pages

Principles of Compiler Design - Unit I

Uploaded by

priyaranjana0911
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODP, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views71 pages

Principles of Compiler Design - Unit I

Uploaded by

priyaranjana0911
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODP, PDF, TXT or read online on Scribd
You are on page 1/ 71

Compiler Design

Compilers – Principles, Techniques and Tools


2nd Edition by Alfred V. Aho, Ravi Sethi, Jeffery D.
Dr. C.Ullman
Dhaya
Professor
Department of Computer Science & Engineering
Adhiparasakthi Engineering College
Melmaruvathur - 603319
What do we expect to achieve
by the end of the course?
 Students are confident that they can use
language processing technology for
various software developments

 Students are confident that they can


design, develop, understand,
modify/enhance and maintain
compilers

History and Milestones
 Until 1952
 Most programming in assembly language
 1952
 Grace Hopper writes first compiler for
the A-0 programming language
 1957-58
 John Backus and team writes first Fortran
compiler
 Optimization was an integral component
of the compiler
About the Nature of Compiler
Algorithms
 Makes practical application of
 Greedy algorithms - register allocation
 Heuristic search - list scheduling
 Graph algorithms - dead code elimination, register
 allocation
 Dynamic programming - instruction selection
 Optimization techniques - instruction scheduling
 Finite automata - lexical analysis
 Pushdown automata - parsing
 Fixed point algorithms - data-flow analysis
 Complex data structures - symbol tables, parse trees,
data
 dependence graphs
 Computer architecture - machine code generation
Unit – I
Introduction to Compilers
1 Translators – Compilers and Interpreters

2 Language Processors

3 Phases of a compiler

4 Errors Encountered in different Phases

5 Grouping of Phases

6 Compiler Construction tools

Programming Language
7
Basics
Translators
 A translator is a computer program that performs the
translation of a program written in a one programming
language into a different programming language,
without losing the functional or logical structure of the original
code (ie. the "essence" of original program).

One Programming Other


Language Programming
(High Level language
Languages) (Low Level
TRANSLAT
Languages)
OR
Computer Languages
• Computer languages are the languages through
which user can communicate with the computer by
writing program instructions.
Types of Translators
Types of Translators –
contd…
Introduction to Compiling
2 mark

 Compiler is a program that reads a program


written in one language (the source
language) and translates it into another
language (the target language).
 First compiler : started 1950’s (took 18 yrs)


Source Target
COMPILER
Program Program

Error
Messag
es
Contd…

Source Program : Traditional Programming Languages (Fortran and


Pascal)
Target Program : Any another programming language (or) the machine
language
Contd…
Input

Source Target
COMPILER
Program Program

Error
Messag Output
es
Compiler Vs. Interpreter
Compiler Vs. Interpreter –
Contd…
Diff between Compiler & Interpreter(2
mark)
Language Processing system.
How actually the compilation process? (6 or 7 mark)

Skeletal source
program
Preprocessor

Source Program

Compiler
Target assembly
program
Assembl
er
Relocatable machine
code
Loader/
Linker Run time Libraries

Absolute machine
code
Language Processing system-
contd…
 A computer program is divided into modules. Pre-
processor takes a module, and processes the skeletal
source program before its passed to the compiler. It adds
pre-processor directives and produces source code.

 Compiler is a program that takes source program as


input and produces assembly language program as
output.
 Assembler is a program that converts assembly
language program into machine language program. It
produces re-locatable machine code as its output.

Language Processing system-
contd…
 The re-locatable machine code has to be linked
together with other re-locatable object files and
library files.
 The loader puts together the entire executable
object files into memory for execution.

Loader
Memory
Language Processing system.-
contd…
Analysis – synthesis model of
Compilation
 2 parts of compilation
COMPILER
Source Target
Program Analysi Synthe Program
s sis

Analysis Synthesis
(i)BreaksPhase
the source Phase
program into (i) Constructs the desired
constituent pieces and target program from
creates intermediate the intermediate
representation representation
(ii)Front end of the compiler (ii)Back end of the compiler
Analysis – synthesis model of a
compiler
Phases of a compiler (13 mark)
Analysis of the source
program
 Analysis consists of 3 phases
 Linear analysis
 Hierarchical analysis
 Semantic analysis


Linear analysis
 Called as lexical analysis (or) SCANNING
 Read left to right, stream of characters and
grouped into TOKENS

Token  Sequence of characters having a collective


meaning, such as keywords, operators, identifiers
etc.

Lexeme  Instance of the Token

Pattern  Describes the rule that the lexeme takes


Example:
Position :=initial + rate * 60


Contd…

Stream of Tokens
characters Lexical Analysis
(O/P)
(I/P)

 Token Template : <token-name, attribute-value>


 Hence,
<id,1><=><id,2><+><id,3><*><60>
Syntax analysis
 Hierarchical analysis (or) PARSING
 Converts tokens produced by Lexical analyzer
into tree like representation called PARSE TREE.

Tokens Syntax Tree


Syntax Analysis
(I/P) (O/P)
Parse tree for position := initial + rate
* 60
Contd…
 Characters grouped are recorded in a table –
Symbol table

 Parse tree – describe the syntactic structure of


the input.
 Syntax tree – compressed representation of parse
tree.

Semantic analysis
 Semantic analysis checks for semantic
errors and gathers information for code
generation phase.
 Type checking – checks each operator has 2
operands


Intermediate code
generation
 Produces intermediate representation of the
source program
 Properties of Intermediate code generation
 Easy to produce
 Easy to translate into target program

 Various formats for representing Intermediate


code generation are
 Postfixnotation
 Three address code – most popular and most
widely used
 Quadruple & Triple


Contd - Three address code
 Sequence of instructions, each of which has at most 3
operands.
1. temp:= inttoreal(60)
2. temp2 := id3 * temp1
3. temp3 := id2 + temp2
4. id1 := temp3

 Properties of three address code


 Has at most 1 operator in addition to assignment

 Generate temporary name to hold the intermediate

value
 Some instructions, have operands < 3 (example 1 & 4)


Code optimization
 Improve the intermediate code so that object
programs runs faster and takes lesser
space.

 Optimization may be
 Detection and removal of dead(unreachable)
code)
 Calculation of constant expressions and terms
 Loop unrolling
 Moving codes outside loops
 Removal of unnecessary temporary variables


Code generation
 Final phase of the compiler
 Once optimizations are completed, the
intermediate code is mapped into sequence of
machine instructions.

 This involves
 Allocation of registers and memory
 Generation of correct references
 Generation of correct types
 Generation of machine code


Symbol table management
 Symbol table : Used to store identifiers(variables)
used in the program
 Information about attributes of each identifier.
 Scope
 Information about storage allocated

 Example:
 int a
 char b
 float c;

Why to use symbol table?


 To track the record of each identifier and retrieve

quickly.
Error detecting and
handling
 The errors identified, is reported to error handler

 Handles error

 Errors  normally in the form of message


 During,
 Lexical analysis  when they do not form
token
 Syntactical analysis  syntax errors
 Semantic analysis  type mismatch
Statement translation
Intermediate Code Generator
Non-optimized
Intermediate Code
Lexical Analyzer

Tokens

Code Optimizer
Syntax Analyzer Optimized Intermedia
Code
Syntax
tree

Code Generator

Semantic analyzer Target machine


code
Semantic Tree

F  Floating point
number
1st operand  source
nd
Assignment
 Translate the statement for
 p = i + r * 60
 a = b + c + 50
 si = p * n * r /100
Grouping of phases
 Activities from more than one phase can be
grouped together.

 Back end
Input Front end Intermediate
Lexical Parse Semant Code Code
Progra
 Analyz r icAnaly code Optimize Generato
m er zer r r

 output
Progra
 Front and Back Ends m
 Passes
 Reducing the number of passes

Grouping of phases –
contd…
Front and Back Ends:
 The front end phases depends more on source

language and is independent of the target


machine.
 These normally include lexical and syntactic analysis,
semantic analysis and intermediate code generation.
 In addition it includes symbol table management &
error handling.

 The back end phases depends more on target


machine and is independent of the source
language
 This includes code optimization and code generation.
 In addition it includes symbol table management &
error handling.
Why grouping of phases is
advantageous?
 Keeping same front end and attaching
different back ends (ie. Same source
program for different machines)
 Keeping different front ends and same
back end.
Grouping of phases –
contd…
Passes:
 One complete scan of source program - PASS
 It includes reading an input file and writing an
output file.

Grouping of phases –
contd…
Reducing the number of passes:
 Takes time to read and write intermediate files.
 Grouping of several phases into one pass, may force the
entire program in memory, because one phase may
need information in a different order than previous
phase produces it.
 Compiler requires
 1st pass – scanning and parsing
 2nd pass – semantic analysis
 3rd pass – code generation and target code optimization
 C, PASCAL  1 pass
 MODULA -2  2 passes
 Intermediate code and code generation are often
merged into one pass using a technique called
Differences between phases and pass
of the compiler

 Phase
 Output as another
Input as one
PHAS representation
representation
 of
E without changing
source program
 its meaning

 Pass
 Portionsof 1 or more phases combined
into a module called pass

Source program Intermediate file
(or) PASS which is used for next

output of previous pass
Compiler construction tools
 Tools that help in the design of compilers
are
 Parser generator
 Scanner generator
 Syntax directed translation engines
 Automatic code generators
 Data flow engines


Parser Generator
Input
(Syntax of
Parser Syntax
Programming Language
Generator analyzer
– based on Context free
grammar)

Scanner Generator
Input
(Specification of tokens Scanner Lexical
– based on Regular Generator analyzer
Expressions)

 UNIX has LEX. LEX has various regular expressions for


representing tokens
Syntax directed translation
engine
Input Scan Using Intermedia
(Parse collection of te code
tree) routines
 Translation is done for each node.

Automatic Code Generator


Each rule of intermediate
Each rule of intermediate
Input Automatic language into equivalent
(Intermediate code machine language
code) generator
 Template matching technique is done. Intermediate code
statements are replaced by “templates”

Data flow engines
 It is used to perform good code
optimization
 It’s just gathering information about how
values are transmitted from one part of
program to another.
Types of compiler
 Incremental compiler
 Performsrecompilation of only modified
source rather than compiling the whole
program.
 Cross compiler
 Compiler
that runs on one machine and
produces target code for another
machine

Tools that perform Analysis
 Structure Editors
 Pretty Printers
 Static Checkers
 Interpreters
 Text Formatters
 Silicon Compilers
 Query Interpreters
Structure Editors

 Ordinary text editor - text creation and


modification functions
 Structure Editor - Analyze the program text by
putting an appropriate hierarchical structure on
the source program
 Structure Structure Build a

of Editor source
commands Program

 Example:
 While – check for matching do
 Begin – end
 Left parenthesis – right parenthesis


Pretty printer
 A Pretty printer analyses a program and
prints it in such a way that the
structure of the program becomes
clearly visible.

 Example
 Comments in special font
 Indentation proportional to the depth –
nesting of the program

Static checker
 A static checker reads a program, analyses it and
attempts to discover potential bugs with out
running the program.

Static
Identify
 Program Checker potential
(Read, analyze
 without running) bugs

 Example
 Detect parts that never be executed
 Variable defined but not used
 Variable used but not defined
Interpreters
 Instead of producing a target program as a
translation, an interpreter performs the
operations implied by the source program.
 For example, for an assignment statement an
interpreter might build a tree and then carry
out the operations at the nodes as it “walks”
the tree.
:=

position + position := initial + rate * 60

initial *

rate 60
Text Formatters

Stream of Text
characters formatters

- Text
- Commands to include
paragraphs
- Figures
- Subscripts and superscripts
Silicon compilers

Source Circuit
language Silicon design in
Logical compilers appropriate
signals 0’s language
and 1’s
Query interpreters

Predicate Commands
containing Query
(helps in searching
relational and Interprete the database)
Boolean rs
expressions
Cousins of compiler

Source Preprocess COMPIL Post Target


Progra ing ER processi Progra
m ng m

 To help the compilation process, various


translators used are,
 Preprocessors
 Assemblers Cousins of
 Loaders compiler

 Link editors
Preprocessor
Skelet Preprocess Source COMPIL Machine
al or progra ER code
Source m,
Progra
Functions
m performed by preprocessor are
 Macro processing
 Allow user to define macros
 File inclusion
 Header files can be included example : #include <global.h>
 Rational preprocessor
 Augment older languages with modern flow and data
structuring facilities
 Language extensions
 Add more capabilities to the language
 Example: Database query language EQUAL is embedded in
C

Loaders
 Taking relocatable machine code,
altering relocatable addresses with
proper locations.
Source COMPIL Relocatable Proper
Progra machine code Loaders Machine
ER
m code
Link editors

Several files of
Link Single
Relocatable
Editors Program
machine code

 If files are used together, there may be external


references (one file refers to another file data)
 The relocatable machine code retains all the
information in the symbol table.
Programming Language
Basics
 The terminologies related with the programming
languages are discussed here. The concepts
related with the language like C, C++, C#, Java
are discussed here.
 Static/Dynamic Decision Policy
 Variable scope
 Mapping of Names to Values
 Scope of the Data Members
 Block Structure
 Access Specifiers
 Dynamic Scope

Static/Dynamic Decision
Policy
 During designing a compiler, a policy
should be followed to decide on issues
ie. following static policy or dynamic
policy.

 If languages uses Static policy  then


issues are decided at compile time
 If languages uses Dynamic policy  then
issues are decided at run time
Variable scope
 The scope of the variable declaration ”d”
is the region of the program in which
the use of “d” is referred.
 Static scope  scope is determined at
compile time(C & Java)
 Dynamic scope  scope is determined at
run time


Mapping of Names to
Values
 How a change in the program affect the
values of data elements in the program.
 Environment: mapping from (variable)
names to store addresses
 State: mapping from addresses to their
values
 Both mappings can be static or dynamic
Mapping of Names to Values –
Contd…
Block Structure
 Block is the sequence of variable declarations and
member function definitions.
 In C, C++ language  block is represented by “{“ and
“}”
 In Algol  block is represented by the keywords ‘begin’
and ‘end’.
 #include<iostream.h>
#include<conio.h>
void main()
{
 private int a,b; Block
void display() B1
 {
cout<<“No Pain; No Gain”;
} Block
} B2
Access Specifiers
 The access specifiers are used to include the
scope of the variables, in and out of the
class.
 In C++ and Java, access specifiers are public,
private & protected.
Class Apple
{
private:
int a,b;
public:
int c;
protected:
int d;
}
Dynamic Scope
 If the scope of the variables are decided at run
time, then its called as Dynamic Scope.
 The Dynamic scope of the variable is decided
by parameter passing mechanisms

 Parameters are of two types


 ActualParameters
 Formal Parameters
UNIT – I Completed

You might also like