Learn Compiler Design: From B. K.
Sharma
UNIT I
Structure of Compiler
Learn Compiler Design: From B. K. Sharma
Unit I: Syllabus
• Introduction to Compiler
• Structure of a compiler
• Lexical Analysis
• Role of Lexical Analyzer
• Input Buffering
• Specification of Tokens
• Recognition of Tokens
Learn Compiler Design: From B. K. Sharma
Unit I: Syllabus
• Lex
• Finite Automata
• Regular
• Expressions to Automata
• Minimizing DFA.
Learn Compiler Design: From B. K. Sharma
Summary of previous Lesson 1:
1:A compiler reads a program written in the source
language and translate it into an equivalent program in
target language.
2: Examples of source languages are High-level
programming languages (e.g., C, C++, Java), Query
Languages (SQL)
3: Examples of target languages are machine language,
assembly language, Intermediate language (Byte Code,
MSIL)
4: Three Types of Target Machine Code are Pure Machine
Code, Augmented Machine Code, Virtual Machine Code.
5: Different translators are Assemblers, compilers,
Interpreter pre-processor, Editor, profiler
Learn Compiler Design: From B. K. Sharma
Mapping of Lesson with Course Outcome
(CO)
Lesson CO
Lesson 2: Structure of Apply the knowledge of
Compiler theory of computation in
specifying and
recognizing tokens
Learn Compiler Design: From B. K. Sharma
Lesson 2: Phases of Compiler: Learning
Outcomes
At the end of this lesson, students will be able to
1: Define structure and phase of a compiler.
2: List different phases of Compiler.
3: List outputs of different phases of complier.
4 : Explain briefly the working of different phases
of compiler with examples.
5 : Differentiate between Syntax Tree and Parse
Tree
Learn Compiler Design: From B. K. Sharma
The Structure of Compiler
The compiler's structure is modular
consisting of following 8 components:
Learn Compiler Design: From B. K. Sharma
The Translation Process: Phases of Compiler
What is Phase of a compiler?
A phase of a compiler is a
distinguishable stage, which takes input
from the previous stage, processes and
produces output that can be used as
input for the next stage.
Learn Compiler Design: From B. K. Sharma
The Translation Process: Phases of Compiler
Source Program
1. Lexical Analysis
2. Syntax Analysis
3. Semantic Analysis
4. IR Code Gen
5. Code Optimization
6. Target Code Generation
Target Program
Learn Compiler Design: From B. K. Sharma
The Translation Process: Phases of Compiler
Source Program
1. Lexical Analysis(Scanner)
2. Syntax Analysis(Parser)
3. Semantic Analysis(Semantic Analyzer)
4. IR Code Gen
5. Code Optimization
6. Target Code Generation
Target Program
Learn Compiler Design: From B. K. Sharma
The Translation Process: Phases of Compiler
Source Program
1. Lexical Analysis(Scanner)
Tokens
2. Syntax Analysis(Parser)
Syntax Tree
3. Semantic Analysis(Semantic Analyzer)
Annotated Tree
4. IR Code Gen
Intermediate Code
5. Code Optimization
Optimized Code
6. Target Code Generation
Target Code
Target Program
Learn Compiler Design: From B. K. Sharma
The Translation Process: Phases of Compiler
Source Program pay := base + rate * 60
1. Lexical Analysis id1 := id2 + id3 * 60
:=
2. Syntax Analysis +
id1 *
id2
id3 60
3. Semantic Analysis :=
+
id1 *
4. IR Code Gen temp1 = inttoreal(60)
id2
temp2 = id3* temp1 id3 inttoreal
5. Code Optimization temp3 = id2+ temp2
id1 = temp3 60
temp1 = id3* 60.0
6. Target Code Generation id1 = id2+temp1
Target Program movf id3, fr2; movf id2, fr1
mulf #60.0, fr2; addf fr2,fr1
movf fr1, id1
Learn Compiler Design: From B. K. Sharma
The Analysis-synthesis model of compilation
A Compiler performs two major tasks:
Analysis of the source program
Synthesis of the target instruction code
Analysis:
This part breaks up the source program into pieces
for Lexical, syntax and semantic analysis, e.g.
parsing.
Synthesis:
This part constructs the desired target program, e.g.
code generation
Learn Compiler Design: From B. K. Sharma
The Analysis-synthesis model of compilation
Source Program
1. Lexical Analysis
Analysis of
Source program 2. Syntax Analysis
3. Semantic Analysis
4. IR Code Gen
Synthesis of
Target instruction 5. Code Optimization
code
6. Target Code Generation
Target Program
Learn Compiler Design: From B. K. Sharma
Front-End and Back-End Model of Compiler
Source Front IR Back Machine
code End End code
Errors
Learn Compiler Design: From B. K. Sharma
Front-End and Back-End Model of Compiler
Source Program
1. Lexical Analysis
Front-End
Machine 2. Syntax Analysis
Independent
3. Semantic Analysis
4. IR Code Gen
Back-End 5. Code Optimization
Machine
Dependent 6. Target Code Generation
Target Program
Learn Compiler Design: From B. K. Sharma
Front-End and Back-End Model of Compiler
Front-End: Analysis (Machine Independent)
Consist of those phases that depend on the source
language but largely independent of the target
machine.
Back –End: Synthesis(Machine Dependent)
Consist of those phases that are usually target
machine dependent such as optimization and code
generation.
Learn Compiler Design: From B. K. Sharma
Front-End and Back-End Model of Compiler
Front-End:
Recognize legal/illegal programs
Report / handle errors
Generate IR
Back –End:
Translate IR into target code
instruction selection (Better Code selection)
register allocation(Limited registers)
instruction scheduling (Instruction-level parallelism)
Learn Compiler Design: From B. K. Sharma
The Translation Process: Phases of Compiler
Lexical Meaning
Relating to vocabulary (Words, phrases) of a
language.
Learn Compiler Design: From B. K. Sharma
What are Tokens?
Smallest individual units that are recognized.
Tokens represent basic program entities such as:
Identifiers, Literals, Reserved Words,
Operators, Delimiters, etc.
Example: a := x + y * 2.5 ; is scanned as
a Identifier y Identifier
:= Assignment Operator Multiplication Operator
*
x Identifier 2.5 Real Literal
+ Plus Operator ; Delimiter Semicolon
Learn Compiler Design: From B. K. Sharma
What is Literal?
A literal is a sequence of characters that stands
for itself, for example, 12.
Generally both terms constants and literals are
used interchangeably.
Learn Compiler Design: From B. K. Sharma
Lexical Analysis: Scanner
The scanner begins the analysis of the source
program by:
Reading file character by character.
Grouping characters into tokens.
Eliminating unneeded information (comments and
white space).
Entering preliminary information into literal or
symbol tables.
Processing compiler directives by setting flags.
Learn Compiler Design: From B. K. Sharma
Syntax Analysis: Parser
Receives tokens from the scanner.
Recognizes the structure of the program as
a parse tree.
Parse tree is recognized according to a
context-free grammar.
Syntax errors are reported if the program
is syntactically incorrect
Learn Compiler Design: From B. K. Sharma
Syntax Analysis: Parser
A parse tree is inefficient to represent the
structure of a program.
A syntax tree is a more condensed version
of the parse tree.
A syntax tree is usually generated as output
by the parser.
Learn Compiler Design: From B. K. Sharma
Syntax Tree and Parse Tree
a := x + y * 2.5 ;
assign-stmt→ <id> := <expr>
<expr> → <expr> + <expr>
CFG | <expr> * <expr>
| <id>
<id>→a|b|c|x|y|z| 2.5
Parse Tree Syntax Tree
Learn Compiler Design: From B. K. Sharma
Difference between Parse Tree and Syntax Tree
Parse Tree Syntax Tree
Interior Nodes are Non- Interior nodes are
Terminals, leaves are operators, leaves are
terminals operands
Represents the concrete Represents the abstract
syntax of the Program syntax of the program.
Learn Compiler Design: From B. K. Sharma
Semantic Analysis:
The semantic analyzer does the following:
Checks the static semantics of the language
Annotates the syntax tree with type
information
Learn Compiler Design: From B. K. Sharma
Semantic Analysis:
The semantics of a program are its meaning
as opposed to syntax or structure.
The semantics consist of:
Runtime semantics – behavior of program at
runtime
Static semantics – checked by the compiler
Learn Compiler Design: From B. K. Sharma
Semantic Analysis:
Static semantics include:
Declarations of variables and constants
before use
Calling functions that exist (predefined in a
library or defined by the user)
Passing parameters properly
Type checking.
Learn Compiler Design: From B. K. Sharma
Intermediate Code Generation
Intermediate code is something that is both
close to the final machine code and easy to
manipulate (for optimization).
Source Front IR Back Machine
code End End code
Errors
Intermediate Code Separates the compiler
front end from its backend.
Learn Compiler Design: From B. K. Sharma
Intermediate Code Generation
Intermediate representation should have 2
important properties:
Should be easy to produce
Should be easy to translate into the target
program
Learn Compiler Design: From B. K. Sharma
Intermediate Code Generation
Intermediate representation can have a
variety of forms:
Three-address code, P-code for an
abstract machine, or DAG representation
One example is the three-address code:
op1 = op2 op op3
Learn Compiler Design: From B. K. Sharma
Intermediate Code Generation
Three-address code for the assignment statement
temp1 := int2real(y)
temp2 := temp1 real * 2.5
temp3 := x real+ temp2
a := temp3
Learn Compiler Design: From B. K. Sharma
Code Optimization
produces better/semantically equivalent code.
temp1 := int2real(y)
Before
temp2 := temp1 real * 2.5 Optimization
temp3 := x real+ temp2
a := temp3
temp1 := y real * 2.5 After
a := x real +temp1 Optimization
Learn Compiler Design: From B. K. Sharma
The Translation Process: Phases of Compiler
Source Program pay := base + rate * 60
1. Lexical Analysis id1 := id2 + id3 * 60
:=
2. Syntax Analysis +
id1 *
id2
id3 60
3. Semantic Analysis :=
+
id1 *
4. IR Code Gen temp1 = inttoreal(60)
id2
temp2 = id3* temp1 id3 inttoreal
5. Code Optimization temp3 = id2+ temp2
id1 = temp3 60
temp1 = id3* 60.0
6. Target Code Generation id1 = id2+temp1
Target Program movf id3, fr2; movf id2, fr1
mulf #60.0, fr2; addf fr2,fr1
movf fr1, id1
Learn Compiler Design: From B. K. Sharma
Summary of Lesson 2: Structure of a Compiler:
The compiler's structure is modular consisting of following
8 components:
1: Lexical Analyzer takes as input a stream of characters and
produces as output a stream of words or phrases or tokens along
with their associated syntactic categories
2: Syntax Analyzer takes as input a stream of tokens and
recognizes the structure of tokens according to grammar of the
language producing parse tree or syntax tree as output
3: Semantic analyzer checks the static semantics of the
language and annotates the syntax tree with type information
4: Intermediate Code Generator produces code that is machine
independent which is portable code.
Learn Compiler Design: From B. K. Sharma
Summary Lesson 2: Structure of a Compiler:
The compiler's structure is modular consisting of following
8 components:
5: Code Optimizer produces better /semantically equivalent
code
6: Target Code Generator produces final target low-level code
which in assembly language or machine language.
7: Symbol Table is a data structure used to store information
about identifiers, their types, and other attributes.
8: Error Handler ensures the compiler can detect and
report errors to the programmers, aiding in code
correction and improvement.