0% found this document useful (0 votes)

79 views11 pages

Additional Note CSC 409

The compilation process involves multiple phases that take the source code through lexical analysis, syntax analysis, semantic analysis, code optimization, and code generation. An intermediate representation is generated after semantic analysis to make the code easier to optimize and generate for different target machines. Common intermediate representations include triple code and quadruples that break expressions into sub-expressions.

Uploaded by

Sadiq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views11 pages

Additional Note CSC 409

Uploaded by

Sadiq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Compilation Process

The compilation process is a sequence of various phases. Each phase takes input
from its previous stage, has its own representation of source program, and feeds its
output to the next phase of the compiler. Let us understand the phases of a compiler.
Lexical Analysis

The first phase of scanner works as a text scanner. This phase scans the source code
as a stream of characters and converts it into meaningful lexemes. Lexical analyzer
represents these lexemes in the form of tokens as:
<token-name, attribute-value>

Syntax Analysis

The next phase is called the syntax analysis or parsing. It takes the token produced by
lexical analysis as input and generates a parse tree (or syntax tree). In this phase,
token arrangements are checked against the source code grammar, i.e. the parser
checks if the expression made by the tokens is syntactically correct.
Parse Tree

Parse Tree
A parse tree is a graphical depiction of a derivation. It is convenient to see how strings
are derived from the start symbol. The start symbol of the derivation becomes the root
of the parse tree. Let us see this by an example from the last topic.
We take the left-most derivation of a + b * c
The left-most derivation is:
E → E * E
E → E + E * E
E → id + E * E
E → id + id * E
E → id + id * id

Step 1:

E→E*E

Step 2:
E→E+E*E

Step 3:

E → id + E * E

Step 4:

E → id + id * E

Step 5:
E → id + id * id

In a parse tree:

 All leaf nodes are terminals.

 All interior nodes are non-terminals.
 In-order traversal gives original input string.
A parse tree depicts associativity and precedence of operators. The deepest sub-tree
is traversed first, therefore the operator in that sub-tree gets precedence over the
operator which is in the parent nodes.

Ambiguity
A grammar G is said to be ambiguous if it has more than one parse tree (left or right
derivation) for at least one string.
Example
E → E + E
E → E – E
E → id

For the string id + id – id, the above grammar generates two parse trees:
The language generated by an ambiguous grammar is said to be inherently
ambiguous. Ambiguity in grammar is not good for a compiler construction. No method
can detect and remove ambiguity automatically, but it can be removed by either re-
writing the whole grammar without ambiguity, or by setting and following associativity
and precedence constraints.

Associativity
If an operand has operators on both sides, the side on which the operator takes this
operand is decided by the associativity of those operators. If the operation is left-
associative, then the operand will be taken by the left operator or if the operation is
right-associative, the right operator will take the operand.
Example
Operations such as Addition, Multiplication, Subtraction, and Division are left
associative. If the expression contains:
id op id op id

it will be evaluated as:

(id op id) op id

For example, (id + id) + id

Operations like Exponentiation are right associative, i.e., the order of evaluation in the
same expression will be:
id op (id op id)

For example, id ^ (id ^ id)

Precedence
If two different operators share a common operand, the precedence of operators
decides which will take the operand. That is, 2+3*4 can have two different parse trees,
one corresponding to (2+3)*4 and another corresponding to 2+(3*4). By setting
precedence among operators, this problem can be easily removed. As in the previous
example,
mathematically * (multiplication) has precedence over + (addition), so the expression
2+3*4 will always be interpreted as:
2 + (3 * 4)

These methods decrease the chances of ambiguity in a language or its grammar.

Semantic Analysis

Semantic analysis checks whether the parse tree constructed follows the rules of
language. For example, assignment of values is between compatible data types, and
adding string to an integer. Also, the semantic analyzer keeps track of identifiers, their
types and expressions; whether identifiers are declared before use or not etc. The
semantic analyzer produces an annotated syntax tree as an output.

Intermediate Code Generation

After semantic analysis the compiler generates an intermediate code of the source
code for the target machine. It represents a program for some abstract machine. It is in
between the high-level language and the machine language. This intermediate code
should be generated in such a way that it makes it easier to be translated into the
target machine code.

Code Optimization

The next phase does code optimization of the intermediate code. Optimization can be
assumed as something that removes unnecessary code lines, and arranges the
sequence of statements in order to speed up the program execution without wasting
resources (CPU, memory).

Code Generation

In this phase, the code generator takes the optimized representation of the
intermediate code and maps it to the target machine language. The code generator
translates the intermediate code into a sequence of (generally) re-locatable machine
code. Sequence of instructions of machine code performs the task as the intermediate
code would do.
Symbol Table

It is a data-structure maintained throughout all the phases of a compiler. All the

identifier's names along with their types are stored here. The symbol table makes it
easier for the compiler to quickly search the identifier record and retrieve it. The symbol
table is also used for scope management.

Intermediate Source Code

A source code can directly be translated into its target machine code, then why at all
we need to translate the source code into an intermediate code which is then
translated to its target code? Let us see the reasons why we need an intermediate
code.

 If a compiler translates the source language to its target machine language without having
the option for generating intermediate code, then for each new machine, a full native
compiler is required.
 Intermediate code eliminates the need of a new full compiler for every unique machine by
keeping the analysis portion same for all the compilers.
 The second part of compiler, synthesis, is changed according to the target machine.
 It becomes easier to apply the source code modifications to improve code performance by
applying code optimization techniques on the intermediate code.

Intermediate Representation
Intermediate codes can be represented in a variety of ways and they have their own
benefits.
 High Level IR - High-level intermediate code representation is very close to the source
language itself. They can be easily generated from the source code and we can easily
apply code modifications to enhance performance. But for target machine optimization, it is
less preferred.
 Low Level IR - This one is close to the target machine, which makes it suitable for register
and memory allocation, instruction set selection, etc. It is good for machine-dependent
optimizations.
Intermediate code can be either language specific (e.g., Byte Code for Java) or
language independent (three-address code).

Three-Address Code
Intermediate code generator receives input from its predecessor phase, semantic
analyzer, in the form of an annotated syntax tree. That syntax tree then can be
converted into a linear representation, e.g., postfix notation. Intermediate code tends to
be machine independent code. Therefore, code generator assumes to have unlimited
number of memory storage (register) to generate code.
For example:
a = b + c * d;

The intermediate code generator will try to divide this expression into sub-expressions
and then generate the corresponding code.
r1 = c * d;
r2 = b + r1;
a = r2

r being used as registers in the target program.

A three-address code has at most three address locations to calculate the expression.
A three-address code can be represented in two forms : quadruples and triples.

Quadruples

Each instruction in quadruples presentation is divided into four fields: operator, arg1,
arg2, and result. The above example is represented below in quadruples format:

Op arg1 arg2 result

* c d r1

+ b r1 r2

+ r2 r1 r3
= r3 a

Triples

Each instruction in triples presentation has three fields : op, arg1, and arg2.The results
of respective sub-expressions are denoted by the position of expression. Triples
represent similarity with DAG and syntax tree. They are equivalent to DAG while
representing expressions.

Op arg1 arg2

* c d

+ b (0)

+ (1) (0)

= (2)

Triples face the problem of code immovability while optimization, as the results are
positional and changing the order or position of an expression may cause problems.

Indirect Triples

This representation is an enhancement over triples representation. It uses pointers

instead of position to store results. This enables the optimizers to freely re-position the
sub-expression to produce an optimized code.

Declarations
A variable or procedure has to be declared before it can be used. Declaration involves
allocation of space in memory and entry of type and name in the symbol table. A
program may be coded and designed keeping the target machine structure in mind, but
it may not always be possible to accurately convert a source code to its target
language.
Taking the whole program as a collection of procedures and sub-procedures, it
becomes possible to declare all the names local to the procedure. Memory allocation is
done in a consecutive manner and names are allocated to memory in the sequence
they are declared in the program. We use offset variable and set it to zero {offset = 0}
that denote the base address.
The source programming language and the target machine architecture may vary in
the way names are stored, so relative addressing is used. While the first name is
allocated memory starting from the memory location 0 {offset=0}, the next name
declared later, should be allocated memory next to the first one.
Example:
We take the example of C programming language where an integer variable is
assigned 2 bytes of memory and a float variable is assigned 4 bytes of memory.
int a;
float b;

Allocation process:
{offset = 0}

int a;
id.type = int
id.width = 2

offset = offset + id.width

{offset = 2}

float b;
id.type = float
id.width = 4

offset = offset + id.width

{offset = 6}

To enter this detail in a symbol table, a procedure enter can be used. This method may
have the following structure:
enter(name, type, offset)

This procedure should create an entry in the symbol table, for variable name, having its
type set to type and relative address offset in its data area.

ERROR Recovery

A parser should be able to detect and report any error in the program. It is expected
that when an error is encountered, the parser should be able to handle it and carry on
parsing the rest of the input. Mostly it is expected from the parser to check for errors
but errors may be encountered at various stages of the compilation process. A
program may have the following kinds of errors at various stages:
 Lexical : name of some identifier typed incorrectly
 Syntactical : missing semicolon or unbalanced parenthesis
 Semantical : incompatible value assignment
 Logical : code not reachable, infinite loop
There are four common error-recovery strategies that can be implemented in the
parser to deal with errors in the code.

Panic mode

When a parser encounters an error anywhere in the statement, it ignores the rest of the
statement by not processing input from erroneous input to delimiter, such as semi-
colon. This is the easiest way of error-recovery and also, it prevents the parser from
developing infinite loops.

Statement mode

When a parser encounters an error, it tries to take corrective measures so that the rest
of inputs of statement allow the parser to parse ahead. For example, inserting a
missing semicolon, replacing comma with a semicolon etc. Parser designers have to
be careful here because one wrong correction may lead to an infinite loop.

Error productions

Some common errors are known to the compiler designers that may occur in the code.
In addition, the designers can create augmented grammar to be used, as productions
that generate erroneous constructs when these errors are encountered.

Compiler Design - YesDee
No ratings yet
Compiler Design - YesDee
427 pages
Bcse307l - Module 1
No ratings yet
Bcse307l - Module 1
121 pages
TSR - Class Cd-Unit 3
No ratings yet
TSR - Class Cd-Unit 3
111 pages
Succinctly
100% (1)
Succinctly
121 pages
Cse3077 CD m3
No ratings yet
Cse3077 CD m3
74 pages
Last Lecture
No ratings yet
Last Lecture
18 pages
CD Important Questions With Answers
No ratings yet
CD Important Questions With Answers
34 pages
RHCSA Part 1 Sander Van Vugt
No ratings yet
RHCSA Part 1 Sander Van Vugt
59 pages
Complier Design Documentation
No ratings yet
Complier Design Documentation
39 pages
Unit 3 Part 1
No ratings yet
Unit 3 Part 1
49 pages
UNIT 3 Compiler Design
No ratings yet
UNIT 3 Compiler Design
28 pages
Lecture 7 - Intermediate Code Generation
No ratings yet
Lecture 7 - Intermediate Code Generation
47 pages
Chapter 6 Code Generation and Optimization
No ratings yet
Chapter 6 Code Generation and Optimization
34 pages
Notes CD Unit - IV
No ratings yet
Notes CD Unit - IV
35 pages
Compiler Construction Iii B.E. - Vi Sem: Unit - I
No ratings yet
Compiler Construction Iii B.E. - Vi Sem: Unit - I
77 pages
2-Introduction To Compilation and Lexical Analysis-19!07!2024
No ratings yet
2-Introduction To Compilation and Lexical Analysis-19!07!2024
135 pages
CSC 204 Passes in Compiler Consturction
No ratings yet
CSC 204 Passes in Compiler Consturction
28 pages
Phases of A Compiler
No ratings yet
Phases of A Compiler
4 pages
Demonstrate The Phases of A Compiler With Example
No ratings yet
Demonstrate The Phases of A Compiler With Example
16 pages
Lecture 01
No ratings yet
Lecture 01
47 pages
1
No ratings yet
1
128 pages
Mod 4
No ratings yet
Mod 4
39 pages
محمود عبد الرحيم 2
No ratings yet
محمود عبد الرحيم 2
3 pages
15IR and SymTab
No ratings yet
15IR and SymTab
30 pages
Wa0057.
No ratings yet
Wa0057.
14 pages
CH 02 - PL
No ratings yet
CH 02 - PL
92 pages
CD Uint1
No ratings yet
CD Uint1
29 pages
Unit 1 CD
No ratings yet
Unit 1 CD
26 pages
SPCC2 Alok
No ratings yet
SPCC2 Alok
15 pages
BR Phaseisofcompiler Presention
No ratings yet
BR Phaseisofcompiler Presention
30 pages
Phases of Compiler
No ratings yet
Phases of Compiler
36 pages
Expressive: Matches Our Notion of Languages (And Application?!) Redundant To Help Avoid Programming Errors
No ratings yet
Expressive: Matches Our Notion of Languages (And Application?!) Redundant To Help Avoid Programming Errors
45 pages
Lecture#1 2
No ratings yet
Lecture#1 2
54 pages
Architectural Patterns
No ratings yet
Architectural Patterns
23 pages
Compiler Desining Complete Notes
No ratings yet
Compiler Desining Complete Notes
175 pages
How To Write A Literature Review: Step 1: Read A Lot
No ratings yet
How To Write A Literature Review: Step 1: Read A Lot
5 pages
How To Write A Literature Review: Step 1: Read A Lot
No ratings yet
How To Write A Literature Review: Step 1: Read A Lot
5 pages
COMPILER
No ratings yet
COMPILER
5 pages
1 Compiler Phases
No ratings yet
1 Compiler Phases
30 pages
My IOPS Calculator
No ratings yet
My IOPS Calculator
5 pages
Lect 02
No ratings yet
Lect 02
29 pages
Lec 2
No ratings yet
Lec 2
21 pages
UNIT-3 Part-A:Semantic Analysis 1. Intermediate Code Forms
No ratings yet
UNIT-3 Part-A:Semantic Analysis 1. Intermediate Code Forms
26 pages
RkCD-Chapter 6 - Intermediate Code Generation
No ratings yet
RkCD-Chapter 6 - Intermediate Code Generation
12 pages
Introduction To Compiling
100% (1)
Introduction To Compiling
26 pages
Overview of Compiler
No ratings yet
Overview of Compiler
56 pages
Array Form: Index
No ratings yet
Array Form: Index
7 pages
Anatomy of A Compiler
No ratings yet
Anatomy of A Compiler
12 pages
Bandwidth Test
No ratings yet
Bandwidth Test
12 pages
Compiler Construction Final
No ratings yet
Compiler Construction Final
6 pages
Ard 6982
No ratings yet
Ard 6982
12 pages
COMPILER DESIGN ASSIGNMENT TWO 17 12 2022 Submit
No ratings yet
COMPILER DESIGN ASSIGNMENT TWO 17 12 2022 Submit
18 pages
Index: HP H3C IRF Setting - 3. MAD Overview
No ratings yet
Index: HP H3C IRF Setting - 3. MAD Overview
3 pages
CA2E AdminGuide ENU
No ratings yet
CA2E AdminGuide ENU
170 pages
Dynamic Optimization: Minimize (, , ) 0, ,, 0
No ratings yet
Dynamic Optimization: Minimize (, , ) 0, ,, 0
5 pages
Oracle Apps R12 Forms: Servlet or Socket
No ratings yet
Oracle Apps R12 Forms: Servlet or Socket
6 pages
Lecture1 - Compiler Design
No ratings yet
Lecture1 - Compiler Design
52 pages
A Presentation On Basics of Compiler Design: Submitted By:-Harshit Khanna 0115CS081040
No ratings yet
A Presentation On Basics of Compiler Design: Submitted By:-Harshit Khanna 0115CS081040
19 pages
A Design and Implementation of Online Ho
No ratings yet
A Design and Implementation of Online Ho
66 pages
MEGALA
No ratings yet
MEGALA
12 pages
Phase 1: Lexical Analysis: Example
No ratings yet
Phase 1: Lexical Analysis: Example
6 pages
UsbFix Report
No ratings yet
UsbFix Report
2 pages
How To Mount External HD Drive To Solaris X86
No ratings yet
How To Mount External HD Drive To Solaris X86
3 pages
PostgreSQL Proficiency For Python People
No ratings yet
PostgreSQL Proficiency For Python People
215 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
Omron cqm1h 1of3 Datasheet en 200403-1189688 PDF
No ratings yet
Omron cqm1h 1of3 Datasheet en 200403-1189688 PDF
23 pages
CPP Interview Questions
No ratings yet
CPP Interview Questions
17 pages
Last Sem Project
No ratings yet
Last Sem Project
4 pages
Software Construction and Development: Lecture-02
No ratings yet
Software Construction and Development: Lecture-02
33 pages
Compiler Design: Dr. M. Moshiul Hoque Dept. of CSE, CUET
No ratings yet
Compiler Design: Dr. M. Moshiul Hoque Dept. of CSE, CUET
53 pages
Comptiia Linux +/LPI Portable Commandes
100% (3)
Comptiia Linux +/LPI Portable Commandes
337 pages
Memory and I/O Interfacing Memory: Accessibility
No ratings yet
Memory and I/O Interfacing Memory: Accessibility
52 pages
AMBA AHB5 Feb - 1
No ratings yet
AMBA AHB5 Feb - 1
36 pages
Ventura Publisher
No ratings yet
Ventura Publisher
1 page
Introduction To Compilation
No ratings yet
Introduction To Compilation
33 pages
Unit 1
No ratings yet
Unit 1
37 pages
Chapter One-Introduction
No ratings yet
Chapter One-Introduction
6 pages
Touchless Touch Screen User Interface
50% (2)
Touchless Touch Screen User Interface
33 pages
Principle of Compiler Design: Translator
No ratings yet
Principle of Compiler Design: Translator
20 pages
NDG Linux Essentials - Chapter 1 - Introduction To Linux
No ratings yet
NDG Linux Essentials - Chapter 1 - Introduction To Linux
1 page
Hotel Reservation System
100% (1)
Hotel Reservation System
34 pages
Compiler Design Ch1
No ratings yet
Compiler Design Ch1
13 pages
Computer Performance Evaluation
100% (1)
Computer Performance Evaluation
6 pages
Phases of Compiler
No ratings yet
Phases of Compiler
9 pages
FANUC Macro Program Programming
50% (4)
FANUC Macro Program Programming
16 pages
AN123INSTPREP
No ratings yet
AN123INSTPREP
14 pages
Web Application Testing
No ratings yet
Web Application Testing
8 pages
RakeshKumarGupta (14 0)
No ratings yet
RakeshKumarGupta (14 0)
11 pages
Get The LUN ID at AIX
No ratings yet
Get The LUN ID at AIX
4 pages
Event Driven Programming
No ratings yet
Event Driven Programming
5 pages
DIAGNOSTIC TEST IN GRADE 9 and 10 ICT
100% (1)
DIAGNOSTIC TEST IN GRADE 9 and 10 ICT
6 pages
C Programming Language
From Everand
C Programming Language
Younish Pathan
No ratings yet
Python Programming Concepts
From Everand
Python Programming Concepts
MRB
No ratings yet
C Programming
From Everand
C Programming
Netra
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)

Additional Note CSC 409

Uploaded by

Additional Note CSC 409

Uploaded by

Compilation Process

 All leaf nodes are terminals.

it will be evaluated as:

For example, (id + id) + id

For example, id ^ (id ^ id)

These methods decrease the chances of ambiguity in a language or its grammar.

Intermediate Code Generation

It is a data-structure maintained throughout all the phases of a compiler. All the

Intermediate Source Code

r being used as registers in the target program.

Op arg1 arg2 result

This representation is an enhancement over triples representation. It uses pointers

offset = offset + id.width

offset = offset + id.width

You might also like