[go: up one dir, main page]

0% found this document useful (0 votes)
73 views17 pages

Chapter 1 Overview of Compilation

This document discusses the overview of compilation and the phases of a compiler. It begins by defining a compiler as a program that translates one language into another. It then describes the basic analysis-synthesis model of compilation, which involves breaking the source code into tokens, constructing a parse tree, and generating intermediate code. The rest of the document details the four main phases of compilation: 1) lexical analysis, 2) syntax analysis, 3) semantic analysis, and 4) intermediate code generation. It explains the processes involved in each phase and how they transform the source code representation.

Uploaded by

Elias Hirpasa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views17 pages

Chapter 1 Overview of Compilation

This document discusses the overview of compilation and the phases of a compiler. It begins by defining a compiler as a program that translates one language into another. It then describes the basic analysis-synthesis model of compilation, which involves breaking the source code into tokens, constructing a parse tree, and generating intermediate code. The rest of the document details the four main phases of compilation: 1) lexical analysis, 2) syntax analysis, 3) semantic analysis, and 4) intermediate code generation. It explains the processes involved in each phase and how they transform the source code representation.

Uploaded by

Elias Hirpasa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Ambo University IOT Department of Computer Science

Chapter 1: Overview of Compilation


1.1 Introduction
Compilers are basically translators. Designing a compiler for some language is a complex and
time-consuming process. Since new tools for writing the compilers are available this process has
now become a sophisticated activity. While studying the subject compiler it is necessary to
understand what is compiler and how the process of compilation can be carried out. In this chapter:
▪ we will get introduced with basic concepts of compiler
▪ we will see how the source program is compiled with the help of various phases of
compiler
▪ lastly, we will get introduced with various compiler construction tools.
1.2 Why to Write Compiler
Compiler is a program which takes one language (source program) as input and translates it
into an equivalent another language (target program).
During the process of translation if some errors are encountered then compiler displays them as
error messages. The basic model of compiler can be represented as follows

Input Compiler Output


Source Program
Target Program

Figure 1.1 Compiler


The compiler takes a source program as higher-level language such as C, C++, PASCAL,
FORTRAN, JAVA etc. and converts it into low level language or a machine level language such
as assembly language.
1.2.1 Compiler: Analysis – Synthesis Model
The compilation can be done in two parts: analysis and synthesis. In analysis part the source
program is read and broken down into constituent pieces. The syntax and the meaning of the source
string is determined and then an intermediate code is created from the input source program. In
synthesis part this intermediate form of the source language is taken and converted into an
equivalent target program. During this process if certain code has to be optimized for efficient
execution then required code is optimized. The analysis and synthesis model is as shown in Figure
1.2.

Compiler Design By Fikru T.(MSc.) and Dr. Velmurugan 2020


Ambo University IOT Department of Computer Science

The analysis part is carried out in three sub parts:

Figure 2.3 Analysis Part

1. Lexical Analysis: In this step the source program is read and then it is broken into stream
of strings. Such strings are called tokens. Hence tokens are nothing but the collection of
characters having some meaning.
2. Syntax Analysis: In this step the tokens are arranged in hierarchical structure that
ultimately helps in finding the syntax of the source string.
3. Semantic Analysis: In this step the meaning of the source string is determined.
In all these analysis steps the meaning of every source string should be unique. Hence actions
in lexical, syntax and semantic analysis are uniquely defined for a given language. After carrying
out the synthesis phase the program gets executed.
1.2.2 Execution of Program
To create an executable form of your source program only a compiler program is not sufficient.
You may require several other programs to create an executable target program. After a synthesis
phase a target code gets generated by the compiler. This target program generated by the compiler
is processed further before it can be run which is shown in Figure 1.4.

Compiler Design By Fikru T.(MSc.) and Dr. Velmurugan 2020


Ambo University IOT Department of Computer Science

The compiler takes a source program written in high level language as an input and converts it
into a target assembly language. The assembler then takes this target assembly code as input and
produces a relocatable machine code as an output. Then a program loader is called for performing
the task of loading and linking. The task of loader is to perform the relocation of an object code.
Relocation of an object code means allocation of load time addresses which exist in the memory
and placement of load time addresses and data in memory at proper locations. The link editor links
the object modules and prepares a single module from several files of relocatable object modules
to resolve the mutual references. These files may be library files and these library files may be
referred by any program.
Properties of Compiler
When a compiler is built it should possess the following properties:
1. The compiler itself must be bug-free.
2. It must generate correct machine code.

Compiler Design By Fikru T.(MSc.) and Dr. Velmurugan 2020


Ambo University IOT Department of Computer Science

3. The generated machine code must run fast.


4. The compiler itself run fast (compilation time must be proportional to program size).
5. The compiler must be portable (i.e. modular, supporting separate compilation).
6. It must give good diagnostics and error messages.
7. The generated code must work well with existing debuggers.
8. It must have consistent optimization.
1.3 Phases of Compiler
A compiler operates in phases. A phase is a logically interrelated operation that takes source
program in one representation and produces output in another representation. There are two phases
of compilation.
a. Analysis (Machine Independent/Language Dependent)
b. Synthesis (Machine Dependent/Language independent)
Compilation process is partitioned into no-of-sub processes called phases.
Phases of compiler is depicted in the following Figure 1.5.

Figure 3.5 Phases of Compiler

Compiler Design By Fikru T.(MSc.) and Dr. Velmurugan 2020


Ambo University IOT Department of Computer Science

1. Lexical Analysis
The lexical analysis is also called scanning. It is the first phase of compilation in which the
complete source code is scanned and your source program is broken up into group of strings
called token. A token is a sequence of characters having a collective meaning. For example, if
some assignment statement in your source code is as follows,
total = count + rate * 10
Then in lexical analysis phase this statement is broken up into series of tokens as follows:
1. the identifier total
2. the assignment symbol
3. the identifier count
4. the plus sign
5. the identifier rate
6. the multiplication sign
7. the constant number 10
2. Syntax Analysis
The syntax analysis is also called parsing. In this phase the tokens generated by the lexical
analyzer are grouped together to form a hierarchical structure. The syntax analysis determines the
structure of the source string by grouping the tokens together. The hierarchical structure generated
in this phase is called parse tree or syntax tree. For example, total = count + rate * 10 the parse
tree can be generated as follows:

Figure 4.6 Parse tree for total = count + rate * 10


In the statement ‘total = count + rate * 10’ first of all rate *10 will be considered because in
arithmetic expression the multiplication operation should be preformed before the addition. And
then the addition operation will be considered. For building such type of syntax tree the production

Compiler Design By Fikru T.(MSc.) and Dr. Velmurugan 2020


Ambo University IOT Department of Computer Science

rules are to be designed. The rules are usually expressed by context free grammar. For the above
statement the production rules are:
(1) E identifier
(2) E number
(3) E E1 + E2
(4) E E1 * E2
(5) E (E)
Where E stands for an expression
▪ By rule (1) count and rate are expression and
▪ By rule (2) 10 is also an expression
▪ By rule (4) we get rate*10 as expression
▪ And finally count + rate*10 is an expression
3. Semantic Analysis
Once the syntax is checked in the syntax analyzer phase the next phase i.e. the semantic analysis
determines the meaning of the source string. For example, meaning of source strings means
matching of parenthesis in the expression, or matching of if …else statements of performing
arithmetic operations of the expressions that are type compatible, or checking the scope of
operation. For example,

Figure 5.7 Semantic Analysis


Thus, these three phases are performing the task of analysis. After these phases an intermediate
code gets generated.
4. Intermediate Code Generation
The intermediate code is a kind of code which is easy to generate and this code can be easily
converted to target code. This code is in variety of forms such as three address code, quadruples,

Compiler Design By Fikru T.(MSc.) and Dr. Velmurugan 2020


Ambo University IOT Department of Computer Science

triple, posix. Here we will consider an intermediate code in three address code form. This is like
an assembly language. The three-address code consists of instructions each of which has at the
most three operands. For example,
t1 := int to float (10)
t2 := rate * t1
t3 := count + t2
total := t3
There are certain properties which should be possessed by the three-address code and those are,
1. Each three-address instruction has at the most one operator in addition to the assignment.
Thus, the compiler has to decide the order of the operations devised by the three-address
code.
2. The compiler must generate a temporary name to hold the value computed by each
instruction.
3. Some three address instructions may have fewer than three operands for examples first and
last instruction of above given three address code. i.e.
t1 := int to float (10)
total := t3
5. Code Optimization
The code optimization phase attempts to improve the intermediate code. This is necessary to
have a faster executing code or less consumption of memory. Thus, by optimizing the code the
overall running time of the target program can be improved.
6. Code Generation
In code generation phase the target code gets generated. The intermediate code instructions are
translated int sequence of machine instructions.
MOV rate , R1
MUL #10.0, R1
MOV count, R2
ADD R2, R1
MOV R1, total
Example below shows how an input a = b + c * 60 gets processed in compiler. Show the
output at each stage of compiler. Also show the contents of symbol table.

Compiler Design By Fikru T.(MSc.) and Dr. Velmurugan 2020


Ambo University IOT Department of Computer Science

Compiler Design By Fikru T.(MSc.) and Dr. Velmurugan 2020


Ambo University IOT Department of Computer Science

Compiler Design By Fikru T.(MSc.) and Dr. Velmurugan 2020


Ambo University IOT Department of Computer Science

Symbol Table Management


▪ To support these phases of compiler a symbol table is maintained. The task of symbol table
is to store identifiers (variables) used in the program.
▪ The symbol table also stores information about attributes of each identifier. The attributes
of identifiers are usually its type, its scope, information about the storage allocated for it.
▪ The symbol table also stores information about the subroutines used in the program. In
case of subroutine, the symbol table stores the name of the subroutine, number of arguments
passed to it, type of these arguments, the method of passing these arguments (may be call
by value or call by reference) return type if any.
▪ Basically, symbol table is a data structure used to store the information about identifiers.
▪ The symbol table allows us to find the record for each identifier quickly and to store or
retrieve data from that record efficiently.
▪ During compilation the lexical analyzer detects the identifier and makes its entry in the
symbol table. However, lexical analyzer cannot determine all the attributes of an identifier
and therefore the attributes are entered by remaining phases of compiler.
▪ Various phases can use the symbol table in various ways. For example, while doing the
semantic analysis and intermediate code generation, we need to know what type of
identifiers are. Then during code generation typically information about how much storage
is allocated to identifier is seen.
Error Detection and Handling
To err is human. As programs are written by human beings therefore, they cannot be free from
errors. In compilation, each phase detects errors. These errors must be reported to error handler
whose task is to handle the errors so that the compilation can proceed. Normally, the errors are
reported in the form of messages. When the input characters from the input do not form the token,
the lexical analyzer detects it as error. Large number of errors can be detected in syntax analysis
phase. Such errors are popularly called as syntax errors. During semantic analysis, type mismatch
kind of error is usually detected.

Compiler Design By Fikru T.(MSc.) and Dr. Velmurugan 2020


Ambo University IOT Department of Computer Science

1.4 Grouping of Phases of Compiler


Front end and back end of compiler

Figure 6.8 Front end and back end model of compiler


Different phases of compiler can be grouped together to form a front end and back end. The
front end consists of those phases that are primarily dependent on the source language and
independent of the target language. The front end consists of analysis part. Typically, it includes
lexical analysis, syntax analysis, and semantic analysis. Some amount of code optimization can be
also be done at front end. The back end consists of those phases that are totally dependent upon
the target language and independent on the source language. It includes code generation and code
optimization. the front end and back end model of the compiler is very much advantages because
of the following reasons.
1. By keeping the same front end and attaching different back ends one can produce a compiler
for same source language on different machines.
2. By keeping different front ends and same back end one can compile several different
languages on the same machine.

Figure 1.9 Role of front end and back end of compiler


1.5 Cousin of the Compiler
Sometimes the output of preprocessor may be given as input to the compiler. Cousins of
compiler means the context in which the compiler typically operates. Such contexts are basically
the program such as preprocessor, assemblers, loaders, and link editors.

Compiler Design By Fikru T.(MSc.) and Dr. Velmurugan 2020


Ambo University IOT Department of Computer Science

1. Preprocessor: the output preprocessors may be given as the input to compilers. The tasks
performed by the preprocessors are given as below.
Preprocessors allow user to use macros in the program. Macro means some set of instructions
which can be used repeatedly in the program. Thus, macro preprocessing task is be done by
preprocessors.
Preprocessor also allows user to include the header files which may be required by the program.
For example:
#include<iostream.h>
By this statement the header file iostream.h can be included and user can make use of the
functions defined in this header file. This task of preprocessor is called file inclusion.

Figure 1.10 Role of Preprocessor


Macro Preprocessor: Macro is small set of instructions. Whenever in a program macroname is
identified then that name is replaced by macro definition (i.e. set of instruction defining the
corresponding macro).
For a macro there are two kinds of statements: macro definition and macro use. The macro
definition is given by keyword like "define " or "macro " followed by the name of the macro.
For example:
# define PI =3.14
Whenever the "PI " is encountered in the program it is replaced by a value 3.14.
2. Assemblers: Some compilers produce the assembly code as output which is given to the
assemblers as an output. The assembler is a kind of translator which takes the assembly program
as input and produces the machine code as output.

Figure 1.11 Role of Assembler


An assembly code is a mnemonic version of machine code. The typical assembly
instructions are as given below:

Compiler Design By Fikru T.(MSc.) and Dr. Velmurugan 2020


Ambo University IOT Department of Computer Science

MOV a, R1
MUL #5, R1
ADD #7, R1
MOV R1, b
The assembler converts these instructions in the binary language which can be understood by
the machine. Such a binary code is often called as machine code. This machine code is a
relocatable machine code that can be passed directly to the loader/linker for execution purpose.
The assembler converts the assembly program to low level machine language using two passes.
A pass means one complete scan of the input program. The end of second pass is the relocatable
machine code.
3. Loaders and Link Editors: Loader is a program which performs two functions: loading and
link editing. Loading is a process in which the relocatable machine code is read and the relocatable
address are altered. Then that code with altered instructions and data is placed in the memory at
proper locations. The job of link editor is to make a single program from several files of
relocatable machine code. If code in one file refers the location in another file then such a
reference is called external reference. The link editor resolves such external references also.

1.6 Concept of Pass


One complete scan of the source language is called pass. It includes reading an input files and
writing to an output file. Many phases can be grouped one pass. It is difficult to compile the source
program into a single pass, because the program may have some forward references. It is
desirable to have relatively few passes, because it takes time to read and write intermediate file.
On the other hand, if we group several phases into one pass, we may be forced to keep the entire
program in the memory. Therefore, memory requirement may be large.
In the first pass the source program is scanned completely and the generated output will be an
easy to use form which is equivalent to the source program along with the additional information
about the storage allocation. It is possible to leave a blank slot for missing information and fill the
slot when the information becomes available. Hence there may be a requirement of more than one

Compiler Design By Fikru T.(MSc.) and Dr. Velmurugan 2020


Ambo University IOT Department of Computer Science

pass in the compilation process. A typical arrangement for optimizing compiler is one pass for
scanning and parsing, one pass for semantic analysis and third pass for code generation and target
code optimization. C and Pascal permit one pass compilation, Modula-2 requires two passes.
1.7 Types of Compiler
Incremental Compiler: is a compiler which preforms the recompilation of only modified
source rather then compiling the whole source program.
The basic features of incremental compiler are:
1. It tracks the dependencies between output and the source program.
2. It produces the same result as full recompile.
3. It performs less task than the recompilation.
4. The process of incremental compilation is effective for maintenance.
Cross Compiler: basically, there exists three types of languages:
1. Source language – the application program.
2. Target language – in which machine code is written.
3. The Implementation language – in which a compiler is written.
There may be a case that all these three languages are different. In other words, there may be a
compiler which run on one machine and produces the target code for another machine. Such a
compiler is called cross compiler. Thus, by using cross compilation technique platform
independency can be achieved.
To represent cross T diagram is draw as follows:

Figure 1.12 (a) T diagram with S as source, T as target and I as implementation language

Figure 1.12 (b) cross compiler


For source language L the target language N gets generated which runs on machine M.

Compiler Design By Fikru T.(MSc.) and Dr. Velmurugan 2020


Ambo University IOT Department of Computer Science

1.8 Interpreter
An interpreter is a kind of translator which produces the result directly when the source language
and data is given to it as input. It does not produce the object code rather each time the program
needs execution. The model for interpreter is as shown in the following figure.

Languages such as BASIC, SNOBOL, LISP can be translated using interpreters. JAVA also
uses interpreter. The process of interpretation can be carried out in following phases.
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Direct execution
Advantages:
• Modification of user program can be easily made and implemented as execution proceeds.
• Type of object that denotes a variable may change dynamically.
• Debugging a program and finding errors is simplified task for a program used for
interpretation.
• The interpreter for the language makes it machine independent.
Disadvantages:
• The execution of the program is slower.
• Memory consumption is more.
1.9 Comparison of Compilers and Interpreters
In interpreter the analysis phase is same as compiler i.e. lexical, syntactic and semantic analysis
is also performed in interpreter. In compiler the object code needs to be generated whereas in
interpreter the object code need not be generated. During the process of interpretation the

Compiler Design By Fikru T.(MSc.) and Dr. Velmurugan 2020


Ambo University IOT Department of Computer Science

interpreter exists along with the source program. Binding and type checking are dynamic in case
of interpreter.
The interpretation is less efficient than compilation. This is because the source program has to
be interpreted every time it is to be executed and every time it requires analysis of source program.
The interpreter can be made portal as they do not produce the object code which is not possible
in case of compiler. Interpreters save time in assembling and linking. Self modifying program
can be interpreted but cannot be compiled. Design of interpreter is simpler than the compiler.
Interpreters give us improved debugging environment as it can check the errors like out of bound
array indexing at run time. Interpreters are most useful in the environment where dynamic program
modification is required.
1.10 Compiler Construction Tools
Writing a compiler is a tedious and time-consuming task. There are some specialized tools for
helping in implementation of various phases of compilers. These tools are called compiler
construction tools. These tools are also called a compiler-compiler, compiler-generators, or
translator writing system. Various compiler construction tools are given as below:

1. Scanner generator: these generator generators lexical analyzers. The specification


given to these generators are in the form of regular expressions.
The UNIX has utility for a scanner generator called LEX. The specification given to LEX
consists of regular expressions for representing various tokens.
2. Parser generator: These produces the syntax analyzer. The specification given to
these generators is given in the form of context free grammar. Typically, UNIX has a tool
called YACC which is a parser generator.
3. Syntax-directed translation engines: In this tool the parse tree is scanned
completely to generate an intermediate code. The translation is done for each node of the
tree.
4. Automatic code generator: These generators take an intermediate code as input
and converts each rule of intermediate language into equivalent machine language. The
template matching technique is used. The intermediate code statements are replaced
templates that represent the corresponding sequence of machine instructions.
5. Data flow engines: The data flow analysis is required to perform good code
optimization. The data flow engines are basically useful in code optimization.

Compiler Design By Fikru T.(MSc.) and Dr. Velmurugan 2020


Ambo University IOT Department of Computer Science

1.10 Summary of Chapter One


• Compiler is a program which takes one language (source program) as input and translate
it into an equivalent language (target program).
• Compilation can be done in two parts: analysis and synthesis.
• Various phases of compiler are:
o Lexical analysis
o Syntax analysis
o Semantic analysis
o Intermediate code generation
o Code optimization
o Code generation
• Along with these phases symbol table management and error detection and handling is
done.
• The compiler has a front end and back end skeleton of grouping phases.
• Cousins of compiler are:
o Preprocessor
o Assembler
o Loader
o Link Editor
• One complete scan of the source language is called pass. It includes reading an input file
and writing to an output file.
• An interpreter is a kind of translator which produces the result directly when the source
language and data is given to it as input.
• Various compiler construction tools are scanner generator, parser generator like LEX,
YACC, syntax directed translation engine, automatic code generator, data flow engines.
• In this chapter we have discussed the fundamental concepts of compiler. Now we will
discuss every phase of compiler in detail in further chapters.

Compiler Design By Fikru T.(MSc.) and Dr. Velmurugan 2020

You might also like