Chapter 1 Overview of Compilation
Chapter 1 Overview of Compilation
1. Lexical Analysis: In this step the source program is read and then it is broken into stream
of strings. Such strings are called tokens. Hence tokens are nothing but the collection of
characters having some meaning.
2. Syntax Analysis: In this step the tokens are arranged in hierarchical structure that
ultimately helps in finding the syntax of the source string.
3. Semantic Analysis: In this step the meaning of the source string is determined.
In all these analysis steps the meaning of every source string should be unique. Hence actions
in lexical, syntax and semantic analysis are uniquely defined for a given language. After carrying
out the synthesis phase the program gets executed.
1.2.2 Execution of Program
To create an executable form of your source program only a compiler program is not sufficient.
You may require several other programs to create an executable target program. After a synthesis
phase a target code gets generated by the compiler. This target program generated by the compiler
is processed further before it can be run which is shown in Figure 1.4.
The compiler takes a source program written in high level language as an input and converts it
into a target assembly language. The assembler then takes this target assembly code as input and
produces a relocatable machine code as an output. Then a program loader is called for performing
the task of loading and linking. The task of loader is to perform the relocation of an object code.
Relocation of an object code means allocation of load time addresses which exist in the memory
and placement of load time addresses and data in memory at proper locations. The link editor links
the object modules and prepares a single module from several files of relocatable object modules
to resolve the mutual references. These files may be library files and these library files may be
referred by any program.
Properties of Compiler
When a compiler is built it should possess the following properties:
1. The compiler itself must be bug-free.
2. It must generate correct machine code.
1. Lexical Analysis
The lexical analysis is also called scanning. It is the first phase of compilation in which the
complete source code is scanned and your source program is broken up into group of strings
called token. A token is a sequence of characters having a collective meaning. For example, if
some assignment statement in your source code is as follows,
total = count + rate * 10
Then in lexical analysis phase this statement is broken up into series of tokens as follows:
1. the identifier total
2. the assignment symbol
3. the identifier count
4. the plus sign
5. the identifier rate
6. the multiplication sign
7. the constant number 10
2. Syntax Analysis
The syntax analysis is also called parsing. In this phase the tokens generated by the lexical
analyzer are grouped together to form a hierarchical structure. The syntax analysis determines the
structure of the source string by grouping the tokens together. The hierarchical structure generated
in this phase is called parse tree or syntax tree. For example, total = count + rate * 10 the parse
tree can be generated as follows:
rules are to be designed. The rules are usually expressed by context free grammar. For the above
statement the production rules are:
(1) E identifier
(2) E number
(3) E E1 + E2
(4) E E1 * E2
(5) E (E)
Where E stands for an expression
▪ By rule (1) count and rate are expression and
▪ By rule (2) 10 is also an expression
▪ By rule (4) we get rate*10 as expression
▪ And finally count + rate*10 is an expression
3. Semantic Analysis
Once the syntax is checked in the syntax analyzer phase the next phase i.e. the semantic analysis
determines the meaning of the source string. For example, meaning of source strings means
matching of parenthesis in the expression, or matching of if …else statements of performing
arithmetic operations of the expressions that are type compatible, or checking the scope of
operation. For example,
triple, posix. Here we will consider an intermediate code in three address code form. This is like
an assembly language. The three-address code consists of instructions each of which has at the
most three operands. For example,
t1 := int to float (10)
t2 := rate * t1
t3 := count + t2
total := t3
There are certain properties which should be possessed by the three-address code and those are,
1. Each three-address instruction has at the most one operator in addition to the assignment.
Thus, the compiler has to decide the order of the operations devised by the three-address
code.
2. The compiler must generate a temporary name to hold the value computed by each
instruction.
3. Some three address instructions may have fewer than three operands for examples first and
last instruction of above given three address code. i.e.
t1 := int to float (10)
total := t3
5. Code Optimization
The code optimization phase attempts to improve the intermediate code. This is necessary to
have a faster executing code or less consumption of memory. Thus, by optimizing the code the
overall running time of the target program can be improved.
6. Code Generation
In code generation phase the target code gets generated. The intermediate code instructions are
translated int sequence of machine instructions.
MOV rate , R1
MUL #10.0, R1
MOV count, R2
ADD R2, R1
MOV R1, total
Example below shows how an input a = b + c * 60 gets processed in compiler. Show the
output at each stage of compiler. Also show the contents of symbol table.
1. Preprocessor: the output preprocessors may be given as the input to compilers. The tasks
performed by the preprocessors are given as below.
Preprocessors allow user to use macros in the program. Macro means some set of instructions
which can be used repeatedly in the program. Thus, macro preprocessing task is be done by
preprocessors.
Preprocessor also allows user to include the header files which may be required by the program.
For example:
#include<iostream.h>
By this statement the header file iostream.h can be included and user can make use of the
functions defined in this header file. This task of preprocessor is called file inclusion.
MOV a, R1
MUL #5, R1
ADD #7, R1
MOV R1, b
The assembler converts these instructions in the binary language which can be understood by
the machine. Such a binary code is often called as machine code. This machine code is a
relocatable machine code that can be passed directly to the loader/linker for execution purpose.
The assembler converts the assembly program to low level machine language using two passes.
A pass means one complete scan of the input program. The end of second pass is the relocatable
machine code.
3. Loaders and Link Editors: Loader is a program which performs two functions: loading and
link editing. Loading is a process in which the relocatable machine code is read and the relocatable
address are altered. Then that code with altered instructions and data is placed in the memory at
proper locations. The job of link editor is to make a single program from several files of
relocatable machine code. If code in one file refers the location in another file then such a
reference is called external reference. The link editor resolves such external references also.
pass in the compilation process. A typical arrangement for optimizing compiler is one pass for
scanning and parsing, one pass for semantic analysis and third pass for code generation and target
code optimization. C and Pascal permit one pass compilation, Modula-2 requires two passes.
1.7 Types of Compiler
Incremental Compiler: is a compiler which preforms the recompilation of only modified
source rather then compiling the whole source program.
The basic features of incremental compiler are:
1. It tracks the dependencies between output and the source program.
2. It produces the same result as full recompile.
3. It performs less task than the recompilation.
4. The process of incremental compilation is effective for maintenance.
Cross Compiler: basically, there exists three types of languages:
1. Source language – the application program.
2. Target language – in which machine code is written.
3. The Implementation language – in which a compiler is written.
There may be a case that all these three languages are different. In other words, there may be a
compiler which run on one machine and produces the target code for another machine. Such a
compiler is called cross compiler. Thus, by using cross compilation technique platform
independency can be achieved.
To represent cross T diagram is draw as follows:
Figure 1.12 (a) T diagram with S as source, T as target and I as implementation language
1.8 Interpreter
An interpreter is a kind of translator which produces the result directly when the source language
and data is given to it as input. It does not produce the object code rather each time the program
needs execution. The model for interpreter is as shown in the following figure.
Languages such as BASIC, SNOBOL, LISP can be translated using interpreters. JAVA also
uses interpreter. The process of interpretation can be carried out in following phases.
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Direct execution
Advantages:
• Modification of user program can be easily made and implemented as execution proceeds.
• Type of object that denotes a variable may change dynamically.
• Debugging a program and finding errors is simplified task for a program used for
interpretation.
• The interpreter for the language makes it machine independent.
Disadvantages:
• The execution of the program is slower.
• Memory consumption is more.
1.9 Comparison of Compilers and Interpreters
In interpreter the analysis phase is same as compiler i.e. lexical, syntactic and semantic analysis
is also performed in interpreter. In compiler the object code needs to be generated whereas in
interpreter the object code need not be generated. During the process of interpretation the
interpreter exists along with the source program. Binding and type checking are dynamic in case
of interpreter.
The interpretation is less efficient than compilation. This is because the source program has to
be interpreted every time it is to be executed and every time it requires analysis of source program.
The interpreter can be made portal as they do not produce the object code which is not possible
in case of compiler. Interpreters save time in assembling and linking. Self modifying program
can be interpreted but cannot be compiled. Design of interpreter is simpler than the compiler.
Interpreters give us improved debugging environment as it can check the errors like out of bound
array indexing at run time. Interpreters are most useful in the environment where dynamic program
modification is required.
1.10 Compiler Construction Tools
Writing a compiler is a tedious and time-consuming task. There are some specialized tools for
helping in implementation of various phases of compilers. These tools are called compiler
construction tools. These tools are also called a compiler-compiler, compiler-generators, or
translator writing system. Various compiler construction tools are given as below: