Unit 1 Introduction
Unit 1 Introduction
Unit-1 Introduction
Introduction to translators- Assembler, Compiler, Interpreter,
Difference between Compiler and Interpreter, Linker, Loader , one pass compiler,
multi pass compiler, cross compiler , The components of Compiler, Stages of
Compiler: Front end, Back end, Qualities of Good Compiler
Translator
• A translator is a program that takes one form of program as input and
converts it into another form.
• Types of translators are:
1. Compiler
2. Interpreter
Source Target
Translator
3. Assembler Program Program
Error
Messages (If any)
Compiler
• A compiler is a program that reads a program written in source
language and translates it into an equivalent program in target
language
Compiler Interpreter
Scans the entire program and translates it It translates program’s one statement at a
as a whole into machine code. time.
It generates intermediate code. It does not generate intermediate code.
An error is displayed after entire program is An error is displayed for every instruction
checked. interpreted if any.
Memory requirement is more. Memory requirement is less.
Example: C compiler Example: Basic, Python, Ruby
Language Processing System Raw Source Program
Preprocessor
• In addition to compiler, many other system
Modified Source
programs are required to generate Program
absolute machine code. Compiler
Preprocessor
• Some of the task performed by preprocessor:
1. Macro processing: Allows user to define macros. Modified Source
Program
Ex: #define PI 3.14159265358979323846
Compiler
2. File inclusion: A preprocessor may include the header file
into the program. Ex: #include<stdio.h> Target Assembly
3. Rational preprocessor: It provides built in macro for Program
construct like while statement or if statement. Assembler
4. Language extensions: Add capabilities to the language Relocatable Object
by using built-in macros. Code
• Ex: the language equal is a database query language Libraries &
embedded in C. Statement beginning with ## are taken Object Files
Linker / Loader
by preprocessor to be database access, statement
unrelated to C and translated into procedure call on
routines that perform the database access. Target Machine
Code
Language Processing System Raw Source Program
Preprocessor
Compiler
Modified Source
• A compiler is a program that reads a Program
Relocatable Object
Code
Libraries &
Linker / Loader
Object Files
Target Machine
Code
Language Processing System Raw Source Program
Preprocessor
Assembler
Modified Source
• Assembler is a translator which takes the Program
Compiler
assembly program (mnemonic) as an input
and generates the machine code as an Target Assembly
Program
output.
Assembler
Relocatable Object
Code
Libraries &
Linker / Loader
Object Files
Target Machine
Code
Language Processing System Raw Source Program
Linker Preprocessor
Modified Source
• Linker makes a single program from a several Program
files of relocatable machine code. Compiler
• These files may have been the result of several Target Assembly
different compilation, and one or more library Program
files.
Loader Assembler
Relocatable Object
Code
• The process of loading consists of: Libraries & Linker / Loader
• Taking relocatable machine code Object Files
Preprocessor
• The linker and loader are essential
components in the compilation process. Modified Source
Program
• The linker is responsible for combining Compiler
multiple object files generated by the
compiler into a single executable file or Target Assembly
Program
library. It resolves symbols and references
between different object files, ensuring that Assembler
gcc -c functions.c
Language Processing System Raw Source Program
Preprocessor
• This will generate two object files: main.o
Modified Source
and functions.o. The linker comes into Program
play when you want to create an Compiler
executable from these object files: Target Assembly
Program
gcc main.o functions.o -o program
Assembler
• Here, the linker (ld) combines both object Relocatable Object
files (main.o and functions.o) to create an Code
Preprocessor
• Loader: After linking, the resulting
Modified Source
executable file needs to be loaded into Program
memory for execution. This is where the Compiler
loader comes in. The loader is responsible Target Assembly
for allocating memory space for the Program
int y = 10;
perform multiple passes over the source code. In
earlier passes, they identify and record all identifiers
and symbols encountered. In subsequent passes,
they resolve these references by looking up their
definitions.
• Overall, handling forward references correctly is
crucial for ensuring proper linking and execution of
programs during compilation.
Pass structure
• Forward reference: A forward reference of a program entity is a
reference to the entity which precedes its definition in the program.
• This problem can be solved by postponing the generation of target code
until more information concerning the entity becomes available.
• It leads to multi pass model of compilation.
Pass I:
Lexical analyzer
Semantic analyzer
Preprocessor
Variable Type Address
Name Intermediate code
Modified Source
Position Float 0001 Generator
Program
Initial Float 0005
Compiler Rate Float 0009
Target Machine
Independent Code Error detection
Target Assembly Symbol table
optimization and recovery
Program
Assembler
Target Code
Relocatable Object generation
Code
Libraries & Target Machine Synthesis Phase
Linker / Loader
Object Files Dependent Code
Optimizer
Target Machine
Code
Target Program
Lexical analysis
• Lexical Analysis is also called linear analysis or scanning.
• Lexical Analyzer divides the given source statement into the Position = initial + rate*60
• tokens.
• Ex: would be grouped into the
Lexical analysis
following tokens:
• Position (identifier)
id1 = id2 + id3 * 60
• = (Assignment symbol) initial
(identifier)
• + (Plus symbol) rate
(identifier)
• * (Multiplication symbol) 60
(Number)
Variable Type Address
Name
Position Float 0001
Symbol table
Initial Float 0005
Rate Float 0009
Syntax analysis
Position = initial + rate*60
• Syntax Analysis is also called Parsing or Hierarchical
Analysis. Lexical analysis
• The syntax analyzer checks each line of the code and
spots every tiny mistake. id1 = id2 + id3 * 60
• If code is error free then syntax analyzer generates the
tree. Syntax analysis
id1 +
id2 *
id3 60
Semantic analysis
=
60
Intermediate code generator
=
• Two important properties of intermediate
id1 +
code:
id2 *
1. It should be easy to produce.
t3 id3 inttoreal
2. Easy to translate into target program. t2 t1
60
• Intermediate form can be represented using Intermediate code
“three address code”.
t1= int to real(60)
• Three address code consist of a sequence of t2= id3 * t1
instruction, each of which has at most three t3= t2 + id2
id1= t3
operands.
Code optimization
• It improves the intermediate code. Intermediate code
• This is necessary to have a faster execution
t1= int to real(60)
of code or less consumption of memory. t2= id3 * t1
t3= t2 + id2
id1= t3
Code optimization
Code generation
MOV id3, R2
MUL #60.0, R2
MOV id2, R1
ADD R2,R1
MOV R1, id1
Id3R2
Id2R1
Front end & back end (Grouping of phases)
Front end
• Depends primarily on source language and largely independent of the target machine.
• It includes followingphases:
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Intermediate code generation
5. Creation of symbol table
Back end
• Depends on target machine and do not depends on source program.
• It includes followingphases:
1. Code optimization
2. Code generation phase
3. Error handling and symbol table operation
Characteristics of Good Compiler
• Correctness
• Efficiency
• Portability
• Error Diagnostics
• Optimization Capabilities
• Modularity
• Scalability
• Support for Language Features
GATE CS 2008
• Some code optimizations are carried out on the intermediate code
because
• (A) they enhance the portability of the compiler to other target
processors
• (B) program analysis is more accurate on intermediate code than on
machine code
• (C) the information from dataflow analysis cannot otherwise be used
for optimization
• (D) the information from the front end cannot otherwise be used for
optimization
GATE CS 2008
• Some code optimizations are carried out on the intermediate code
because
• (A) they enhance the portability of the compiler to other target processors
• (B) program analysis is more accurate on intermediate code than on
machine code
• (C) the information from dataflow analysis cannot otherwise be used for
optimization
• (D) the information from the front end cannot otherwise be used for
optimization
• Answer: A
GATE CS 1997
• A language L allows declaration of arrays whose sizes are not known
during compilation. It is required to make efficient use of memory.
Which of the following is true?
• (A) A compiler using static memory allocation can be written for
• (B) A compiler cannot be written for L, an interpreter must be used
• (C) A compiler using dynamic memory allocation can be written for L
• (D) None of the above
GATE CS 1997
• A language L allows declaration of arrays whose sizes are not known
during compilation. It is required to make efficient use of memory.
Which of the following is true?
• (A) A compiler using static memory allocation can be written for
• (B) A compiler cannot be written for L, an interpreter must be used
• (C) A compiler using dynamic memory allocation can be written for L
• (D) None of the above
• Answer: C
GATE-CS-2014-(Set-3)
• One of the purposes of using intermediate code in compilers is to
• (A) make parsing and semantic analysis simpler.
• (B) improve error recovery and error reporting.
• (C) increase the chances of reusing the machine-independent code
optimizer in other compilers.
• (D) improve the register allocation.
GATE-CS-2014-(Set-3)
• One of the purposes of using intermediate code in compilers is to
• (A) make parsing and semantic analysis simpler.
• (B) improve error recovery and error reporting.
• (C) increase the chances of reusing the machine-independent code
optimizer in other compilers.
• (D) improve the register allocation.
• Answer: C
• In a two-pass assembler, symbol table is
• (A) Generated in first pass
• (B) Generated in second pass
• (C) Not generated at all
• (D) Generated and used only in second pass
• In a two-pass assembler, symbol table is
• (A) Generated in first pass
• (B) Generated in second pass
• (C) Not generated at all
• (D) Generated and used only in second pass
• Answer: A
• How many tokens will be generated by the scanner for the following
statement ?
• x = x ∗ (a + b) – 5;
• (A) 12
• (B) 11
• (C) 10
• (D) 07
• How many tokens will be generated by the scanner for the following
statement ?
• x = x ∗ (a + b) – 5;
• (A) 12
• (B) 11
• (C) 10
• (D) 07
• Answer: A
• Symbol table can be used for:
• A) Checking type compatibility
• B) Suppressing duplication of error message
• C) Storage allocation
• D) All of these
• Symbol table can be used for:
• A) Checking type compatibility
• B) Suppressing duplication of error message
• C) Storage allocation
• D) All of these
• Answer: D
• The access time of the symbol table will be logarithmic if it is
implemented by
• A)Linear list
• B) Search tree
• C) Hash table
• D) Self organization list
• The access time of the symbol table will be logarithmic if it is
implemented by
• A)Linear list
• B) Search tree
• C) Hash table
• D) Self organization list
• Answer: B
GATE CSE 2009
• Match all items in Group 1 with the correct options from those
given in Group 2. Group 1 Group 2
• (A) P-4, Q-1, R-2, S-3 P. Regular expression 1. Syntax Analysis
Q. Pushdown automata 2. Code Generation
• (B) P-3, Q-1, R-4, S-2 R. Dataflow analysis 3. Lexical Analysis
S. Register allocation 4. Code Optimization
• (C) P-3, Q-4, R-1, S-2
• Answer: 26
1.switch(inputvalue)
2.{
ISRO 2020 3.case 1 : b =c*d; break;
4.default : b =b++; break;
5.}