[go: up one dir, main page]

0% found this document useful (0 votes)
17 views61 pages

Unit 1 Introduction

The document provides an overview of compiler design, including the roles of translators such as compilers, interpreters, and assemblers, as well as the differences between them. It discusses the stages of compilation, types of compilers, and the importance of components like linkers and loaders in the language processing system. Additionally, it covers the phases of compilation, characteristics of good compilers, and the significance of code optimization.

Uploaded by

jegesi5568
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views61 pages

Unit 1 Introduction

The document provides an overview of compiler design, including the roles of translators such as compilers, interpreters, and assemblers, as well as the differences between them. It discusses the stages of compilation, types of compilers, and the importance of components like linkers and loaders in the language processing system. Additionally, it covers the phases of compilation, characteristics of good compilers, and the significance of code optimization.

Uploaded by

jegesi5568
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Compiler Design

Unit-1 Introduction
Introduction to translators- Assembler, Compiler, Interpreter,
Difference between Compiler and Interpreter, Linker, Loader , one pass compiler,
multi pass compiler, cross compiler , The components of Compiler, Stages of
Compiler: Front end, Back end, Qualities of Good Compiler
Translator
• A translator is a program that takes one form of program as input and
converts it into another form.
• Types of translators are:
1. Compiler
2. Interpreter
Source Target
Translator
3. Assembler Program Program

Error
Messages (If any)
Compiler
• A compiler is a program that reads a program written in source
language and translates it into an equivalent program in target
language

void main() 0000 1100 0010


Source { 0100
int a=1,b=2,c=3; 00001
Progra Compiler
c=a+b; 111110 1100
m
printf(“%d”,c); 0000 1000
} 1011

Source Error Target


Program Messages (If any) Program
Interpreter
• Interpreter is also program that reads a program written in source
language and translates it into an equivalent program in target language
line by line.

Void main() 0000 1100 0010


{ 0000
int a=1,b=2,c; Interpreter 1111 1100 0010
c=a+b;
1010 1100 0010
printf(“%d”,c);
0011 1100 0010
} 1111
Error
Source
Messages (If any)
Program
Assembler
• Assembler is a translator which takes the assembly code as an
input and generates the machine code as an output.

MOV id3, R1 0000 1100 0010


MUL #2.0, R1 0100
MOV id2, R2 0111 1000 0001
MUL R2, R1 Assembler 1111 0101 1110
MOV id1, R2 1100 0000 1000
ADD R2, R1 1011
MOV R1, id1 1100 0000 1000
Error
Assembly Code Messages (If any) Machine Code
Difference between compiler & interpreter

Compiler Interpreter
Scans the entire program and translates it It translates program’s one statement at a
as a whole into machine code. time.
It generates intermediate code. It does not generate intermediate code.
An error is displayed after entire program is An error is displayed for every instruction
checked. interpreted if any.
Memory requirement is more. Memory requirement is less.
Example: C compiler Example: Basic, Python, Ruby
Language Processing System Raw Source Program

Preprocessor
• In addition to compiler, many other system
Modified Source
programs are required to generate Program
absolute machine code. Compiler

• These system programs are: Target Assembly


Program
• Preprocessor Assembler

• Assembler Relocatable Object


Code
• Linker Libraries &
Linker / Loader
Object Files
• Loader
Target Machine
Code
Language Processing System Raw Source Program

Preprocessor
• Some of the task performed by preprocessor:
1. Macro processing: Allows user to define macros. Modified Source
Program
Ex: #define PI 3.14159265358979323846
Compiler
2. File inclusion: A preprocessor may include the header file
into the program. Ex: #include<stdio.h> Target Assembly
3. Rational preprocessor: It provides built in macro for Program
construct like while statement or if statement. Assembler
4. Language extensions: Add capabilities to the language Relocatable Object
by using built-in macros. Code
• Ex: the language equal is a database query language Libraries &
embedded in C. Statement beginning with ## are taken Object Files
Linker / Loader
by preprocessor to be database access, statement
unrelated to C and translated into procedure call on
routines that perform the database access. Target Machine
Code
Language Processing System Raw Source Program

Preprocessor
Compiler
Modified Source
• A compiler is a program that reads a Program

program written in source language and Compiler

translates it into an equivalent program in Target Assembly


Program
target language.
Assembler

Relocatable Object
Code
Libraries &
Linker / Loader
Object Files

Target Machine
Code
Language Processing System Raw Source Program

Preprocessor
Assembler
Modified Source
• Assembler is a translator which takes the Program
Compiler
assembly program (mnemonic) as an input
and generates the machine code as an Target Assembly
Program
output.
Assembler

Relocatable Object
Code
Libraries &
Linker / Loader
Object Files

Target Machine
Code
Language Processing System Raw Source Program

Linker Preprocessor

Modified Source
• Linker makes a single program from a several Program
files of relocatable machine code. Compiler

• These files may have been the result of several Target Assembly
different compilation, and one or more library Program
files.
Loader Assembler

Relocatable Object
Code
• The process of loading consists of: Libraries & Linker / Loader
• Taking relocatable machine code Object Files

• Altering the relocatable address


Target Machine
• Placing the altered instructions and data in Code
memory at the proper location.
Language Processing System Raw Source Program

Preprocessor
• The linker and loader are essential
components in the compilation process. Modified Source
Program
• The linker is responsible for combining Compiler
multiple object files generated by the
compiler into a single executable file or Target Assembly
Program
library. It resolves symbols and references
between different object files, ensuring that Assembler

functions and variables are correctly linked Relocatable Object


together. Code
Libraries & Linker / Loader
• For example, suppose you have two C source Object Files
files, main.c and functions.c, which need to
be compiled separately: Target Machine
gcc -c main.c Code

gcc -c functions.c
Language Processing System Raw Source Program

Preprocessor
• This will generate two object files: main.o
Modified Source
and functions.o. The linker comes into Program
play when you want to create an Compiler
executable from these object files: Target Assembly
Program
gcc main.o functions.o -o program
Assembler
• Here, the linker (ld) combines both object Relocatable Object
files (main.o and functions.o) to create an Code

executable called program. It resolves any Libraries &


Object Files
Linker / Loader
unresolved symbols or references between
them. Target Machine
Code
Language Processing System Raw Source Program

Preprocessor
• Loader: After linking, the resulting
Modified Source
executable file needs to be loaded into Program
memory for execution. This is where the Compiler
loader comes in. The loader is responsible Target Assembly
for allocating memory space for the Program

program, resolving dynamic dependencies Assembler

(if any), and preparing it for execution. Relocatable Object


Code
• Continuing with our previous example, once Libraries & Linker / Loader
we have our executable file program, we Object Files
can run it: • At this point, before executing the program instructions, the Target Machine
operating system's loader loads it into memory by mapping
its sections/data structures appropriately. This allows Code
./program direct access to resources such as code segments and data
segments during runtime.
Pass structure
• One complete scan of a source program is called pass.
• Pass includes reading an input file and writing to the output file.
• In a single pass compiler analysis of source statement is immediately
followed by synthesis of equivalent target statement.
• While in a two pass compiler intermediate code is generated between
analysis and synthesis phase.
• It is difficult to compile the source program into single pass due to:
forward reference
Pass structure
• Forward reference: A forward reference of a program entity is a
reference to the entity which precedes its definition in the program.
• This problem can be solved by postponing the generation of target code
until more information concerning the entity becomes available.
• In this case, y is being referenced before it has
• It leads to multi pass model of compilation. been declared. This results in a forward reference
error because the compiler cannot determine what
int x = y + 5; value y holds at that point.
• To handle forward references, compilers typically

int y = 10;
perform multiple passes over the source code. In
earlier passes, they identify and record all identifiers
and symbols encountered. In subsequent passes,
they resolve these references by looking up their
definitions.
• Overall, handling forward references correctly is
crucial for ensuring proper linking and execution of
programs during compilation.
Pass structure
• Forward reference: A forward reference of a program entity is a
reference to the entity which precedes its definition in the program.
• This problem can be solved by postponing the generation of target code
until more information concerning the entity becomes available.
• It leads to multi pass model of compilation.
Pass I:

• Perform analysis of the source program and note relevant information.


Pass II:

• In Pass II: Generate target code using information noted in pass I.


Pass structure
• Effect of reducing the number of passes:
• It is desirable to have a few passes, because it takes time to read and write intermediate file.
• If we group several phases into one pass then memory requirement may be large.
Types of compiler
1. One pass compiler
• It is a type of compiler that compiles whole process in one-pass.
2. Two pass compiler
• It is a type of compiler that compiles whole process in two-pass.
• It generates intermediate code.
3. Incremental compiler
• The compiler which compiles only the changed line from the source code and update
the object code.
4. Native code compiler
• The compiler used to compile a source code for a same type of platform only.
5. Cross compiler
• The compiler used to compile a source code for a different kinds platform.
Phases of a compiler Source program

Lexical analyzer

Syntax analyzer Analysis Phase


Raw Source Program

Semantic analyzer
Preprocessor
Variable Type Address
Name Intermediate code
Modified Source
Position Float 0001 Generator
Program
Initial Float 0005
Compiler Rate Float 0009
Target Machine
Independent Code Error detection
Target Assembly Symbol table
optimization and recovery
Program

Assembler
Target Code
Relocatable Object generation
Code
Libraries & Target Machine Synthesis Phase
Linker / Loader
Object Files Dependent Code
Optimizer
Target Machine
Code
Target Program
Lexical analysis
• Lexical Analysis is also called linear analysis or scanning.
• Lexical Analyzer divides the given source statement into the Position = initial + rate*60
• tokens.
• Ex: would be grouped into the
Lexical analysis
following tokens:
• Position (identifier)
id1 = id2 + id3 * 60
• = (Assignment symbol) initial
(identifier)
• + (Plus symbol) rate
(identifier)
• * (Multiplication symbol) 60
(Number)
Variable Type Address
Name
Position Float 0001
Symbol table
Initial Float 0005
Rate Float 0009
Syntax analysis
Position = initial + rate*60
• Syntax Analysis is also called Parsing or Hierarchical
Analysis. Lexical analysis
• The syntax analyzer checks each line of the code and
spots every tiny mistake. id1 = id2 + id3 * 60
• If code is error free then syntax analyzer generates the
tree. Syntax analysis

id1 +

id2 *
id3 60
Semantic analysis
=

• Semantic analyzer determines the meaning of a id1 +


source string. id2 * int to
real
• It performs following operations: id3 60

1. Matching of parenthesis in the expression.


Semantic analysis
2. Matching of if..else statement.
=
3. Performing arithmetic operation that are
type compatible. id1 +
4.Checking scope of operation. id2 *
*Note: Consider id1, id2 and id3 are real id3 inttoreal

60
Intermediate code generator
=
• Two important properties of intermediate
id1 +
code:
id2 *
1. It should be easy to produce.
t3 id3 inttoreal
2. Easy to translate into target program. t2 t1
60
• Intermediate form can be represented using Intermediate code
“three address code”.
t1= int to real(60)
• Three address code consist of a sequence of t2= id3 * t1
instruction, each of which has at most three t3= t2 + id2
id1= t3
operands.
Code optimization
• It improves the intermediate code. Intermediate code
• This is necessary to have a faster execution
t1= int to real(60)
of code or less consumption of memory. t2= id3 * t1
t3= t2 + id2
id1= t3

Code optimization

t1= id3 * 60.0


id1 = id2 + t1
Code generation
• The intermediate code instructions are translated Code optimization
into sequence of machine instruction (assembly t1= id3 * 60.0
language). id1 = id2 + t1

Code generation

MOV id3, R2
MUL #60.0, R2
MOV id2, R1
ADD R2,R1
MOV R1, id1

Id3R2
Id2R1
Front end & back end (Grouping of phases)
Front end
• Depends primarily on source language and largely independent of the target machine.
• It includes followingphases:
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Intermediate code generation
5. Creation of symbol table
Back end
• Depends on target machine and do not depends on source program.
• It includes followingphases:
1. Code optimization
2. Code generation phase
3. Error handling and symbol table operation
Characteristics of Good Compiler
• Correctness
• Efficiency
• Portability
• Error Diagnostics
• Optimization Capabilities
• Modularity
• Scalability
• Support for Language Features
GATE CS 2008
• Some code optimizations are carried out on the intermediate code
because
• (A) they enhance the portability of the compiler to other target
processors
• (B) program analysis is more accurate on intermediate code than on
machine code
• (C) the information from dataflow analysis cannot otherwise be used
for optimization
• (D) the information from the front end cannot otherwise be used for
optimization
GATE CS 2008
• Some code optimizations are carried out on the intermediate code
because
• (A) they enhance the portability of the compiler to other target processors
• (B) program analysis is more accurate on intermediate code than on
machine code
• (C) the information from dataflow analysis cannot otherwise be used for
optimization
• (D) the information from the front end cannot otherwise be used for
optimization
• Answer: A
GATE CS 1997
• A language L allows declaration of arrays whose sizes are not known
during compilation. It is required to make efficient use of memory.
Which of the following is true?
• (A) A compiler using static memory allocation can be written for
• (B) A compiler cannot be written for L, an interpreter must be used
• (C) A compiler using dynamic memory allocation can be written for L
• (D) None of the above
GATE CS 1997
• A language L allows declaration of arrays whose sizes are not known
during compilation. It is required to make efficient use of memory.
Which of the following is true?
• (A) A compiler using static memory allocation can be written for
• (B) A compiler cannot be written for L, an interpreter must be used
• (C) A compiler using dynamic memory allocation can be written for L
• (D) None of the above
• Answer: C
GATE-CS-2014-(Set-3)
• One of the purposes of using intermediate code in compilers is to
• (A) make parsing and semantic analysis simpler.
• (B) improve error recovery and error reporting.
• (C) increase the chances of reusing the machine-independent code
optimizer in other compilers.
• (D) improve the register allocation.
GATE-CS-2014-(Set-3)
• One of the purposes of using intermediate code in compilers is to
• (A) make parsing and semantic analysis simpler.
• (B) improve error recovery and error reporting.
• (C) increase the chances of reusing the machine-independent code
optimizer in other compilers.
• (D) improve the register allocation.
• Answer: C
• In a two-pass assembler, symbol table is
• (A) Generated in first pass
• (B) Generated in second pass
• (C) Not generated at all
• (D) Generated and used only in second pass
• In a two-pass assembler, symbol table is
• (A) Generated in first pass
• (B) Generated in second pass
• (C) Not generated at all
• (D) Generated and used only in second pass
• Answer: A
• How many tokens will be generated by the scanner for the following
statement ?
• x = x ∗ (a + b) – 5;
• (A) 12
• (B) 11
• (C) 10
• (D) 07
• How many tokens will be generated by the scanner for the following
statement ?
• x = x ∗ (a + b) – 5;
• (A) 12
• (B) 11
• (C) 10
• (D) 07
• Answer: A
• Symbol table can be used for:
• A) Checking type compatibility
• B) Suppressing duplication of error message
• C) Storage allocation
• D) All of these
• Symbol table can be used for:
• A) Checking type compatibility
• B) Suppressing duplication of error message
• C) Storage allocation
• D) All of these
• Answer: D
• The access time of the symbol table will be logarithmic if it is
implemented by
• A)Linear list
• B) Search tree
• C) Hash table
• D) Self organization list
• The access time of the symbol table will be logarithmic if it is
implemented by
• A)Linear list
• B) Search tree
• C) Hash table
• D) Self organization list
• Answer: B
GATE CSE 2009
• Match all items in Group 1 with the correct options from those
given in Group 2. Group 1 Group 2
• (A) P-4, Q-1, R-2, S-3 P. Regular expression 1. Syntax Analysis
Q. Pushdown automata 2. Code Generation
• (B) P-3, Q-1, R-4, S-2 R. Dataflow analysis 3. Lexical Analysis
S. Register allocation 4. Code Optimization
• (C) P-3, Q-4, R-1, S-2

• (D) P-2, Q-1, R-4, S-3


GATE CSE 2009
• Match all items in Group 1 with the correct options from those given in
Group 2. Group 1 Group 2
• (A) P-4, Q-1, R-2, S-3 A. Regular expression 1. Syntax Analysis
B. Pushdown automata 2. Code Generation
• (B) P-3, Q-1, R-4, S-2
C. Dataflow analysis 3. Lexical Analysis
D. Register allocation 4. Code Optimization
• (C) P-3, Q-4, R-1, S-2

• (D) P-2, Q-1, R-4, S-3


• Answer: B
GATE CSE 2010
• Which data structure in a compiler is used for managing information
about variables and their attributes?
• (A) Abstract Syntax Tree

• (B) Symbol Table

• (C) Semantic Stack

• (D) Parse Table


GATE CSE 2010
• Which data structure in a compiler is used for managing information about
variables and their attributes?
• (A) Abstract Syntax Tree

• (B) Symbol Table

• (C) Semantic Stack

• (D) Parse Table


• Answer: B
ISRO 2020
• The number of tokens in the following C code segment is
• A. 27
1.switch(inputvalue)
• B. 29 2.{
3.case 1 : b =c*d; break;
• C. 26 4.default : b =b++; break;
5.}
• D. 24
ISRO 2020
• The number of tokens in the following C code segment is
• A. 27
1.switch(inputvalue)
• B. 29 2.{
3.case 1 : b =c*d; break;
• C. 26 4.default : b =b++; break;
5.}
• D. 24

• Answer: 26
1.switch(inputvalue)
2.{
ISRO 2020 3.case 1 : b =c*d; break;
4.default : b =b++; break;
5.}

• The tokens in the code segment are


as follows:
• switch - Keyword
• ( - Punctuation
• inputvalue - Identifier
• ) - Punctuation
• { - Punctuation
• case - Keyword
• 1 - Integer Literal
• : - Punctuation
• b - Identifier
• = - Operator
• …..
GATE CSE 2020
• Consider the following statements.
• I. Symbol table is accessed only during lexical analysis and syntax analysis.
• II. Compilers for programming languages that support recursion necessarily
need heap storage for memory allocation in the run-time environment.
• III. Errors violating the condition ‘any variable must be declared before its
use’ are detected during syntax analysis.
• Which of the above statements is/are TRUE?
A. I only
B. I and III only
C. Ⅱ only
D. None of Ⅰ, Ⅱ and Ⅲ
GATE CSE 2020
• Consider the following statements.
• I. Symbol table is accessed only during lexical analysis and syntax analysis.
• II. Compilers for programming languages that support recursion necessarily need
heap storage for memory allocation in the run-time environment.
• III. Errors violating the condition ‘any variable must be declared before its use’ are
detected during syntax analysis.
• Which of the above statements is/are TRUE?
A. I only
B. I and III only
C. Ⅱ only
D. None of Ⅰ, Ⅱ and Ⅲ
• Answer: D
GATE CSE 2020-Explanation
• Symbol table is a data structure used by compilers to store information about the
variables, functions, and other entities in a program. It is accessed during both
lexical analysis and syntax analysis, but may also be used by later stages of the
compilation process, such as code generation.
• Compilers for programming languages that support recursion may or may not
need heap storage for memory allocation in the run-time environment. The
decision to use heap storage or not depends on the specific implementation of
the compiler and the features of the programming language.
• Errors violating the condition 'any variable must be declared before its use' are
typically detected during the semantic analysis phase of compilation, not during
syntax analysis. Syntax analysis checks the structure of the program to ensure it
follows the rules of the programming language's grammar, while semantic
analysis checks the meaning of the program to ensure it is semantically correct.
Incremental-Compiler is a compiler?
• (A) which is written in a language that is different from the source
language
• (B) compiles the whole source code to generate object code afresh
• (C)compiles only those portion of source code that have been
modified.
• (D)that runs on one machine but produces object code for another
machine
Incremental-Compiler is a compiler?
• (A) which is written in a language that is different from the source
language
• (B) compiles the whole source code to generate object code afresh
• (C)compiles only those portion of source code that have been
modified.
• (D)that runs on one machine but produces object code for another
machine
• Answer: C
Which phase of compiler generates stream of
atoms?
• (A) Syntax Analysis
• (B) Lexical Analysis
• (C) Code Generation
• (D)Code Optimization
Which phase of compiler generates stream of
atoms?
• (A) Syntax Analysis
• (B) Lexical Analysis
• (C) Code Generation
• (D)Code Optimization
• Answer: B
Which one of the following is NOT performed
during compilation?
• A)Dynamic memory allocation
• B)Type checking
• C)Symbol table management
• D)Inline expansion
Which one of the following is NOT performed
during compilation?
• A)Dynamic memory allocation
• B)Type checking
• C)Symbol table management
• D)Inline expansion
• Answer: A

You might also like