What is Type Checking?
Type checking is the process of verifying that the types of variables, expressions, and functions are used correctly
according to the rules of the programming language.
It ensures that operations are performed on compatible data types, which helps in preventing errors and maintaining
program correctness.
Purpose of Type Checking
• Ensures a program is type-safe.
• Prevents operations between incompatible types.
• Helps the compiler detect semantic errors early
Types of Type Checking
- > Static Type Checking
-> Dynamic Type Checking
1. Static Type Checking
• Done at compile time.
• Types are checked before program execution.
• Common in languages like C, C++, Java, Pascal.
Examples:
• Type mismatch (e.g., adding int and string).
• Wrong function return type.
Benefits:
• Detects errors early.
• Helps in memory allocation.
2. Dynamic Type Checking
• Done at runtime.
• Types are associated with values, not variables.
• Common in languages like Python, JavaScript, Ruby.
Advantages:
• More flexible.
• Allows compact and expressive code.
• No need to declare types explicitly.
What is Type Conversion?
Sometimes in a program, we use different types of numbers together, like:
• int (whole numbers like 2, 5, 10)
• float (decimal numbers like 3.14, 2.5)
When we mix them, the computer needs to convert one type to match the other.
This happens automatically. We call this type conversion.
Example:
If you write:
x = 2 + 3.14;
• 2 is an int
• 3.14 is a float
Before adding, the computer changes 2 to a float like this:
(float) 2 = 2.0
Then it adds:
2.0 + 3.14 = 5.14
So, the result is a float.
How It Works:
The computer follows simple rules:
• int + int → int
• int + float → float
• float + int → float
• float + float → float
If there's a float anywhere, the result is a float.
Storage Allocation Strategies.
A compiler is a program that converts HLL(High-Level Language) to LLL(Low-Level Language) like machine language.
It is also responsible for organizing the memory layout of a program by allocating space for global and local variables,
constants, and dynamic data structures.
Static Storage Allocation
• Static storage allocation is the simplest form of memory allocation.
• In this strategy, the memory for variables is allocated at compile time.
• Used for global variables and static variables in languages like C/C++.
Ex :
int number = 1; // Global
static int digit = 1; // Static
Advantages:
• Fast access and no runtime overhead.
• Allocated once, no repeated allocation/deallocation.
Disadvantages:
• Not suitable for dynamic data.
• Wastes memory if unused.
Dynamic Storage Allocation :
Dynamic storage allocation strategies in compiler design refer to techniques for allocating memory to data structures
at runtime, rather than compile time.
Two types :
1. Stack Storage Allocation
• Memory is allocated when functions are called and deallocated when they return.
• Follows LIFO (Last In First Out) order.
• Used for local variables and function parameters.
Ex :
void sum(int a, int b) {
int ans = a + b;
cout << ans;
}
Advantages:
• Fast allocation and deallocation.
• Automatic memory management.
Disadvantages:
• Limited size (can lead to stack overflow).
• Not suitable for dynamic data or large arrays.
2. Heap Storage Allocation
• Memory is allocated at runtime using functions like malloc() or new.
• Suitable for dynamic data structures (e.g., arrays, linked lists).
Ex :
int* ans = new int[5]; // Allocated in heap
Advantages:
• Allows dynamic memory allocation.
• Flexible: memory can grow/shrink at runtime.
Disadvantages:
• Slower than stack allocation.
• Requires manual deallocation (free() or delete).
• Risk of memory leaks or dangling pointers.
Directed Acyclic Graph
A Directed Acyclic Graph (DAG) is a graph structure used to represent expressions in a way that eliminates
redundancy.
To apply an optimization technique to a basic block, a DAG is a three-address code generated as the result of
intermediate code generation.
• Leaf nodes represent identifiers, names or constants
• Interior nodes represent operators
Applications of Directed Acyclic Graph (DAG)
A DAG is used in compilers to make programs faster and remove repeated or unnecessary work.
1. Remove Repeated Calculations
• DAG helps find the same expression used more than once.
• It calculates it only one time and uses it wherever needed.
2. Find Used and Unused Values
• DAG shows which variable names and values are:
o Used inside the code block.
o Not used at all (can be removed).
3. Show How Values Are Connected
• DAG helps represent code as a graph with arrows showing which values are used to calculate others.
• This makes it easy for the compiler to improve the code.
4. Update Only Changed Values
• In some programs, when one value changes, only related values need to be updated.
• DAG helps find these connected values quickly.
5. Optimize Code in Small Code Blocks
• A DAG works best in small sections of code (called basic blocks).
• It helps remove dead code (code that doesn’t affect the result).
Directed Acyclic Graph Characteristics
• A Directed Acyclic Graph (DAG) is used to represent expressions in compilers.
• The leaf nodes (ends of the graph) have unique names, like variable names (a, b) or constants (5, 10).
• The inner nodes are labeled with operator symbols like +, -, *, or /.
• The graph is acyclic, meaning it has no loops.
• DAGs use transitive closure.
• They also use transitive reduction.
• DAGs have a topological order.
Types of Errors in Compiler Phases
1. Lexical Errors (Compile-time errors during lexical analysis)
o Spelling mistakes
o Identifier too long
o Illegal characters
o Example: fi instead of if – not caught as an error but treated as an identifier
2. Syntax Errors (Detected in syntax analysis)
o Incorrect code structure
o Missing operators or semicolons
o Unbalanced/missing parentheses
o Example: printf("Hello") (missing semicolon)
3. Semantic Errors (Found in semantic analysis)
o Using undeclared variables
o Type mismatches
o Incorrect function arguments
o Example: id1 = id2 + id3 * 60 with all IDs declared as real
Error Recovery Strategies
These are methods used by the compiler to handle errors and continue processing.
1. Panic Mode
o Skips input tokens until it finds a synchronizing token (e.g., semicolon)
o Simple and effective if few errors are present
o Best for fast recovery
2. Phrase Level Recovery
o Makes small local corrections (e.g., replace , with ;, insert/delete ;)
o Tries to continue parsing with minimal change
o Must avoid infinite loops
3. Error Production
o Include common error patterns in the grammar
o Helps the parser recognize and recover from known mistakes
o Example grammar provided for known errors
4. Global Correction
o Finds the closest valid input with the least number of changes
o Expensive but ideal for producing valid parse trees from incorrect input
What is Type Casting?
Type Casting means changing the type of data from one kind to another — like changing an integer into a float, or a
float into an integer.
It's like turning a square block into a round one to fit into a different hole.
There are two types of type casting:
Implicit Type Casting (Automatic)
This happens automatically. The computer decides to convert a data type without you writing any code for it.
Example:
int a = 10;
float b = a; // int 'a' is automatically changed to float
Here, a is an integer, but it is stored in b which is a float. The computer converts it to 10.0 automatically.
Advantages:
• Easy – You don’t have to do anything.
• Less error – No need to remember to convert types.
Disadvantages:
• Might lose data – For example, converting float to int removes decimals.
• Confusing results – If you don’t realize it's happening, it can cause bugs.
Explicit Type Casting (Manual)
You tell the computer to convert the data type. This is manual casting.
Example:
float a = 10.75;
int b = (int) a; // float 'a' is manually changed to int
Here, a is a float. Using (int) tells the computer to cut off the decimal and make b = 10.
Advantages:
• You control it – You choose exactly how and when to convert.
• Clear code – Other people can understand what you're doing.
• Safe – You can avoid mistakes if done properly.
Disadvantages:
• More complex – You must know the types and how they convert.
• Risk of mistakes – Wrong casting can crash your program or give wrong results.
What is a Symbol Table?
A symbol table is like a dictionary the compiler uses to store and look up information about everything in your code
— like variables, constants, functions, and parameters.
It stores:
• Name (e.g., distance, pi)
• Type (e.g., float, int)
• Scope (where it's valid — global or local)
• Memory address
• Value (if known)
• Extra info (like data type, if it’s a constant, etc.)
Why is it Important?
The compiler uses the symbol table to:
• Know what things are (variable or function?)
• Where and how they're used
• Check for errors
• Optimize the code for better performance
How It Helps in Compiler Phases
Compiler Phase What Symbol Table Does
Lexical Analysis Adds new entries (like variable or function names)
Syntax Analysis Adds more details (like type, scope, dimension)
Semantic Analysis Checks meaning — e.g., is a variable used correctly?
Intermediate Code Helps allocate temporary variables
Code Optimization Helps optimize based on type/scope info
Target Code Generation Helps generate final code using memory addresses
Example Symbol Table
Let’s say you write this code:
float distance;
const float pi = 3.14159;
float calculateArea(float radius);
Here's what the symbol table might look like:
Name Type Scope Address Value Extra Info
distance Variable Global 0x1000 — Data type: float
pi constant Global 0x1004 3.14159 Data type: float, read-only
calculateArea function Global 0x1008 — Returns float
radius parameter Local 0x2000 — Data type: float
Basic Operations on a Symbol Table
1. Insert(name, info) – Add a new name (like a new variable).
2. Lookup(name) – Search for an existing name and get its info.
3. Update(name, newInfo) – Change or add more info to a name.
4. Delete(name) – Remove an entry if it’s out of scope.
5. Traverse() – Go through all entries, often used for debugging or printing.
What is a Polymorphic Function?
A polymorphic function is a function that can work with different types of data — like using the same function for
both integers and floats.
Polymorphism means “many forms.”
In programming, it means a function can behave differently based on the type of data it receives.
Two Types of Polymorphism
1. Parametric Polymorphism (Also called Generics)
Write one function that works for any data type.
Example in C++:
template <typename T>
T add(T a, T b) {
return a + b;
This function add will work with int, float, double, etc.
Advantages:
• One function for all types
• Safe (checked at compile-time)
• Clean and easy to update
Disadvantages:
• Slower to compile
• More memory (multiple versions may be created)
• Harder to optimize
2. Ad-hoc Polymorphism (Also called Function Overloading)
Write different versions of the function with same name for different types.
Example in C++:
int add(int a, int b) {
return a + b;
double add(double a, double b) {
return a + b;
Here, add works for both int and double, but each type has its own version.
Advantages:
• Clear and readable code
• Fast (compiler chooses function at compile time)
• Fewer errors
Disadvantages:
• More code to write
• Can be confusing with many versions