Compiler Design
Compiler Design
Unit – 1
Programming Language Basics:
1. The Static/Dynamic Distinction: This refers to how variables are treated in a
programming language. In a static language, variables have fixed types determined at
compile time. In a dynamic language, variables can change types during runtime.
2. Environments and States: Environments refer to the context in which variables and
functions exist, including their names and values. States refer to the values stored in
variables at a particular point in time during program execution.
3. Static Scope and Block Structure: Static scope means that the visibility of variables is
determined by their location in the source code, typically defined by blocks like
functions or loops. Block structure refers to the nesting of these blocks within each
other.
4. Explicit Access Control: This involves mechanisms for controlling access to variables
and functions, such as public and private keywords or access modifiers like in object-
oriented programming.
5. Dynamic Scope: Unlike static scope, dynamic scope determines variable visibility
based on the call stack at runtime rather than the lexical structure of the code.
6. Parameter Passing Mechanisms: These determine how arguments are passed to
functions or methods. Common mechanisms include pass-by-value (where a copy of
the argument is passed) and pass-by-reference (where the function receives a
reference to the original argument).
7. Aliasing: Aliasing occurs when two or more variables refer to the same memory
location. It can lead to unexpected behaviour, especially in languages with mutable
data structures.
UNIT – 3
Evalution Order:
https://www.geeksforgeeks.org/three-address-code-compiler/
UNIT – 5
https://www.geeksforgeeks.org/machine-independent-code-optimization-in-compiler-design/
https://www.codingninjas.com/studio/library/data-flow-analysis
constant propagation
https://www.geeksforgeeks.org/constant-propagation-in-complier-design/
PRE (Partial Redundancy Elimination) is a compiler optimization method. PRE aims to remove
redundant computations from the program. Redundant computations are those that are achieved
in more than one instance. However, it produces the same result every time. By removing these
redundancies, PRE can enhance the overall performance of the program by reducing the number of
instructions carried out.
Concept:
PRE aims to eliminate the redundant computations by performing the expression only
once and reusing its result at other points where it needs to be evaluated.
Benefits:
Reduced memory usage: Depending on the specific implementation, PRE may also lead to
reduced memory usage as the same value is not stored multiple times.
Example:
C++
if (condition1) {
x = a + b;
} else {
// other code
}
z = x + y;
In this example, the expression a + b is partially redundant. It's computed twice: once in the if
block and again outside the if block. However, the value of a + b depends on the value of
condition1, so it cannot be guaranteed to be the same in both cases.
PRE Optimization:
C++
if (condition1) {
x = t;
} else {
// other code
z = x + y;
By introducing a temporary variable t, PRE calculates the expression a + b only once and stores the
result in t. It then reuses the value of t wherever it needs the expression's value, eliminating the
redundant computation.
Not all redundant expressions are suitable for PRE. It depends on the complexity of the
expression, the number of times it's calculated, and the ease of reusing the value.
PRE algorithms can become complex when dealing with complex control flow structures.
However, when applicable, PRE can significantly improve the performance of programs by reducing
redundant computations and optimizing memory usage.
Partially redudency
https://www.codingninjas.com/studio/library/partial-redundancy-elimination-in-compiler-design
Loop Conrol Flow:
https://youtu.be/LoTIsHmHsL0?si=84NBC1NUXr9k9tLo
https://ebooks.inflibnet.ac.in/csp10/chapter/loops-in-flow-graphs/
UNIT – 4
+------+
| Data | (Top of stack)
+------+
| Data |
+------+
| Data |
+------+
Stack Base (Lower memory addresses)
Process:
1. When a function is called:
Static Link:
Definition: It's like a breadcrumb trail that tells us where to find variables defined in outer
scopes when we're in a nested function or scope.
Purpose: Helps us access variables from outer scopes (scopes that contain the current
scope).
Utilization: When we want to use a variable from an outer scope in a nested function or
scope, we follow this breadcrumb trail until we find the scope where the variable is defined.
Example: Imagine we have a function within another function. The static link helps us find
variables from the outer function when we're in the inner one.
Dynamic Link:
Definition: Similar to the static link, but it helps us find variables from scopes that might
change as our program runs.
Purpose: Allows us to access variables from scopes that could be different each time we call
a function.
Utilization: When we call a function from different places in our program, the dynamic link
helps us find variables from the scope where the function was called.
Example: If a function is called from different parts of the program, the dynamic link helps it
find the right variables each time it's called.
Static Link: It helps us find variables from outer scopes by following a trail of static links until
we reach the right scope.
Dynamic Link: It helps us find variables from scopes that might change each time a function
is called, ensuring we get the right variables no matter where the function is called from.
In summary, static link and dynamic link help us access variables from outer scopes in nested
functions or scopes. They act like guiding trails, ensuring we find the right variables no matter how
our program is structured or how it runs.
__________________________________________________________________________________
The concepts of static link and dynamic link are used in accessing nonlocal data on the stack
in the context of nested scopes, particularly in languages that support nested functions or
lexical scoping.
Static Link:
1. Definition: A static link is a pointer stored in each stack frame that points to the stack
frame of the enclosing scope or the nearest lexically enclosing function.
2. Purpose: The static link is used to access nonlocal variables or data structures defined
in outer scopes (i.e., scopes enclosing the current scope).
3. Utilization: When accessing a nonlocal variable within a nested scope, the compiler
generates code that follows the static link chain to locate the appropriate stack frame
where the desired variables reside. This involves traversing the static link chain until
reaching the stack frame that corresponds to the appropriate lexical scope.
4. Example: Consider a nested function inner defined within another function outer.
When accessing a variable x from inner, the compiler generates code that follows the
static link chain to locate the stack frame of outer, where x is defined.
Dynamic Link:
1. Definition: A dynamic link is a pointer or reference stored in each stack frame that
points to the stack frame of the caller or the dynamic scope at the time of the
function call.
2. Purpose: The dynamic link is used to access nonlocal variables or data structures
defined in the dynamic scope, which may change dynamically during program
execution.
3. Utilization: When accessing a nonlocal variable within a nested scope, the compiler
generates code that follows the dynamic link chain to locate the appropriate stack
frame where the desired variables reside. This involves traversing the dynamic link
chain until reaching the stack frame that corresponds to the appropriate dynamic
scope.
4. Example: Consider a situation where a function inner is called from multiple locations
within the program. The dynamic link allows inner to access variables defined in the
scope of its caller dynamically, regardless of the lexical nesting structure.
Utilization during Variable Access:
1. Static Link: When accessing nonlocal variables within nested scopes, the static link is
used to locate the appropriate stack frame corresponding to the lexical scope in
which the variable is defined. This involves traversing the static link chain until
reaching the correct stack frame.
2. Dynamic Link: When accessing nonlocal variables within nested scopes, the dynamic
link is used to locate the appropriate stack frame corresponding to the dynamic
scope at the time of the function call. This involves traversing the dynamic link chain
until reaching the correct stack frame.
In summary, static link and dynamic link are used to access nonlocal data on the stack within
nested scopes by providing a mechanism for locating the appropriate stack frame
corresponding to the lexical or dynamic scope in which the data is defined. These links
facilitate efficient variable access in languages that support nested functions or lexical
scoping
Heap:
https://www.codingninjas.com/studio/library/heap-management-in-compiler-design
leading to memory leaks. Over time, memory leaks can significantly impact
program performance and eventually lead to crashes.
2. Manual Memory Management Complexity: Manually managing memory
allocation and deallocation can be complex and error-prone, especially in
large and complex programs. Garbage collection simplifies memory
management by automating deallocation, reducing the risk of human error
and improving program reliability.
3. Memory Fragmentation: As objects are allocated and deallocated from the
heap, the free space becomes fragmented. This can lead to inefficiencies
when allocating new objects, as finding contiguous blocks of free space
becomes more challenging. Garbage collection may use compaction
techniques to consolidate free space, mitigating the impact of fragmentation.
Benefits of Garbage Collection:
Prevents memory leaks: Ensures unused memory gets reclaimed, improving
program stability and performance.
Simplifies memory management: Reduces the burden on programmers and
improves code readability.
Mitigates fragmentation: Can help maintain efficient memory usage by
potentially compacting free space.
However, it's important to note:
Overhead: Garbage collection adds some runtime overhead to the
program's execution due to the identification and management of unused
memory.
Non-deterministic behavior: The timing of garbage collection cycles might
not be predictable, which can impact program performance in real-time
systems requiring strict timing constraints.
Conclusion:
Garbage collection plays a vital role in run-time environments with dynamic memory
allocation by automating memory management, preventing memory leaks, and
simplifying development. While it introduces some overhead, the benefits of
improved program reliability, reduced risk of errors, and efficient memory utilization
often outweigh these limitations for many applications.
Here are the main points from the provided text, simplified for clarity:
Challenges in Code Generation:
1. Input and Output Formats:
Input: Intermediate code and symbol table information.
Output: Machine language (absolute or relocatable) or assembly language.
2. Memory Management:
Mapping names to data object addresses by both the front end and code
generator.
3. Instruction Selection:
Choosing optimal instructions to improve program efficiency.
Considering instruction completeness, uniformity, speeds, and machine
idioms.
4. Register Allocation:
Efficiently utilizing registers for faster computations.
Involves both allocation and subsequent assignment of registers.
5. Evaluation Order:
Deciding the order of instruction execution to optimize code efficiency.
Affects the number of registers needed for holding intermediate results.
Approaches to Code Generation:
Design goals include correctness, maintainability, testability, and efficiency.
Code generators should handle special cases effectively and produce correct code
consistently.
Disadvantages:
1. Limited Flexibility:
Code generators may be tailored for specific tasks or platforms, limiting their
adaptability to diverse inputs or target platforms.
2. Maintenance Overhead:
Code generators require ongoing maintenance and updates alongside the
code they generate, potentially increasing complexity and introducing errors.
3. Debugging Difficulties:
Debugging generated code may be challenging due to its complexity and lack
of readability compared to hand-written code.
https://www.geeksforgeeks.org/issues-in-the-design-of-a-code-generator/
Target Language:
The target language refers to the programming language or machine language into
which the compiler translates the source code.
It could be absolute machine language, relocatable machine language, or assembly
language.
Absolute machine language is executable directly and can be placed in fixed
memory locations.
Relocatable machine language allows separate compilation of subprograms and
linking by a linking loader.
Assembly language makes code generation easier, allowing for symbolic
instructions and leveraging assembler macro facilities.
Addresses in the Target Code:
Addresses in the target code refer to the memory locations or offsets used to
access data objects or instructions.
The compiler maps names in the source program to addresses of data objects in
the target machine's memory.
The symbol table provides information about the run-time addresses of data
objects.
Relative addresses or offsets may be determined for names based on their symbol
table entries.
Efficient memory management involves mapping names to their respective
memory addresses in the target code.
Proper addressing ensures correct data access and execution of instructions in the
generated code.
https://chat.openai.com/share/1a47e80a-a969-4bcc-8d05-c185c3ef96e8
https://www.geeksforgeeks.org/peephole-optimization-in-compiler-design/
https://chat.openai.com/c/1d9e2cff-f465-496f-a5fd-f4caaa2ccf8e
UNIT – 2
https://www.tutorialspoint.com/compiler_design/compiler_design_top_down_parser.htm
LL(1): https://youtu.be/oSCqHq5zslw?si=sGNSyGMaQQiYO7XE
LR(0): https://youtu.be/zaomF0mznQo?si=xNkA8YFdjFtPO3y5
Most Powerful Grammer:
LR parsers are more powerful than LL parsers due to their ability to handle a broader class of
grammars, specifically the class of LR(k) grammars, where "k" denotes the number of lookahead
symbols. There are several types of LR parsers that differ in their power and efficiency. Let's discuss
some of them:
They have a parsing table constructed from the LR(0) items of the grammar.
SLR parsers are less powerful than other LR parsers but are easier to construct and
use.
They are more powerful than SLR parsers and can handle a larger class of
grammars.
CLR parsers can handle any LR(1) grammar, but their construction can be more
complex.
They are more powerful than SLR parsers but less powerful than CLR parsers.
LALR parsers can handle a larger class of grammars than SLR parsers while being
more efficient in terms of parsing table size and construction.
LALR parsers are widely used in practice due to their balance between power and
efficiency.
UNIT - 1
1. Tokenization: Lexical analysis breaks down the source code into a sequence of tokens, which
are the basic building blocks of the programming language. Tokens represent meaningful
units such as keywords, identifiers, literals, and punctuation symbols. This tokenization
simplifies subsequent phases of compilation by providing a structured representation of the
code.
2. Syntax Parsing: The output of the lexical analysis phase is typically fed into the syntax parser.
The parser relies on the tokens produced by the lexical analyzer to recognize the grammatical
structure of the code according to the rules of the programming language. By providing well-
defined tokens, lexical analysis facilitates accurate parsing, enabling the compiler to
understand the syntax of the source code.
3. Error Detection: Lexical analysis plays a crucial role in detecting lexical errors in the source
code, such as misspelled identifiers, invalid characters, or misplaced symbols. By identifying
these errors early in the compilation process, the lexical analyzer helps improve the overall
reliability of the compiler and aids developers in debugging their code.
4. Whitespace and Comment Handling: During lexical analysis, whitespace characters (e.g.,
spaces, tabs, newlines) and comments are typically ignored. This simplifies subsequent
processing stages by removing irrelevant elements from the source code. Additionally,
comments may be preserved as metadata for documentation or debugging purposes.
5. Efficiency: Efficient lexical analysis is essential for overall compiler performance. The lexical
analyzer should be designed to process the input source code quickly, as it serves as the first
stage in the compilation pipeline. By efficiently tokenizing the source code, the lexical
analyzer contributes to the overall speed and responsiveness of the compiler.
6. Source Code Normalization: In some cases, lexical analysis may perform source code
normalization tasks, such as converting all characters to a common case or handling
character encoding issues. This ensures consistency and simplifies subsequent processing
stages, such as semantic analysis and code generation.
In summary, lexical analysis is a critical phase in the compilation process that transforms raw source
code into a structured sequence of tokens, facilitating accurate parsing, error detection, and efficient
processing by subsequent compiler stages. Its significance lies in its ability to lay the foundation for
the interpretation and transformation of source code into executable programs.
Lex:
https://www.javatpoint.com/lex
Input Buffer:
https://www.geeksforgeeks.org/input-buffering-in-compiler-
design/