[go: up one dir, main page]

0% found this document useful (0 votes)
55 views20 pages

Compiler Design

Jntuh cd notes

Uploaded by

sunkaralokesh0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views20 pages

Compiler Design

Jntuh cd notes

Uploaded by

sunkaralokesh0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Compiler Design

Unit – 1
Programming Language Basics:
1. The Static/Dynamic Distinction: This refers to how variables are treated in a
programming language. In a static language, variables have fixed types determined at
compile time. In a dynamic language, variables can change types during runtime.
2. Environments and States: Environments refer to the context in which variables and
functions exist, including their names and values. States refer to the values stored in
variables at a particular point in time during program execution.
3. Static Scope and Block Structure: Static scope means that the visibility of variables is
determined by their location in the source code, typically defined by blocks like
functions or loops. Block structure refers to the nesting of these blocks within each
other.
4. Explicit Access Control: This involves mechanisms for controlling access to variables
and functions, such as public and private keywords or access modifiers like in object-
oriented programming.
5. Dynamic Scope: Unlike static scope, dynamic scope determines variable visibility
based on the call stack at runtime rather than the lexical structure of the code.
6. Parameter Passing Mechanisms: These determine how arguments are passed to
functions or methods. Common mechanisms include pass-by-value (where a copy of
the argument is passed) and pass-by-reference (where the function receives a
reference to the original argument).
7. Aliasing: Aliasing occurs when two or more variables refer to the same memory
location. It can lead to unexpected behaviour, especially in languages with mutable
data structures.
UNIT – 3

Evalution Order:

https://www.geeksforgeeks.org/three-address-code-compiler/
UNIT – 5

In the context of computer science, particularly compiler design and code


optimization, there are several principal sources of optimization aimed at
improving the efficiency (speed and memory usage) of programs. Here are some key
aspects:
1. Common Sub-Expression Elimination (CSE):
 Concept: Identifies and eliminates redundant calculations of the same
expression that appear multiple times within a program.
 Benefits: Reduces the number of instructions executed, leading to faster
execution speed.
2. Copy Propagation:
 Concept: Analyzes the program and identifies situations where the same
value is assigned to a variable multiple times without modification. Instead of
repeating the assignment, the value is simply copied from its previous location
in memory.
 Benefits: Saves time by avoiding unnecessary assignments and potentially
reduces memory usage by optimizing register allocation.
3. Dead Code Elimination:
 Concept: Identifies code that is unreachable or unnecessary for the
program's execution, such as unreachable code blocks due to conditional
statements or unused variables.
 Benefits: Improves code size and potentially execution speed by removing
unnecessary instructions that waste processing time and memory space.
4. Constant Folding:
 Concept: Optimizes code by performing calculations on constant expressions
(expressions that always evaluate to the same value) at compile time. Instead
of evaluating them repeatedly during runtime, the result is directly substituted
into the code.
 Benefits: Enhances execution speed by eliminating redundant calculations
and potentially reduces memory usage by avoiding unnecessary
computations.
5. Loop Optimizations:
 Concept: Loops are often performance bottlenecks in programs. Several
optimization techniques target loops, such as:
o Loop Unrolling: Duplicates the loop body a certain number of times to
reduce the overhead of loop control structures.
o Induction Variable Elimination: Identifies loop variables that always
change by a constant amount within each iteration. This information
can be used to optimize memory access and calculations.
o Loop Invariant Code Motion: If certain calculations within the loop are
independent of the loop variable's value (loop invariant), they can be
moved outside the loop to be executed only once.
 Benefits: Can significantly improve the speed of programs by reducing loop
control overhead, optimizing memory access, and minimizing redundant
calculations.
Additional Optimization Techniques:
 Strength Reduction: Replaces expensive operations with cheaper,
equivalent operations that are more efficient on the target machine.
 Code Motion: Moves code to a more suitable location in the program to
improve efficiency, such as moving frequently used code outside of loops.
 Register Allocation: Strategically assigns variables to registers for faster
access compared to main memory.
By applying these optimization techniques, compilers can improve the performance
of programs by reducing the number of instructions executed, optimizing memory
usage, and improving the overall efficiency. The specific techniques and their
effectiveness depend on the specific program, compiler, and target architecture.

https://www.geeksforgeeks.org/machine-independent-code-optimization-in-compiler-design/

Data Flow analysys:

https://www.codingninjas.com/studio/library/data-flow-analysis

constant propagation

https://www.geeksforgeeks.org/constant-propagation-in-complier-design/

Partial Redundancy Elimination (PRE)

PRE (Partial Redundancy Elimination) is a compiler optimization method. PRE aims to remove
redundant computations from the program. Redundant computations are those that are achieved
in more than one instance. However, it produces the same result every time. By removing these
redundancies, PRE can enhance the overall performance of the program by reducing the number of
instructions carried out.

Concept:

 An expression is considered partially redundant if:

o It is computed multiple times within the program.

o Its value is not guaranteed to be the same on all execution paths.

 PRE aims to eliminate the redundant computations by performing the expression only
once and reusing its result at other points where it needs to be evaluated.

Benefits:

 Improved execution speed: By eliminating redundant computations, the program requires


fewer instructions to execute, leading to faster execution.

 Reduced memory usage: Depending on the specific implementation, PRE may also lead to
reduced memory usage as the same value is not stored multiple times.

Example:

Consider the following code snippet:

C++

if (condition1) {

x = a + b;

} else {

// other code
}

y = a + b; // Partially redundant expression

z = x + y;

In this example, the expression a + b is partially redundant. It's computed twice: once in the if
block and again outside the if block. However, the value of a + b depends on the value of
condition1, so it cannot be guaranteed to be the same in both cases.

PRE Optimization:

Here's how PRE would optimize the code:

C++

if (condition1) {

t = a + b; // Calculate and store the value once

x = t;

} else {

// other code

y = t; // Reuse the previously computed value

z = x + y;

By introducing a temporary variable t, PRE calculates the expression a + b only once and stores the
result in t. It then reuses the value of t wherever it needs the expression's value, eliminating the
redundant computation.

It's important to note that:

 Not all redundant expressions are suitable for PRE. It depends on the complexity of the
expression, the number of times it's calculated, and the ease of reusing the value.

 PRE algorithms can become complex when dealing with complex control flow structures.

However, when applicable, PRE can significantly improve the performance of programs by reducing
redundant computations and optimizing memory usage.

Partially redudency

https://www.codingninjas.com/studio/library/partial-redundancy-elimination-in-compiler-design
Loop Conrol Flow:

https://youtu.be/LoTIsHmHsL0?si=84NBC1NUXr9k9tLo

https://ebooks.inflibnet.ac.in/csp10/chapter/loops-in-flow-graphs/

UNIT – 4

Stack Allocation of Space

Stack allocation is a memory allocation technique commonly used in programming


languages to store local variables, parameters, and return addresses for
functions and methods.
Key points about stack allocation:
 LIFO (Last-In-First-Out) principle: Data is allocated to the stack in a last-in-
first-out manner. This means the most recently allocated data is at the top
of the stack and is accessed first.
 Automatic allocation and deallocation: The compiler automatically
allocates memory for local variables and function arguments when a function
is called, and deallocates memory when the function returns. This simplifies
memory management for the programmer.
 Limited size: The stack has a fixed size defined by the operating system,
and exceeding this limit can lead to a stack overflow error.
Diagram:
Imagine a stack of plates in a cafeteria. Each plate represents a memory location on
the stack.

+------+
| Data | (Top of stack)
+------+
| Data |
+------+
| Data |
+------+
Stack Base (Lower memory addresses)

Process:
1. When a function is called:

o Space is allocated on the stack for local variables and parameter


values used by the function.
o The return address (address of the instruction to return to after the
function call) is also pushed onto the stack.
2. Function execution: The function uses the allocated memory for its local
variables and performs its tasks.
3. When the function returns:
o The function's local variables and parameter values are removed from
the stack (memory deallocation).
o The return address is retrieved from the top of the stack.
o Control jumps back to the instruction after the function call using the
retrieved return address.
Benefits of stack allocation:
 Simplicity: Automatic allocation and deallocation simplify memory
management for the programmer.
 Efficiency: Accessing data on the stack is generally faster than accessing
data in the heap due to its locality and LIFO nature.
 Safety: Fixed size helps prevent memory leaks and excessive memory
usage.
Limitations of stack allocation:
 Limited size: Can lead to stack overflows if large amounts of data are
allocated on the stack.
 Not suitable for long-lived data: Data on the stack is deallocated when the
function returns, making it unsuitable for data that needs to persist beyond the
function's lifetime.
In summary, stack allocation is a convenient and efficient way to manage
temporary data for functions but has limitations in terms of size and data
persistence.

ACESSING NON LOCAL IN STACK:


In simple terms, when we talk about accessing variables in nested functions or scopes, there are two
concepts we need to understand: static link and dynamic link.

Static Link:

 Definition: It's like a breadcrumb trail that tells us where to find variables defined in outer
scopes when we're in a nested function or scope.

 Purpose: Helps us access variables from outer scopes (scopes that contain the current
scope).

 Utilization: When we want to use a variable from an outer scope in a nested function or
scope, we follow this breadcrumb trail until we find the scope where the variable is defined.

 Example: Imagine we have a function within another function. The static link helps us find
variables from the outer function when we're in the inner one.

Dynamic Link:

 Definition: Similar to the static link, but it helps us find variables from scopes that might
change as our program runs.

 Purpose: Allows us to access variables from scopes that could be different each time we call
a function.

 Utilization: When we call a function from different places in our program, the dynamic link
helps us find variables from the scope where the function was called.

 Example: If a function is called from different parts of the program, the dynamic link helps it
find the right variables each time it's called.

Utilization during Variable Access:

 Static Link: It helps us find variables from outer scopes by following a trail of static links until
we reach the right scope.

 Dynamic Link: It helps us find variables from scopes that might change each time a function
is called, ensuring we get the right variables no matter where the function is called from.

In summary, static link and dynamic link help us access variables from outer scopes in nested
functions or scopes. They act like guiding trails, ensuring we find the right variables no matter how
our program is structured or how it runs.
__________________________________________________________________________________

The concepts of static link and dynamic link are used in accessing nonlocal data on the stack
in the context of nested scopes, particularly in languages that support nested functions or
lexical scoping.
Static Link:
1. Definition: A static link is a pointer stored in each stack frame that points to the stack
frame of the enclosing scope or the nearest lexically enclosing function.
2. Purpose: The static link is used to access nonlocal variables or data structures defined
in outer scopes (i.e., scopes enclosing the current scope).
3. Utilization: When accessing a nonlocal variable within a nested scope, the compiler
generates code that follows the static link chain to locate the appropriate stack frame
where the desired variables reside. This involves traversing the static link chain until
reaching the stack frame that corresponds to the appropriate lexical scope.
4. Example: Consider a nested function inner defined within another function outer.
When accessing a variable x from inner, the compiler generates code that follows the
static link chain to locate the stack frame of outer, where x is defined.
Dynamic Link:
1. Definition: A dynamic link is a pointer or reference stored in each stack frame that
points to the stack frame of the caller or the dynamic scope at the time of the
function call.
2. Purpose: The dynamic link is used to access nonlocal variables or data structures
defined in the dynamic scope, which may change dynamically during program
execution.
3. Utilization: When accessing a nonlocal variable within a nested scope, the compiler
generates code that follows the dynamic link chain to locate the appropriate stack
frame where the desired variables reside. This involves traversing the dynamic link
chain until reaching the stack frame that corresponds to the appropriate dynamic
scope.
4. Example: Consider a situation where a function inner is called from multiple locations
within the program. The dynamic link allows inner to access variables defined in the
scope of its caller dynamically, regardless of the lexical nesting structure.
Utilization during Variable Access:
1. Static Link: When accessing nonlocal variables within nested scopes, the static link is
used to locate the appropriate stack frame corresponding to the lexical scope in
which the variable is defined. This involves traversing the static link chain until
reaching the correct stack frame.
2. Dynamic Link: When accessing nonlocal variables within nested scopes, the dynamic
link is used to locate the appropriate stack frame corresponding to the dynamic
scope at the time of the function call. This involves traversing the dynamic link chain
until reaching the correct stack frame.
In summary, static link and dynamic link are used to access nonlocal data on the stack within
nested scopes by providing a mechanism for locating the appropriate stack frame
corresponding to the lexical or dynamic scope in which the data is defined. These links
facilitate efficient variable access in languages that support nested functions or lexical
scoping
Heap:
https://www.codingninjas.com/studio/library/heap-management-in-compiler-design

Garbage Collection and Dynamic Memory Allocation

Garbage collection (GC) is a critical mechanism employed in run-time


environments that utilize dynamic memory allocation. It automatically identifies
and reclaims unused memory blocks, preventing memory leaks and ensuring
efficient memory utilization.
Why is Garbage Collection Important?
1. Memory Leaks: In languages with dynamic memory allocation, programmers
are responsible for explicitly releasing memory using functions like free or
delete. If not handled correctly, unused memory blocks can remain allocated,

leading to memory leaks. Over time, memory leaks can significantly impact
program performance and eventually lead to crashes.
2. Manual Memory Management Complexity: Manually managing memory
allocation and deallocation can be complex and error-prone, especially in
large and complex programs. Garbage collection simplifies memory
management by automating deallocation, reducing the risk of human error
and improving program reliability.
3. Memory Fragmentation: As objects are allocated and deallocated from the
heap, the free space becomes fragmented. This can lead to inefficiencies
when allocating new objects, as finding contiguous blocks of free space
becomes more challenging. Garbage collection may use compaction
techniques to consolidate free space, mitigating the impact of fragmentation.
Benefits of Garbage Collection:
 Prevents memory leaks: Ensures unused memory gets reclaimed, improving
program stability and performance.
 Simplifies memory management: Reduces the burden on programmers and
improves code readability.
 Mitigates fragmentation: Can help maintain efficient memory usage by
potentially compacting free space.
However, it's important to note:
 Overhead: Garbage collection adds some runtime overhead to the
program's execution due to the identification and management of unused
memory.
 Non-deterministic behavior: The timing of garbage collection cycles might
not be predictable, which can impact program performance in real-time
systems requiring strict timing constraints.
Conclusion:
Garbage collection plays a vital role in run-time environments with dynamic memory
allocation by automating memory management, preventing memory leaks, and
simplifying development. While it introduces some overhead, the benefits of
improved program reliability, reduced risk of errors, and efficient memory utilization
often outweigh these limitations for many applications.

Introduction to Trace-Based Garbage Collection


Trace-based garbage collection is a technique used in memory management for reclaiming
unused objects in a program. Unlike reference counting, which tracks object references
directly, trace-based collection explicitly identifies reachable objects by tracing the
program's execution and analyzing the connections between them.
Key Concepts:
 Reachable objects: Objects that are still accessible by the program and haven't gone
out of scope.
 Garbage: Objects that are no longer reachable and can be reclaimed.
 Trace: A path through the program's execution that follows references from known-
live objects to potentially reachable objects.
 Marking phase: Identifies reachable objects by recursively traversing references from
a set of roots (e.g., global variables, stack frames) and marking them as reachable.
 Sweeping phase: Scans memory for unmarked objects (garbage) and reclaims their
allocated space.
Algorithm:
1. Start: Begin with a set of roots (known live objects) like global variables or the stack.
2. Marking phase:
o Iterate over the roots and their references.
o If a referenced object hasn't been marked yet, mark it as reachable.
o Add the marked object to a queue for further exploration.
o Repeat this process for all objects in the queue until the queue is empty.
3. Sweeping phase:
o Scan all objects in memory.
o If an object is not marked, it's considered garbage and can be reclaimed.
o Deallocate the memory associated with the garbage object.
Benefits:
 Works well with cyclic data structures: Can handle objects involved in circular
references, which can be challenging for reference counting.
 Conservative approach: Avoids false positives by only marking objects explicitly
reachable through the execution trace.
 Potentially more efficient for large objects: Can be more efficient for large objects
with relatively few references, as tracing only follows the actual connections.
Limitations:
 Can be slower for frequent object creation and deletion: The tracing process can
become expensive if there's frequent object creation and deletion, leading to more
tracing overhead.
 Memory overhead for marking bits: Requires additional memory space to store
marking information for each object.
Applications:
 Trace-based collection is widely used in real-time systems due to its conservative
approach and ability to handle cyclic references.
 It often finds applications in languages like Java and JavaScript, where automatic
memory management is crucial.
Further Exploration:
Learning about specific implementations of trace-based collection algorithms like mark-and-
sweep or mark-compact will provide a deeper understanding of their functionalities and
trade-offs in different scenarios.

Here are the main points from the provided text, simplified for clarity:
Challenges in Code Generation:
1. Input and Output Formats:
 Input: Intermediate code and symbol table information.
 Output: Machine language (absolute or relocatable) or assembly language.
2. Memory Management:
 Mapping names to data object addresses by both the front end and code
generator.
3. Instruction Selection:
 Choosing optimal instructions to improve program efficiency.
 Considering instruction completeness, uniformity, speeds, and machine
idioms.
4. Register Allocation:
 Efficiently utilizing registers for faster computations.
 Involves both allocation and subsequent assignment of registers.
5. Evaluation Order:
 Deciding the order of instruction execution to optimize code efficiency.
 Affects the number of registers needed for holding intermediate results.
Approaches to Code Generation:
 Design goals include correctness, maintainability, testability, and efficiency.
 Code generators should handle special cases effectively and produce correct code
consistently.
Disadvantages:
1. Limited Flexibility:
 Code generators may be tailored for specific tasks or platforms, limiting their
adaptability to diverse inputs or target platforms.
2. Maintenance Overhead:
 Code generators require ongoing maintenance and updates alongside the
code they generate, potentially increasing complexity and introducing errors.
3. Debugging Difficulties:
 Debugging generated code may be challenging due to its complexity and lack
of readability compared to hand-written code.

https://www.geeksforgeeks.org/issues-in-the-design-of-a-code-generator/

Target Language:
 The target language refers to the programming language or machine language into
which the compiler translates the source code.
 It could be absolute machine language, relocatable machine language, or assembly
language.
 Absolute machine language is executable directly and can be placed in fixed
memory locations.
 Relocatable machine language allows separate compilation of subprograms and
linking by a linking loader.
 Assembly language makes code generation easier, allowing for symbolic
instructions and leveraging assembler macro facilities.
Addresses in the Target Code:
 Addresses in the target code refer to the memory locations or offsets used to
access data objects or instructions.
 The compiler maps names in the source program to addresses of data objects in
the target machine's memory.
 The symbol table provides information about the run-time addresses of data
objects.
 Relative addresses or offsets may be determined for names based on their symbol
table entries.
 Efficient memory management involves mapping names to their respective
memory addresses in the target code.
 Proper addressing ensures correct data access and execution of instructions in the
generated code.

https://chat.openai.com/share/1a47e80a-a969-4bcc-8d05-c185c3ef96e8
https://www.geeksforgeeks.org/peephole-optimization-in-compiler-design/
https://chat.openai.com/c/1d9e2cff-f465-496f-a5fd-f4caaa2ccf8e

UNIT – 2

https://www.tutorialspoint.com/compiler_design/compiler_design_top_down_parser.htm

LL(1): https://youtu.be/oSCqHq5zslw?si=sGNSyGMaQQiYO7XE

First and FolloW: https://youtu.be/DqdK8SKpajM?si=GGSL7CZ5iPf2uumf

LR(0): https://youtu.be/zaomF0mznQo?si=xNkA8YFdjFtPO3y5
Most Powerful Grammer:

LR parsers are more powerful than LL parsers due to their ability to handle a broader class of
grammars, specifically the class of LR(k) grammars, where "k" denotes the number of lookahead
symbols. There are several types of LR parsers that differ in their power and efficiency. Let's discuss
some of them:

1. Simple LR (SLR) Parser:

 SLR parsers are the simplest form of LR parsers.

 They have a parsing table constructed from the LR(0) items of the grammar.

 SLR parsers are less powerful than other LR parsers but are easier to construct and
use.

 They can handle a subset of LR(0) grammars, called SLR(1) grammars.

2. Canonical LR (CLR) Parser:

 CLR parsers are based on the canonical collection of LR(1) items.

 They are more powerful than SLR parsers and can handle a larger class of
grammars.

 CLR parsers can handle any LR(1) grammar, but their construction can be more
complex.

3. LALR Parser (Look-Ahead LR Parser):


 LALR parsers are based on the "Look-Ahead" LR(1) sets.

 They are more powerful than SLR parsers but less powerful than CLR parsers.

 LALR parsers can handle a larger class of grammars than SLR parsers while being
more efficient in terms of parsing table size and construction.

 LALR parsers are widely used in practice due to their balance between power and
efficiency.
UNIT - 1

The Role of the Lexical Analyzer:


Lexical analysis, also known as scanning or tokenization, is a fundamental phase in the compilation
process with several significant contributions to the overall functioning of a compiler:

1. Tokenization: Lexical analysis breaks down the source code into a sequence of tokens, which
are the basic building blocks of the programming language. Tokens represent meaningful
units such as keywords, identifiers, literals, and punctuation symbols. This tokenization
simplifies subsequent phases of compilation by providing a structured representation of the
code.

2. Syntax Parsing: The output of the lexical analysis phase is typically fed into the syntax parser.
The parser relies on the tokens produced by the lexical analyzer to recognize the grammatical
structure of the code according to the rules of the programming language. By providing well-
defined tokens, lexical analysis facilitates accurate parsing, enabling the compiler to
understand the syntax of the source code.

3. Error Detection: Lexical analysis plays a crucial role in detecting lexical errors in the source
code, such as misspelled identifiers, invalid characters, or misplaced symbols. By identifying
these errors early in the compilation process, the lexical analyzer helps improve the overall
reliability of the compiler and aids developers in debugging their code.

4. Whitespace and Comment Handling: During lexical analysis, whitespace characters (e.g.,
spaces, tabs, newlines) and comments are typically ignored. This simplifies subsequent
processing stages by removing irrelevant elements from the source code. Additionally,
comments may be preserved as metadata for documentation or debugging purposes.

5. Efficiency: Efficient lexical analysis is essential for overall compiler performance. The lexical
analyzer should be designed to process the input source code quickly, as it serves as the first
stage in the compilation pipeline. By efficiently tokenizing the source code, the lexical
analyzer contributes to the overall speed and responsiveness of the compiler.

6. Source Code Normalization: In some cases, lexical analysis may perform source code
normalization tasks, such as converting all characters to a common case or handling
character encoding issues. This ensures consistency and simplifies subsequent processing
stages, such as semantic analysis and code generation.

In summary, lexical analysis is a critical phase in the compilation process that transforms raw source
code into a structured sequence of tokens, facilitating accurate parsing, error detection, and efficient
processing by subsequent compiler stages. Its significance lies in its ability to lay the foundation for
the interpretation and transformation of source code into executable programs.
Lex:
https://www.javatpoint.com/lex
Input Buffer:
https://www.geeksforgeeks.org/input-buffering-in-compiler-
design/

You might also like