[go: up one dir, main page]

0% found this document useful (0 votes)
25 views30 pages

CD Unit 4

Uploaded by

shireeshasri2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views30 pages

CD Unit 4

Uploaded by

shireeshasri2004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

UNIT 4

Syllabus UNIT - IV Run-Time Environments: Storage organization, Stack Allocation of


Space, Access to Nonlocal Data on the Stack, Heap Management, Introduction to Garbage
Collection, Introduction to Trace-Based Collection. Code Generation: Issues in the Design of
a Code Generator, The Target Language, addresses in the Target Code, Basic Blocks and
Flow Graphs, Optimization of Basic Blocks, A Simple Code Generator, Peephole
Optimization, Register Allocation and Assignment, Dynamic Programming Code-Generation.
========================================================================

Run Time Environment


When a program is written in either C, C++ or java, when we want to save it, it normally
saves in Hard Disk. When we want to execute the program, it should be in main memory.
In this juncture, Compiler demands Operating System a block of memory for running the
compiled program. This block of memory is called run time storage. The block has for
parts.
Runtime storage can be subdivided to hold:
 Code area- the program code, it is static as its size can be determined at compile
time
 Static data objects
 Dynamic data objects- heap
 Automatic data objects- stack

 Code Area: The size of generated code is fixed. The code occupies the static
determined area of the memory. Compiler stores the code in the lower address of
the block.
 Data objects: Space required by the data objects is said to be compile time and the
data objects are placed at the statically determined area of the memory.
 Stack is used to manage the active procedures that is when a call occurs, the normal
execution of activation is interrupted and information about the status of the stack
is saved on the stack. When control comes back from the call, the suspended
activation is resumed. The Program counter points the instruction to be executed.
The information is stored in the stack area of run time storage. It grows towards the
high address in the free space.
 Heap are is the area of the run time storage in which the other information is stored.
Heap grows towards the low address in the free space.

DR.VSR 1
Storage Allocation
1. Static Storage Allocation
 For any program if we create memory at compile time, memory will be created in the
static area.
 For any program if we create memory at compile time only, memory is created only
once.
 It doesn’t support dynamic data structure i.e memory is created at compile time and
deallocated after program completion.
 The drawback with static storage allocation is recursion is not supported.
 Another drawback is size of data should be known at compile time
Example: i) int arr [200]; size can’t be reduced during compile time
ii) Int arr [ 10]; size can’t be increased during the compile time.
 FORTRAN was designed to permit static storage allocation.
2. Stack Storage Allocation
 Storage is organised as a stack and activation records are pushed and popped
as activation begins and ends respectively. Locals are contained in activation
records so they are bound to fresh storage in each activation.
 Recursion is supported in stack allocation.
3. Heap Storage Allocation
 Memory allocation and deallocation can be done at any time and at any place depending
on the requirement of the user.
 Heap allocation is used to dynamically allocate memory to the variables and deallocates
back when the variables are no more required.
 Recursion is supported.
Comparison among the Static, Stack and Heap Allocations
Static Allocation Stack Allocation Heap Allocation
1 Static allocation is done Stack is used to manage Heap is used to manage
for all The data objects run Time storage dynamic memory allocation
at compile time
2 Since static is its Data structures and Data structures and data
behaviour, it cannot data objects can be objects can be created
create dynamic data created dynamically dynamically
structures
3 The names of data Stack follows LIFO A contiguous block of
objects are bound to approach, activation memory from heap is
the storage at compile records and data objects allocated for activation
time. are pushed on to the record or data object. It
stack. maintains a linked list for
free blocks.
4 i) This allocation i) Supports dynamic i)Since maintaining linked
strategy is simple to allocation but slower list stratety, deallocated
implement but than static execution. memory can be reused.
supports static ii)supports recursion ii) Recursion is supported
allocation. but references to non- iii)But since memory block
ii) since static in local variables after is allocated using the best
nature, it does not activation record can fit, holes may occur.
support recursion. not be retained.

DR.VSR 2
Activation Record
An activation record is another name for Stack Frame. It's the data structure that
composes a call stack. The content of the activation record is shown below:

The activation record is a block of memory used for managing the information needed by
a single execution of a procedure. Various fields f activation record are:

 Temporary variables
 Local variables
 Saved machine registers
 Control link
 Access link
 Actual parameters
 Return values

Return Value: It is used by calling procedure to return a value to calling procedure.


Actual Parameter: It is used by calling procedures to supply parameters to the called
procedures.
Control Link: It points to activation record of the caller.
Access Link: It is used to refer to non-local data held in other activation records.
Saved Machine Status: It holds the information about status of machine before the
procedure is called.
Local Data: It holds the data that is local to the execution of the procedure.
Temporaries: It stores the value that arises in the evaluation of an expression.

Access to Non-local Names


 In some cases, when a procedure refers to variables that are not local to it, then
such variables are called nonlocal variables
 There are two types of scope rules, for the non-local names. They are
1. Static Scope Rule
2. Dynamic Scope Rule

DR.VSR 3
Static Scope or Lexical Scope
 Lexical scope is also called static scope. In this type of scope, the scope is verified
by examining the text of the program.
Examples: PASCAL, C and ADA are the languages that use the static scope rule.
 These languages are also called block structured languages
Block
 A block defines a new scope with a sequence of statements that contains the local
data declarations. It is enclosed within the delimiters.
Example
{
statements
……….
}
 The beginning and end of the block are specified by the delimiter. The blocks can be
in nesting fashion that means block B2 can be inside the block B1.
 In a block structured language, scope declaration is given by static rule or most
closely nested loop.
 At a program point, declarations are visible.
The declarations that are made inside the procedure.
The names of all enclosing procedures.
The declarations of names made immediately within such procedures.
 The displayed image on the screen shows the storage for the names corresponding
to particular block.
 Thus, block structure storage allocation can be done by stack.

DR.VSR 4
Lexical Scope for Nested Procedure
 If a procedure is declared inside of another procedure then that procedure is known
as nested procedure
 A procedure pi, can call any procedure, i.e., its direct ancestor or older siblings of
its direct ancestor
Procedure main
Procedure P1
Procedure P2
Procedure P3
Procedure P4

Nesting Depth
Lexical scope can be implemented by using nesting depth of a procedure. The procedure
of calculating nesting depth is as follows:
The main programs nesting depth is ‘1’
When a new procedure begins, add ‘1’ to nesting depth each time
When you exit from a nested procedure, subtract ‘1’ from depth each time
The variable declared in specific procedure is associated with nesting depth
Static Scope or Lexical Scope
 The lexical scope can be implemented using access link and displays.
Access Link
 Access links are the pointers used in the implementation of lexical scope which is
obtained by using pointer to each activation record
 If procedure p is nested within a procedure q then access link of p points to access
link or most recent activation record of procedure q
Example: Consider the following piece of code and the runtime stack during execution of
the program
program test;
var a: int;
procedure A;
var d: int;
{
a := 1,

DR.VSR 5
}
procedure B(i: int);
var b : int;
procedure C;
var k : int;
{
A;
}
{
if(i<>0) then B(i-1)
else C;
}
{
B(1);
}
Displays
 If access links are used in the search, then the search can be slow
 So, optimization is used to access an activation record from the direct location of
the variable without any search
 The variable declared in specific procedure is associated with nesting depth

How to maintain display information?

DR.VSR 6
 When a procedure is called, a procedure ‘p’ at nesting depth ‘i’ is setup:
Save value of d[i] in activation record for ‘p’
‘I’ set d[i] to point to new activation record
 When a ‘p’ returns:
Reset d[i] to display value stored
The display can be maintained
 Registers
 In statically allocated memory (data segment)
 Store display or control stack and create a new copy on each entry

Symbol Table is an important data structure created and maintained by the compiler
in order to keep track of semantics of variable i.e. it stores information about scope and
binding information about names, information about instances of various entities such as
variable and function names, classes, objects, etc.
 It is built in lexical and syntax analysis phases.
 The information is collected by the analysis phases of compiler and is used by
synthesis phases of compiler to generate code.
 It is used by compiler to achieve compile time efficiency.
 It is used by various phases of compiler as follows :-
1. Lexical Analysis: Creates new table entries in the table, example like entries
about token.
2. Syntax Analysis: Adds information regarding attribute type, scope, dimension,
line of reference, use, etc in the table.
3. Semantic Analysis: Uses available information in the table to check for
semantics i.e. to verify that expressions and assignments are semantically
correct (type checking) and update it accordingly.
4. Intermediate Code generation: Refers symbol table for knowing how much and
what type of run-time is allocated and table helps in adding temporary variable
information.
5. Code Optimization: Uses information present in symbol table for machine
dependent optimization.
6. Target Code generation: Generates code by using address information of
identifier present in the table.
 Symbol Table entries – Each entry in symbol table is associated with attributes
that support compiler in different phases.
Items stored in Symbol table:
 Variable names and constants
 Procedure and function names
 Literal constants and strings
 Compiler generated temporaries
 Labels in source languages
Information used by compiler from Symbol table:
 Data type and name
 Declaring procedures
 Offset in storage
 If structure or record then, pointer to structure table.
 For parameters, whether parameter passing by value or by reference
 Number and type of arguments passed to function
 Base Address
Operations of Symbol table – The basic operations defined on a symbol table include

DR.VSR 7
Operation Function
allocate to allocate a new empty symbol table
Free to remove all entries and free storage of symbol table
lookup to search for a name and return pointer to its entry
Insert to insert a name in a symbol table and return a pointer to its entry
set _attribute to associate an attribute with a given entry
get_attribute to get an attribute associated with a given entry

Implementation of Symbol table – Following are commonly used data


structure for implementing symbol table.
1. List
 In this method, an array is used to store names and associated information.
 A pointer “available” is maintained at end of all stored records and new names
are added in the order as they arrive
 To search for a name we start from beginning of list till available pointer and if
not found we get an error “use of undeclared name”
 While inserting a new name we must ensure that it is not already present
otherwise error occurs i.e. “Multiple defined name”
 Insertion is fast O(1), but lookup is slow for large tables – O(n) on average
 Advantage is that it takes minimum amount of space.
2. Linked List
 This implementation is using linked list. A link field is added to each record.
 Searching of names is done in order pointed by link of link field.
 A pointer “First” is maintained to point to first record of symbol table.
 Insertion is fast O(1), but lookup is slow for large tables – O(n) on average
3. Hash Table
 In hashing scheme two tables are maintained – a hash table and symbol table
and is the most commonly used method to implement symbol tables.
 A hash table is an array with index range: 0 to table size – 1. These entries are
pointer pointing to names of symbol table.
 To search for a name, we use hash function that will result in any integer
between 0 to table size – 1.
 Insertion and lookup can be made very fast – O(1).
 Advantage is quick search is possible and disadvantage is that hashing is
complicated to implement.
4. Binary Search Tree
 Another approach to implement symbol table is to use binary search tree i.e.
we add two link fields i.e. left and right child.
 All names are created as child of root node that always follow the property of
binary search tree.
 Insertion and lookup are O(log2 n) on average.

DR.VSR 8
Heap Management
The heap is the portion of the store that is used for data that lives indefinitely, or until the
program explicitly deletes it. While local variables become inaccessible when their
procedures end, many languages enable to create objects or other data whose existence is
not tied to the procedure activation that creates them. For example, both C + + and Java
give the programmer new to create objects that may be passed — or pointers to them may
be passed — from procedure to procedure, so they continue to exist long after the
procedure that created them is gone. Such objects are stored on a heap.
1. The Memory Manager
The memory manager keeps track of all the free space in heap storage at all times. It
performs two basic functions:
• Allocation: When a program requests memory for a variable or object, the memory
manager produces a piece of contiguous heap memory of the requested size. If possible, it
satisfies an allocation request using free space in the heap, if no chunk of the needed size
is available, it seeks to increase the heap storage space by getting consecutive bytes of
virtual memory from the operating system. If space is exhausted, the memory manager
passes that information back to the application program.
• Deallocation: The memory manager returns deallocated space to the pool of free space,
so it can reuse the space to satisfy other allocation requests. Memory managers do not
return memory to the operating system, even if the program's heap usage drops.

The properties of memory managers


• Space Efficiency: A memory manager minimizes the total heap space needed by a
program. It allows larger programs to run in a fixed virtual address space. Space efficiency
is achieved by minimizing fragmentation.
• Program Efficiency: A memory manager make use of the memory subsystem to allow
programs to run faster. The time taken to execute an instruction is vary depending on
where objects are placed in memory. By attention to the placement of objects in memory,
the memory manager makes better use of space to make the program run faster.
• Low Overhead: Because memory allocations and deallocations are frequent operations
in many programs, it is important that these operations should be as efficient as possible.
So, to minimize the overhead — the fraction of execution time spent performing allocation
and deallocation. Notice that the cost of allocations is dominated by small requests, the
overhead of managing large objects is less important, because it usually over a larger
amount of computation.
Here are the properties we desire of memory managers:
• Space Efficiency: A memory manager should minimize the total heap space needed by a
program. Doing so allows larger programs to run in a fixed virtual address space. Space
efficiency is achieved by minimizing "fragmentation," discussed in Section 7.4.4.

DR.VSR 9
• Program Efficiency: A memory manager should make good use of the memory subsystem
to allow programs to run faster. As we shall see in Section 7.4.2, the time taken to execute an
instruction can vary widely depending on where objects are placed in memory. Fortunately,
programs tend to exhibit "locality," a phenomenon discussed in Section 7.4.3, which refers to
the non-random clustered way in which typical programs access memory. By attention to the
placement of objects in memory, the memory manager can make better use of space and,
hopefully, make the program run faster.
• Low Overhead. Because memory allocations and deallocations are frequent operations in
many programs, it is important that these operations be as efficient as possible. That is, we
wish to minimize the overhead — the fraction of execution time spent performing allocation
and deallocation. Notice that the cost of allocations is dominated by small requests; the
overhead of managing large objects is less important, because it usually can be amortized over
a larger amount of computation.
Garbage Collection
 Garbage collection is the process of finding spaces within the heap that are no longer
used by the program and can be reallocated to other data items.
 The garbage collector finds these unreachable objects and reclaims their space by
handing them to the memory manager, which keeps track of the free space.
 For languages like Java, it is the garbage collector that deallocates memory.

A Basic Requirement: Type Safety


 All languages do not have the feature of automatic garbage collection. To work
garbage collector, it must be informed whether any given data element is used as a
pointer to a piece of allocated memory space. A language in which the type of any
data component can be determined is said to be type safe. There are type-safe
languages like ML, for which we can determine types at compile time. There are
other type-safe languages, like Java, whose types cannot be determined at compile
time, but can be determined at run time. The latter are called dynamically
typed languages. If a language is neither statically nor dynamically type safe, then
it is said to be unsafe.

Performance Metrics
 Garbage collection is often so expensive that, although it was invented decades ago
and absolutely prevents memory leaks, it has yet to be adopted by many mainstream
programming languages. Many different approaches have been proposed over the
years, and there is not one clearly best garbage-collection algorithm. Before
exploring the options, let us first enumerate the performance metrics that must be
considered when designing a garbage collector.

 Overall Execution Time: Garbage collection is very slow. It is important that it does
not increases the total run time of an application. Since the garbage collector
necessarily must touch a lot of data, its performance is determined greatly by how
it leverages the memory subsystem.

DR.VSR 10
 Space Usage. It is important that garbage collection avoid fragmentation and makes
the best use of the available memory.

 Pause Time. Simple garbage collectors are notorious for causing program to pause
suddenly for an extremely long time, as garbage collection ignores them without
warning. Thus, besides minimizing the overall execution time, it is desirable that
the maximum pause time be minimized.

 Program Locality. We cannot evaluate the speed of a garbage collector solely by its
running time. The garbage collector controls the placement of data and influences
the data locality of the mutator program. It can improve a mutator's temporal
locality by freeing up space and reusing it; it can improve the mutator's spatial
locality by relocating data used together in the same cache or pages.

2. Reachability
 All the data that can be accessed directly by a program, without having to
dereference any pointer, as the root set. A program can reach any member of its root
set at any time. Recursively, any object with a reference that is stored in the field
members or array elements of any reachable object is reachable.

 Reachability becomes a bit complex when the program has been optimized by the
compiler. First, a compiler may keep reference variables in registers. Second, even
though in a type-safe language programmer do not get to manipulate memory
addresses directly, a compiler often does so for the sake of speeding up the code. To
optimize compiler to enable the garbage collector to find the correct root set.

 The compiler can restrict the invocation of garbage collection to certain code points
in the program, when no hidden references exist.

 The compiler can write information that the garbage collector can use to recover all
the references, such as specifying which registers contain references, or how to
compute the base address of an object that is given an internal address.

3. Reference Counting Garbage Collectors


Consider a simple, although imperfect, garbage collector, based on reference counting,
which identifies garbage as an object changes from being reachable to unreachable, the
object can be deleted when its count drops to zero. Reference counts can be maintained
as follows:

 Object Allocation. The reference count of the new object is set to 1.


 Parameter Passing. The reference count of each object passed into a procedure is
incremented.

DR.VSR 11
 Reference Assignments. For statement u = v, where u and v are references, the
reference count of the object referred to by v goes up by one, and the count for the
old object referred to by u goes down by one.
 Procedure Returns. As a procedure exits, all the references held by the local variables
of that procedure activation record must also be decremented. If several local
variables hold references to the same object, that object's count must be
decremented once for each such reference.
 Transitive Loss of Reachability. Whenever the reference count of an object becomes
zero, decrement the count of each object pointed to by a reference within the object.

Tree-Based Collection
Instead of collecting garbage, trace-based collectors run periodically to find unreachable
objects and reclaim their space. To run the trace-based collector, the free space is
exhausted or its amount drops below some threshold.
It contains a number of improvements, including in which object relocation is a part of
the garbage-collection function.
1. A Basic Mark-and-Sweep Collector
Mark-and-sweep garbage-collection algorithms are straightforward, stop-the-world
algorithms that find all the unreachable objects and put them on the list of free space.
The below algorithm visits and "marks" all the reachable objects in the first tracing step
and then "sweeps" the entire heap to free up unreachable objects.

INPUT: A root set of objects, a heap, and a free list, called Free, with all the unallocated
chunks of the heap.
OUTPUT: A modified Free list after all the garbage has been removed.

DR.VSR 12
2. Basic Abstraction
All trace-based algorithms compute the set of reachable objects and then take the
complement of this set. Memory is recycled as follows:
The program runs and makes allocation requests.
The garbage collector discovers reachability by tracing.
The garbage collector reclaims the storage for unreachable objects.

1. Free, Unreached, Unscanned, and Scanned. The state of a chunk is stored in


the chunk, or it may implicit in the data structures used by the garbage-
collection algorithm.
While trace-based algorithms may differ in their implementation, they can all
be described in terms of the following states:
2. Free. A chunk is in the Free state if it is ready to be allocated. Thus,
a Free chunk must not hold a reachable object.
Unreached. Chunks are presumed unreachable, unless proven reachable by
tracing. A chunk is in the Unreached state at any point during garbage
3. Unscanned: Chunks that are known to be reachable either in
state Unscanned or state Scanned. A chunk is in the Unscanned state if it is
known to be reachable, but its pointers have not yet been scanned. The
transition to Unscanned from Unreached occurs when we discover that a
chunk is reachable.
4. Scanned: Every Unscanned object is scanned and transition to the Scanned
state. To scan an object to examine each of the pointers within it and follow
those pointers to the objects to which they refer. If a reference is to an
Unreached object, then that object is put in the Unscanned state. When the
scan of an object is completed, that object is placed in the Scanned state.

DR.VSR 13
3. Optimizing Mark-and-Sweep
The final step in the basic mark-and-sweep algorithm is expensive because there is no
easy way to find only the unreachable objects without examining the entire heap. An
improved algorithm of Baker keeps a list of all allocated objects. To find the set of
unreachable objects, that returns to free space.

Lines (1) and (2) initialize Scanned to be the empty list and Unscanned to have only the
objects reached from the root set. These objects were presumably on the
list Unreached and must be removed from there.
Lines (3) through (7) are a straightforward implementation of the basic marking algorithm,
using these lists.
The for-loop of lines (5) through (7) examines the references in one unscanned
object o, and if any of those references o' have not yet been reached and line (7) changes
o' to the Unscanned state.
At the end, line (8) takes those objects that are still on the Unreached list and deallocates
their chunks, by moving them to the Free list.
Line (9) takes all the objects in state Scanned, which are the reachable objects, and
reinitializes the Unreached list to be exactly those objects. Presumably, as the memory
manager creates new objects, those too will be added to the Unreached list and removed
from the Free list.

4. Mark-and-Compact Garbage Collectors


Relocating collectors move reachable objects around in the heap to eliminate memory
fragmentation. It is common that the space occupied by reachable objects is smaller than
the freed space. After identifying all the holes, instead of freeing them individually, one
attractive alternative is to relocate all the reachable objects into one end of the heap,
leaving the entire rest of the heap as one free chunk. After all, the garbage collector has
already analysed every reference within the reachable objects, so updating them to point

DR.VSR 14
to the new locations does not require much more work. These, plus the references in the
root set, are all the references we need to change.
The mark-and-compact collector in Algorithm 7.15 has three phase
 First is a marking phase, similar to that of the mark-and-sweep algorithms.

 Second, the algorithm scans the allocated section of the heap and computes a new
address for each of the reachable objects. New addresses are assigned from the low
end of the heap, so there are no holes between reach-able objects. The new address
for each object is recorded in a structure called New Location.

 Finally, the algorithm copies objects to their new locations, updating all references
in the objects to point to the corresponding new locations. The needed addresses
are found in New Location.

DR.VSR 15
5. Copying collectors
A copying collector reserves, ahead of time, space to which the objects can move, thus
breaking the dependency between tracing and finding free space.
The memory space is partitioned into two semispaces, A and B. The mutator allocates
memory in one semispace, A, until it fills up, at which point the mutator is stopped and
the garbage collector copies the reachable objects to the other space, B. When garbage
collection completes, the roles of the semispaces are reversed. The mutator is allowed to
resume and allocate objects in space B and the next round of garbage collection moves
reachable objects to space A.

Line (2) establishes that none of the objects in the From space have new addresses yet.
At line (3), two pointers are initialized, unscanned and free, to the beginning of
the To semispace. Pointer free will always indicate the beginning of free space within
the To space and add objects to the To space.
Lines (4) and (5) handle the objects reachable from the root set.
Note that as a side effect, some of the calls to LookupNewLocation at line (5) will increase
free, as chunks for these objects are allocated within To.
The loop of lines (6) through (10) will be entered the first time it is reached, unless there
are no objects referenced by the root set, in which case the entire heap is garbage. This

DR.VSR 16
loop then scans each of the objects that has been added to To and is in the Unscanned
state.
Line (7) takes the next unscanned object, o.
At lines (8) and (9), each reference within o is translated from its value in the from
semispace to its value in the Tosemispace. Notice that, as a side effect, if a reference within
o is to an object that has not reached previously.
Then the call to LookupNewLocation at line (9) creates space for that object in the To space
and moves the object there.
Finally, line (10) increments unscanned to point to the next object, just beyond o in the
To space.

Code Generation - Algorithm


Code generation is the final phase of compilation. The code generator accepts optimized
intermediate and produces machine code as output.

1. Code generator generates target code for a sequence of instructions.


2. It uses a functions getReg ( ) to assign registers to variables those are in a
program.
3. It uses two data structures
i) Register Descriptor: It used to keep track of which variable is stored
in which register such information is provided by the Register
Descripter. Initially all registers are empty.
ii) Address Descriptor: It used to keep track of the location of variable
is stored, location may be register, memory address or stack.
The following actions are performed by code generator for the instruction x = y op z and
let us assume L is the location where the output is saved.
Step1: call the function getReg ( ) to get the location of L.
Step 2: Determine the present location of y by consulting address location of y. If y is not
present in the location L then generate the instruction mov y’, L to copy value of
y to L. Note: y’ means (address descriptor) the value of y.
Step 3: The present location of z is determined using step 2 and the instruction is
generated as op z’ , L.
Step 4: Now L contains the value of y op z i.,e assigned to x. So if L is a register then
update its descriptor that contains the value of x update address descriptor of x to
indicate that it is stored in x.
Step 5: If y and z have no future use then update the descriptors and remove them.
Example: d = ( a – b + ( a – c) + (a – c)
Three Address Code
t1=a–b
t2 = a – c

DR.VSR 17
t3 = t1 +t2
d = t3 + t2

Statements Code Generator Register Descriptor Address Descriptor


t 1= a - b MOV a, R0 R0 contains t1 t 1 in R0
SUB b, R0
t2=a-c MOV a, R1 R0 contains t1 t 1 in R0
SUB c, R1 R1 contains t2 t 2 in R1
T 3 = t1 + t2 ADD R1, R0 R0 contains t3 t 3 in R0
R1 contains t2 t 2 in R1
d = t3 + t2 ADD R1, R0 R0 contains d d in R0

Issues in Design of Code generation


1.Input to code generator
 The input to code generator is the intermediate code generated by the front end,
along with information in the symbol table that determines the run-time addresses
of the data-objects denoted by the names in the intermediate representation.

 Intermediate codes may be represented mostly in quadruples, triples, indirect


triples, Postfix notation, syntax trees, DAG’s, etc. The code generation phase just
proceeds on an assumption that the input is free from all of syntactic and state
semantic errors, the necessary type checking has taken place and the type-
conversion operators have been inserted wherever necessary.

2.Target program The target program is the output of the code generator. The output may
be absolute machine language, relocatable machine language, assembly language.
 Absolute machine language as output has advantages that it can be placed in a fixed
memory location and can be immediately executed.
 Relocatable machine language as an output allows subprograms and subroutines to
be compiled separately. Relocatable object modules can be linked together and loaded
by linking loader. But there is added expense of linking and loading.
 Assembly language as output makes the code generation easier. We can generate
symbolic instructions and use macro-facilities of assembler in generating code. And
we need an additional assembly step after code generation.

3.Memory Management
 Mapping the names in the source program to the addresses of data objects is done
by the front end and the code generator.
 A name in the three address statements refers to the symbol table entry for name.
Then from the symbol table entry, a relative address can be determined for the
name.
4.Instruction selection
 Selecting the best instructions improves the efficiency of the program. It includes
the instructions that should be complete and uniform.
 Instruction speeds and machine idioms also plays a major role when efficiency is
considered. But if we do not care about the efficiency of the target program then
instruction selection is straight-forward.
Eg: The respective three-address statements are translated into the latter code
sequence as shown below:

DR.VSR 18
P=Q+R
S+P+T

MOV Q, R0
ADD R, R0
MOV R0, P
MOV P, R0
ADD T, R0
MOV R0, S
The fourth statement is redundant as the value of the P is loaded again in that statement
that just has been stored in the previous statement. It leads to an inefficient code
sequence. A given intermediate representation can be translated into many code
sequences, with significant cost differences between the different implementations. A prior
knowledge of instruction cost is needed in order to design good sequences, but accurate
cost information is difficult to predict.
5.Register allocation issues Use of registers make the computations faster in comparison
to that of memory, So efficient utilization of registers is important. The use of registers is
subdivided into two subproblems:
i) During Register allocation – selection set of variables that are residing in the
registers at each point in the program.
ii) During a subsequent Register assignment phase, the specific register is picked to
access the variable.
As the number of variables increases, the optimal assignment of registers to variables
becomes difficult. Mathematically, this problem becomes NP-complete. Certain machine
requires register pairs consist of an even and next odd-numbered register.
For example M a, b
These types of multiplicative instruction involve register pairs where the multiplicand is
an even register and b, the multiplier is the odd register of the even/odd register pair.
6.Evaluation order
The code generator decides the order in which the instruction is to be executed. The order
of computations affects the efficiency of the target code. Among many computational
orders, some require only fewer registers to hold the intermediate results. However,
picking the best order in the general case is a difficult NP-complete problem.

Target Language
Familiarity with the target machine and its instruction set is needed for designing a code
generator.
Code generator techniques:
1. A Simple Target Machine Model
2. Program and Instruction Costs
A Simple Target Machine Model
Target computer models a three-address machine with load and store operations,
computation operations, jump operations, and conditional jumps.

DR.VSR 19
 Load operations: The instruction LD dst, addr loads the value in location addr into
location dst This instruction denotes the assignment dst = addr. The most common
form of this instruction is LD r, x which loads the value in location x into register
r. An instruction of the form LD r1, r2 is a register-to-
register copy in which the contents of register r2 are copied into register r1.

 Store operations: The instruction ST x, r stores the value in register r into the
location x. This instruction denotes the assignment x = r.

 Computation operations of the form OP dst, src1, src2, where OP is a operator


like ADD or SUB and dst, src1, and src2 are locations. The effect of this machine
instruction is to apply the operation represented by OP to the values in locations
src1 and src2 and place the result of this operation in location dst. Example: SUB
r1, r2, r3 computes r1 = r2 - r3. Any value formerly stored in r1 is lost, but if
r1 is r2 or r3, the old value is read first. Unary operators that take only one
operand do not have a src2.

 Unconditional jumps: The instruction BR L causes control to branch to the


machine instruction with label L. (BR stands for branch.)

 Conditional jumps of the form Bcond r, L, where r is a register, L is a label and


cond stands for any of the common tests on values in the register r. For example,
BLTZ r, L causes a jump to label L if the value in register r is less than zero and
allows control to pass to the next machine instruction if not.
Program and Instruction Costs
Generally, a cost is associated with compiling and running a program. Depending
on what aspect of a program we are interested in optimizing, some common
cost measures are the length of compilation time and the size, running time and power
consumption of the target program.
For simplicity, we take the cost of an instruction to be one plus the costs associated with
the addressing modes of the operands. This cost corresponds to the length in words of the
instruction. Addressing modes involving registers have zero additional cost, while those
involving a memory location or constant in them have an additional cost of one, because
such operands have to be stored in the words following the instruction.
Some examples:
 The instruction LD RO, Rl copies the contents of register Rl into register
RO. This instruction has a cost of one because no additional memory words are
required.
 The instruction LD RO, M loads the contents of memory location M into register
RO. The cost is two since the address of memory location M is in the word following
the instruction.

DR.VSR 20
 The instruction LD R l, *100(R2) loads into register Rl the value given by contents
(contents (100 + contents (K2))). The cost is three because the constant 100 is
stored in the word following the instruction.

Machine-dependent Optimization
Machine-dependent optimization is done after the target code has been generated and
when the code is transformed according to the target machine architecture. It involves
CPU registers and may have absolute memory references rather than relative references.
Machine-dependent optimizers put efforts to take maximum advantage of memory
hierarchy.

Basic Blocks
 Source codes generally have a number of instructions, which are always executed
in sequence and are considered as the basic blocks of the code.
 These basic blocks do not have any jump statements among them, i.e., when the
first instruction is executed, all the instructions in the same basic block will be
executed in their sequence of appearance without losing the flow control of the
program.
A program can have various constructs as basic blocks, like IF-THEN-ELSE, SWITCH-
CASE conditional statements and loops such as DO-WHILE, FOR, and REPEAT-UNTIL,
etc.
Basic block identification
The following algorithm to find the basic blocks in a program:
 Search header statements of all the basic blocks from where a basic block starts:
o First statement of a program.
o Statements that are target of any branch (conditional/unconditional).
o Statements that follow any branch statement.

 Header statements and the statements following them form a basic block.
 A basic block does not include any header statement of any other basic block.
Basic blocks are important concepts from both code generation and optimization point of
view.

DR.VSR 21
Basic blocks play an important role in identifying variables, which are being used more
than once in a single basic block. If any variable is being used more than once, the register
memory allocated to that variable need not be emptied unless the block finishes execution.
Control Flow Graph
 Basic blocks in a program can be represented by means of control flow graphs.
 A control flow graph depicts how the program control is being passed among the
blocks.
 It is a useful tool that helps in optimization by help locating any unwanted loops in
the program.

Optimization of Basic Blocks


We know that a block is a set of statements they are executed one by one sequentially.
There are two types of basic block optimizations. They are :
1. Structure-Preserving Transformations
2. Algebraic Transformations

Structure-Preserving Transformations
i). Common sub-expression elimination
ii). Dead code elimination
iii). Renaming of temporary variables
iv). Interchange of two independent adjacent statements.

i). Common sub-expression elimination


Common sub expressions need not be computed often if values of them are not changed.
Instead they can be computed once and kept in store from where it’s referenced.

Example
a =b + c
b =a - d
c =b + c
d =a - d

The 2nd and 4th statements compute the same expression: b + c and a - d

DR.VSR 22
Basic block can be optimized to
a=b+c
b=a-d
c=a //since a value is not changed
d=b // b value is already computed.
ii). Dead code elimination
It is possible that a large amount of dead / useless code may exist in the program. This
may be caused when introducing variables and procedures as part of construction or
error-correction of a program - once declared and defined, one forgets to remove them in
case they serve no purpose. Eliminating these will definitely optimize the code.
Example
a=0
if ( a == 1)
a = a + 5;
in the above example, a is assigned 0 value. It is meaning less the check the condition if (
a == 1) since it is assigned 0 value. So to optimize the code if statement as well as its
statement can be eliminated.
So optimized code is a = 0;

iii). Renaming of temporary variables


A statement t =b+c where t is a temporary name can be changed to u:=b+c where u is
another temporary name, and change all uses of t to u. In this a basic block is transformed
to its equivalent block called normal-form block.

Example
t1 = b + c
t2 = a - t1
t1 = t1 * d
d = t2 + t1
since t1 value is modified in the third statement. So to store the value t3 is used instead
t1. So, the optimized block is
t1 = b + c
t2 = a - t1
t3 = t1 * d
d = t2 + t3

iv). Interchange of statements


Two instructions
t1 = b + c
t2 = x + y
can be interchanged or reordered in its computation in the basic block when value of t1
does not affect the value of t2.

DR.VSR 23
v). Algebraic Transformations:
Algebraic expressions represent another important class of optimizations on basic blocks.
This includes simplifying expressions or replacing expensive operation by cheaper ones
i.e. reduction in strength. Another class of related optimizations is constant folding. Here
we evaluate constant expressions at compile time and replace the constant expressions
by their values. Thus, the expression 2 * 3.14 would be replaced by 6.28.
The relational operators <=, >=, <, >, + and = sometimes generate unexpected common
sub expressions. Associative laws may also be applied to expose common sub expressions.
For example, if the source code has the assignments
Example
t1= a -a
t2 = b – t1
t3 = t1 * 2
in the first statement a – a is 0 so directly initialize t1 = 0
in the second statement t1 = 0 so no necessity makes subtraction operation. Directly t2 =
b can be assigned.
in the third statement t3 can be assigned b * 2
so the optimized code is
t3 = b * 2 is sufficient.
Peephole Optimization Techniques
Peephole optimization is a type of Code Optimization performed on a small part of the
code. It is performed on the very small set of instructions in a segment of code. The small
set of instructions or small part of code on which peephole optimization is performed is
known as peephole or window.

Goals of the technique


1. To improve performance
2. To reduce memory footprints
3. To reduce the size of the code.
Characteristics of peephole optimizations
 Redundant-store and load instructions elimination
 Unreachable
 Algebraic simplifications
 Flow-of-control optimizations
 Use of machine idioms
i). Redundant-store and load instructions elimination
The instructions sequence
(1) MOV R0, a
(2) MOV a, R0
The above instructions can be redundant because the first instruction store the value of
a into R0. The second instruction can not do anything special because R0 has a value if
once again stores the same value is replaced.

DR.VSR 24
ii). Unreachable Code
An unlabelled instruction immediately following an unconditional jump may be removed.
This operation can be repeated to eliminate a sequence of instructions. For example, for
debugging purposes, a large program may have within it certain segments that are
executed only if a variable debug is 1. In C, the source code might look like:
Example 1
Unoptimized code Optimized code
i=0
if( i == 1)
{ i = 0;
Sum = 0;
}
Example 2
int add(int a, int b)
{
int c = a + b;
return c; // After return statement, printf is given which never executes.
printf (“%d”, c);
}
iii). Flow of Control Statements Using Flow of Control Optimization, unnecessary
optimization jumps can be eliminated.
Example Inefficient Code
.
.
goto L1
.
.
L1: goto L2
.
.
L2: goto L3
.
.
L3: MOV a, R0 // instead of all this goto L3 at the begin is sufficient.
Algebraic Simplification
There is no end to the amount of algebraic simplification that can be attempted through
peephole optimization. Only few algebraic identities occur that it is worth considering
implementing them. For example, statements such as
x = x + 0 or
x=x*1
are often produced by straightforward intermediate code-generation algorithms and they
can be eliminated easily through peephole optimization.
Use of Idioms
The target machine may have hardware instructions to implement certain specific
operations efficiently. For example, some machines have auto-increment and auto-

DR.VSR 25
decrement addressing modes. These add or subtract one from an operand before or after
using its value. The use of these modes greatly improves the quality of code when pushing
or popping a stack, as in parameter passing. These modes can also be used in code for
statements like i = i + 1.

i=i+1 → i ++
i=i-1 → i--

Register Allocation and Assignment


Instructions involving only register operands are faster than those involving memory
operands. Efficient utilization of registers is important in generating good code.
One approach to register allocation and assignment is to assign specific values in the
target program to certain registers.
Example: To assign base addresses to one group of registers, arithmetic computations to
another, the top of the stack to a fixed register, and so on.
1. Global Register Allocation
 Frequently used variables and keep these registers consistent across block
boundaries (globally). Since programs spend most of their time in inner loops, a
natural approach to global register assignment is to try to keep a frequently used
value in a fixed register throughout a loop.
 One strategy for global register allocation is to assign some fixed number of registers
to hold the most active values in each inner loop. The selected values may be
different in different loops.
2. Usage Counts
Assume that the savings to be realized by keeping a variable x in a register for the duration
of a loop L is one unit of cost for each reference to x if x is already in a register. Thus,
count a savings of one for each use of x in loop L that is not preceded by an assignment
to x in the same block. Thus, if x is allocated a register, count a savings of two for each
block in loop L for which x is live on exit and in which x is assigned a value.
Example: Consider the basic blocks in the inner loop, where jump and conditional jump
statements have been omitted. Assume registers RO, Rl and R2 are allocated to hold values
throughout the loop. Variables live on entry into and on exit from each block.
For example, notice that both e and f are live at the end of B1, but only e is live on entry
to B2 and only f on entry to B3. In general, the variables live at the end of a block are the
union of those live at the beginning of each of its successor blocks.

DR.VSR 26
To evaluate the above equation for x = a, is live on exit from B1 and is assigned a value
but is not live on exit from B2, B3, or B4. Thus, ∑B in L use (a. B) = 2. Hence the value of
for x = a is 4. That is, four units of cost can be saved by selecting a for one of the global
registers. The values of for b, c, d, e and f are 5, 3, 6, 4, and 4, respectively. Thus, we may
select a, b, and d for registers RO, Rl, and R2, respectively. Using RO for e or f instead of
a would be another choice with the same benefit.

The assembly code generated is used to generate code for each block. We do not show the
generated code for the omitted conditional or unconditional jumps that end each and
therefore do not show the generated code as a single stream.

DR.VSR 27
Dynamic Programming Code-Generation
 The dynamic programming algorithm can be used to generate code for any machine
with r interchangeable registers R0, R1, ...., Rr—1 and load, store and add instructions.
For simplicity, every instruction cost one unit, although the dynamic programming
algorithm can easily be modified to work even if each instruction has its own cost.
The Dynamic Programming Algorithm proceeds in three phases
1. Compute bottom-up for each node n of the expression tree T an array C of costs, in
which the ith component C[i] is the optimal cost of computing the subtree S rooted
at n into a register, assuming i registers are available for the computation, for 1 < i
< r.
2. Traverse T, using the cost vectors to determine which subtrees of T must be
computed into memory.
3. Traverse each tree using the cost vectors and associated instructions to generate
the final target code. The code for the subtrees computed into memory locations is
generated first.
Example: Consider a machine having two registers RO and Rl, and the following
instructions, each of unit cost:
LD Ri, Mj // Ri = Mj
op Ri, Ri, Rj // Ri = Ri Op R j

op Ri, Ri, Mj // Ri = Ri Op Mj
LD Ri, Rj // Ri = Rj
ST Mi, Rj // Mi = Rj
In these instructions, Ri is either RO or Rl and Mj is a memory location. The operator op
corresponds to an arithmetic operator.
By applying the dynamic programming algorithm to generate optimal code for the syntax
tree in the given diagram.
 In the first phase, computing the cost vectors shown at each node. To illustrate this
cost computation, consider the cost vector at the leaf a. C[0], the cost of computing
a into memory, is 0 since it is already there.
 C[l], the cost of computing a into a register, is 1 since we can load it into a register
with the instruction LD RO, a. C[2], the cost of loading a into a register with two
registers available, is the same as that with one register available.
 The cost vector at leaf a is therefore (0,1,1).

DR.VSR 28
 Consider the cost vector at the root. First determining the minimum cost of
computing the root with one and two registers available.
 The machine instruction ADD RO, RO, M matches the root, because the root is
labelled with the operator +.
 Using this instruction, the minimum cost of evaluating the root with one register
available is the minimum cost of computing its right subtree into memory, plus the
minimum cost of computing its left subtree into the register, plus 1 for the
instruction.
 No other way exists. The cost vectors at the right and left children of the root show
that the minimum cost of computing the root with one register available is 5 + 2 +
1 = 8.
Now consider the minimum cost of evaluating the root with two registers available. Three
cases arise depending on the instruction is used to compute the root and in what order
the left and right subtrees of the root are evaluated.
1. Compute the left subtree with two registers available into register RO,
compute the right subtree with one register available into register Rl and use
the instruction ADD RO, RO, Rl to compute the root. This sequence has cost
2 + 5 + 1 = 8.
2. Compute the right subtree with two registers available into Rl, compute the
left subtree with one register available into RO and use the instruction ADD
RO, RO, Rl. This sequence has cost 4 + 2 + 1 = 7.
3. Compute the right subtree into memory location M, compute the left sub-tree
with two registers available into register RO and use the instruction ADD RO,
RO, M. This sequence has cost 5 + 2 + 1 = 8.
 The second choice gives the minimum cost 7.
From the cost vectors we can easily construct the code sequence by making a traversal of
the tree. From the tree, assuming two registers are available, an optimal code sequence is

LD RO, c // RO = c
LD Rl, d // Rl = d
DIV Rl, Rl, e // Rl = Rl / e
MUL RO, RO, Rl // RO = RO * Rl
LD Rl, a // Rl = a

DR.VSR 29
SUB Rl, Rl, b // Rl = Rl - b
ADD Rl, Rl, RO // Rl = Rl + RO
Dynamic programming techniques have been used in a number of compilers, including
the second version of the portable C compiler, PCC2 . The technique facilitates retargeting
because of the applicability of the dynamic programming technique to a broad class of
machines.
$$$$$$$$$$$$$$$

DR.VSR 30

You might also like