CD Unit 4
CD Unit 4
Code Area: The size of generated code is fixed. The code occupies the static
determined area of the memory. Compiler stores the code in the lower address of
the block.
Data objects: Space required by the data objects is said to be compile time and the
data objects are placed at the statically determined area of the memory.
Stack is used to manage the active procedures that is when a call occurs, the normal
execution of activation is interrupted and information about the status of the stack
is saved on the stack. When control comes back from the call, the suspended
activation is resumed. The Program counter points the instruction to be executed.
The information is stored in the stack area of run time storage. It grows towards the
high address in the free space.
Heap are is the area of the run time storage in which the other information is stored.
Heap grows towards the low address in the free space.
DR.VSR 1
Storage Allocation
1. Static Storage Allocation
For any program if we create memory at compile time, memory will be created in the
static area.
For any program if we create memory at compile time only, memory is created only
once.
It doesn’t support dynamic data structure i.e memory is created at compile time and
deallocated after program completion.
The drawback with static storage allocation is recursion is not supported.
Another drawback is size of data should be known at compile time
Example: i) int arr [200]; size can’t be reduced during compile time
ii) Int arr [ 10]; size can’t be increased during the compile time.
FORTRAN was designed to permit static storage allocation.
2. Stack Storage Allocation
Storage is organised as a stack and activation records are pushed and popped
as activation begins and ends respectively. Locals are contained in activation
records so they are bound to fresh storage in each activation.
Recursion is supported in stack allocation.
3. Heap Storage Allocation
Memory allocation and deallocation can be done at any time and at any place depending
on the requirement of the user.
Heap allocation is used to dynamically allocate memory to the variables and deallocates
back when the variables are no more required.
Recursion is supported.
Comparison among the Static, Stack and Heap Allocations
Static Allocation Stack Allocation Heap Allocation
1 Static allocation is done Stack is used to manage Heap is used to manage
for all The data objects run Time storage dynamic memory allocation
at compile time
2 Since static is its Data structures and Data structures and data
behaviour, it cannot data objects can be objects can be created
create dynamic data created dynamically dynamically
structures
3 The names of data Stack follows LIFO A contiguous block of
objects are bound to approach, activation memory from heap is
the storage at compile records and data objects allocated for activation
time. are pushed on to the record or data object. It
stack. maintains a linked list for
free blocks.
4 i) This allocation i) Supports dynamic i)Since maintaining linked
strategy is simple to allocation but slower list stratety, deallocated
implement but than static execution. memory can be reused.
supports static ii)supports recursion ii) Recursion is supported
allocation. but references to non- iii)But since memory block
ii) since static in local variables after is allocated using the best
nature, it does not activation record can fit, holes may occur.
support recursion. not be retained.
DR.VSR 2
Activation Record
An activation record is another name for Stack Frame. It's the data structure that
composes a call stack. The content of the activation record is shown below:
The activation record is a block of memory used for managing the information needed by
a single execution of a procedure. Various fields f activation record are:
Temporary variables
Local variables
Saved machine registers
Control link
Access link
Actual parameters
Return values
DR.VSR 3
Static Scope or Lexical Scope
Lexical scope is also called static scope. In this type of scope, the scope is verified
by examining the text of the program.
Examples: PASCAL, C and ADA are the languages that use the static scope rule.
These languages are also called block structured languages
Block
A block defines a new scope with a sequence of statements that contains the local
data declarations. It is enclosed within the delimiters.
Example
{
statements
……….
}
The beginning and end of the block are specified by the delimiter. The blocks can be
in nesting fashion that means block B2 can be inside the block B1.
In a block structured language, scope declaration is given by static rule or most
closely nested loop.
At a program point, declarations are visible.
The declarations that are made inside the procedure.
The names of all enclosing procedures.
The declarations of names made immediately within such procedures.
The displayed image on the screen shows the storage for the names corresponding
to particular block.
Thus, block structure storage allocation can be done by stack.
DR.VSR 4
Lexical Scope for Nested Procedure
If a procedure is declared inside of another procedure then that procedure is known
as nested procedure
A procedure pi, can call any procedure, i.e., its direct ancestor or older siblings of
its direct ancestor
Procedure main
Procedure P1
Procedure P2
Procedure P3
Procedure P4
Nesting Depth
Lexical scope can be implemented by using nesting depth of a procedure. The procedure
of calculating nesting depth is as follows:
The main programs nesting depth is ‘1’
When a new procedure begins, add ‘1’ to nesting depth each time
When you exit from a nested procedure, subtract ‘1’ from depth each time
The variable declared in specific procedure is associated with nesting depth
Static Scope or Lexical Scope
The lexical scope can be implemented using access link and displays.
Access Link
Access links are the pointers used in the implementation of lexical scope which is
obtained by using pointer to each activation record
If procedure p is nested within a procedure q then access link of p points to access
link or most recent activation record of procedure q
Example: Consider the following piece of code and the runtime stack during execution of
the program
program test;
var a: int;
procedure A;
var d: int;
{
a := 1,
DR.VSR 5
}
procedure B(i: int);
var b : int;
procedure C;
var k : int;
{
A;
}
{
if(i<>0) then B(i-1)
else C;
}
{
B(1);
}
Displays
If access links are used in the search, then the search can be slow
So, optimization is used to access an activation record from the direct location of
the variable without any search
The variable declared in specific procedure is associated with nesting depth
DR.VSR 6
When a procedure is called, a procedure ‘p’ at nesting depth ‘i’ is setup:
Save value of d[i] in activation record for ‘p’
‘I’ set d[i] to point to new activation record
When a ‘p’ returns:
Reset d[i] to display value stored
The display can be maintained
Registers
In statically allocated memory (data segment)
Store display or control stack and create a new copy on each entry
Symbol Table is an important data structure created and maintained by the compiler
in order to keep track of semantics of variable i.e. it stores information about scope and
binding information about names, information about instances of various entities such as
variable and function names, classes, objects, etc.
It is built in lexical and syntax analysis phases.
The information is collected by the analysis phases of compiler and is used by
synthesis phases of compiler to generate code.
It is used by compiler to achieve compile time efficiency.
It is used by various phases of compiler as follows :-
1. Lexical Analysis: Creates new table entries in the table, example like entries
about token.
2. Syntax Analysis: Adds information regarding attribute type, scope, dimension,
line of reference, use, etc in the table.
3. Semantic Analysis: Uses available information in the table to check for
semantics i.e. to verify that expressions and assignments are semantically
correct (type checking) and update it accordingly.
4. Intermediate Code generation: Refers symbol table for knowing how much and
what type of run-time is allocated and table helps in adding temporary variable
information.
5. Code Optimization: Uses information present in symbol table for machine
dependent optimization.
6. Target Code generation: Generates code by using address information of
identifier present in the table.
Symbol Table entries – Each entry in symbol table is associated with attributes
that support compiler in different phases.
Items stored in Symbol table:
Variable names and constants
Procedure and function names
Literal constants and strings
Compiler generated temporaries
Labels in source languages
Information used by compiler from Symbol table:
Data type and name
Declaring procedures
Offset in storage
If structure or record then, pointer to structure table.
For parameters, whether parameter passing by value or by reference
Number and type of arguments passed to function
Base Address
Operations of Symbol table – The basic operations defined on a symbol table include
DR.VSR 7
Operation Function
allocate to allocate a new empty symbol table
Free to remove all entries and free storage of symbol table
lookup to search for a name and return pointer to its entry
Insert to insert a name in a symbol table and return a pointer to its entry
set _attribute to associate an attribute with a given entry
get_attribute to get an attribute associated with a given entry
DR.VSR 8
Heap Management
The heap is the portion of the store that is used for data that lives indefinitely, or until the
program explicitly deletes it. While local variables become inaccessible when their
procedures end, many languages enable to create objects or other data whose existence is
not tied to the procedure activation that creates them. For example, both C + + and Java
give the programmer new to create objects that may be passed — or pointers to them may
be passed — from procedure to procedure, so they continue to exist long after the
procedure that created them is gone. Such objects are stored on a heap.
1. The Memory Manager
The memory manager keeps track of all the free space in heap storage at all times. It
performs two basic functions:
• Allocation: When a program requests memory for a variable or object, the memory
manager produces a piece of contiguous heap memory of the requested size. If possible, it
satisfies an allocation request using free space in the heap, if no chunk of the needed size
is available, it seeks to increase the heap storage space by getting consecutive bytes of
virtual memory from the operating system. If space is exhausted, the memory manager
passes that information back to the application program.
• Deallocation: The memory manager returns deallocated space to the pool of free space,
so it can reuse the space to satisfy other allocation requests. Memory managers do not
return memory to the operating system, even if the program's heap usage drops.
DR.VSR 9
• Program Efficiency: A memory manager should make good use of the memory subsystem
to allow programs to run faster. As we shall see in Section 7.4.2, the time taken to execute an
instruction can vary widely depending on where objects are placed in memory. Fortunately,
programs tend to exhibit "locality," a phenomenon discussed in Section 7.4.3, which refers to
the non-random clustered way in which typical programs access memory. By attention to the
placement of objects in memory, the memory manager can make better use of space and,
hopefully, make the program run faster.
• Low Overhead. Because memory allocations and deallocations are frequent operations in
many programs, it is important that these operations be as efficient as possible. That is, we
wish to minimize the overhead — the fraction of execution time spent performing allocation
and deallocation. Notice that the cost of allocations is dominated by small requests; the
overhead of managing large objects is less important, because it usually can be amortized over
a larger amount of computation.
Garbage Collection
Garbage collection is the process of finding spaces within the heap that are no longer
used by the program and can be reallocated to other data items.
The garbage collector finds these unreachable objects and reclaims their space by
handing them to the memory manager, which keeps track of the free space.
For languages like Java, it is the garbage collector that deallocates memory.
Performance Metrics
Garbage collection is often so expensive that, although it was invented decades ago
and absolutely prevents memory leaks, it has yet to be adopted by many mainstream
programming languages. Many different approaches have been proposed over the
years, and there is not one clearly best garbage-collection algorithm. Before
exploring the options, let us first enumerate the performance metrics that must be
considered when designing a garbage collector.
Overall Execution Time: Garbage collection is very slow. It is important that it does
not increases the total run time of an application. Since the garbage collector
necessarily must touch a lot of data, its performance is determined greatly by how
it leverages the memory subsystem.
DR.VSR 10
Space Usage. It is important that garbage collection avoid fragmentation and makes
the best use of the available memory.
Pause Time. Simple garbage collectors are notorious for causing program to pause
suddenly for an extremely long time, as garbage collection ignores them without
warning. Thus, besides minimizing the overall execution time, it is desirable that
the maximum pause time be minimized.
Program Locality. We cannot evaluate the speed of a garbage collector solely by its
running time. The garbage collector controls the placement of data and influences
the data locality of the mutator program. It can improve a mutator's temporal
locality by freeing up space and reusing it; it can improve the mutator's spatial
locality by relocating data used together in the same cache or pages.
2. Reachability
All the data that can be accessed directly by a program, without having to
dereference any pointer, as the root set. A program can reach any member of its root
set at any time. Recursively, any object with a reference that is stored in the field
members or array elements of any reachable object is reachable.
Reachability becomes a bit complex when the program has been optimized by the
compiler. First, a compiler may keep reference variables in registers. Second, even
though in a type-safe language programmer do not get to manipulate memory
addresses directly, a compiler often does so for the sake of speeding up the code. To
optimize compiler to enable the garbage collector to find the correct root set.
The compiler can restrict the invocation of garbage collection to certain code points
in the program, when no hidden references exist.
The compiler can write information that the garbage collector can use to recover all
the references, such as specifying which registers contain references, or how to
compute the base address of an object that is given an internal address.
DR.VSR 11
Reference Assignments. For statement u = v, where u and v are references, the
reference count of the object referred to by v goes up by one, and the count for the
old object referred to by u goes down by one.
Procedure Returns. As a procedure exits, all the references held by the local variables
of that procedure activation record must also be decremented. If several local
variables hold references to the same object, that object's count must be
decremented once for each such reference.
Transitive Loss of Reachability. Whenever the reference count of an object becomes
zero, decrement the count of each object pointed to by a reference within the object.
Tree-Based Collection
Instead of collecting garbage, trace-based collectors run periodically to find unreachable
objects and reclaim their space. To run the trace-based collector, the free space is
exhausted or its amount drops below some threshold.
It contains a number of improvements, including in which object relocation is a part of
the garbage-collection function.
1. A Basic Mark-and-Sweep Collector
Mark-and-sweep garbage-collection algorithms are straightforward, stop-the-world
algorithms that find all the unreachable objects and put them on the list of free space.
The below algorithm visits and "marks" all the reachable objects in the first tracing step
and then "sweeps" the entire heap to free up unreachable objects.
INPUT: A root set of objects, a heap, and a free list, called Free, with all the unallocated
chunks of the heap.
OUTPUT: A modified Free list after all the garbage has been removed.
DR.VSR 12
2. Basic Abstraction
All trace-based algorithms compute the set of reachable objects and then take the
complement of this set. Memory is recycled as follows:
The program runs and makes allocation requests.
The garbage collector discovers reachability by tracing.
The garbage collector reclaims the storage for unreachable objects.
DR.VSR 13
3. Optimizing Mark-and-Sweep
The final step in the basic mark-and-sweep algorithm is expensive because there is no
easy way to find only the unreachable objects without examining the entire heap. An
improved algorithm of Baker keeps a list of all allocated objects. To find the set of
unreachable objects, that returns to free space.
Lines (1) and (2) initialize Scanned to be the empty list and Unscanned to have only the
objects reached from the root set. These objects were presumably on the
list Unreached and must be removed from there.
Lines (3) through (7) are a straightforward implementation of the basic marking algorithm,
using these lists.
The for-loop of lines (5) through (7) examines the references in one unscanned
object o, and if any of those references o' have not yet been reached and line (7) changes
o' to the Unscanned state.
At the end, line (8) takes those objects that are still on the Unreached list and deallocates
their chunks, by moving them to the Free list.
Line (9) takes all the objects in state Scanned, which are the reachable objects, and
reinitializes the Unreached list to be exactly those objects. Presumably, as the memory
manager creates new objects, those too will be added to the Unreached list and removed
from the Free list.
DR.VSR 14
to the new locations does not require much more work. These, plus the references in the
root set, are all the references we need to change.
The mark-and-compact collector in Algorithm 7.15 has three phase
First is a marking phase, similar to that of the mark-and-sweep algorithms.
Second, the algorithm scans the allocated section of the heap and computes a new
address for each of the reachable objects. New addresses are assigned from the low
end of the heap, so there are no holes between reach-able objects. The new address
for each object is recorded in a structure called New Location.
Finally, the algorithm copies objects to their new locations, updating all references
in the objects to point to the corresponding new locations. The needed addresses
are found in New Location.
DR.VSR 15
5. Copying collectors
A copying collector reserves, ahead of time, space to which the objects can move, thus
breaking the dependency between tracing and finding free space.
The memory space is partitioned into two semispaces, A and B. The mutator allocates
memory in one semispace, A, until it fills up, at which point the mutator is stopped and
the garbage collector copies the reachable objects to the other space, B. When garbage
collection completes, the roles of the semispaces are reversed. The mutator is allowed to
resume and allocate objects in space B and the next round of garbage collection moves
reachable objects to space A.
Line (2) establishes that none of the objects in the From space have new addresses yet.
At line (3), two pointers are initialized, unscanned and free, to the beginning of
the To semispace. Pointer free will always indicate the beginning of free space within
the To space and add objects to the To space.
Lines (4) and (5) handle the objects reachable from the root set.
Note that as a side effect, some of the calls to LookupNewLocation at line (5) will increase
free, as chunks for these objects are allocated within To.
The loop of lines (6) through (10) will be entered the first time it is reached, unless there
are no objects referenced by the root set, in which case the entire heap is garbage. This
DR.VSR 16
loop then scans each of the objects that has been added to To and is in the Unscanned
state.
Line (7) takes the next unscanned object, o.
At lines (8) and (9), each reference within o is translated from its value in the from
semispace to its value in the Tosemispace. Notice that, as a side effect, if a reference within
o is to an object that has not reached previously.
Then the call to LookupNewLocation at line (9) creates space for that object in the To space
and moves the object there.
Finally, line (10) increments unscanned to point to the next object, just beyond o in the
To space.
DR.VSR 17
t3 = t1 +t2
d = t3 + t2
2.Target program The target program is the output of the code generator. The output may
be absolute machine language, relocatable machine language, assembly language.
Absolute machine language as output has advantages that it can be placed in a fixed
memory location and can be immediately executed.
Relocatable machine language as an output allows subprograms and subroutines to
be compiled separately. Relocatable object modules can be linked together and loaded
by linking loader. But there is added expense of linking and loading.
Assembly language as output makes the code generation easier. We can generate
symbolic instructions and use macro-facilities of assembler in generating code. And
we need an additional assembly step after code generation.
3.Memory Management
Mapping the names in the source program to the addresses of data objects is done
by the front end and the code generator.
A name in the three address statements refers to the symbol table entry for name.
Then from the symbol table entry, a relative address can be determined for the
name.
4.Instruction selection
Selecting the best instructions improves the efficiency of the program. It includes
the instructions that should be complete and uniform.
Instruction speeds and machine idioms also plays a major role when efficiency is
considered. But if we do not care about the efficiency of the target program then
instruction selection is straight-forward.
Eg: The respective three-address statements are translated into the latter code
sequence as shown below:
DR.VSR 18
P=Q+R
S+P+T
MOV Q, R0
ADD R, R0
MOV R0, P
MOV P, R0
ADD T, R0
MOV R0, S
The fourth statement is redundant as the value of the P is loaded again in that statement
that just has been stored in the previous statement. It leads to an inefficient code
sequence. A given intermediate representation can be translated into many code
sequences, with significant cost differences between the different implementations. A prior
knowledge of instruction cost is needed in order to design good sequences, but accurate
cost information is difficult to predict.
5.Register allocation issues Use of registers make the computations faster in comparison
to that of memory, So efficient utilization of registers is important. The use of registers is
subdivided into two subproblems:
i) During Register allocation – selection set of variables that are residing in the
registers at each point in the program.
ii) During a subsequent Register assignment phase, the specific register is picked to
access the variable.
As the number of variables increases, the optimal assignment of registers to variables
becomes difficult. Mathematically, this problem becomes NP-complete. Certain machine
requires register pairs consist of an even and next odd-numbered register.
For example M a, b
These types of multiplicative instruction involve register pairs where the multiplicand is
an even register and b, the multiplier is the odd register of the even/odd register pair.
6.Evaluation order
The code generator decides the order in which the instruction is to be executed. The order
of computations affects the efficiency of the target code. Among many computational
orders, some require only fewer registers to hold the intermediate results. However,
picking the best order in the general case is a difficult NP-complete problem.
Target Language
Familiarity with the target machine and its instruction set is needed for designing a code
generator.
Code generator techniques:
1. A Simple Target Machine Model
2. Program and Instruction Costs
A Simple Target Machine Model
Target computer models a three-address machine with load and store operations,
computation operations, jump operations, and conditional jumps.
DR.VSR 19
Load operations: The instruction LD dst, addr loads the value in location addr into
location dst This instruction denotes the assignment dst = addr. The most common
form of this instruction is LD r, x which loads the value in location x into register
r. An instruction of the form LD r1, r2 is a register-to-
register copy in which the contents of register r2 are copied into register r1.
Store operations: The instruction ST x, r stores the value in register r into the
location x. This instruction denotes the assignment x = r.
DR.VSR 20
The instruction LD R l, *100(R2) loads into register Rl the value given by contents
(contents (100 + contents (K2))). The cost is three because the constant 100 is
stored in the word following the instruction.
Machine-dependent Optimization
Machine-dependent optimization is done after the target code has been generated and
when the code is transformed according to the target machine architecture. It involves
CPU registers and may have absolute memory references rather than relative references.
Machine-dependent optimizers put efforts to take maximum advantage of memory
hierarchy.
Basic Blocks
Source codes generally have a number of instructions, which are always executed
in sequence and are considered as the basic blocks of the code.
These basic blocks do not have any jump statements among them, i.e., when the
first instruction is executed, all the instructions in the same basic block will be
executed in their sequence of appearance without losing the flow control of the
program.
A program can have various constructs as basic blocks, like IF-THEN-ELSE, SWITCH-
CASE conditional statements and loops such as DO-WHILE, FOR, and REPEAT-UNTIL,
etc.
Basic block identification
The following algorithm to find the basic blocks in a program:
Search header statements of all the basic blocks from where a basic block starts:
o First statement of a program.
o Statements that are target of any branch (conditional/unconditional).
o Statements that follow any branch statement.
Header statements and the statements following them form a basic block.
A basic block does not include any header statement of any other basic block.
Basic blocks are important concepts from both code generation and optimization point of
view.
DR.VSR 21
Basic blocks play an important role in identifying variables, which are being used more
than once in a single basic block. If any variable is being used more than once, the register
memory allocated to that variable need not be emptied unless the block finishes execution.
Control Flow Graph
Basic blocks in a program can be represented by means of control flow graphs.
A control flow graph depicts how the program control is being passed among the
blocks.
It is a useful tool that helps in optimization by help locating any unwanted loops in
the program.
Structure-Preserving Transformations
i). Common sub-expression elimination
ii). Dead code elimination
iii). Renaming of temporary variables
iv). Interchange of two independent adjacent statements.
Example
a =b + c
b =a - d
c =b + c
d =a - d
The 2nd and 4th statements compute the same expression: b + c and a - d
DR.VSR 22
Basic block can be optimized to
a=b+c
b=a-d
c=a //since a value is not changed
d=b // b value is already computed.
ii). Dead code elimination
It is possible that a large amount of dead / useless code may exist in the program. This
may be caused when introducing variables and procedures as part of construction or
error-correction of a program - once declared and defined, one forgets to remove them in
case they serve no purpose. Eliminating these will definitely optimize the code.
Example
a=0
if ( a == 1)
a = a + 5;
in the above example, a is assigned 0 value. It is meaning less the check the condition if (
a == 1) since it is assigned 0 value. So to optimize the code if statement as well as its
statement can be eliminated.
So optimized code is a = 0;
Example
t1 = b + c
t2 = a - t1
t1 = t1 * d
d = t2 + t1
since t1 value is modified in the third statement. So to store the value t3 is used instead
t1. So, the optimized block is
t1 = b + c
t2 = a - t1
t3 = t1 * d
d = t2 + t3
DR.VSR 23
v). Algebraic Transformations:
Algebraic expressions represent another important class of optimizations on basic blocks.
This includes simplifying expressions or replacing expensive operation by cheaper ones
i.e. reduction in strength. Another class of related optimizations is constant folding. Here
we evaluate constant expressions at compile time and replace the constant expressions
by their values. Thus, the expression 2 * 3.14 would be replaced by 6.28.
The relational operators <=, >=, <, >, + and = sometimes generate unexpected common
sub expressions. Associative laws may also be applied to expose common sub expressions.
For example, if the source code has the assignments
Example
t1= a -a
t2 = b – t1
t3 = t1 * 2
in the first statement a – a is 0 so directly initialize t1 = 0
in the second statement t1 = 0 so no necessity makes subtraction operation. Directly t2 =
b can be assigned.
in the third statement t3 can be assigned b * 2
so the optimized code is
t3 = b * 2 is sufficient.
Peephole Optimization Techniques
Peephole optimization is a type of Code Optimization performed on a small part of the
code. It is performed on the very small set of instructions in a segment of code. The small
set of instructions or small part of code on which peephole optimization is performed is
known as peephole or window.
DR.VSR 24
ii). Unreachable Code
An unlabelled instruction immediately following an unconditional jump may be removed.
This operation can be repeated to eliminate a sequence of instructions. For example, for
debugging purposes, a large program may have within it certain segments that are
executed only if a variable debug is 1. In C, the source code might look like:
Example 1
Unoptimized code Optimized code
i=0
if( i == 1)
{ i = 0;
Sum = 0;
}
Example 2
int add(int a, int b)
{
int c = a + b;
return c; // After return statement, printf is given which never executes.
printf (“%d”, c);
}
iii). Flow of Control Statements Using Flow of Control Optimization, unnecessary
optimization jumps can be eliminated.
Example Inefficient Code
.
.
goto L1
.
.
L1: goto L2
.
.
L2: goto L3
.
.
L3: MOV a, R0 // instead of all this goto L3 at the begin is sufficient.
Algebraic Simplification
There is no end to the amount of algebraic simplification that can be attempted through
peephole optimization. Only few algebraic identities occur that it is worth considering
implementing them. For example, statements such as
x = x + 0 or
x=x*1
are often produced by straightforward intermediate code-generation algorithms and they
can be eliminated easily through peephole optimization.
Use of Idioms
The target machine may have hardware instructions to implement certain specific
operations efficiently. For example, some machines have auto-increment and auto-
DR.VSR 25
decrement addressing modes. These add or subtract one from an operand before or after
using its value. The use of these modes greatly improves the quality of code when pushing
or popping a stack, as in parameter passing. These modes can also be used in code for
statements like i = i + 1.
i=i+1 → i ++
i=i-1 → i--
DR.VSR 26
To evaluate the above equation for x = a, is live on exit from B1 and is assigned a value
but is not live on exit from B2, B3, or B4. Thus, ∑B in L use (a. B) = 2. Hence the value of
for x = a is 4. That is, four units of cost can be saved by selecting a for one of the global
registers. The values of for b, c, d, e and f are 5, 3, 6, 4, and 4, respectively. Thus, we may
select a, b, and d for registers RO, Rl, and R2, respectively. Using RO for e or f instead of
a would be another choice with the same benefit.
The assembly code generated is used to generate code for each block. We do not show the
generated code for the omitted conditional or unconditional jumps that end each and
therefore do not show the generated code as a single stream.
DR.VSR 27
Dynamic Programming Code-Generation
The dynamic programming algorithm can be used to generate code for any machine
with r interchangeable registers R0, R1, ...., Rr—1 and load, store and add instructions.
For simplicity, every instruction cost one unit, although the dynamic programming
algorithm can easily be modified to work even if each instruction has its own cost.
The Dynamic Programming Algorithm proceeds in three phases
1. Compute bottom-up for each node n of the expression tree T an array C of costs, in
which the ith component C[i] is the optimal cost of computing the subtree S rooted
at n into a register, assuming i registers are available for the computation, for 1 < i
< r.
2. Traverse T, using the cost vectors to determine which subtrees of T must be
computed into memory.
3. Traverse each tree using the cost vectors and associated instructions to generate
the final target code. The code for the subtrees computed into memory locations is
generated first.
Example: Consider a machine having two registers RO and Rl, and the following
instructions, each of unit cost:
LD Ri, Mj // Ri = Mj
op Ri, Ri, Rj // Ri = Ri Op R j
op Ri, Ri, Mj // Ri = Ri Op Mj
LD Ri, Rj // Ri = Rj
ST Mi, Rj // Mi = Rj
In these instructions, Ri is either RO or Rl and Mj is a memory location. The operator op
corresponds to an arithmetic operator.
By applying the dynamic programming algorithm to generate optimal code for the syntax
tree in the given diagram.
In the first phase, computing the cost vectors shown at each node. To illustrate this
cost computation, consider the cost vector at the leaf a. C[0], the cost of computing
a into memory, is 0 since it is already there.
C[l], the cost of computing a into a register, is 1 since we can load it into a register
with the instruction LD RO, a. C[2], the cost of loading a into a register with two
registers available, is the same as that with one register available.
The cost vector at leaf a is therefore (0,1,1).
DR.VSR 28
Consider the cost vector at the root. First determining the minimum cost of
computing the root with one and two registers available.
The machine instruction ADD RO, RO, M matches the root, because the root is
labelled with the operator +.
Using this instruction, the minimum cost of evaluating the root with one register
available is the minimum cost of computing its right subtree into memory, plus the
minimum cost of computing its left subtree into the register, plus 1 for the
instruction.
No other way exists. The cost vectors at the right and left children of the root show
that the minimum cost of computing the root with one register available is 5 + 2 +
1 = 8.
Now consider the minimum cost of evaluating the root with two registers available. Three
cases arise depending on the instruction is used to compute the root and in what order
the left and right subtrees of the root are evaluated.
1. Compute the left subtree with two registers available into register RO,
compute the right subtree with one register available into register Rl and use
the instruction ADD RO, RO, Rl to compute the root. This sequence has cost
2 + 5 + 1 = 8.
2. Compute the right subtree with two registers available into Rl, compute the
left subtree with one register available into RO and use the instruction ADD
RO, RO, Rl. This sequence has cost 4 + 2 + 1 = 7.
3. Compute the right subtree into memory location M, compute the left sub-tree
with two registers available into register RO and use the instruction ADD RO,
RO, M. This sequence has cost 5 + 2 + 1 = 8.
The second choice gives the minimum cost 7.
From the cost vectors we can easily construct the code sequence by making a traversal of
the tree. From the tree, assuming two registers are available, an optimal code sequence is
LD RO, c // RO = c
LD Rl, d // Rl = d
DIV Rl, Rl, e // Rl = Rl / e
MUL RO, RO, Rl // RO = RO * Rl
LD Rl, a // Rl = a
DR.VSR 29
SUB Rl, Rl, b // Rl = Rl - b
ADD Rl, Rl, RO // Rl = Rl + RO
Dynamic programming techniques have been used in a number of compilers, including
the second version of the portable C compiler, PCC2 . The technique facilitates retargeting
because of the applicability of the dynamic programming technique to a broad class of
machines.
$$$$$$$$$$$$$$$
DR.VSR 30