SPCC - 6
SPCC - 6
● Quadruples:
○ The quadruples have four fields to implement the three address codes.
○ The field of quadruples contains the name of the operator, the first source
operand, the second source operand and the result respectively.
● Triples:
○ The triples have three fields to implement the three address code. The field of
triples contains the name of the operator, the first source operand and the second
source operand.
○ In triples, the results of respective sub-expressions are denoted by the position of
expression. Triple is equivalent to DAG while representing expressions.
● Directed Acyclic Graph:
○ Directed Acyclic Graph (DAG) is a tool that depicts the structure of basic blocks,
helps to see the flow of values flowing among the basic blocks, and offers
optimization too.
○ DAG provides easy transformation on basic blocks.
○ DAG can be understood here: Leaf nodes represent identifiers, names or
constants.Interior nodes represent operators. Interior nodes also represent the
results of expressions or the identifiers/name where the values are to be stored or
assigned.
Code Optimization:
● Optimization is a program transformation technique, which tries to improve the code by
making it consume less resources (i.e. CPU, Memory) and deliver high speed.
● In optimization, high-level general programming constructs are replaced by very efficient
low-level programming codes.
● A code optimizing process must follow the three rules given below:
○ The output code must not, in any way, change the meaning of the program.
○ Optimization should increase the speed of the program and if possible, the
program should demand less resources.
○ Optimization should itself be fast and should not delay the overall compiling
process.
● Efforts for an optimized code can be made at various levels of compiling the process -
○ At the beginning, users can change/rearrange the code or use better algorithms to
write the code.
○ After generating intermediate code, the compiler can modify the intermediate
code by address calculations and improving loops.
○ While producing the target machine code, the compiler can make use of memory
hierarchy and CPU registers.
● The optimized code has the following advantages -
○ Optimized code has faster execution speed.
○ Optimized code utilizes the memory efficiently.
○ Optimized code gives better performance.
● Machine independent optimization - In this optimization, the compiler takes in the
intermediate code and transforms a part of the code that does not involve any CPU
registers and/or absolute memory locations. Types of machine independent
optimization-
○ Loop optimization- Code Movement, frequency reduction, loop unrolling, loop
jamming
○ Constant folding
○ Constant propagation
○ Common subexpression elimination
● Machine dependent Optimization - Machine-dependent optimization is done after the
target code has been generated and when the code is transformed according to the
target machine architecture. It involves CPU registers and may have absolute memory
references rather than relative references. Machine-dependent optimizers put efforts to
take maximum advantage of memory hierarchy. Types of machine dependent
optimization -
○ Peephole Optimization
○ Instruction level parallelism
○ Data level parallelism
○ Cache optimization
○ Redundant resources
Machine Independent:
● Compile Time Evaluation -
○ Constant Folding-
■ It involves folding the constants. The expressions that contain the operands
having constant values at compile time are evaluated. Those expressions
are then replaced with their respective results.
■ For Example - Circumference= (22/7) x Diameter
■ This technique evaluates the expression 22/7 at compile time. The
expression is then replaced with its result 3.14. This saves time during run
time.
○ Constant propagation -
■ If some variable has been assigned some constant value, then it replaces
that variable with its constant value in the further program during
compilation.
■ The condition is that the value of the variable must not get altered.
■ For Example - pi = 3.14, radius = 10; Area of circle = pi x radius x radius
■ This technique substitutes the value of variables ‘pi’ and ‘radius’ at compile
time. It then evaluates the expression 3.14 x 10 x 10. The expression is then
replaced with its result 314. This saves time at run time.
● Common Subexpression Elimination -
○ It involves eliminating the common sub expressions.
○ The redundant expressions are eliminated to avoid their re-computation.
○ The already computed result is used in the further program when required.
● Code Movement -
○ It involves movement of the code.
○ The code present inside the loop is moved out if it does not matter whether it is
present inside or outside.
○ Such a code unnecessarily gets executed again and again with each iteration of the
loop.
○ This leads to the wastage of time at run time.
● Dead Code Elimination -
○ It involves eliminating the dead code.
○ The statements of the code which either never executes or are unreachable or
their output is never used are eliminated.
● Strength Reduction -
○ It involves reducing the strength of expressions.
○ This technique replaces the expensive and costly operators with the simple and
cheaper ones.
Machine Dependent:
● Peephole Optimization -
○ This optimization technique works locally on the source code to transform it into
an optimized code.
○ These methods can be applied on intermediate codes as well as on target codes.
○ A bunch of statements are analyzed and are checked for possible optimization.
○ Redundant instruction elimination -
■ At compilation level, the compiler searches for instructions redundant in
nature.
■ Multiple loading and storing of instructions may carry the same meaning
even if some of them are removed.
○ Unreachable Code -
■ Unreachable code is a part of the program code that is never accessed
because of programming constructs.
■ Programmers may have accidentally written a piece of code that can never
be reached.
■ void add_ten(int x)
{
return x + 10;
printf(“value of x is %d”, x);
}
■ In this code segment, the printf statement will never be executed as the
program control returns back before it can execute, hence printf can be
removed.
○ Flow of control optimization -
■ There are instances in a code where the program control jumps back and
forth without performing any significant task.
■ These jumps can be removed.
○ Algebraic expression simplification -
■ There are occasions where algebraic expressions can be made simple.
■ For example, the expression a = a + 1 can simply be replaced by INC a.
○ Strength reduction -
■ There are operations that consume more time and space.
■ Their ‘strength’ can be reduced by replacing them with other operations
that consume less time and space, but produce the same result.
■ For example, x * 2 can be replaced by x << 1, which involves only one left
shift.
○ Accessing machine instructions -
■ The target machine can deploy more sophisticated instructions, which can
have the capability to perform specific operations much more efficiently.
■ If the target code can accommodate those instructions directly, that will not
only improve the quality of code, but also yield more efficient results.
Code Generation:
● The code generated by the compiler is an object code of some lower-level programming
language, for example, assembly language.
● We have seen that the source code written in a higher-level language is transformed into
a lower level language that results in a lower-level object code, which should have the
following minimum properties -
○ It should carry the exact meaning of the source code.
○ It should be efficient in terms of CPU usage and memory management
● The code generator should take the following things into consideration to generate the
code -
○ Target language - The code generator has to be aware of the nature of the target
language for which the code is to be transformed. That language may facilitate
some machine-specific instructions to help the compiler generate the code in a
more convenient way.
○ IR Type - Intermediate representation has various forms.
○ Selection of instruction - The code generator takes Intermediate Representation as
input and converts (maps) it into the target machine’s instruction set. One
representation can have many ways (instructions) to convert it, so it becomes the
responsibility of the code generator to choose the appropriate instructions wisely.
○ Register allocation - A program has a number of values to be maintained during
the execution. The target machine’s architecture may not allow all of the values to
be kept in the CPU memory or registers. Code generator decides what values to
keep in the registers.
○ Ordering of instructions - At last, the code generator decides the order in which
the instruction will be executed. It creates schedules for instructions to execute
them.
Descriptors:
● The code generator has to track both, the registers (for availability) and addresses
(location of values) while generating the code.
● For both of them, the following two descriptors are used -
○ Register descriptor - Register descriptor is used to inform the code generator
about the availability of registers. Register descriptor keeps track of values stored
in each register. Whenever a new register is required during code generation, this
descriptor is consulted for register availability.
○ Address descriptor - Values of the names (identifiers) used in the program might
be stored at different locations while in execution. Address descriptors are used to
keep track of memory locations where the values of identifiers are stored. These
locations may include CPU registers, heaps, stacks, memory or a combination of
the mentioned locations
● Code generator keeps both the descriptor updated in real-time.
● Basic blocks consist of a sequence of three-address instructions. Code generator takes
these sequences of instructions as input.
● Code generator uses getReg function to determine the status of available registers and
the location of name values.
Steps of code generation:
● Assuming L designates the location, preferably a register, where the result of y OP z will
be stored:
○ Utilize the function getReg to determine the location for L.
○ Ascertain the current location (either register or memory) of y by referencing its
Address Descriptor. If y is not currently in register L, generate the instruction:
MOV y', L where y' represents the value of y.
○ Determine the current location of z using the same method as step 2 for y, and
generate the instruction: OP z', L where z' represents the value of z.
● Now L holds the result of y OP z, intended for assignment to x. If L is a register, update its
descriptor to reflect that it contains the value of x. Update the descriptor of x to indicate
that it resides at location L.
● If y and z are no longer needed, they can be released back to the system.
Basic Blocks:
● Basic block contains a sequence of statements.
● The flow of control enters at the beginning of the statement and leaves at the end
without any halt (except may be the last instruction of the block).
● The characteristics of basic blocks are-
○ They do not contain any kind of jump statements in them.
○ There is no possibility of branching or getting halted in the middle.
○ All the statements execute in the same order they appear.
○ They do not lose the flow control of the program.
● Partitioning code into basic blocks -
○ Rule 1 - Determining leader
■ First statement of the code
■ Statement that is target of GOTO statement
■ Statement directly after GOTO
○ Rule 2 - Determining Basic Block
■ All statements following leader till next leader forms one basic block
Flow Graphs:
● A flow graph is a directed graph with flow control information added to the basic blocks.
● The basic blocks serve as nodes of the flow graph.
● There is a directed edge from block B1 to block B2 if B2 appears immediately after B1 in
the code.