Unit-4
Storage Organization
o When the target program executes then it runs in its own logical address space
in which the value of each program has a location.
o The logical address space is shared among the compiler, operating system and
target machine for management and organization. The operating system is
used to map the logical address into physical address which is usually spread
throughout the memory.
Subdivision of Run-time Memory:
o Runtime storage comes into blocks, where a byte is used to show the smallest
unit of addressable memory. Using the four bytes a machine word can form.
Object of multibyte is stored in consecutive bytes and gives the first byte
address.
o Run-time storage can be subdivide to hold the different components of an
executing program:
1. Generated executable code
2. Static data objects
3. Dynamic data-object- heap
4. Automatic data objects- stack
Activation Record
o Control stack is a run time stack which is used to keep track of the live procedure
activations i.e. it is used to find out the procedures whose execution have not
been completed.
o When it is called (activation begins) then the procedure name will push on to
the stack and when it returns (activation ends) then it will popped.
o Activation record is used to manage the information needed by a single
execution of a procedure.
o An activation record is pushed into the stack when a procedure is called and it
is popped when the control returns to the caller function.
The diagram below shows the contents of activation records:
Return Value: It is used by calling procedure to return a value to calling procedure.
Actual Parameter: It is used by calling procedures to supply parameters to the called
procedures.
Control Link: It points to activation record of the caller.
Access Link: It is used to refer to non-local data held in other activation records.
Saved Machine Status: It holds the information about status of machine before the
procedure is called.
Local Data: It holds the data that is local to the execution of the procedure.
Temporaries: It stores the value that arises in the evaluation of an expression.
Storage Allocation
The different ways to allocate memory are:
1. Static storage allocation
2. Stack storage allocation
3. Heap storage allocation
Static storage allocation
o In static allocation, names are bound to storage locations.
o If memory is created at compile time then the memory will be created in static area
and only once.
o Static allocation supports the dynamic data structure that means memory is created
only at compile time and deallocated after program completion.
o The drawback with static storage allocation is that the size and position of data objects
should be known at compile time.
o Another drawback is restriction of the recursion procedure.
Stack Storage Allocation
o In static storage allocation, storage is organized as a stack.
o An activation record is pushed into the stack when activation begins and it is popped
when the activation end.
o Activation record contains the locals so that they are bound to fresh storage in each
activation record. The value of locals is deleted when the activation ends.
o It works on the basis of last-in-first-out (LIFO) and this allocation supports the recursion
process.
Heap Storage Allocation
o Heap allocation is the most flexible allocation scheme.
o Allocation and deallocation of memory can be done at any time and at any place
depending upon the user's requirement.
o Heap allocation is used to allocate memory to the variables dynamically and when the
variables are no more used then claim it back.
o Heap storage allocation supports the recursion process.
Example:
1. fact (int n)
2. {
3. if (n<=1)
4. return 1;
5. else
6. return (n * fact(n-1));
7. }
8. fact (6)
The dynamic allocation is as follows:
Implementation of Block Structured
Language in compiler design
A block is a statement containing its own local data declaration. The concept of a block is
originated with ALGOL. The block-structured language permits an array with adjustable
length. The main feature of blocks is their bracketing structure (begin and end used in
ALGOL) in which they can define their data.
Activation Record for Block Structured Languages
Block structured language like ALGOL, and PL/I permit adjustable arrays, i.e., of varying
length. Therefore, we cannot store irregular size arrays in between activation records. It can
allocate the flexible or variable arrays at one corner of the activation record or above the
fixed-size data. A pointer to these adjustable arrays will be stored in a fixed position of the
activation record.
The following figure shows an activation record of a procedure having two Adjustable length
arrays.
Consider a block-structured program
procedure P;
real A ;
real Array X [10] ;
In this procedure, we have taken 3 Blocks. Execution of these blocks will lead to storage of
their local data items A, B, C, D, and adjustable array X, Y, Z into the stack. Activation Record
for a particular block will be generated when that Block is currently being executed.
If Block 2 is active or undercurrent execution, then Activation Record of Procedure P will save
Local data and pointer to adjustable arrays of Block 2 at the top.
Display & Static Links
There are two methods to access Non-Local data for a procedure are as follows −
• Static Link− In this method, a pointer called static link is attached with each
procedure which points to the topmost activation record of that procedure that
physically surrounds it in the program. So, Non-local data references for any
procedure can be found out by descending chain of pointers to find all statically
enclosing procedures.
• Display− The display is an array of pointer which are maintained to speed up the
access to non-local data.
Parameter Passing
After the procedure is called, the parameters are passed to the procedure. There are two
parameters
• Actual Parameter
• Formal Parameter
Depending on these parameters, there are various parameter passing methods −
• Call by Value− It is the simplest method for parameter passing. The actual
parameters are computed, and their r-values are passed to the called procedure.
• Call by Reference− When the parameter is passed by reference also known as call by
address or call by location, the caller passes to the caller procedure, a pointer to the
storage address of each actual parameter.
• Call by Name− It is a less popular method of parameter passing. The procedure is
treated like macro. The procedure body is substituted for the call-in caller with actual
parameters substituted for formals.
Returns
When procedure P1calls procedure P2, procedure P2 will return some value to P1.
Following changes will be made.
• The value returned by P2 will be stored in space kept free for it in the activation
record.
• The top pointer will be restored, i.e., again point to the location on which it was earlier
before P1 called P2.
• Restore SP, i.e., set SP pointer to old SP in an activation record of P 1.
• Get return address from P1 activation record to complete return done by P2.
Parameter Passing
The communication medium among procedures is known as
parameter passing. The values of the variables from a calling
procedure are transferred to the called procedure by some
mechanism. Before moving ahead, first go through some basic
terminologies pertaining to the values in a program.
r-value
The value of an expression is called its r-value. The value
contained in a single variable also becomes an r-value if it
appears on the right-hand side of the assignment operator. r-
values can always be assigned to some other variable.
l-value
The location of memory (address) where an expression is stored
is known as the l-value of that expression. It always appears at
the left hand side of an assignment operator.
For example:
day = 1;
week = day * 7;
month = 1;
year = month * 12;
From this example, we understand that constant values like 1, 7,
12, and variables like day, week, month and year, all have r-
values. Only variables have l-values as they also represent the
memory location assigned to them.
For example:
7 = x + y;
is an l-value error, as the constant 7 does not represent any
memory location.
Formal Parameters
Variables that take the information passed by the caller
procedure are called formal parameters. These variables are
declared in the definition of the called function.
Actual Parameters
Variables whose values or addresses are being passed to the
called procedure are called actual parameters. These variables
are specified in the function call as arguments.
Example:
fun_one()
{
int actual_parameter = 10;
call fun_two(int actual_parameter);
}
fun_two(int formal_parameter)
{
print formal_parameter;
}
Formal parameters hold the information of the actual parameter,
depending upon the parameter passing technique used. It may be
a value or an address.
Symbol Table in Compiler
The symbol table is defined as the set of Name and Value pairs.
Symbol Table is an important data structure created and maintained by
the compiler in order to keep track of semantics of variables i.e. it stores
information about the scope and binding information about names,
information about instances of various entities such as variable and
function names, classes, objects, etc.
• It is built-in lexical and syntax analysis phases.
• The information is collected by the analysis phases of the
compiler and is used by the synthesis phases of the compiler to
generate code.
• It is used by the compiler to achieve compile-time efficiency.
• It is used by various phases of the compiler as follows:-
1. Lexical Analysis: Creates new table entries in the table,
for example like entries about tokens.
2. Syntax Analysis: Adds information regarding attribute
type, scope, dimension, line of reference, use, etc in the
table.
3. Semantic Analysis: Uses available information in the
table to check for semantics i.e. to verify that
expressions and assignments are semantically
correct(type checking) and update it accordingly.
4. Intermediate Code generation: Refers symbol table for
knowing how much and what type of run-time is
allocated and table helps in adding temporary variable
information.
5. Code Optimization: Uses information present in the
symbol table for machine-dependent optimization.
6. Target Code generation: Generates code by using
address information of identifier present in the table.
Symbol Table entries – Each entry in the symbol table is associated with
attributes that support the compiler in different phases.
Use of Symbol Table-
The symbol tables are typically used in compilers. Basically compiler is a
program which scans the application program (for instance: your C
program) and produces machine code.
During this scan compiler stores the identifiers of that application
program in the symbol table. These identifiers are stored in the form of
name, value address, type.
Here the name represents the name of identifier, value represents the
value stored in an identifier, the address represents memory location of
that identifier and type represents the data type of identifier.
Thus compiler can keep track of all the identifiers with all the necessary
information.
Items stored in Symbol table:
• Variable names and constants
• Procedure and function names
• Literal constants and strings
• Compiler generated temporaries
• Labels in source languages
Information used by the compiler from Symbol table:
• Data type and name
• Declaring procedures
• Offset in storage
• If structure or record then, a pointer to structure table.
• For parameters, whether parameter passing by value or by
reference
• Number and type of arguments passed to function
• Base Address
Operations of Symbol table – The basic operations defined on a symbol
table include:
Operations on Symbol Table :
Following operations can be performed on symbol table-
1. Insertion of an item in the symbol table.
2. Deletion of any item from the symbol table.
3. Searching of desired item from symbol table.
Implementation of Symbol table –
Following are commonly used data structures for implementing symbol
table:-
1. List –
we use a single array or equivalently several arrays, to store names
and their associated information ,New names are added to the list in the
order in which they are encountered . The position of the end of the array
is marked by the pointer available, pointing to where the next symbol-
table entry will go. The search for a name proceeds backwards from the
end of the array to the beginning. when the name is located the
associated information can be found in the words following next.
id1 info1 id2 info2 …….. id_n info_n
• In this method, an array is used to store names and associated
information.
• A pointer “available” is maintained at end of all stored records
and new names are added in the order as they arrive
• To search for a name we start from the beginning of the list till
available pointer and if not found we get an error “use of the
undeclared name”
• While inserting a new name we must ensure that it is not
already present otherwise an error occurs i.e. “Multiple defined
names”
• Insertion is fast O(1), but lookup is slow for large tables – O(n)
on average
• The advantage is that it takes a minimum amount of space.
1. Linked List –
• This implementation is using a linked list. A link field is
added to each record.
• Searching of names is done in order pointed by the link
of the link field.
• A pointer “First” is maintained to point to the first
record of the symbol table.
• Insertion is fast O(1), but lookup is slow for large tables
– O(n) on average
2. Hash Table –
• In hashing scheme, two tables are maintained – a hash
table and symbol table and are the most commonly
used method to implement symbol tables.
• A hash table is an array with an index range: 0 to table
size – 1. These entries are pointers pointing to the
names of the symbol table.
• To search for a name we use a hash function that will
result in an integer between 0 to table size – 1.
• Insertion and lookup can be made very fast – O(1).
• The advantage is quick to search is possible and the
disadvantage is that hashing is complicated to
implement.
3. Binary Search Tree –
• Another approach to implementing a symbol table is to
use a binary search tree i.e. we add two link fields i.e.
left and right child.
• All names are created as child of the root node that
always follows the property of the binary search tree.
• Insertion and lookup are O(log2 n) on average.
Advantages of Symbol Table
1. The efficiency of a program can be increased by using symbol
tables, which give quick and simple access to crucial data such
as variable and function names, data kinds, and memory
locations.
2. better coding structure Symbol tables can be used to organize
and simplify code, making it simpler to comprehend, discover,
and correct problems.
3. Faster code execution: By offering quick access to information
like memory addresses, symbol tables can be utilized to
optimize code execution by lowering the number of memory
accesses required during execution.
4. Symbol tables can be used to increase the portability of code by
offering a standardized method of storing and retrieving data,
which can make it simpler to migrate code between other
systems or programming languages.
5. Improved code reuse: By offering a standardized method of
storing and accessing information, symbol tables can be utilized
to increase the reuse of code across multiple projects.
6. Symbol tables can be used to facilitate easy access to and
examination of a program’s state during execution, enhancing
debugging by making it simpler to identify and correct mistakes.
Disadvantages of Symbol Table
1. Increased memory consumption: Systems with low memory
resources may suffer from symbol tables’ high memory
requirements.
2. Increased processing time: The creation and processing of
symbol tables can take a long time, which can be problematic in
systems with constrained processing power.
3. Complexity: Developers who are not familiar with compiler
design may find symbol tables difficult to construct and
maintain.
4. Limited scalability: Symbol tables may not be appropriate for
large-scale projects or applications that require o the
management of enormous amounts of data due to their limited
scalability.
5. Upkeep: Maintaining and updating symbol tables on a regular
basis can be time- and resource-consuming.
6. Limited functionality: It’s possible that symbol tables don’t offer
all the features a developer needs, and therefore more tools or
libraries will be needed to round out their capabilities.
Applications of Symbol Table
1. Resolution of variable and function names: Symbol tables are
used to identify the data types and memory locations of
variables and functions as well as to resolve their names.
2. Resolution of scope issues: To resolve naming conflicts and
ascertain the range of variables and functions, symbol tables are
utilized.
3. Symbol tables, which offer quick access to information such as
memory locations, are used to optimize code execution.
4. Code generation: By giving details like memory locations and
data kinds, symbol tables are utilized to create machine code
from source code.
5. Error checking and code debugging: By supplying details about
the status of a program during execution, symbol tables are
used to check for faults and debug code.
6. Code organization and documentation: By supplying details
about a program’s structure, symbol tables can be used to
organize code and make it simpler to understand.