Data Structure: Edusat Learning Resource Material
Data Structure: Edusat Learning Resource Material
ON
DATA STRUCTURE
(For 3rd Semester CSE & IT)
Contributors :
1.
2.
3.
4.
5.
6.
7.
Page 1
Objective :
The effectiveness of implementation of any application in computer mainly
depends on the that how effectively its information can be stored in the computer.
For this purpose various -structures are used. This paper will expose the
students to various fundamentals structures arrays, stacks, queues, trees etc. It
will also expose the students to some fundamental, I/0 manipulation techniques
like sorting, searching etc
1.0
1.1
1.2
1.3
1.4
1.5
INTRODUCTION:
Explain Data, Information, data types
Define data structure & Explain different operations
Explain Abstract data types
Discuss Algorithm & its complexity
Explain Time, space tradeoff
2.0
2.1
2.2
2.3
3.0
3.1
3.2
3.3
3.4
3.5
STRING PROCESSING
03
Explain Basic Terminology, Storing Strings
State Character Data Type,
Discuss String Operations
ARRAYS
07
Give Introduction about array,
Discuss Linear arrays, representation of linear array In memory
Explain traversing linear arrays, inserting & deleting elements
Discuss multidimensional arrays, representation of two dimensional arrays
in memory (row major order & column major order), and pointers
Explain sparse matrices.
4.0
4.1
4.2
4.3
4.4
4.5
08
5.0
5.1
5.2
LINKED LIST
Give Introduction about linked list
Explain representation of linked list in memory
08
04
Page 2
5.3
5.4
5.5
6.0
6.1
6.2
6.3
TREE
08
Explain Basic terminology of Tree
Discuss Binary tree, its representation and traversal, binary search tree,
searching,
Explain insertion & deletion in a binary search trees
7.0
7.1
7.2
GRAPHS
Explain graph terminology & its representation,
Explain Adjacency Matrix, Path Matrix
8.0
SORTING SEARCHING & MERGING
8.1 Discuss Algorithms for Bubble sort, Quick sort,
8.2
Merging
8.3
Linear searching, Binary searching.
9.0
9.1
9.2
06
08
FILE ORGANIZATION
08
Discuss Different types of files organization and their access method,
Introduction to Hashing, Hash function, collision resolution, open
addressing..
Books
1. Data Structure by S. Lipschutz - (Schaum Series)
2. Introduction to Data Structure in C by :A.N.Kamthane; Pearson Education
3. Data Strcture using C by Reema Thereja, Oxford University Press
Page 3
INTRODUCTION
Data
Data is a set of values of qualitative or quantitative variables. Data in computing
(or data processing) is represented in a structure that is often tabular
(represented by rows and columns), a tree (a set of nodes with parent-children
relationship), or a graph (a set of connected nodes). Data is typically the result of
measurements and can be visualized using graphs or images.
Data as an abstract concept can be viewed as the lowest level of abstraction,
from which information and then knowledge are derived.
Unprocessed data which is also known as raw data refers to a collection of
numbers, characters and is a relative term; data processing commonly occurs by
stages, and the "processed data" from one stage may be considered the "raw
data" of the next. Field data refers to raw data that is collected in an uncontrolled
environment. Experimental data refers to data that is generated within the context
of a scientific investigation by observation and recording.
Information
Information is that which informs us with some valid meaning, i.e. that from which
data can be derived. Information is conveyed either as the content of a message
or through direct or indirect observation of some thing. Information can be
encoded into various forms for transmission and interpretation. For example,
information may be encoded into signs, and transmitted via signals.
Information resolves uncertainty. The uncertainty of an event is measured by its
probability of occurrence and is inversely proportional to that. The more uncertain
an event, the more information is required to resolve uncertainty of that event. In
other words, information is the message having different meanings in different
contexts. Thus the concept of information becomes closely related to notions of
constraint, communication, control, data, instruction, knowledge, meaning,
Copy Right DTE&T,Odisha
Page 4
Data Type
Data types are used within type systems, which offer various ways of defining,
implementing and using the data. Different type systems ensure varying degrees
of type safety.
Almost all programming languages explicitly include the notion of data type.
Though different languages may use different terminology. Common data types
may include:
Integers,
Booleans,
Characters,
Floating-point numbers,
Alphanumeric strings.
For example, in the Java programming language, the "int" type represents the set
of 32-bit integers ranging in value from -2,147,483,648 to 2,147,483,647, as well
as the operations that can be performed on integers, such as addition,
subtraction, and multiplication. Colors, on the other hand, are represented by
three bytes denoting the amounts each of red, green, and blue, and one string
representing that color's name; allowable operations include addition and
subtraction, but not multiplication.
Most programming languages also allow the programmer to define additional
data types, usually by combining multiple elements of other types and defining
the valid operations of the new data type. For example, a programmer might
create a new data type named "complex number" that would include real and
imaginary parts. A data type also represents a constraint placed upon the
interpretation of data in a type system, describing representation, interpretation
and structure of values or objects stored in computer memory. The type system
uses data type information to check correctness of computer programs that
Copy Right DTE&T,Odisha
Page 5
to
or signed
language and machine doesn't need to distinguish between these unsigned and
signed data types for the most part.
There is a specific set of arithmetic instructions that use a different interpretation
of the bits in word as a floating-point number. Machine data types need to be
exposed or made available in systems or low-level programming languages,
allowing fine-grained control over hardware. The C programming language, for
instance, supplies integer types of various widths, such as short and long. If a
corresponding native type does not exist on the target platform, the compiler will
break them down into code using types that do exist. For instance, if a 32-bit
integer is requested on a 16 bit platform, the compiler will tacitly treat it as an
array of two 16 bit integers. Several languages allow binary and hexadecimal
literals, for convenient manipulation of machine data.
Copy Right DTE&T,Odisha
Page 6
In higher level programming, machine data types are often hidden or abstracted
as an implementation detail that would render code less portable if exposed. For
instance, a generic numeric type might be supplied instead of integers of some
specific bit-width.
Boolean type
The Boolean type represents the values true and false. Although only two values
are possible, they are rarely implemented as a single binary digit for efficiency
reasons. Many programming languages do not have an explicit boolean type,
instead interpreting (for instance) 0 as false and other values as true.
Numeric types
Such as:
Fixed point data types are convenient for representing monetary values.
They are often implemented internally as integers, leading to predefined
limits.
Page 7
Record (also called tuple or struct) Records are among the simplest data
structures. A record is a value that contains other values, typically in fixed
number and sequence and typically indexed by names. The elements of
records are usually called fields or members.
A set is an abstract data structure that can store certain values, without
any particular order, and no repeated values. Values themselves are not
retrieved from sets, rather one tests a value for membership to obtain a
boolean "in" or "not in".
Many others are possible, but they tend to be further variations and compounds
of the above.
Page 8
Enumerated Type
This has values which are different from each other, and which can be compared
and assigned, but which do not necessarily have any particular concrete
representation in the computer's memory; compilers and interpreters can
represent them arbitrarily. For example, the four suits in a deck of playing cards
may be four enumerators named CLUB, DIAMOND, HEART, SPADE, belonging
to an enumerated type named suit. If a variable V is declared having suit as its
data type, one can assign any of those four values to it. Some implementations
allow programmers to assign integer values to the enumeration values, or even
treat them as type-equivalent to integers.
Character and string types can store sequences of characters from a character
set such as ASCII. Since most character sets include the digits, it is possible to
have a numeric string, such as "1234". However, many languages would still
treat these as belonging to a different type to the numeric value 1234.
Character and string types can have different subtypes according to the required
character "width". The original 7-bit wide ASCII was found to be limited and
superseded by 8 and 16-bit sets.
Page 9
Examples include:
A queue is a first-in first-out list. Variations are Deque and Priority queue.
A set can store certain values, without any particular order, and with no
repeated values.
A graph.
Page 10
Data structure
In Computer Science, a data structure is a way of organizing information, so that
it is easier to use. Data structures determine the way in which information can be
stored in computer and used. If the focus of use is on the things that can be
done, people often talk about an abstract data type (ADT). Data structures are
often optimized for certain operations. Finding the best data structure when
solving a problem is an important part of programming. Programs that use the
right data structure are easier to write, and work better.
Page 11
more
correctly
the
mathematical
equivalent. Arrays are often used to implement tables, especially look up tables;
the word table is sometimes used as a synonym of array.
Arrays are among the oldest and most important data structures, and are used by
almost every program. They are also used to implement many other data
structures, such as lists and strings. They effectively exploit the addressing logic
of computers. In most modern computers and many external storage devices, the
memory is a one-dimensional array of words, whose indices are their
addresses. Processors, especially vector processors, are often optimized for
array operations.
Arrays are useful mostly because the element indices can be computed at run
time. Among other things, this feature allows a single iterative statement to
process arbitrarily many elements of an array. For that reason, the elements of
an array data structure are required to have the same size and should use the
same data representation. The set of valid index tuples and the addresses of the
elements (and hence the element addressing formula) are usually, but not
always, fixed while the array is in use.[3][4]
The term array is often used to mean array data type, a kind of data
type provided by most high-level programming languages that consists of a
collection of values or variables that can be selected by one or more indices
computed at run-time. Array types are often implemented by array structures;
however, in some languages they may be implemented by hash tables, linked
lists, search trees, or other data structures.
Linked List
A linked list data structure is a set of records linked together by references. The
records are often called nodes. The references are often called links or pointers.
From here on, the words node and pointer will be used for these concepts.
Page 12
In linked data structures, pointers are only dereference or compared for equality.
Thus, linked data structures are different than arrays, which require adding and
subtracting pointers.
Linked lists, search trees, and expression trees are all linked data structures.
They are also important in algorithms such as topological sort and set union-find.
Stack
A stack is a basic data structure that can be logically thought as linear structure
represented by a real physical stack or pile, a structure where insertion and
deletion of items takes place at one end called top of the stack. The basic
concept can be illustrated by thinking of your data set as a stack of plates or
books where you can only take the top item off the stack in order to remove
things from it. This structure is used all throughout programming.
The basic implementation of a stack is also called a Last In First Out structure;
however there are different variations of stack implementations.
There are basically three operations that can be performed on stacks. They are:
Queue
A queue is an abstract data type or a linear data structure, in which the first
element is inserted from one end (the tail), and the deletion of existing element
takes place from the other end (the head). A queue is a First In First Out
structure. The process of adding an element to a queue is called enqueuing and
the process of removing an element from a queue is called dequeuing.
Graph
A graph is
an abstract
data
type that
is
meant
to
implement
the graph and hypergraph concepts from mathematics. A graph data structure
consists
of
finte
of
(and
possibly
certain
entities
mutable) set of
called
ordered
nodes or vertices.
pairs,
As
in
Page 13
Page 14
removal.
When analyzing the efficiency of algorithms that use stacks, one may also
specify that all operations take the same time no matter how many items have
been pushed into the stack, and that the stack uses a constant amount of storage
for each element.
Abstract data types are purely theoretical entities, used (among other things) to
simplify the description of abstract algorithms, to classify and evaluate data
structures, and to formally describe the type systems of programming languages.
However, an ADT may be implemented by specific data types or data structures,
in many ways and in many programming languages; or described in a formal
specification language. ADTs are often implemented as modules: the module's
interface declares procedures that correspond to the ADT operations, sometimes
with comments that describe the constraints. This information hiding strategy
allows the implementation of the module to be changed without disturbing the
client programs.
The term abstract data type can also be regarded as a generalised approach of a
number of algebraic structures, such as lattices, groups, and rings. [2] This can be
treated as part of the subject area of artificial intelligence. The notion of abstract
data types is related to the concept of data abstraction, important in objectoriented programming and design by contract methodologies for software
development.
Definition of abstract data type (ADT)
An abstract data type is defined as a mathematical model of the data objects that
make up a data type as well as the functions that operate on these objects. There
are no standard conventions for defining them. A broad division may be drawn
between "imperative" and "functional" definition styles.
Page 15
An ADT has a generalized name e.g. Stack, Queue, Binary Tree etc. Each ADT
accepts data objects that are represented as members of the underlying list e.g.
an integer, a Person Object. Each ADT has a set of pre-defined methods
(behaviors in OOPs terminology) that can be used to manipulate the members in
the list - irrespective of what they actually in reality represent.
Some common ADTs, which have proved useful in a great variety of
programming applications, are
Container
Deque
List
Map
Multimap
Multiset
Page 16
Priority queue
Queue
Set
Stack
Tree
Graph
Each of these ADTs may be defined in many ways and variants, not necessarily
equivalent. For example, a stack ADT may or may not have a count operation
that tells how many items have been pushed and not yet popped. This choice
makes a difference not only for its clients but also for the implementation.
Implementation
Implementing an ADT means providing one procedure or function for each
abstract operation. The ADT instances are represented by some concrete data
structure that is manipulated by those procedures, according to the ADT's
specifications.
Usually there are many ways to implement the same ADT, using several different
concrete data structures. Thus, for example, an abstract stack can be
implemented by a linked list or by an array.
An ADT implementation is often packaged as one or more modules, whose
interface contains only the signature (number and types of the parameters and
results) of the operations. The implementation of the module namely, the
bodies of the procedures and the concrete data structure used can then be
hidden from most clients of the module. This makes it possible to change the
implementation without affecting the clients.
Algorithm
In mathematics and computer science, an algorithm is a step-by-step procedure
for calculations. Algorithms are used for calculation, data processing, and
automated reasoning.
An algorithm is an effective method expressed as a finite list of well-defined
Copy Right DTE&T,Odisha
Page 17
instructions for calculating a function. Starting from an initial state and initial input
(perhaps empty), the instructions describe a computation that, when executed,
proceeds through a finite number of well-defined successive states, eventually
producing "output" and terminating at a final ending state. The transition from one
state to the next is not necessarily deterministic; some algorithms, known as
randomized algorithms, incorporate random input.
usually
require
certain
assumptions
concerning
the
particular
Page 18
machine, and/or by postulating that certain operations are executed in unit time.
For example, if the sorted list to which we apply binary search has n elements,
and we can guarantee that each lookup of an element in the list can be done in
unit time, then at most log2 n + 1 time units are needed to return an answer.
Best, worst and average case complexity
The best, worst and average case complexity refer to three different ways of
measuring the time complexity (or any other complexity measure) of different
inputs of the same size. Since some inputs of size n may be faster to solve than
others, we define the following complexities:
Best-case complexity: This is the complexity of solving the problem for the
best input of size n.
Time complexity
In computer science, the time complexity of an algorithm quantifies the amount
of time taken by an algorithm to run as a function of the length of the string
representing the input. The time complexity of an algorithm is commonly
expressed using big O notation, which excludes coefficients and lower order
terms. When expressed this way, the time complexity is said to be described
asymptotically, i.e., as the input size goes to infinity. For example, if the time
required by an algorithm on all inputs of size n is at most 5n3 + 3n, the asymptotic
time complexity is O(n3).
Time complexity is commonly estimated by counting the number of elementary
operations performed by the algorithm, where an elementary operation takes a
Copy Right DTE&T,Odisha
Page 19
fixed amount of time to perform. Thus the amount of time taken and the number
of elementary operations performed by the algorithm differ by at most a constant
factor.
Since an algorithm's performance time may vary with different inputs of the same
size, one commonly uses the worst-case time complexity of an algorithm,
denoted as T(n), which is defined as the maximum amount of time taken on any
input of size n. Time complexities are classified by the nature of the function T(n).
For instance, an algorithm with T(n) = O(n) is called a linear time algorithm, and
an algorithm with T(n) = O(2n) is said to be an exponential time algorithm.
space complexity
The way in which the amount of storage space required by an algorithm varies
with the size of the problem it is solving. Space complexity is normally expressed
as an order of magnitude, e.g. O(N^2) means that if the size of the problem (N)
doubles then four times as much working storage will be needed.
STRING PROCESSING
String
A finite sequence S of zero or more characters is called a String. The string with
zero character is called the empty string or null string.
The number of characters in a string is called its length.
Specific string will be denoted by in closing their character in single
quotation mark.
For e.g. ; THE END quotation mark.
123
THE || || END THE END
Let
S ;S
1
followed by the
Page 20
characters of
It will be denoted by
For e.g.,
= XY1,
S || S
1
&
= PQR
S || S
1
= XY1
S
S
S ,S
= XY1 PQR
= (space),
= PQR
The length of
S || S || S
1
&
&
.A
string Y is called a substring of a string S & if their exits string S & if their exits
string X & Z. Such that S=X || Y || Z. If X is an empty string, then y is called &
initial substring of S & Z is an empty string then Y is called a terminal substring
of S.
CHARACTER DATA TYPE: The character data type is of two data type. (1) Constant (2) Variable
Constant String:
-> The constant string is fixed & is written in either single quote & double
quotation.
Ex:- SONA
Sona
Variable String:
String variable falls into 3 categories.
1. Static
2. Semi-Static
3. Dynamic
Static character variable:
Whose variable is defined before the program can be executed & cannot change
Copy Right DTE&T,Odisha
Page 21
K L
END.
Page 22
Indexing also called pattern matching which refers to finding the position where a
string pattern P. First appears in a given string text T, we called this operation
index and write as INDEX (text, pattern)
If the pattern P does not appear in text T then index is assign the
value 0; the argument & text and pattern can either string constant or string
variable.
For e.g.; T contains the text.
HIS FATHER IS THE PROFESSOR
Then INDEX (T, THE)
7
INDEX (T, THEN)
0
INDEX (T, THE)
10
Concatenation:Let
by S1
&
S , S || S
2
&
is denoted
followed by the
characters of S 2 .
Ex:-
= Sonalisa
S || S || S
1
= Behera
= Sonalisa Behera
Length operation:The number of character in a string is called its length. We will write LENGTH
(string).For the length of a given string LENGTH (Computer). The length is 8.
Basic language LEN (STRING)
Strlen (string)
Copy Right DTE&T,Odisha
Page 23
Strupper(string)
String upper
Strupr(computer)
COMPUTER
String lower
Strlwr (COMPUTER)
COMPUTER
String concatenating
String Reverse
Strcnt
Strrev
ARRAY
Linear Array:
A Linear Array is a list of finite number of n homogeneous data elements i.e. the
elements of same data types Such that:
a) The elements of the array are referenced respectively by an index set
consisting of n consecutive numbers.
b) The elements of the array are stored respectively in the successive memory
locations.
The number n of elements is called length or size of array. If not explicitly stated,
we will assume the index set consists of integers 1, 2, 3 n. In general the
length or the number of data elements of the array can be obtained from the
index set by the formula
Length= UB LB + 1
Where UB is the largest index, called the upper bound, and LB is the smallest
index, called the lower bound. Note that length = UB when LB = 1.
The elements of an array A are denoted by subscript notation a 1, a2, a3an
Or by the parenthesis notation -> A (1), A (2),., A(n)
Or by the bracket notation -> A[1], A[2],.,A[n].
We will usually use the subscript notation or the bracket notation.
Page 24
Let LA is a linear array in the memory of the computer. Recall that the memory
of computer is simply a sequence of addressed locations.
LOC (LA[k]) = address of element LA[k] of the array LA.
As previously noted, the elements of LA are stored in the successive memory
cells. Accordingly, the computer does not need to keep track of the address of
every element of LA, but needs to keep track only of the address of the first
element of LA, denoted by
Base (LA)
And called the base address of LA. Using base address the computer calculates
the address of any element of LA by the following formula:
LOC (LA[k]) = Base (LA) + w (k-lower bound)
Where w is the number of words per memory cell for the array LA.
OPERATIONS ON ARRAYS
Various operations that can be performed on an array
Traversing
Insertion
Deletion
Sorting
Searching
Merging
Page 25
Page 26
the middle of the array. Then, on the average, half of the elements must be
moved downward to new location to accommodate the new elements and keep
the order of the other elements.
Similarly, deleting an element at the end of the array presents no difficulties, but
deleting the element somewhere in the middle of the array requires that each
subsequent element be moved one location upward in order to fill up the array.
The following algorithm inserts a data element ITEM in to the Kth position in the
linear array LA with N elements.
The following algorithm deletes the Kth element from a linear array LA and
assigns it to a variable ITEM.
Page 27
Multidimensional Array
1. Array having more than one subscript variable is called MultiDimensional array.
2. Multi Dimensional Array is also called as Matrix.
Consider the Two dimensional array 1. Two Dimensional Array requires Two Subscript Variables
2. Two Dimensional Array stores the values in the form of matrix.
3. One Subscript Variable denotes the Row of a matrix.
4. Another Subscript Variable denotes the Column of a matrix.
Page 28
number.
5. Row number and Column Number Starts from 0.
a[0][1]
a[0][2]
a[0][3]
a[1][0]
a[1][1]
a[1][2]
a[1][3]
a[2][0]
a[2][1]
a[2][2]
a[2][3]
Explanation
a[3][4]
No of Rows
No of Columns
No of Cells
12
2
Memory Representation
Page 29
Array Representation:
Column-major
Row-major
Arrays may be represented in Row-major form or Column-major form. In Rowmajor form, all the elements of the first row are printed, then the elements of the
second row and so on up to the last row. In Column-major form, all the elements
of the first column are printed, then the elements of the second column and so on
up to the last column. The C program to input an array of order m x n and print
the array contents in row major and column major is given below. The following
array elements may be entered during run time to test this program:
Output:
Row Major:
1
Column Major:
1
Page 30
4000
a[0][1]
4002
a[0][2]
4004
a[1][0]
4006
a[1][1]
4008
a[1][2]
4010
a[2][0]
4012
a[2][1]
4014
a[2][2]
4016
Page 31
using
the
formula
Address(Arr[k])=Base(Arr)+w(k-LowerBound)
2 d Array :- Suppose Arr is a 2 d array. The first dimension of Arr contains the
index set 0,1,2, row-1 ( the lower bound is 0 and the upper bound is row-1)
and the second dimension contains the index set 0,1,2, col-1( with lower bound
0 and upper bound col-1.)
The length of each dimension is to be calculated .The multiplied result of both the
lengths
will
give
you
the
number
of
elements
in
the
array.
Lets assume Arr is an two dimensional 2 X 2 array .The array may be stored in
memory
1.
one
Column
of
by
the
column
following
i,e
way
column
major
:-
order
2. Row by row , i,e in row major order. The following figure shows both
representation
of
the
above
array.
By row-major order, we mean that the elements in the array are so arranged
that the subscript at the extreme right varies fast than the subscript at its left.,
while in column-major order , the subscript at the extreme left changes rapidly ,
then
the
subscript
at
its
right
and
so
on.
1,1
2,1
1,2
2,2
ColumnMajorOrder
1,1
1,2
Copy Right DTE&T,Odisha
Page 32
2,1
2,2
Arr[m,n]
can
(Column
major
(Row
major
be
calculated
order
by
using
the
Address(Arr[j,k])=
order)
following
formula
:-
base(Arr)+w[m(k-1)+(j-1)]
Address(Arr[j,k])=base(Arr)+w[n(j-1)+(k-1)]
For example Arr(25,4) is an array with base value 200.w=4 for this array. The
address
of
Arr(12,3)
can
be
calculated
using
row-major
order
as
Address(Arr(12,3))=200+4[4(12-1)+(3-1)]
=200+4[4*11+2]
=200+4[44+2]
=200+4[46]
=200+184
=384
Again
using
column-major
order
Address(Arr(12,3))=200+4[25(3-1)+(12-1)]
=200+4[25*2+11]
=200+4[50+11]
=200+4[61]
=200+244
=444
Sparse matrix
Matrix with relatively a high proportion of zero entries are called sparse matrix.
Two general types of n-square sparse matrices are there which occur in various
Copy Right DTE&T,Odisha
Page 33
5 -3
3 -5
9 -3 6
-7 8 -1 3
5 -2
-8
4 -7
3
Triangular matrix
Tridiagonal matrix
Triangular matrix
This is the matrix where all the entries above the main diagonal are zero or
equivalently where non-zero entries can only occur on or below the main
diagonal is called a (lower)Triangular matrix.
Tridiagonal matrix
This is the matrix where non-zero entries can only occur on the diagonal or on
elements immediately above or below the diagonal is called a Tridiagonal matrix.
The natural method of representing matrices in memory as two-dimensional
arrays may not be suitable for sparse matrices i.e. one may save space by
storing only those entries which may be non-zero.
Page 34
Page 35
Array
The array implementation aims to create an array where the first element
(usually at the zero-offset) is the bottom. That is, array[0] is the first element
pushed onto the stack and the last element popped off. The program must keep
track of the size, or the length of the stack. The stack itself can therefore be
effectively implemented as a two-element structure in C:
typedef struct {
size_t size;
int items[STACKSIZE];
} STACK;
The push() operation is used both to initialize the stack, and to store values to it.
It is responsible for inserting (copying) the value into the ps->items[] array and for
incrementing the element counter (ps->size). In a responsible C implementation,
it is also necessary to check whether the array is already full to prevent an
overrun.
Page 36
If we use a dynamic array, then we can implement a stack that can grow or
shrink as much as needed. The size of the stack is simply the size of the dynamic
array. A dynamic array is a very efficient implementation of a stack, since adding
items to or removing items from the end of a dynamic array is dynamically with
respect to time.
Arithmetic
The expression for adding the numbers 1 and 2 is, in prefix notation, written "+ 1
2" rather than "1 + 2". In more complex expressions, the operators still precede
their operands, but the operands may themselves be nontrivial expressions
including operators of their own. For instance, the expression that would be
written in conventional infix notation as
(5 6) 7
can be written in prefix as
( 5 6) 7
Since the simple arithmetic operators are all binary (at least, in arithmetic
contexts), any prefix representation thereof is unambiguous, and bracketing the
prefix expression is unnecessary. As such, the previous expression can be
further simplified to
567
The processing of the product is deferred until its two operands are available
(i.e., 5 minus 6, and 7). As with any notation, the innermost expressions are
Copy Right DTE&T,Odisha
Page 37
Similarly
5 (6 7)
can be written in Polish notation as
567
Polish notation
Polish notation, also known as Polish prefix notation or simply prefix notation is a
form of notation for logic, arithmetic, and algebra. Its distinguishing feature is that
it places operators to the left of their operands. If the arity of the operators is
fixed, the result is a syntax lacking parentheses or other brackets that can still be
parsed without ambiguity. The Polish logician Jan ukasiewicz invented this
notation in 1924 in order to simplify sentential logic.
The term Polish notation is sometimes taken (as the opposite of infix notation) to
also include Polish postfix notation, or Reverse Polish notation, in which the
operator is placed after the operands.
Page 38
Prefix
Postfix
a+b
+ab
ab+
a+b*c
+a*bc
abc*+
(a + b) * (c - d)
*+ab-cd
ab+cd-*
b*b-4*a*c
40 - 3 * 5 + 1
Postfix expressions are easily evaluated with the aid of a stack.
Page 39
Page 40
Page 41
A classic example of recursion is the definition of the factorial function, given here
in C code:
Queues:
Queue is a linear list of elements in which deletions can take place only at one
end, called the front and insertions can take place only at the other end, called
the rear. The termsfront and rear are used in describing a linear list only
when it is implemented as a queue.
Queue are also called first-in first-out (FIFO) lists, since the first elements enter
a queue will be the first element out of the queue. In other words, the order in
which elements enter a queue is the order in which they leave. This contrasts
with stacks, which are last-in first-out (LIFO) lists.
Queues abound in everyday life. The automobiles waiting to pass through an
intersection form a queue. In which the first car in line is the first car through; the
people waiting in line at a bank form a queue, where the first person in line is the
Copy Right DTE&T,Odisha
Page 42
Representation of queues:
Queues may be represented in the computer in various ways, usually by means
at one-way lists or linear arrays. Unless otherwise stated or implied, each of our
queues will be maintained by a linear array QUEUE and two pointer variables:
FRONT, containing the location of the front element of the queue; and REAR,
containing the location of the rear element of the queue. The condition FRONT =
NULL will indicate that the queue is empty. Following figure shows the way the
array in Figure will be stared in memory using an array QUEUE with N elements.
Figure also indicates the way elements will be deleted from the queue and the
way new elements will be added to the queue. Observe that whenever an
element is deleted from the queue, the value of FRONT is increased by 1; this
can be implemented by the assignment
FRONT: = Rear + 1
Similarly, whenever an element is added to the queue, the value of REAR is
increased by 1; this can be implemented by the assignment
REAR: = Rear +1
This means that after N insertion, the rear element of the queue will occupy
QUEUE [N] or, in other words; eventually the queue will occupy the last part of
the array. This occurs even though the queue itself may not contain many
elements.
Suppose we want to insert an element ITEM into a queue will occupy the last part
of the array, i.e., when REAR=N. One way to do this is to simply move the entire
queue to thee beginning of the array, changing FRONT and REAR accordingly,
and then inserting ITEM as above. This procedure may be very expensive. The
procedure we adopt is to assume that the array QUEUE is circular, that is, that
QUEUE [1] comes after QUEUE [N] in the array. With this assumption, we insert
ITEM into the queue by assigning ITEM to QUEUE [1]. Specifically, instead of
Copy Right DTE&T,Odisha
Page 43
Priority Queues:
A priority queue is a collection of elements such that each element has been
assigned a priority and such that the order in which elements are deleted and
processed comes from the which following rule:
(1) An element of higher priority is processes before any element of lower priority.
(2) Two elements with the same priority are processes according to the order in
which they were added to the queue.
A prototype of a priority queue is a timesharing system: programs of high priority
are processed first, and programs with the same priority form a standard queue.
There are various ways of maintaining a priority queue in memory. We discuss
two of them here: one uses a one way list, and the other uses multiple queues.
The ease or difficulty in adding elements to or deleting them from a priority queue
clearly depends on the representation that one chooses.
Page 44
means that the order in the One-way list corresponds to the order of the priority
queue.
Priority numbers will operate in the usual way: the priority number, the higher the
priority.
Page 45
What is Pointer
A pointer is a variable whose value is the address of another variable, i.e., direct
address of the memory location. Like any variable or constant, you must declare
a pointer before you can use it to store any variable address. The general form of
a pointer variable declaration is:
type *var-name;
Here, type is the pointer's base type; it must be a valid C data type and varname is the name of the pointer variable. The asterisk * you used to declare a
pointer is the same asterisk that you use for multiplication. However, in this
statement the asterisk is being used to designate a variable as a pointer.
Following are the valid pointer declaration:
int
*ip;
/* pointer to an integer */
double *dp;
float *fp;
char *ch
/* pointer to a double */
/* pointer to a float */
/* pointer to a character */
The actual data type of the value of all pointers, whether integer, float, character,
or otherwise, is the same, a long hexadecimal number that represents a memory
address. The only difference between pointers of different data types is the data
type of the variable or constant that the pointer points to.
How to use Pointers?
There are few important operations, which we will do with the help of pointers
very frequently. (a) we define a pointer variable (b) assign the address of a
variable to a pointer and (c) finally access the value at the address available in
the pointer variable. This is done by using unary operator * that returns the value
Copy Right DTE&T,Odisha
Page 46
of the variable located at the address specified by its operand. Following example
makes use of these operations:
#include <stdio.h>
int main ()
{
int var = 20; /* actual variable declaration */
int *ip;
return 0;
}
When the above code is compiled and executed, it produces result something as
follows:
Address of var variable: bffd8b3c
Address stored in ip variable: bffd8b3c
Value of *ip variable: 20
NULL Pointers in C
It is always a good practice to assign a NULL value to a pointer variable in case
you do not have exact address to be assigned. This is done at the time of
variable declaration. A pointer that is assigned NULL is called a null pointer.
Copy Right DTE&T,Odisha
Page 47
The NULL pointer is a constant with a value of zero defined in several standard
libraries. Consider the following program:
#include <stdio.h>
int main ()
{
int *ptr = NULL;
return 0;
}
When the above code is compiled and executed, it produces the following result:
The value of ptr is 0
On most of the operating systems, programs are not permitted to access memory
at address 0 because that memory is reserved by the operating system.
However, the memory address 0 has special significance; it signals that the
pointer is not intended to point to an accessible memory location. But by
convention, if a pointer contains the null (zero) value, it is assumed to point to
nothing.
To check for a null pointer you can use an if statement as follows:
if(ptr)
if(!ptr)
/* succeeds if p is null */
C Pointers in Detail:
Pointers have many but easy concepts and they are very important to C
programming. There are following few important pointer concepts which should
be clear to a C programmer:
LINKED LIST
Page 48
It is the linear collection of data elements called nodes that are stored in different
memory location connected by pointers.
Advantages of linked list:Dynamic data structure that can grow an string. Efficient memory utilization
(exact amount of data storage). Insertion deletion & pupation are easy & efficient.
Data stored in RAM but not sequential .
Disadvantages of linked list:More memory space is needed if no. of filed are more. Logical & physical
ordering of node are different. Searching is solve . Difficult to program because
pointer manipulation is required.
Types of linked list:Linear linked list or one way linked list or single list. Double linked list or two way
linked list are two way linked list. Circular linked list is two types i.e. (1) Single
circular list (2) Double circular list.
Linear linked list:- It is a one way collection of nodes where the linear order
is maintained by pointers. Nodes are not in sequence, each node
implemented in comp. by a self referential structure . Each node is divided
in two parts. First part contain the information of the element (INFO).
Second part is linked field contains the add. Of next node in the list (LINK)
field or next pointer filed .In c linked list is created using structured pointer
and Mallow (Allocation of memory). The structure of a node is strict node.
{
Into info;
Strict node*link;
};
The new node is created and addresses of the new node is assigned to stack
has start.
{START =(Strict node*) mallow(size of (strict node))}
Representation of the linked list in memory:Let list be a linked list. Then list required to linear arrays called INFO and LINK.
Copy Right DTE&T,Odisha
Page 49
Such that INFO[k] and LINK[k] contain the information part & next pointer
denoted by a NULL which indicates the end of the LIST.
E
START 3
O
H
2
INFO
link
Memory allocation of the linear linked list:The computer maintains a special list which consist of a list of all free memory
calls & also has its own pointer is called the list of available space or the free
storage list or the free pool .
Suppose insertion are to be performed on linked list then unused memory calls I
the array will also be linked to gather to form a linked list using AVAIL. As its list
pointer
variable
such
data
structure
will
be
denoted
by
writing
LIST(INFO,LINK,START,AVAIL)
START 5
Copy Right DTE&T,Odisha
K
Page 50
3
INFO
LINK
AVAIL
Avail Space
Garbage collection definition:The operating system of a computer may periodically collect all the deleted space
on to the free storage list. Any technique which does these collections is called
garbage collection.
When we delete a particular note from an existing linked list or delete the linked
list the space occupied by it must be given back to the free pool. So that the
memory can be the used by some other program that needs memory space.
To the free pool is done.
The operating system will perform this operation whenever it finds the CPU idle
or whenever the programs are falling short of memory space. The OS scans
through the entire memory cell & marks those cells. That are being by some
program then it collects the entire cell which are not being used & add to the free
pool. So that this cells can be used by other programs. This process is called
garbage collection. The garbage collection is invisible to the programmer.
Traversing of linked list:Copy Right DTE&T,Odisha
Page 51
Algorithm:Let list be a linked list in memory, this algorithm traverse LIST applying &
operation PROCESS to each element of LIST. The variable PTR to point to the
nodes currently being processed.
Step 1:-set PTR=START [initialize pointer PTR]
Step 2:-repeat step 3 & step 4 while PTR! = NULL
Step 3:-apply PROCESS to INFO[PTR]
STEP 4:-SET PTR=LINK [PTR]
[PTR now points to the next node]
End of step 2 loop
Step 5:-exit
Algorithm for searching linked list:SEARCH (INFO,LINK,START,ITEM,LOC)
LIST is a linked list in memory , this algorithm finds the location LOC of the node
where ITEM first appears in LIST or sets , LOC=NULL.
Step 1:-set PTR=START[initialize pointer PTR]
Step 2:-repeat step 3 while PTR ! = NULL
Step 3:-if ITEM = INFO[PTR]
Then set LOC = PTR & exit
Else
Set PTR = LINK[PTR]
[PTR now points to next node]
[End of if structure]
End of step 2 loop
Step 4:-[Search is unsuccessful]
Set LOC = NULL
Step 5:-Exit
Inserting the node at the beginning of the list:INSERT (INFO, LINK, START, AVAIL, ITEM)
Step 1:-[over flow?]
If AVAIL = NULL, then write over flow & exit
Copy Right DTE&T,Odisha
Page 52
Page 53
Else
Set = LINK [LOCP] = LINK [LOC]
[Deletes node N]
[End of if]
Step 2:-[Return deleted node to the AVAIL LIST]
Set = LINK [LOC] = AVAIL & AVAIL = LOC
Step 3:-Exit
Header linked list:A header linked list is a special type of linked list. Which always contains a
special node called header node at the beginning of the list so in a header linked
list will not point to first node of the list. But start will contain the address of the
header node.
There are two types of header linked list i.e. Grounded header linked list &
Circular header linked list.
Grounded header linked list:It is a header linked list where last node contains the NULL pointer. LINK [start] =
NULL indicates that a grounded header linked list is empty.
STAR
T
Header node
Circular header linked list:It is a header linked list where the last node points back to the header node.
STAR
T
Header node
Page 54
TREE
A tree is a non-linear data structure that consists of a root node and potentially
many levels of additional nodes that form a hierarchy. A tree can be empty with
no nodes called the null or empty tree or a tree is a structure consisting of one
node called the root and one or more subtrees.
Thus tree is a finite set of one or more nodes such that :
i> There is a specially designated node called the root
ii> The remaining nodes are partitioned into n >= 0 disjoint sets T1,T2,..,Tn
where each of these sets is a tree . T1,T2,..,Tn are called the
subtrees of the root .
Terminologies used in Trees
Page 55
Height - The height of a node is the length of the longest downward path
between the node and a leaf.
The node labeled 7 has two children, labeled 2 and 6, and one parent, labeled 2.
The root node, at the top, has no parent.
Binary Tree
Definition: A binary tree is a finite set of nodes which is either empty or consists
of a root and two disjoint binary trees called the left subtree and the right subtree.
We can define the data structure binary tree as follows:
structure BTREE
declare CREATE( ) --> btree
ISMTBT(btree,item,btree) --> boolean
MAKEBT(btree,item,btree) --> btree
LCHILD(btree) --> btree
Copy Right DTE&T,Odisha
Page 56
This set of axioms defines only a minimal set of operations on binary trees. Other
operations can usually be built in terms of these. The distinctions between a
binary tree and a tree should be analyzed. First of all there is no tree having zero
nodes, but there is an empty binary tree. The two binary trees below are different.
The first one has an empty right subtree while the second has an empty left
subtree. If these are regarded as trees, then they are the same despite the fact
that they are drawn slightly differently.
Page 57
gives us the definition of a complete binary tree. A binary tree with n nodes and a
depth k is complete iff its nodes correspond to the nodes which are numbered
one to n in the full binary tree of depth k. The nodes may now be stored in a one
dimensional array tree, with the node numbered i being stored in tree(i).
If a complete binary tree with n nodes (i.e., depth=[LOG2N]+1) is represented
sequentially as above then for any node with index i, 1<=i<=n we have
(i) parent(i) is at [i/2] if i is not equal to 1. When i=1, i is the root and has no
parent.
(ii) lchild(i) is at 2i if 2i <= n . If 2i > n, then i has no left child.
(iii) rchild(i) is at 2i+1 if 2i+1 <= n .If 2i+1 > n, then i has no right child.
Proof: We prove (ii). (iii) is an immediate consequence of (ii) and the numbering
of nodes on the same level from left to right. (i) follows from (ii) and (iii). We prove
(ii) by induction on i. For i=1, clearly the left child is at 2 unless 2>n in which case
1 has no left child. Now assume that for all j, 1<=j<=i lchild(j) is at 2j .Then the
two nodes immediately preceeding lchild(i+1) in the representation are the right
child of I and the left child of I .The left child of i is at 2i .Hence,the left child of i+1
is at 2i+2=2(i+1) unless 2(i+1)>n inwhich case i+1 has no left child. This
representation can clearly be used for all binary trees though in most cases there
will be a lot of unutilized space. For complete binary trees the representation is
ideal as no space is wasted.
Page 58
Inorder Traversal: informally this calls for moving down the tree towards the left
untilyou can go no farther. Then you "visit" the node, move one node to the right
and continue again. If you cannot move to the right, go back one more node. A
precise way of describing this traversal is to write it as a recursive procedure.
Procedure INORDER(T)
// T is a binary tree where each node has three fields LCHILD,DATA,RCHILD //
If T<> 0 then [call INORDER(LCHILD(T))
Print (DATA(T))
Call (INORDER(RCHILD(T))]
end INORDER
call POSTORDER(LCHILD(T))
Call POSTORDER(RCHILD(T))]
[print (DATA(T))]
end POSTORDER
Page 59
The left subtree of a node contains only nodes with keys less than the
node's key.
The right subtree of a node contains only nodes with keys greater than the
node's key.
The left and right subtree each must also be a binary search tree.
Page 60
Searching
1. Start at the root node as current node
2. If the search keys value matches the current nodes key then found a match
3. If search keys value is greater than current nodes
a. If the current node has a right child, search right
b. Else, no matching node in the tree
4. If search key is less than the current nodes
a. If the current node has a left child, search left
b. Else, no matching node in the tree
Insertion
1.Always insert new node as leaf node
2. Start at root node as current node
3. If new nodes key < currents key
a. If current node has a left child, search left
b. Else add new node as currents left child
4. If new nodes key > currents key
a. If current node has a right child, search right
b. Else add new node as currents right child
Deletion
Basically, in can be divided into two stages:
Page 61
Page 62
Page 63
Method :
o
replace 5 by 19;
Notice, that the node with minimum value has no left child and, therefore, it's
removal may result in first or second cases only.
Example. Remove 12 from a BST.
.
Copy Right DTE&T,Odisha
Page 64
3. Replace 12 with 19. Notice, that only values are replaced, not nodes. Now we
have two nodes with the same value.
Page 65
GRAPH
Definition
A graph, G, consists of two sets V and E. V is a finite non-empty set of vertices. E
is a set of pairs of vertices, these pairs are called edges. V(G) and E(G) will
represent the sets of vertices and edges of graph G. We will also write G = (V,E)
to represent a graph.
In an undirected graph the pair of vertices representing any edge is unordered .
Thus, the pairs (v1, v2) and (v2, v1) represent the same edge.
In a directed graph each edge is represented by a directed pair (v1, v2). v1 is the
tail and v2 the head of the edge. Therefore <v2, v1> and <v1, v2> represent two
different edges.
Page 66
Edges:
o
Vertices:
o
Representation:
o
Example:
V = {A,B,C,D}
E = {(A,B),(A,C),(A,D),(B,D),(C,D)}
Page 67
connected by an edge
Representation
Adjacency list
Vertices are stored as records or objects, and every vertex stores a list of
adjacent vertices. This data structure allows the storage of additional data on the
vertices. Additional data can be stored if edges are also stored as objects, in
which case each vertex stores its incident edges and each edge stores its
incident vertices.
Adjacency matrix
A two-dimensional matrix, in which the rows represent source vertices and
columns represent destination vertices. Data on edges and vertices must be
stored externally. Only the cost for one edge can be stored between each pair of
vertices.
Incidence matrix
A two-dimensional Boolean matrix, in which the rows represent the vertices and
columns represent the edges. The entries indicate whether the vertex at a row is
incident to the edge at a column.
Directed graph
Page 68
Undirected Graph
A graph where there is no implied direction on edge between nodes
undirected graph
Page 69
directed graph
Page 70
Page 71
Page 72
Page 73
Else L=L+1
TEMP[L]=K[J]
J=J+1
Step 3: [Copy remaining unprocessed element in output area]
If I>=SECOND
Then repeat while J<=THIRD
L=L+1
TEMP[L]=K[J]
J=J+1
Else
Repeat while I< SECOND
L=L+1
TEMP[L]=K[I]
I=I+1
Step 4: [Copy the element into temporary vector into original area]
Repeat for I=1,2,..L
K[FIRST-I+1]=TEMP[I]
Step 5: Exit
Complexity of the Merging Algorithm
The input consists of the total number n=r+s of elements in A and B. Each
comparison assigns an elements to the array C, which eventually has n
elements. Accordingly, the number f(n) of comparisons cannot exceed n:
F(n) <= n = 0(n)
In other words, the merging algorithm can be run in linear time.
WORST CASE : n log= 0[n log n]
AVERAGE CASE : n log n=0[n log n]
SEARCHING:
Searching refers to finding the location i.e LOC of ITEM in an array. The search
is said to be successful if ITEM appears the array & unsuccessful otherwise we
have two types of searching techniques.
Copy Right DTE&T,Odisha
Page 74
1. Linear Search
2. Binary Search
LINEAR SEARCH:
Suppose DATA is a linear array with n elements. No other information about
DATA, the most intuitive way to search for a given ITEM in DATA is to compare
ITEM with each element of DATA one by one. First we have to test whether
DATA [1]=ITEM, nad then we test whether DATA[2] =ITEM , and so on. This
method which traverses DATA sequentially to locate ITEM, is called linear search
or sequential search.
Algorithm
LINEAR (DATA, N, ITEM, LOC)
Step 1: [Insert ITEM at the end of data]
Set DATA [N+1] = ITEM
Step 2: [Initialize counter]
Set LOC=1
Step 3: [Search for ITEM]
Repeat while DATA [LOC]!= ITEM
Step 4: [Successful]
If LOC=N+1
Then Set LOC = 0
Step 5: Exit
Complexity of the Linear Search Algorithm
The complexity of search algorithm is measured by the number f(n) of
comparisons required to find ITEM in DATA where DATA contains n elements.
Two important cases to consider are the average case and the worst case.
The worst case occurs when one must search through the entire array DATA. In
this case, the algorithm requires
F(n)= n+1
Thus, in the worst case, running time is proportional to n.
The running time of the average case uses the probabilistic notion of expectation.
The probability that ITEM appears in DATA[K], and q is the probability that ITEM
Copy Right DTE&T,Odisha
Page 75
does not appear in DATA . Since the algorithm uses k comparison when ITEM
appears in DATA[K], the average number of comparison is given by
F(n) = 1 . p1 + 2 . p2 +..+ n . pn + (n+1).q
BINARY SEARCH
Suppose DATA is an array which is sorted in increasing numerical order or
equivalently, alphabetically. Then there is an extremely efficient searching
algorithm, called binary search.
Algorithm
Binary search (DATA, LB, UB, ITEM, LOC)
Step 1: [Initialize the segment variables]
Set BEG := LB, END := UB and MID := INT ((BEG + END)/2)
Step 2: [Loop]
Repeat Step 3 and Step 4 while BEG <= END and DATA [MID] != ITEM
Step 3: [Compare]
If ITEM < DATA [MID]
then set END := MID - 1
Else
Set BEG = MID + 1
Step 4: [Calculate MID]
Set MID := INT ((BEG + END)/2)
Step 5: [Successful search]
If DATA [MID] = ITEM
then set LOC := MID
Else set LOC := NULL
Step 6: Exit
Complexity of the Binary Search Algorithm
The complexity is measured by the number f(n) of comparison to locate ITEM in
DATA where DATA contains n elements. Observe that each comparison reduces
the sample size in half. Hence we require at most f(n) comparison to locate ITEM
where
Copy Right DTE&T,Odisha
Page 76
2f(n) > n
Or equivalently
F(n) = [log2 n] + 1
The running time for the worst case is approximately equal to log2 n and the
average case is approximately equal to the running time for the worst case.
Page 77
Although one may find that these requirements are in contradiction with each
other, it is the designers job to find a good compromise among them to get an
adequate solution for the problem at hand. For example, the ease of addition of
records can be compromised to get fast access to data.
record in the file. Sequential file organization is the most basic way
to organize a large collection of records in a file. The figure below
shows n records numbered from 0 to n-1 stored in a sequential file.
Once we store the records in a file, we cannot make any changes
to the records. We cannot even delete the records from a
sequential file. In case we need to delete or update one or more
Record i
Record
In sequential file organization, all the records have the same size
i+1
and the same field format, and every field has a fixed size. The
.................
..
of two or more fields. This field is known as the key. Each key
.................
..
different value for the key field. Records can be sorted in either
Record n-
Record n-
Page 78
and tapes.
Features
Advantages
Records can be
No extra overheads
sequentially. If
involved.
to be read, then
be
all
magnetic disks as
records
well
be read.
tapes.
Records
are
read
and
stored
as
on
magnetic
the
i-1
must
to be created
oriented
applications.
file has to be
read
written sequentially.
entered.
Disadvantages
format.
replaced
key value.
contains
desired
generation
changes.
or
sequential
reading.
with
Cannot be used
for
interactive
application
the
Page 79
means the record number represents the location of the record relative to the
beginning of the file. The record numbers range from 0 to n-1, where n is the
number of records in the file. For example, the record with number 0 is the first
record in the file. The records in a relative file are of fixed length.
Therefore, in relative files, records are organized in ascending relative record
number. A relative file can be thought of as a single dimension table stored on a
disk, in which the relative record number is the index into the table. Relative files
can be used for both random as well as sequential access. For sequential
access, records are simply read one after another.
Relative files provide support for only one key, that is, the relative record number.
This key must be numeric and must take a value between 0 and the current
highest relative record number -1. This means that enough space must be
allocated for the file to contain the records with relative record numbers between
0 and the highest record number -1. For example, if the highest relative record
number is 1,000 then space must be allocated to store 1,000 records in the file.
The figure below shows a schematic representation of a relative file which has
been allocated enough space to store 100 records. Although it has space to
accommodate 100 records, not all the locations are occupied. The locations
marked as FREE are yet to store records in them. Therefore, every location in
the table either stores a record or is marked as FREE.
Relative
record
number
0
1
2
3
4
98
99
Records
stored
in
memory
Record 0
Record 1
FREE
FREE
Record 4
.
FREE
Record 99
Page 80
record which has to be accessed. If the records are of fixed length and we know
the base address of the file and the length of the record, then any record i can be
accessed using the following formula:
Address of ith record=base_address+(i-1)* record_length
Note that the base address of the file refers to the starting address of the file. We
took i-1 in the formula because record numbers start from 0 rather than 1.
Consider the base address of a file is 1000 and each record occupies 20 bytes,
then the address of the 5th record can be given as:
1000+ (5-1)*20
=1000+80
=1080
The Table below summarizes the features, advantages, and disadvantages of
Features
Advantages
Disadvantages
Provides an effective
Ease of processing.
way
If
to
access
The
number
has to be accessed is
the
record
of
relative
to disk devices.
Records can be of
fixed length only.
be
relative
instantaneously.
access of records,
Records in a relative
number must be
files fast.
known
to
the
Allow
accessed
Use
files is restricted
record
relative
individual records.
represents
the
deletions
and
For
random
advance.
in
record or is marked as
FREE.
based
on
record
number
the
relative
of
the
record to be inserted.
Indexed sequential file organization stores data for fast retrieval. The records in
an indexed sequential file are of fixed length and every record is uniquely
identified by a key field. We maintain a table known as the index table which
Record
Address
number
Record
of
the
765
Record
27
Record
876
Record
742
Record
NULL
because
NULL
records
NULL
anywhere,
NULL
NULL
called
as
indexed
be
the
stored
the index
those records.
The ith entry in the index
table points to the ith record of the file. Initially, when the file is created, each
entry in the index table contains NULL. When the ith record of the file is written,
free space is obtained from the free space manager and its address is stored in
Copy Right DTE&T,Odisha
Page 82
Features
Advantages
only on disks.
searched
and
store indices.
file.
records it needs.
Supports applications
is more complicated
that
than
Disadvantages
require
quickly,
both
Indexed sequential
to
handling
Page 83
processing.
Records
accessed
to access it randomly.
sequentially as well
as randomly.
well
in
situations
where
sequential files.
can
be
HASHING
The search time of each algorithm discussed so far depends on the
number n of elements in the collection S of data. This section discusses a
searching technique, called hashing or hash addressing, which is essentially
independent of the number n.
The terminology which we use in our presentation of hashing will be
oriented toward file management. First of all, we assume that there is a file F of n
records with a set K of keys which uniquely determine the records in F. Secondly,
we assume that F is maintained in memory by a table T of m memory locations
and that L is set of memory addresses of the locations in T. For notational
convenience, we assume that the keys in K and the addresses in L are (decimal)
integers. (Analogous methods will work with binary integers or with keys which
are character strings, such as names, since there are standard ways of
representing strings by integers.)
The subject of hashing will be introduced by the following example.
Page 84
Example:
The general idea of using the key to determine the address of a record is an
excellent idea, but it must be modified so that a great deal of space is not wasted.
This modification takes the form of a function H from the set K of keys into the set
L of memory addresses. Such a function,
H: K
1. Hash Functions
The two principal criteria used in selecting a hash function H: K
are
as
follows. First of all, the function H should be very easy and quick to compute.
Secondly the function H should, as far as possible, uniformly distribute the hash
addresses throughout the set L so that there are a minimum number of collisions.
Naturally, there is no guarantee that the second condition can be completely
fulfilled without actually knowing beforehand the keys and addresses. However,
certain general techniques do help. One technique is to chop a key k into
pieces and combine the pieces in some way to form the hash address H(k). (The
Copy Right DTE&T,Odisha
Page 85
term hashing comes from this technique of chopping a key into pieces.)
We next illustrate some popular hash functions. We emphasize that each
of these hash functions can be easily and quickly evaluated by the computer.
a)
Division method
H(k)=k(mod m)+1
b) Midsquare method
The key k is squared. Then the hash function H is defined by
H(k)=l
Where l is obtained by deleting digits from both ends of k2. We emphasize that
the same positions of k2 must be used for all of the keys.
c) Folding method
The key k is partitioned into a number of parts, k1....., kr, where each part, except
possibly the last, has the same number of digits as the required address. Then
the parts are added together, ignoring the last carry. That is,
H(k)=k1+k2+.......+kr
Where the leading-digit carries, if any, are ignored. Sometimes, for extra milling,
the even-numbered parts, k2,k4,....., are each reversed before the addition.
Example
Consider the company in the above Example, each of whose 68 employees is assigned a
unique 4-digit employee number. Suppose L consists of 100 two-digit addresses: 00, 01,
02, ......, 99. We apply the above hash functions to each of the following employee
Copy Right DTE&T,Odisha
Page 86
numbers:
3205, 7148, 2345
1. Division method
Choose a prime number m close to 99, such as m=97. Then H(3205)=4, H(7148)=67,
H(2345)=17
That is, dividing 3205 by 97 gives a remainder of 4, dividing 7148 by 97 gives a remainder
of 67, and dividing 2345 by 97 gives a remainder of 17. In the case that the memory
addresses begin with 01 rather than 00, we choose that the function H(k)=k(mod m)+1 to
obtain:
H(3205)=4+1=5,
H(7148)=67+1=68,
H(2345)=17+1=18
2. Midsquare method
The following calculations are performed:
k:
3205
7148
2345
k2:
10 272 025
51 093 904
499 025
H(k):
72
93
99
Observe that the fourth and fifth digits, counting from the right, are chosen for the hash
address.
3. Folding method
Chopping the key k into two parts and adding yields the following hash addresses:
H(3205)=32+05=37,
H(7148)=71+48=19,
H(2345)=23+45=68
Observe that the leading digit 1 in H(7148) is ignored. Alternatively, one may want to
reverse the second part before adding, thus producing the following hash addresses:
H(3205)=32+50=82,
H(7148)=71+84+55,
H(2345)=23+54=77
2. Collision Resolution
Suppose we want to add a new record R with key k to our file F, but
suppose the memory location address H(k) is already occupied. This situation is
Copy Right DTE&T,Odisha
Page 87
Page 88
numbers of probes for a successful search and for an unsuccessful search are
known to be the following respective quantities:
and
U()=
(Here
Example
Suppose the table T has 11 memory locations, T[1], T[2],......., T[11], and
suppose the file F consists of 8 records, A, B, C, D, E, X, Y, and Z, with the
following hash addresses:
Record: A, B, C, D, E, X, Y, Z
H(k):
4, 8, 2, 11, 4, 11, 5, 1
Suppose the 8 records are entered into the table T in the above order. Then the
file F will appear in memory as follows:
Table T:
X, C, Z, A, E, Y, _, B, _, _, D
Address:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
Although Y is the only record with hash address H(k)=5, the record is not
assigned to T[5], since T[5] has already been filled by E because of a previous
collision at T[4]. Similarly, Z does not appear in T[1].
The average number S of probes for a successful search follows:
S=
=1.6
=3.6
The first sum adds the number of probes to find each of the 8 records, and the
second sum adds the number of probes to find an empty location for each of the
11 locations.
One main disadvantage of linear probing is that records tend to cluster, that is,
Copy Right DTE&T,Odisha
Page 89
appear next to one another, when the load factor is greater than 50 percent.
Such a clustering substantially increases the average search time for a record.
Two techniques to minimize clustering are as follows:
Quadratic probing
Suppose a record R with key k has the hash address H(k)=h. Then, instead of
searching the locations with addresses h, h+1, h+2,....., we linearly search the
locations with addresses
h, h+1, h+4, h+9, h+16,........h+i 2,.....
If the number m of locations in the table T is a prime number, then the above
sequence will access half of the locations in T.
Double hashing
Here a second hash function H is used for resolving a collision, as follows.
Suppose a record R with key k has the hash addresses H(k)=h and H(k)=hm.
Then we linearly search the locations with addresses
h, h+h, h+2h, h+3h,....
If m is a prime number, then the above sequence will access all the locations in
the table T.
Remark: One major disadvantage in any type of open addressing procedure is in
the implementation of deletion. Specifically, suppose a record R is deleted from
the location T[r]. Afterwards, suppose we meet T[r] while searching for another
record R. This does not necessarily mean that the search is unsuccessful. Thus,
when deleting the record R, we must label the location T[r] to indicate that it
previously did contain a record. Accordingly, open addressing may seldom be
used when a file F is constantly changing.
REVIEW QUESTIONS
1. What is the principle of bubble sort?
2. Differentiate between linear search and binary search?
3. What is searching and what are the methods of searching?
4. Why do we need files?
5. What do you understand by the term file organization?
Copy Right DTE&T,Odisha
Page 90
Page 91