Unit-1 Data Structures
Unit-1 Data Structures
1.1 INTRODUCTION
The study of data structures helps to understand the basic concepts involved in organizing
and storing data as well as the relationship among the data sets. This in turn helps to
determine the way information is stored, retrieved and modified in a computer’s memory.
Data structure is a branch of computer science. The study of data structure helps you to
understand how data is organized and how data flow is managed to increase efficiency of
any process or program. Data structure is the structural representation of logical
relationship between data elements. This means that a data structure organizes data items
based on the relationship between the data elements.
Example:
A house can be identified by the house name, location, number of floors and so on.
These structured set of variables depend on each other to identify the exact house.
Similarly, data structure is a structured set of variables that are linked to each other,
which forms the basic component of a system
Data structures are the building blocks of any program or the software. Choosing the
appropriate data structure for a program is the most difficult task for a programmer.
Following terminology is used as far as data structures are concerned
Data: Data can be defined as an elementary value or the collection of values, for
example, student's name and its id are the data about the student.
Group Items: Data items which have subordinate data items are called Group item, for
example, name of a student can have first name and the last name.
Record: Record can be defined as the collection of various data items, for example, if we
talk about the student entity, then its name, address, course and marks can be grouped
together to form the record for the student.
File: A File is a collection of various records of one type of entity, for example, if there
are 60 employees in the class, then there will be 20 records in the related file where each
record contains the data about each employee.
Attribute and Entity: An entity represents the class of certain objects. it contains
various attributes. Each attribute represents the particular property of that entity.
Correctness: Data structure is designed such that it operates correctly for all kinds of
input, which is based on the domain of interest. In other words, correctness forms the
primary goal of data structure, which always depends on the specific problems that the
data structure is intended to solve.
Efficiency: Data structure also needs to be efficient. It should process the data at high
speed without utilizing much of the computer resources such as memory space. In a real
time state, the efficiency of a data structure is an important factor that determines the
success and failure of the process.
Adaptability: Developing software projects such as word processors, Web browsers and
Internet search engine involves large software systems that work or execute correctly and
efficiently for many years. Moreover, software evolves due to ever changing market
conditions or due to emerging technologies.
A data structure provides a structured set of variables that are associated with each other
in different ways. It forms a basis of programming tool that represents the relationship
between data elements and helps programmers to process the data easily.
Primitive data structures consist of the numbers and the characters which are
built in programs. These can be manipulated or operated directly by the
machine level instructions. Basic data types such as integer, real, character,
and Boolean come under primitive data structures. These data types are also
known as simple data types because they consist of characters that cannot be
divided.
I) Array
When we declare an array, we can assign initial values to each of its elements by
enclosing the values in braces { }.
Num 26 7 67 50 66
Figure 1.2 Array
The number of values inside braces { } should be equal to the number of elements that
we declare for the array inside the square brackets [ ]. In the example of array Paul, we
have declared 5 elements and in the list of initial values within braces { } we have
specified 5 values, one for each element. After this declaration, array Paul will have five
integers, as we have provided 5 initialization values.
Limitations:
Arrays are of fixed size.
Data elements are stored in contiguous memory locations which may not be
always available.
Insertion and deletion of elements can be problematic because of shifting of
elements from their positions.
However, these limitations can be solved by using linked lists.
Applications:
Storing list of data elements belonging to same data type
Auxiliary storage for other data structures
Storage of binary tree elements of fixed count
Storage of matrices
A linked list is a data structure in which each data element contains a pointer
or link to the next element in the list. Through linked list, insertion and
deletion of the data element is possible at all places of a linear list. Also in
linked list, it is not necessary to have the data elements stored in consecutive
locations. It allocates space for each data item in its own block of memory.
Thus, a linked list is considered as a chain of data elements or records called
nodes. Each node in the list contains information field and a pointer field. The
information field contains the actual data and the pointer field contains address
of the subsequent nodes in the list.
Figure 1.3 represents a linked list with 4 nodes. Each node has two parts. The left part in
the node represents the information part which contains an entire record of data items and
the right part represents the pointer to the next node. The pointer of the last node contains
a null pointer.
Applications:
Implementing stacks, queues, binary trees and graphs of predefined size.
Implement dynamic memory management functions of operating system.
Polynomial implementation for mathematical operations
Circular linked list is used to implement OS or application functions that require
round robin execution of tasks.
Circular linked list is used in a slide show where a user wants to go back to the
first slide after last slide is displayed.
Doubly linked list is used in the implementation of forward and backward buttons
in a browser to move backwards and forward in the opened pages of a website.
Circular queue is used to maintain the playing sequence of multiple players in a
game.
III) Stacks
A stack is a linear data structure in which insertion and deletion of elements are done at
only one end, which is known as the top of the stack. Stack is called a last-in, first-out
(LIFO) structure because the last element which is added to the stack is the first
element which is deleted from the stack.
Applications:
Temporary storage structure for recursive operations
Auxiliary storage structure for nested operations, function calls,
deferred/postponed functions
Manage function calls
Evaluation of arithmetic expressions in various programming languages
Conversion of infix expressions into postfix expressions
Checking syntax of expressions in a programming environment
Matching of parenthesis
String reversal
In all the problems solutions based on backtracking.
Used in depth first search in graph and tree traversal.
Operating System functions
UNDO and REDO functions in an editor.
IV) Queues
A queue is a first-in, first-out (FIFO) data structure in which the element that is
inserted first is the first one to be taken out. The elements in a queue are added at one
end called the rear and removed from the other end called the front. Like stacks,
queues can be implemented by using either arrays or linked lists.
Figure 1.5 shows a queue with 4 elements, where 55 is the front element and 65 is the
rear element. Elements can be added from the rear and deleted from the front.
Applications:
It is used in breadth search operation in graphs.
Job scheduler operations of OS like a print buffer queue, keyboard buffer queue to
store the keys pressed by users
Job scheduling, CPU scheduling, Disk Scheduling
Priority queues are used in file downloading operations in a browser
Data transfer between peripheral devices and CPU.
Interrupts generated by the user applications for CPU
Calls handled by the customers in BPO
V) Trees
A tree is a non-linear data structure in which data is organized in branches. The data
elements in tree are arranged in a sorted order. It imposes a hierarchical structure on
the data elements.
Figure 1.6 represents a tree which consists of 8 nodes. The root of the tree is the node
60 at the top. Node 29 and 44 are the successors of the node 60. The nodes 6, 4, 12
and 67 are the terminal nodes as they do not have any successors.
Applications:
Implementing the hierarchical structures in computer systems like directory and
file system.
Implementing the navigation structure of a website.
Code generation like Huffman’s code.
Decision making in gaming applications.
Implementation of priority queues for priority-based OS scheduling functions
Parsing of expressions and statements in programming language compilers
For storing data keys for DBMS for indexing
Spanning trees for routing decisions in computer and communications networks
Hash trees
path-finding algorithm to implement in AI, robotics and video games applications
VI) Graphs
A graph is also a non-linear data structure. In a tree data structure, all data
elements are stored in definite hierarchical structure. In other words, each
node has only one parent node. While in graphs, each data element is called a
vertex and is connected to many other vertexes through connections called
edges.
Applications:
Representing networks and routes in communication, transportation and travel
applications
Routes in GPS
Interconnections in social networks and other network-based applications
Mapping applications
Ecommerce applications to present user preferences
Utility networks to identify the problems posed to municipal or local corporations
Resource utilization and availability in an organization
Document link map of a website to display connectivity between pages through
hyperlinks
Robotic motion and neural networks
Data structure is a way of storing and organising data efficiently such that the required
operations on them can be performed be efficient with respect to time as well as
memory. Simply, Data Structure are used to reduce complexity (mostly the time
complexity) of the code.
Data structures can be two types:
1. Static Data Structure
2. Dynamic Data Structure
What is a Static Data structure?
In Static data structure the size of the structure is fixed. The content of the data
structure can be modified but without changing the memory space allocated to it.
This section discusses the different operations that can be performed on the various data
structures previously mentioned.
Traversing It means to access each data item exactly once so that it can be processed.
For example, to print the names of all the students in a class.
Searching It is used to find the location of one or more data items that satisfy the given
constraint. Such a data item may or may not be present in the given collection of data
items. For example, to find the names of all the students who secured 100 marks in
mathematics.
Inserting It is used to add new data items to the given list of data items. For example, to
add the details of a new student who has recently joined the course.
Deleting It means to remove (delete) a particular data item from the given collection of
data items. For example, to delete the name of a student who has left the course.
Sorting Data items can be arranged in some order like ascending order or descending
order depending on the type of application. For example, arranging the names of students
in a class in an alphabetical order, or calculating the top three winners by arranging the
participants’ scores in descending order and then extracting the top three.
Merging Lists of two sorted data items can be combined to form a single list of sorted
data items.
From the above definition, it is clear that the operations in data structure
involve higher -level abstractions such as, adding or deleting an item from a
list, accessing the highest priority item in a list, or searching and sorting an
item in a list. When the data structure does such operations, it is called an
abstract data type.
It can be defined as a collection of data items together with the operations on the data.
The word “abstract” refers to the fact that the data and the basic operations defined on it
are being studied independently of how they are implemented. It involves what can be
done with the data, not how has to be done. For ex, in the below figure the user would be
involved in checking that what can be done with the data collected not how it has to be
done.
An implementation of ADT consists of storage structures to store the data items and
algorithms for basic operation. All the data structures i.e. array, linked list, stack, queue
etc are examples of ADT.
Advantage of using ADTs
In the real world, programs evolve as a result of new requirements or constraints, so a
modification to a program commonly requires a change in one or more of its data
structures. For example, if you want to add a new field to a student’s record to keep track
of more information about each student, then it will be better to replace an array with a
linked structure to improve the program’s efficiency. In such a scenario, rewriting every
procedure that uses the changed structure is not desirable. Therefore, a better alternative
is to separate the use of a data structure from the details of its implementation. This is the
principle underlying the use of abstract data types.
1.7 ALGORITHM
Algorithm is a step-by-step procedure, which defines a set of instructions to be executed
in a certain order to get the desired output. Algorithms are generally created independent
of underlying languages, i.e. an algorithm can be implemented in more than one
programming language.
From the data structure point of view, following are some important categories of
algorithms −
Search − Algorithm to search an item in a data structure.
Sort − Algorithm to sort items in a certain order.
Insert − Algorithm to insert item in a data structure.
Update − Algorithm to update an existing item in a data structure.
Delete − Algorithm to delete an existing item from a data structure.
Advantages of Algorithms:
It is easy to understand.
Algorithm is a step-wise representation of a solution to a given problem.
In Algorithm the problem is broken down into smaller pieces or steps hence, it
is easier for the programmer to convert it into an actual program.
Disadvantages of Algorithms:
Hence, many solution algorithms can be derived for a given problem. The next step is to
analyze those proposed solution algorithms and implement the best suitable solution.
1.8 ALGORITHM COMPLEXITY
Suppose X is an algorithm and n is the size of input data, the time and space used by the
algorithm X are the two main factors, which decide the efficiency of X.
Time Factor − Time is measured by counting the number of key operations such
as comparisons in the sorting algorithm.
Space Factor − Space is measured by counting the maximum memory space
required by the algorithm.
The complexity of an algorithm f(n) gives the running time and/or the storage space
required by the algorithm in terms of n as the size of input data.
1.8.1 Space Complexity
Space complexity of an algorithm represents the amount of memory space required by
the algorithm in its life cycle. The space required by an algorithm is equal to the sum of
the following two components −
A fixed part that is a space required to store certain data and variables, that are
independent of the size of the problem. For example, simple variables and
constants used, program size, etc.
A variable part is a space required by variables, whose size depends on the size of
the problem. For example, dynamic memory allocation, recursion stack space,
etc.
Space complexity S(P) of any algorithm P is S(P) = C + SP(I), where C is the fixed part
and S(I) is the variable part of the algorithm, which depends on instance characteristic I.
Following is a simple example that tries to explain the concept −
Algorithm: SUM(A, B)
Step 1 - START
Step 2 - C ← A + B + 10
Step 3 - Stop
Here we have three variables A, B, and C and one constant. Hence S(P) = 1 + 3. Now,
space depends on data types of given variables and constant types and it will be
multiplied accordingly.
Algorithm analysis framework involves finding out the time taken and the memory space
required by a program to execute the program. It also determines how the input size of a
program influences the running time of the program.
The efficiency of some algorithms may vary for inputs of the same size. For
such algorithms, we need to differentiate between the worst case, average case
and best case efficiencies.
If an algorithm takes maximum amount of time to execute for a specific set of input, then
it is called the worst case time complexity. The worst case efficiency of an algorithm is
the efficiency for the worst case input of size n. The algorithm runs the longest among all
the possible inputs of the similar size because of this input of size n.
Algorithms are widely used in various areas of study. We can solve different problems
using the same algorithm. Therefore, all algorithms must follow a standard. The
mathematical notations use symbols or symbolic expressions, which have a precise
semantic meaning.
A problem may have various algorithmic solutions. In order to choose the best algorithm
for a particular process, you must be able to judge the time taken to run a particular
solution. More accurately, you must be able to judge the time taken to run two solutions,
and choose the better among the two.
To select the best algorithm, it is necessary to check the efficiency of each algorithm. The
efficiency of each algorithm can be checked by computing its time complexity. The
asymptotic notations help to represent the time complexity in a shorthand way. It can
generally be represented as the fastest possible, slowest possible or average possible.
The notations such as O (Big-O), Ώ (Omega), and θ (Theta) are called as asymptotic
notations. These are the mathematical notations that are used in three different cases of
time complexity.
‘O’ is the representation for Big-O notation. Big -O is the method used to express the
upper bound of the running time of an algorithm. It is used to describe the performance or
time complexity of the algorithm. Big-O specifically describes the worst-case scenario
and can be used to describe the execution time required or the space used by the
algorithm.
Table 2.1 gives some names and examples of the common orders used to
describe functions. These orders are ranked from top to bottom.
f(n) ≤ c ∗ g(n)
where, n can be any number of inputs or outputs and f(n) as well as g(n) are
two non-negative functions. These functions are true only if there is a constant
c and a non-negative integer n0 such that,
n ≥ n0.
The Big-O can also be denoted as f(n) = O(g(n)), where f(n) and g(n) are two
non -negative functions and f(n) < g(n) if g(n) is multiple of some constant c.
The graphical representation of f(n) = O(g(n)) is shown in figure 2.1, where
the running time increases considerably when n increases.
Example: Consider f(n)=15n3+40n2+2nlog n+2n. As the value of n
increases, n3 becomes much larger than n2, nlog n, and n. Hence, it
dominates the function f(n) and we can consider the running time to grow by
the order of n3. Therefore, it can be written as f(n)=O(n3).
The values of n for f(n) and C* g(n) will not be less than n0. Therefore, the
values less than n0 are not considered relevant.
Example:
Consider function f(n) = 2(n)+2 and g(n) = n2.
g(n) = n2 = 12 = 1
Here, f(n)>g(n)
Let n = 2, then
f(n) = 2(n)+2 = 2(2)+2 = 6
g(n) = n2 = 22 = 4
Here, f(n)>g(n)
Let n = 3, then
f(n) = 2(n)+2 = 2(3)+2 = 8
g(n) = n2 = 32 = 9
Here, f(n)<g(n)
Thus, when n is greater than 2, we get f(n)<g(n). In other words, as n becomes larger,
the running time increases considerably. This concludes that the Big-O helps to
determine the ‘upper bound’ of the algorithm’s run-time.
‘Ω’ is the representation for Omega notation. Omega describes the manner in which an
algorithm performs in the best case time complexity. This notation provides the minimum
amount of time taken by an algorithm to compute a problem. Thus, it is considered that
omega gives the "lower bound" of the algorithm's run-time. Omega is defined as:
f(n) ≥ c ∗ g(n)
Where, n is any number of inputs or outputs and f(n) and g(n) are two non-negative
functions. These functions are true only if there is a constant c and a non-negative integer
n0 such that n>n0.
Example:
Consider function f(n) = 2n2+5 and g(n) = 7n.
We need to find the constant c such that f(n) ≥ c ∗ g(n).
Let n = 0, then
f(n) = 2n2+5 = 2(0)2+5 = 5
g(n) = 7(n) = 7(0) = 0
Here, f(n)>g(n)
Let n = 1, then
Thus, for n=1, we get f(n) ≥ c ∗ g(n). This concludes that Omega helps to
determine the "lower bound" of the algorithm's run-time.
'θ' is the representation for Theta notation. Theta notation is used when the
upper bound and lower bound of an algorithm are in the same order of
magnitude. Theta can be defined as:
here, c1 is 4, c2 is 5 and n0 is 3 Thus, from the above equation we get c1 g(n) f(n) c2 g(n). This
concludes that Theta notation depicts the running time between the upper bound and lower
bound.
Time Space Tradeoff
The best algorithm is the one which helps to solve a problem that requires less space in
memory as well as takes less time to generate the output.But in general, it is not always
If our problem is taking a long time but not much memory, a space-time trade-off would
let us use more memory and solve the problem more quickly. Or, if it could be solved
very quickly but requires more memory than we have, we can try to spend more time
computing time but increases the amount of memory required. It can recalculate i.e.,
compute table entries as needed, increasing computing time but reducing memory
requirements.
uncompressed, it takes more space but less time. But if the data is stored compressed,
it takes less space but more time to run the decompression algorithm. There are many
instances where it is possible to directly work with compressed data. In that case of
compressed bitmap indices, where it is faster to work with compression than without
compression.
space but less time i.e., storing an image in the cache is faster than re-rendering but
is required for jumping back to the beginning of the loop at the end of each iteration.
Loop unrolling can optimize execution speed at the cost of increased binary size. It