DS Notes
DS Notes
COURSE CONTENT
UNIT I
Introduction and Overview: Definition, Elementary data organization, Data Structures, data Structures
operations, Abstract data types, algorithms complexity, time-space trade-off. Preliminaries: Mathematical
notations and functions, Algorithmic notations, control structures, Complexity of algorithms, asymptotic
notations for complexity of algorithms. Introduction to Strings, Storing String, Character Data Types,
String Operations, word processing, Introduction to pattern matching algorithms.
UNIT II
Arrays: Definition, Linear arrays, arrays as ADT, Representation of Linear Arrays in Memory, Traversing
Linear arrays, Inserting and deleting, multi-dimensional arrays, Matrices and Sparse matrices, searching
and sorting techniques using array. Linked list: Definition, Representation of Singly Linked List in memory,
Traversing a Singly linked list, Searching in a Singly linked list, Memory allocation, Garbage collection,
Insertion into a singly linked list, Deletion from a singly linked list; Doubly linked list, Header linked list,
Circular linked list.
UNIT III
Stacks: Definition, Array representation of stacks, Linked representation of stacks, Stack as ADT, Arithmetic
Expressions: Polish Notation, Conversion of infix expression to postfix expression, Evaluation of Postfix
expression, Application of Stacks, Recursion, Towers of Hanoi, Implementation of recursive procedures by
stack. Queues: Definition, Array representation of queue, Linked list representation of queues. Types of
queue: Simple queue, Circular queue, Double-ended queue, Priority queue, Operations on Queues,
Applications of queues.
UNIT IV
Binary Trees: Definitions, Tree Search, Traversal of Binary Tree, Tree Sort, Building a Binary Search Tree,
Height Balance: AVL Trees, Contiguous Representation of Binary Trees: Heaps, Red Black Tree: Insertion
and Deletion, External Searching: B-Trees, Applications of Trees. Graphs: Mathematical Background,
Computer Representation, Graph Traversal. Hashing: Hash Table ADT, understanding Hashing,
Components of Hashing, Hash Table, Hash Function, Hashing Techniques, collisions, collision resolution
techniques.
Text Book
1 Seymour Lipschutz, “Data Structures with C”, Schaum’s Outlines, Tata Mc Graw Hill, 2011.
2 Robert Kruse, C.L.Tondo, Bruce Leung, Shashi Mogalla,“Data Structures and Program Design using C”,
Pearson Education, 2009
Reference Books
1. Mark Allen Weiss,“ Data Structures and Algorithm Analysis in C”, Second Edition, Pearson Education,2013
2. Forouzan,“A Structured Programming Approach using C”,2nd Edition, Cengage LearningIndia,2008.
*****
TABLE OF CONTENT
Module 1 – Introduction
and Overview to Data
1 3-13
Structure
Module 2 - Array
2 & Linked list 14 - 45
Module 3 - Stack
3
& Queue 46 - 74
UNIT 1
Data structure is a data organization and storage format that enables efficient access and modification.
It is a way of collecting and organising data in memory, in such a way that we can perform operations on
these data in an effective way. It should be designed and implemented in such a way that it reduces the
complexity and increases the efficiency.
Data structures can be used to organize the storage and retrieval of information stored in both main
memory and secondary memory.
Data structures provide a technique to manage large amounts of data efficiently, such as large databases
and internet indexing services. Usually, efficient data structures are the key to designing efficient
algorithms.
Implementation
Implementation provides the internal representation of a data structure. The implementation of a data
structure usually requires writing a set of procedures that create and manipulate instances of that structure.
The efficiency of a data structure can be analyzed from those operations.
A data structure is a technique of organizing the data so that the data can be utilized efficiently. There are
two ways of viewing the data structure:
o Mathematical/ Logical/ Abstract models/ Views: The data structure is the way of
organizing the data that requires some protocols or rules. These rules need to be modeled that come under
the logical/abstract model.
o Implementation: The second part is the implementation part. The rules must be implemented
using some programming language.
Implementation
Implementation provides the internal representation of a data structure. The implementation of a data
structure usually requires writing a set of procedures that create and manipulate instances of that structure.
The efficiency of a data structure can be analyzed from those operations.
The abstract datatype is special kind of datatype, whose behavior is defined by a set of values and set of
operations. The keyword “Abstract” is used as we can use these datatypes, we can perform different
operations. But how those operations are working that is totally hidden from the user. The ADT is made
of with primitive datatypes, but operation logics are hidden.
Some examples of ADT are Stack, Queue, List etc.
An abstract data type is an abstraction of a data structure that provides only the interface to which the data
structure must adhere. The interface does not give any specific details about something should be
implemented or in what programming language.
In other words, we can say that abstract data types are the entities that are definitions of data and
operations but do not have implementation details. In this case, we know the data that we are storing and
the operations that can be performed on the data, but we don't know about the implementation details. The
reason for not having implementation details is that every programming language has a different
implementation strategy for example; a C data structure is implemented using structures while a C++ data
structure is implemented using objects and classes.
For example, a List is an abstract data type that is implemented using a dynamic array and linked list. A
queue is implemented using linked list-based queue, array-based queue, and stack-based queue. A Map is
implemented using Tree map, hash map, or hash table.
Abstract data type model
Anything that can store data can be called as a data structure. Hence Integer, Float, Boolean, Char etc, are
data structures. They are known as Primitive/built-in Data Structures.
We also have some complex Data Structures, which are used to store large and related/connected data.
These data structures are normally built by the combination of primary or built-in data types and
associated operations on them. Some examples of Abstract/Derived Data Structures are :
Array
Linked List
Stack, Queue
Tree, Graph etc.
All these data structures allow us to perform different operations on data. We select these data structures
based on which type of operation is required. An overview of some popular data structures are given
below:
1. Array:- Array is a linear data structure used to store homogeneous elements (all of the same type).
Elements are accessed using an integer index to specify which element is required. Contiguous memory
locations are allocated for the elements of arrays. Size of an array must be provided before storing data.
For example, let us say, we want to store marks of all students in a class, we can use an array to store
them. This helps in reducing the use of number of variables as we don’t need to create a separate variable
for marks of every subject. All marks can be accessed by simply traversing the array.
2. Linked List:- A linked list is a linear data structure where each element is a separate object. Each
element (that is node) of a list is comprising of two items – the data and a reference to the next node (i.e.,
points to the next node in the linked list). Types of Linked List are:
Singly Linked List
Doubly Linked List :
Circular Linked List
3. Stack:- A stack or LIFO (last in, first out) is an abstract data type that serves as a collection of
elements, with two principal operations: push, which adds an element to the collection, and pop, which
removes the last element that was added. In stack both the operations of push and pop takes place at the
same end, that is, top of the stack. It can be implemented by using both array and linked list.
4. Queue:- A queue or FIFO (first in, first out) is an abstract data type that serves as a collection of
elements, with two principal operations: enqueue, the process of adding an element to the collection and
dequeue, the process of removing the first element that was added. It can be implemented by using both
array and linked list.
5. Binary Tree:- Unlike Arrays, Linked Lists, Stack and queues, which are linear data structures, trees are
hierarchical (non-linear) data structures. A binary tree is a tree data structure in which each node has at
most two children, which are referred to as the left child and the right child. It is implemented mainly
using Linked lists.
A record (also called tuple or struct) is a collective data structure. A record is a value that contains
other values, in fixed number and sequence and indexed by names. The elements of records are called
fields or members.
A union is a data structure that specifies which of a number of permitted primitive types may be
stored in its instances. A record could be defined to contain a float and an integer; whereas in a union,
there is only one value at a time. Enough space is allocated to contain the widest member datatype.
A class is a data structure that contains data fields, like a record, as well as various methods which
operate on the contents of the record.
The data structures can also be classified on the basis of the following characteristics:
Characterstic Description
In Linear data structures, the data items are arranged in a linear sequence.
Linear
Example: Array
In Non-Linear data structures, the data items are not in sequence. Example:
Non-Linear
Tree, Graph
In homogeneous data structures, all the elements are of same type. Example:
Homogeneous
Array
Non- In Non-Homogeneous data structure, the elements may or may not be of the
Homogeneous same type. Example: Structures
Static data structures are those whose sizes and structures associated
Static
memory locations are fixed, at compile time. Example: Array
Dynamic structures are those which expand or shrink depending upon the
Dynamic program need and its execution. Also, their associated memory locations
changes. Example: Linked List created using pointers
The data in the data structures are processed by certain operations. The particular data structure selected
largely depends on the operation that needs to be performed on the data structure.
Insertion, Deletion, Traversing, Sorting, Searching, Merging
Inputs: Zero or more quantities which are externally supplied to the algorithm.
Output: At least one quantity is produced as output.
Definiteness: Each step must be clear and unambigous.
Finiteness: All steps for all cases of an algorithm will terminate after a finite amount of time.
Effectiveness: The algorithm will be efficient.
The efficiency of an algorithm is measured in terms of the time and space it uses. The
complexity of an algorithm is expressed as a function of input size which gives running time and
or space.
Complexity
Suppose M is an algorithm and suppose n is the size of the input data. The efficiency of M is
measured in terms of time and space used by the algorithm. Time is measured by counting the
number of operations and space is measured by counting the maximum amount of memory
consumed by M.
The complexity of M is the function f(n) which gives running time and or space in terms of n. In
complexity theory, we find complexity f(n) for three major cases as follows,
Worst case: f(n) have the maximum value for any possible inputs.
Average case: f(n) have the expected value.
Best case: f(n) have the minimum possible value.
The time and space it uses are two major measures of the efficiency of an algorithm. Sometimes
the choice of a data structure involves a space-time tradeoff. That is by increasing the amount of
space for storing the data, we may be able to reduce the time needed for processing data or vice-
versa.
The best algorithm to solve a given problem is one that requires less space in memory and takes
less time to complete its execution. But in practice it is not always possible to achieve both these
objectives. As we know there may be more then one approach to solve a particular problem. One
approach may take more space but takes less time to complete its execution while the other
approach may take less space but takes more time to complete its execution. We may have to
sacrifice one at the cost of the other. If space is our constraint, then we have to choose a program
that requires less space at the cost of more execution time. On the other hand if time is our
constraint then we have to choose a program that takes less time to complete its execution at the
cost of more space.
The best algorithm is the one which helps to solve a problem that requires less space in memory
as well as takes less time to generate the output. But in general, it is not always possible to
achieve both of these conditions at the same time.
If our problem is taking a long time but not much memory, a space-time trade-off would let us
use more memory and solve the problem more quickly. Or, if it could be solved very quickly but
requires more memory than we have, we can try to spend more time solving the problem in the
limited memory.
In a lookup table, an implementation can include the entire table which reduces computing time
but increases the amount of memory required. It can recalculate i.e., compute table entries as
needed, increasing computing time but reducing memory requirements.
A space-time trade-off can be applied to the problem of data storage. If data stored is
uncompressed, it takes more space but less time. But if the data is stored compressed, it takes less
space but more time to run the decompression algorithm. There are many instances where it is
possible to directly work with compressed data. In that case of compressed bitmap indices, where
it is faster to work with compression than without compression.
In this case, storing only the source and rendering it as an image would take more space but less
time i.e., storing an image in the cache is faster than re-rendering but requires more space in
memory.
Smaller code occupies less space in memory but it requires high computation time that is
required for jumping back to the beginning of the loop at the end of each iteration. Loop
unrolling can optimize execution speed at the cost of increased binary size. It occupies more
space in memory but requires less computation time.
Example:
There are many algorithms that make use of time-space tradeoffs. Some of the algorithms are:
In the field of cryptography, using space-time trade-off, the attacker is decreasing the exponential
time required for a brute-force attack. Rainbow tables use partially precomputed values in the
hash space of a cryptographic hash function to crack passwords in minutes instead of weeks.
Decreasing the size of the rainbow table increases the time required to iterate over the hash
space. The meet-in-the-middle attack uses a space-time trade-off to find the cryptographic key in
only 2^{n+1} encryptions (and O(2^{n}) space) compared to the expected 2^{2n} encryptions
(but only O(1) space) of the normal attack.
Dynamic programming is another example where the time of solving problems can be
decreased by using more memory. Fibonacci problem can be solved faster with DP.
Asymptotic Notations
The efficiency of an algorithm depends on the amount of time, storage and other resources
required to execute the algorithm. The efficiency is measured with the help of asymptotic
notations.
An algorithm may not have the same performance for different types of inputs. With the increase
in the input size, the performance will change.
The study of change in performance of the algorithm with the change in the order of the input
size is defined as asymptotic analysis.
Asymptotic notations are the mathematical notations used to describe the running time of an
algorithm when the input tends towards a particular value or a limiting value.
For example: In bubble sort, when the input array is already sorted, the time taken by the
algorithm is linear i.e. the best case.
But, when the input array is in reverse condition, the algorithm takes the maximum time
(quadratic) to sort the elements i.e. the worst case.
When the input array is neither sorted nor in reverse order, then it takes average time. These
durations are denoted using asymptotic notations. There are mainly three asymptotic notations:
Big-O notation
Omega notation
Theta notation
Big-O notation represents the upper bound of the running time of an algorithm. Thus, it gives the
worst-case complexity of an algorithm.
The above expression can be described as a function f(n) belongs to the set O(g(n)) if there
exists a positive constant c such that it lies between 0 and cg(n), for sufficiently large n.
For any value of n, the running time of an algorithm does not cross the time provided
by O(g(n)).
Since it gives the worst-case running time of an algorithm, it is widely used to analyze an
algorithm as we are always interested in the worst-case scenario.
Omega notation represents the lower bound of the running time of an algorithm. Thus, it provides
the best case complexity of an algorithm.
The above expression can be described as a function f(n) belongs to the set Ω(g(n)) if there
exists a positive constant c such that it lies above cg(n), for sufficiently large n.
For any value of n, the minimum time required by the algorithm is given by Omega Ω(g(n)).
Theta notation encloses the function from above and below. Since it represents the upper and the
lower bound of the running time of an algorithm, it is used for analyzing the average-case
complexity of an algorithm.
The above expression can be described as a function f(n) belongs to the set Θ(g(n)) if there exist
positive constants c1 and c2 such that it can be sandwiched between c1g(n) and c2g(n), for
sufficiently large n.
If a function f(n) lies anywhere in between c1g(n) and c2g(n) for all n ≥ n0, then f(n) is said to
be asymptotically tight bound.
UNIT 2
Arrays
An array is a linear data structure that collects elements of the same data type and stores
them in contiguous and adjacent memory locations. Arrays work on an index system
starting from 0 to (n-1), where n is the size of the array.
Types of Arrays
There are majorly two types of arrays, they are:
One-Dimensional Arrays:
You can imagine a 1d array as a row, where elements are stored one after another.
Multi-Dimensional Arrays:
These multi-dimensional arrays are again of two types. They are:
Two-Dimensional Arrays:
You can imagine it like a table where each cell contains elements.
Three-Dimensional Arrays:
You can imagine it like a cuboid made up of smaller cuboids where each cuboid can
contain an element.
Initializing an Array
You can initialize an array in four different ways:
Method 1:
int a[6] = {2, 3, 5, 7, 11, 13};
Method 2:
int arr[]= {2, 3, 5, 7, 11};
Method 3:
int n;
scanf(“%d”,&n);
int arr[n];
for(int i=0;i<5;i++)
{
scanf(“%d”,&arr[i]);
}
Method 4:
int arr[5];
arr[0]=1;
arr[1]=2;
…..
Operations Performed on an Array
Traversal
Insertion
Deletion
Searching
Sorting
Insertion:
At the Beginning:
Code:
#include<stdio.h>
int main()
{
int array[10], n,i, item;
printf("Enter the size of array: ");
scanf("%d", &n);
printf("\nEnter Elements in array: ");
for(i=0;i<n;i++)
{
scanf("%d", &array[i]);
}
At the End:
Code:
#include<stdio.h>
#include<conio.h>
int main()
{
int array[10], i, values;
printf("Enter 5 Array Elements: ");
for(i=0; i<5; i++)
scanf("%d", &array[i]);
printf("\nEnter Element to Insert: ");
scanf("%d", &values);
array[i] = values;
printf("\nThe New Array is:\n");
for(i=0; i<6; i++)
printf("%d ", array[i]);
getch();
return 0;
}
At a Specific Position:
Code:
#include <stdio.h>
int main()
{
int array[100], pos, size, val;
printf("Enter size of the array:");
scanf("%d", &size);
printf("\nEnter %d elements\n", size);
for (int i = 0; i < size; i++)
scanf("%d", &array[i]);
printf("Enter the insertion location\n");
scanf("%d", &pos);
printf("Enter the value to insert\n");
scanf("%d", &val);
for (int i = size - 1; i >= pos - 1; i--)
array[i+1] = array[i];
array[pos-1] = val;
printf("Resultant array is\n");
for (int i = 0; i <= size; i++)
printf("%d\n", array[i]);
return 0;
}
Deletion:
Deletion of an element is the process of removing the desired element and re-organizing
it.
You can also do deletion in different ways:
At the beginning
At the end
At the Beginning:
#include<stdio.h>
int main()
{
int n,array[10];
printf("enter the size of an array");
scanf("%d" ,&n);
printf("enter elements in an array");
for(int i=0;i<n;i++)
scanf("%d", &array[i]);
n--;
for(int i=0;i<n;i++)
array[i]=array[i+1];
printf("\nafter deletion ");
for(int i=0;i<n;i++)
printf("\n%d" , array[i]);
}
At the End:
#include<stdio.h>
int main()
{
int n,array[10];
printf("enter the size of an array");
scanf("%d" ,&n);
printf("enter elements in an array");
for(int i=0;i<n;i++)
scanf("%d", &array[i]);
printf("\nafter deletion array elements are");
for(int i=0;i<n-1;i++)
printf("\n%d" , array[i]);
}
Linear Search:
Code:
#include <stdio.h>
int linear(int a[], int n, int x)
{
for (int i = 0; i < n; i++)
if (a[i] == x)
return i;
return -1;
}
int main(void)
{
int a[] = { 2, 3, 4, 10, 40 };
int x = 10;
int n = sizeof(a) / sizeof(a[0]);
// Function call
int result = linear(a, n, x);
if(result == -1)
{
Binary Search:
#include <stdio.h>
int binary(int a[], int lt, int rt, int k)
{
if (rt >= lt) {
int mid = lt + (rt - l) / 2;
// check If element is at the middle
if (a[mid] == k)
return mid;
//check if element is at the left side of mid
if (a[mid] > x)
return binary(a, lt, mid - 1, k);
// Else element is at the right side of mid
return binary(a, mid + 1, rt, k);
}
// if element not found
return -1;
}
int main(void)
{
int a[] = { 2, 3, 5, 7, 11 };
int n = sizeof(a) / sizeof(a[0]);
int k = 11;
int res = binary(arr, 0, n - 1, k);
if(res == -1)
{
printf("Element is not found")
}
else
{
printf("Element found at index %d",res);
}
return 0;
}
Sorting:
Linked Lists
Array is used to store data of similar type, in almost all the modern programming languages. But there are
many cases where we don't know the quantity of data to be stored.
A linked list is a linear data structure, in which the elements are not stored at contiguous memory
locations. But they are linked using pointers as shown in the below image:
In simple words, a linked list consists of nodes where each node contains a data field and a reference
(link) to the next node in the list. A linked list is a sequence of elements, which are connected together via
links. Linked list is the second most-used data structure after array.
Array supports Random Access, which Linked List supports Sequential Access, which
means elements can be accessed means to access any element/node in a linked
directly using their index, like arr[0] for list, we have to sequentially traverse the
1st element, arr[6] for 7th element etc. complete linked list, upto that element.
In an array, elements are stored in In a linked list, new elements can be stored
contiguous memory location or anywhere in the memory.
consecutive manner in the memory.
Address of the memory location allocated to the
new element is stored in the previous node of
linked list, hence forming a link between the
two nodes/elements.
In array, Insertion and Deletion In case of linked list, a new element is stored at
operation takes more time, as the the first free and available memory location,
memory locations are consecutive and with only a single extra step of storing the
fixed. address of memory location in the previous
node of linked list.
Insertion and Deletion operations are fast in
linked list.
Memory is allocated as soon as the Memory is allocated at runtime, as and when a
array is declared, at compile time. It's new node is added. It's also known as Dynamic
also known as Static Memory Allocation. Memory Allocation.
Both Linked List and Array are used to store linear data of similar type, but an array consumes contiguous
memory locations allocated at compile time, i.e. at the time of declaration of array, while for a linked list,
memory is assigned as and when data is added to it, which means at runtime. This is the basic and the
most important difference between a linked list and an array.
In case of array, memory is allocated in contiguous manner, hence array elements get stored in
consecutive (successive) memory locations. So when you have to access any array element, all we have to
do is use the array index, for example arr[4] will directly access the 5th memory location, returning the
data stored there.
But in case of linked list, data elements are allocated memory at runtime, hence the memory location can
be anywhere. Therefore to be able to access every node of the linked list, address of every node is stored
in the previous node, hence forming a link between every node.
We need this additional pointer because without it, the data stored at random memory locations will be
lost. We need to store somewhere all the memory locations where elements are getting stored. This
requires an additional memory space with each node.
Limitations of Arrays:
1) The size of the arrays is fixed: So we must know the upper limit on the number of elements in advance.
Also, generally, the allocated memory is equal to the upper limit irrespective of the usage, and in practical
uses, upper limit is rarely reached.
2) Inserting a new element in an array takes more time because space has to be created for the new
element by shifting the existing elements.
Deletion also takes more time. For example, to delete 1010 in id[ ], everything after 1010 has to be moved
backwards (to the previous locations).
1) Dynamic size
2) Ease of insertion/deletion
The principal benefit of a linked list over an array is that the list elements can be easily inserted or
removed without reallocation or reorganization of the entire structure because the data items need not be
stored contiguously in memory or on disk, while restructuring an array at run-time is a much more
expensive operation. Linked lists allow insertion and removal of nodes at any point in the list.
Since simple linked lists do not allow random access to the data, many basic operations—such as
obtaining the last node of the list, searching a specific node, or finding the place where a new node should
be inserted—may require iterating through most of the list elements.
Difficulties arise in linked lists when it comes to reverse traversing. For instance, singly linked lists are
cumbersome to navigate backward and while doubly linked lists are somewhat easier to read, memory is
consumed in allocating space for a back-pointer.
Linked list can be visualized as a chain of nodes, where every node points to the next node.
Example
A linked list is represented by a pointer to the first node of the linked list. The first node is called head. If
the linked list is empty, then value of head is NULL.
Basic Operations
Following are the basic operations supported by a list.
Insertion − Adds an element to the list.
Deletion − Deletes an element from the list.
Display − Displays the complete list.
Search − Searches an element using the given key.
Singly linked lists contain nodes which have a data field as well as 'next' field, which points to the next
node in the list.
In c, we can represent a node using structures. Below is an example of a linked list node with an integer
data:
struct node
{
int data;
struct node *next;
};
Insertion operation
A node can be added in three ways:
At the beginning/front of the linked list
After a given/specific node.
At the end of the linked list.
Adding a new node in linked list is a more than one step activity. First, create a node using the same
structure and find the location where it has to be inserted.
Here, the new node is always added before the first node of the given Linked List. And newly added node
becomes the new head of the Linked List. For example if the given Linked List is A->B->C->D and we
add an item E at the front, then the Linked List becomes E->A->B->C->D.
We are given an element and the new node is inserted after the given node. For example if the given
Linked List is A->B->C->D and we add an item E after the element B, then the Linked List becomes
A->B->E->C->D.
This will put the new node in the middle of the two. The new list should look like this –
Here, the new node is always added after the last node of the given Linked List. For example if the given
Linked List is A->B->C->D and we add an item E at the end, then the Linked List becomes
A->B->C->D->E.
While inserting a new node at the end, the last node of the list should point to the new node and the new
node will point to NULL. Since a Linked List is typically represented by the head of it, we have to
traverse the list till end and then change the ‘next’ member of last node to the new node.
Deletion Operation
Deletion process involves more than one step. First, locate the target node to be removed, by using
searching algorithm.
First, locate the target node to be removed, by using searching algorithm. The left (previous) node of the
target node now should point to the next node of the target node –
In a singly linked list, every node has link to its next node in the sequence. So, we can traverse from one
node to other node only in one direction and we cannot traverse back. We can solve this kind of problem
by using doubly linked list.
Doubly Linked List is a variation of Linked list in which navigation (traversing) is possible in both ways,
either forward or backward easily.
In doubly linked list, every node has link to its previous node and next node. So, we can traverse forward
by using next field and can traverse backward by using previous field. Every node in a doubly linked list
contains three fields and they are shown in the following figure.
Here, 'link1' field is used to store the address of the previous node in the sequence, 'link2' field is used to
store the address of the next node in the sequence and 'data' field is used to store the actual value of that
node.
In a 'doubly linked list', each node contains, apart from the next-node link, a second link field pointing to
the 'previous' node in the sequence. The two links may be called 'next' and 'prev' ('previous').
☀ In doubly linked list, the first node must be always pointed by head.
☀ Always the previous field of the first node must be NULL.
☀ Always the next field of the last node must be NULL
In c, we can represent a node using structures. Below is an example of a doubly linked list node with an
integer data:
struct node
{
struct node *prev;
int data;
struct node *next;
};
As per the above illustration, following are the important points to be considered.
Each element carries a data field(s) and two link fields called ‘next’ and ‘prev’.
Each link is linked with its next element using its ‘next’ link.
Each link is linked with its previous element using its ‘prev’ link.
The last element has its ‘next’ link as null to mark the end of the list.
The first element has its ‘prev’ link as NULL since it does not have a previous element.
Basic Operations
Insertion Operation
A node can be added in three ways:
At the beginning/front of the linked list
After a given/specific node.
At the end of the linked list.
Adding a new node in linked list is a more than one step activity. First, create a node using the same
structure and find the location where it has to be inserted.
i) Add a node at the beginning/front
The new node is always added before the first node of the given Linked List. And newly added node
becomes the new head of the Linked List. For example if the given Linked List is A->B->C->D and we
add an item E at the front, then the Linked List becomes E->A->B->C->D.
While inserting a new node at the end, the last node of the list should point to the new node and the new
node will point to NULL.
The new node is always added after the last node of the given Linked List. For example if the given
Linked List is A->B->C->D and we add an item E at the end, then the Linked List becomes A->B->C-
>D->E.
Since the linked List is represented by the head of it, we have to traverse the list till end and then change
the ‘next’ member of last node to the new node. And also change the ‘prev’ member of the new node to
the last node of the list.
Deletion Operation
Deletion process involves more than one step. First, locate the target node to be removed, by using
searching algorithm.
i) Deleting first node
When removing the node at the beginning of the list, there is no relinking of nodes to be performed, since
the first node has no preceding node. Update Head pointer to point to the node, next to the Head. Dispose
removed node.
2) The delete operation in DLL is more efficient if pointer to the node to be deleted is given. In DLL, we
can get the previous node using previous pointer.
2) All operations require an extra pointer previous to be maintained. For example, in insertion, we need to
modify previous pointers together with next pointers.
In a singly linked list, for accessing any node of linked list, we start traversing from the first node. If we
are at any node in the middle of the list, then it is not possible to access nodes that precede the given node.
This problem can be solved by slightly altering the structure of singly linked list.
In singly linked list, every node points to its next node in the sequence and the last node points NULL.
But in circular linked list, every node points to its next node in the sequence but the last node points to the
first node in the list.
Circular linked list is a sequence of elements in which every element has link to its next element in the
sequence and the last element has a link to the first element in the sequence. That means circular linked
list is similar to the singly linked list except that the last node points to the first node in the list.
In a circular linked list, all nodes are linked in a continuous circle, without using null. The next node after
the last node is the first node.
Circularly linked lists can be either singly or doubly linked. Both types of circular linked lists benefit from
the ability to traverse the full list beginning at any given node.
In c, we can represent a node using structures. Below is an example of a linked list node with an integer
data:
struct node
{
int data;
struct node *next;
};
Implementing a circular linked list is very easy and almost similar to linear linked list implementation,
with the only difference being that, in circular linked list the last node will have it's ‘next’ point to
the Head of the List. In Linear linked list the last Node simply holds NULL in it's ‘next’ pointer.
Insertion Operation
A node can be added in three ways:
At the beginning/front of the linked list
After a given/specific node.
At the end of the linked list.
Adding a new node in linked list is a more than one step activity. First, create a node using the same
structure and find the location where it has to be inserted.
i) Add a node at the beginning/front
To add a node at the beginning of the circular linked list, create a new node and store the data. Search
for the last node and change its ‘next’ pointer to point to the new node. Change the ‘next’ pointer of new
node to point to the first/head node. Finally set the Head to point to the new node.
Create a new node and store the data. Search for the specified element after which new node has to be
inserted.
Deletion Operation
Deletion process involves more than one step. First, locate the target node to be removed, by using
searching algorithm.
i) Deleting first node
To delete the first node of circular linked list, the last node should be searched and its ‘next pointer is
changed to point to the second node of the list. Head pointer is set to point to the second node of the list.
To delete a specific node, the target node is searched using the key. Then change the ‘next’ pointer of
its previous (left) node to point to the next (right) node of the target node.
------------------------
UNIT 3
A stack is an Abstract Data Type (ADT), commonly used in most programming languages. It is named
stack as it behaves like a real-world stack, for example – a deck of cards or a pile of plates, etc.
A real-world stack allows operations at one end only. For example, we can place or remove a card or plate
from the top of the stack only. Likewise, Stack ADT allows all data operations at one end only. At any
given time, we can only access the top element of a stack.
Stack is a simple data structure that allows adding and removing elements in a particular order. Every
time an element is added, it goes on the top of the stack and the only element that can be removed is the
element that is at the top of the stack.
This feature makes it LIFO data structure. LIFO stands for Last-in-first-out. Here, the element which is
inserted or added last, is accessed first. In stack terminology, insertion operation is called PUSH operation
and removal operation is called POP operation.
1. Stack is a LIFO (Last in First out) structure or we can say FILO (First in Last out).
2. push() function is used to insert new elements into the Stack and pop() function is used to remove an
element from the stack. Both insertion and removal are allowed at only one end of Stack called Top.
3. Stack is said to be in Overflow state when it is completely full and is said to be in Underflow state if it is
completely empty.
Stack can be easily implemented using an Array or a Linked List. Arrays are quick, but are limited in size
and Linked List requires overhead to allocate, link, unlink, and de-allocate, but is not limited in size.
To use a stack efficiently, we need to check the status of stack. For this purpose, the following
functionality is added to stacks −
isFull() − check if stack is full.
isEmpty() − check if stack is empty.
At all times, we maintain a pointer to the last PUSHed data on the stack. The top pointer provides
index/location of top element/value of the stack.
Push Operation
The process of putting a new data element onto stack is known as a Push Operation. Push operation
involves a series of steps –
If the linked list is used to implement the stack, then in step 3, we need to allocate space dynamically.
Pop Operation
Accessing the content and removing it from the stack, is known as a Pop Operation. In an array
implementation of pop() operation, the data element is not actually removed, instead top is decremented
to a lower position in the stack to point to the next value. But in linked-list implementation, pop() actually
removes data element and de-allocates memory space.
Applications of Stack
1. Expression Evaluation
2. Expression conversion
i. Infix to Postfix
ii. Infix to Prefix
The simplest application of a stack is to reverse a word/string. We push the given word/string to stack -
letter by letter - and then pop letters from the stack.
The way to write arithmetic expression is known as a notation. An arithmetic expression can be written in
three different but equivalent notations, i.e., without changing the essence or output of an expression.
These notations are −
Infix Notation
Prefix (Polish) Notation
Postfix (Reverse-Polish) Notation
These notations are named based on how they use operator in expression.
Infix Notation: We write expression in infix notation, e.g. a - b + c, where operators are used in-between
operands. It is easy for us humans to read, write, and speak in infix notation but the same does not go well
with computing devices. An algorithm to process infix notation could be difficult and costly in terms of
time and space consumption.
Prefix Notation: In this notation, operator is prefixed to operands, i.e. operator is written ahead of
operands. For example, +ab. This is equivalent to its infix notation a + b. Prefix notation is also known as
Polish Notation.
Postfix Notation: This notation style is known as Reversed Polish Notation. In this notation style, the
operator is postfixed to the operands i.e., the operator is written after the operands. For example, ab+.
This is equivalent to its infix notation a + b.
The following table briefly tries to show the difference in all three notations −
Infix Prefix Postfix
Sr.No.
Notation Notation Notation
2 (a + b) ∗ c ∗+abc ab+c∗
3 a ∗ (b + c) ∗a+bc abc+∗
(a + b) ∗ (c +
5 ∗+ab+cd ab+cd+∗
d)
((a + b) ∗ c) -
6 -∗+abcd ab+c∗d-
d
Parsing Expressions
It is not a very efficient way to design an algorithm or program to parse infix notations. Instead, these
infix notations are first converted into either postfix or prefix notations and then computed.
To parse any arithmetic expression, we need to take care of operator precedence and associativity also.
Precedence:- When an operand is in between two different operators, which operator will take the
operand first, is decided by the precedence of an operator over others. For example −
Associativity:- Associativity describes the rule where operators with the same precedence appear in an
expression. For example, in expression a + b − c, both + and – have the same precedence, then which part
of the expression will be evaluated first, is determined by associativity of those operators. Here, both +
and − are left associative, so the expression will be evaluated as (a + b) − c.
Precedence and associativity determines the order of evaluation of an expression. Following is an operator
precedence and associativity table (highest to lowest) –
Sr.No. Operator Precedence Associativity
The above table shows the default behavior of operators. At any point of time in expression evaluation,
the order can be altered by using parenthesis. For example −
In a + b * c, the expression part b * c will be evaluated first, with multiplication has precedence over
addition. We here use parenthesis for a + b to be evaluated first, like (a + b) * c.
Algorithm to convert Infix To Postfix
Let, X is an arithmetic expression written in infix notation. This algorithm finds the equivalent postfix
expression Y.
1. While reading the expression from left to right, push the element in to the stack if it is an operand.
2. Pop the two operands from the stack, if the element is an operator and then evaluate it.
3. Push back the result of the evaluation. Repeat it till the end of the expression.
Algorithm
Expression: 4 5 6 * +
Result: 34
Advantage of Postfix Expression over Infix Expression:- An infix expression is difficult for the
machine to know and keep track of precedence of operators. On the other hand, a postfix expression itself
determines the precedence of operators (as the placement of operators in a postfix expression depends
upon its precedence).Therefore, for the machine it is easier to carry out a postfix expression than an infix
expression.
Step 1: Push “)” onto STACK, and add “(“ to end of the A
Step 2: Scan A from right to left and repeat step 3 to 6 for each element of A until the STACK is empty
Step 3: If an operand is encountered add it to B
Step 4: If a right parenthesis is encountered push it onto STACK
Step 5: If an operator is encountered then:
i. Repeatedly pop from STACK and add to B each operator (on the top of STACK) which has same
or higher precedence than the operator.
ii. Add operator to STACK
Step 6: If left parenthesis is encountered then
i. Repeatedly pop from the STACK and add to B (each operator on top of stack until a right parenthesis
“)” is encountered)
ii. Remove the right parenthesis
Step 7. Reverse the string B
Step 8. Exit
Recursion
Recursion is a programming technique that allows the programmer to express operations in terms of
themselves. In other words recursion is thus the process of defining something in terms of itself.
Some computer programming languages allow a module or function to call itself. This technique is known
as recursion. In recursion, a function α either calls itself directly or calls a function β that in turn calls the
original function α. The function α is called recursive function.
A recursive function can go infinite like a loop. To avoid infinite running of recursive function, there are
two properties that a recursive function must have −
Base criteria − There must be at least one base criteria or condition, such that, when this condition is met
the function stops calling itself recursively.
Progressive approach − The recursive calls should progress in such a way that each time a recursive call
is made it comes closer to the base criteria.
Example:
void recurse()
{
recurse(); /* Function calls itself */
}
void main()
{
recurse(); /* Sets off the recursion */
getch();
}
Implementation
Many programming languages implement recursion by means of stacks. Generally, whenever a function
(caller) calls another function (callee) or itself as callee, the caller function transfers execution control to
the callee. This transfer process may also involve some data to be passed from the caller to the callee.
This implies, the caller function has to suspend its execution temporarily and resume later when the
execution control returns from the callee function. Here, the caller function needs to start exactly from the
point of execution where it puts itself on hold. It also needs the exact same data values it was working on.
For this purpose, an activation record (or stack frame) is created for the caller function.
This activation record keeps the information about local variables, formal parameters, return address and
all information passed to the caller function.
{
if ( n == 0 ) /* Base criteria*/
return ( 1 ) ;
else
return(n * fact ( n - 1 ) ;
}
Analysis of Recursion
One may argue why to use recursion, as the same task can be done with iteration. The first reason is,
recursion makes a program more readable and because of latest enhanced CPU systems, recursion is more
efficient than iterations.
Time Complexity:- In case of iterations, we take number of iterations to count the time complexity.
Likewise, in case of recursion, assuming everything is constant, we try to figure out the number of times a
recursive call is being made. A call made to a function is Ο(1), hence the (n) number of times a recursive
call is made makes the recursive function Ο(n).
Space Complexity:- Space complexity is counted as what amount of extra space is required for a module
to execute. In case of iterations, the compiler hardly requires any extra space. The compiler keeps
updating the values of variables used in the iterations. But in case of recursion, the system needs to store
activation record each time a recursive call is made. Hence, it is considered that space complexity of
recursive function may go higher than that of a function with iteration.
Queue
A real-world example of queue can be a single-lane one-way road, where the vehicle enters first, exits
first. More examples can be seen as queues at the ticket windows and bus-stops.
Queue is an abstract data type or a linear data structure, in which the element is inserted from one end
called the REAR (also called tail), and the removal of existing element takes place from the other end
called as FRONT(also called head).
This makes queue a FIFO (First in First Out) data structure, which means that element inserted first will
be removed first. Which is exactly how queue system works in real world. If you go to a ticket counter to
buy movie tickets, and are first in the queue, then you will be the first one to get the tickets.
Queue can be implemented using an Array or Linked List. The easiest way of implementing a queue is by
using an Array.
Initially the FRONT (head) and the REAR (tail) pointers of the queue are initialized to -1. As we add
elements to the queue, the REAR pointer keeps on moving ahead, always pointing to the location where
the last element is situated, while FRONT remains at the first index.
When we remove an element from FRONT position and then move FRONT pointer to the next position,
the size of Queue is reduced by one space each time.
Queue operations may involve initializing or defining the queue, utilizing it, and then completely erasing
it from the memory. Here we shall try to understand the basic operations associated with queues −
Few more functions are required to make the above-mentioned queue operation efficient. These are −
In queue, we always dequeue (or access) data, pointed by front pointer and we enque (or store) data with
the help of rear pointer.
As we are using single dimension array to implement queue, we can just check for the rear pointer to
determine whether the queue is full.
bool isfull()
{
if(rear == MAXSIZE - 1)
return true;
else
return false;
}
else
return false
Step 2: Exit
If the value of front is less than 0, it tells that the queue is not yet initialized, hence empty.
bool isempty()
{
if(front ==-1)
return true;
else
return false;
}
Insertion/Enqueue Operation
The following steps should be taken to enqueue (insert) data into a queue −
Step 4: Exit
Deletion/Dequeue Operation
It is a process of two tasks − access the data where front is pointing and remove the data after accessing.
The following steps are taken to perform dequeue operation −
dequeue() dequeue()
{ {
if(isempty()) if(front==-1)
printf( “underflow”); printf(“Queue is empty”);
else else
{ data = queue[front]; OR { data=queue[front];
if(front==rear) if(front==rear)
{ front=-1; rear=-1; } { front=-1; rear=-1; }
else else
} }
Applications of Queue
Any situation where resources are shared among multiple users and served on first come first served
basis. Examples include CPU scheduling, Disk Scheduling, sharing a printer. Another application of
queue is when data is transferred between two processes. Examples include IO Buffers, pipes, file
InputOutput, etc.
Handling of interrupts in real-time systems. The interrupts are handled in the same order as they arrive i.e
First come first served.
In real life scenario, Call Center phone systems uses Queues to hold people calling them in an order, until
a service representative is free.
Just like Stack, in a Queue too, we know exactly, on which position new element will be added and from
where an element will be removed, hence both these operations requires a single step.
Enqueue: O(1)
Dequeue: O(1)
Circular Queue
Before we start to learn about Circular queue, we should first understand why we need a circular queue.
In a linear queue data structure, we can insert elements until queue becomes full. But once queue becomes
full, we cannot insert the next element until all the elements are deleted from the queue. For example,
consider the queue below:
When we dequeue any element to remove it from the queue, we are actually moving the front of the
queue forward, thereby reducing the overall size of the queue. And we cannot insert new elements,
because the rear pointer is still at the end of the queue.
This situation also says that queue is full and we cannot insert the new element because, ' rear' is still at
last position. In above situation, even though we have empty positions in the queue we cannot make use
of them to insert new element. This is the major problem in normal queue data structure. To overcome this
problem we use circular queue data structure.
Circular Queue is a linear data structure, which follows the principle of FIFO (First In First Out), but
instead of ending the queue at the last position, it again starts from the first position after the last, hence
making the queue behave like a circular data structure.
A circular queue is an abstract data type in which the operations are performed based on FIFO principle
and the last position is connected back to the first position to make a circle.
The advantage of this data structure is that it reduces wastage of space in case of array implementation,
Two pointers called FRONT and REAR are used to keep track of the first and last elements in the queue.
When initializing the queue, we set the value of FRONT and REAR to -1.
On enqueing an element, we circularly increase the value of REAR index and place the new element in
the position pointed to by REAR.
On dequeueing an element, we return the value pointed to by FRONT and circularly increase the FRONT
index.
Before enqueing, we check if queue is already full.
Before dequeuing, we check if queue is already empty.
When enqueing the first element, we set the value of FRONT to 0.
When dequeing the last element, we reset the values of FRONT and REAR to -1.
However, the check for full queue has a new additional case:
The second case happens when REAR starts from 0 due to circular increment and when its value is just 1
less than FRONT, the queue is full.
enQueue() is a function which is used to insert an element into the circular queue. In a circular queue, the
new element is always inserted at rear position. We can use the following steps to insert an element into
the circular queue...
enQue()
{ int data;
if ((( front==0) && (rear==MAXSIZE-1)) ||(front==rear+1))
{ printf(“Queue is full”); return;
}
if ((front==-1) && (rear==-1))
{ front=0; rear=0;
}
else if ((rear==MAXSIZE-1) && (front!=0))
rear=0;
else rear=rear+1;
printf(“new element:”);
scanf(“%d”,&data);
cqueue[rear]=data;
return;
}
In a circular queue, deQueue() is a function used to delete an element from the circular queue. In a
circular queue, the element is always deleted from front position. We can use the following steps to delete
an element from the circular queue...
deQueue()
{ int data;
if (front==-1)
{ printf(“Queue is empty”);
return;
}
data=cqueue[front];
if (front==rear)
{ front=-1; rear=-1;
}
else if (front==MAXSIZE-1)
Input Restricted Double Ended Queue:- In input restricted double ended queue, the insertion operation
is performed at only one end i.e., rear end and deletion operation is performed at both the front and rear
ends.
Output Restricted Double Ended Queue:- In output restricted double ended queue, the deletion
operation is performed at only one end i.e., front end and insertion operation is performed at both the ends
front and rear.
Step 4: Exit
insertdeq_rear()
{ if (rear==MAXSIZE-1)
printf(“Queue is full”);
else if(front==-1 && rear==-1)
{ front=0; rear=0;
}
else rear=rear+1;
queue[rear]=data;
return;
}
Step 4: Exit
insertdeq_front()
{ if (front==0)
printf(“Queue is full”);
else if(front==-1 && rear==-1)
{ front=0; rear=0;
}
else front=front-1;
queue[front]=data;
return;
}
deletedeq_front()
{
if(front==-1)
{ printf(“Queue is empty”);
return;
}
data=queue[front];
if(front==rear)
{ front=-1; rear=-1; }
else
front = front + 1;
return;
}
deletedeq_rear()
{
if(rear==-1)
{ printf(“Queue is empty”);
return;
}
data=queue[rear];
if(front==rear)
{ front=-1; rear=-1; }
else
rear=rear-1;
return;
}
---------------------
UNIT 4
Trees
A tree consists of a collection of nodes which are connected to other nodes with the help of branch. A
node which points to other nodes is said to be the parent of the nodes to which it is pointing to.
These pointed nodes are called children of that node. In the above diagram, node A is the Root of the tree
and A is the parent of nodes B and C. In other words, B and C are children of node A.
The node at the top of the tree is called root of the tree. There is only one root in a tree. Root is the only
node in the tree that does not have a parent/ancestor.
The nodes that have no children are called leaves/leaf node/terminal node and the rest of the nodes in the
tree (that have children) are called interior/non-terminal nodes. In the above diagram, nodes B, C, E are
internal nodes and nodes D,F,G,H are leaf nodes.
The descendents of a node are all the nodes along the path from node to terminal node. The ancestors of a
node are all the nodes along the path from node to the root. For example, the ancestors of node D are B
and A. The descendents of node B are D, E,H and F.
The children nodes of the same parent node are called siblings. B and C are siblings.
The length of the longest path from root to leaf node is called depth or height of the tree. The height of the
give tree is 4.
A subtree is a subset of a tree that itself is a tree. For example following are subtrees of the given tree.
Degree
In the tree data structure, the total number of children of a node is called the degree of the node.
The highest degree of the node among all the nodes in a tree is called the Degree of Tree.
Level
In tree data structures, the root node is said to be at level 0, and the root node's children are at
level 1, and the children of that node at level 1 will be level 2, and so on.
Height
In a tree data structure, the number of edges from the leaf node to the particular node in the longest path is
known as the height of that node.
In the tree, the height of the root node is called "Height of Tree".
Depth
In a tree, many edges from the root node to the particular node are called the depth of the tree.
In the tree, the total number of edges from the root node to the leaf node in the longest path is
known as "Depth of Tree".
Subtree
In the tree in data structures, each child from a node shapes a sub-tree recursively and every child
in the tree will form a sub-tree on its parent node.
Binary Tree
A binary tree has the following properties:
Properties:
Follows all properties of the tree data structure.
These two children are called the left child and the right child.
The binary search tree has a unique property known as the binary search property. This states that
the value of a left child node of the tree should be less than or equal to the parent node value of
the tree. And the value of the right child node should be greater than or equal to the parent value.
AVL Tree
An AVL tree is a type of tree that is a self-balancing binary search tree.
Properties
Follows all properties of the tree data structure.
Self-balancing.
Each node stores a value called a balanced factor, which is the difference in the height of the left
sub-tree and right sub-tree.
All the nodes in the AVL tree must have a balance factor of -1, 0, and 1.
Binary tree
A binary tree is a special type of tree representation of data items. If every node in a tree can have
at the most two children, the tree is called a binary tree. A node in a binary tree can have 0,1, or 2
children.
The child appearing on the left side of a node is called its left child and the child appearing on the
right side of a node is called its right child. For example, E is the left child and F is the right child
of node C.
The subtree to the left of a node is called the left subtree and the subtree to the right of a node is
the right subtree.
It is a binary tree whose non-leaf nodes have non-empty left and right subtree. A complete
binary tree is a binary tree in which every level, except possibly the last, is completely filled,
and all nodes are as far left as possible.
Eg.
Eg
In above figure, a normal binary tree is converted into full binary tree by adding dummy nodes.
Eg
The balance factor of a node is the difference between the heights of left and right subtrees. A
node will have a balance factor of 1 if the height of its left subtree is greater than the height of its
right subtree. A node will have a balance factor of -1 if the height of its left subtree is less than
the height of its right subtree. A node will have a balance factor of 0 if the heights of left and
right subtrees of a node are equal.
In the above diagram, second tree is not a height balanced tree because the balance factor of node
C is 2. And third tree is also not height balanced because the balance factor of node A is -2.
First tree is a height balanced tree because every node of the tree satisfies the condition specified
for a height balanced tree. (i.e., balance factor has to be either -1, 0 or 1).
An external pointer variable ‘root’ will be used to point to the root node of the tree. If any subtree
is empty, then the corresponding pointer will contain a NULL value. If the tree itself is empty, the
‘root’ will contain a NULL value.
A node can be defined in ‘c’ language as:
struct treenode
{ struct treenode *left;
Int data;
struct treenode *right;
};
The drawback of this representation is that we cannot access a node directly. For example, to
access node G, we have to start at root node, then reach node B, then node E and finally G.
The tree can be represented using linked list as:
The above figure shows the representation of the tree data structure in the memory. In the
above structure, the node contains three fields. The first field stores the address of the left
child, second field stores the data and the third field stores the address of the right child.
In programming, the structure of a node can be defined as:
struct node
{
struct node *left;
int data;
struct node *right;
}
The above structure can only be defined for the binary trees because the binary tree can
have utmost two children, and generic trees can have more than two children.
The above is the node structure with three fields
Eg.
Generally, we traverse a tree to search or locate a given item or key in the tree or to print all the values it
contains. For example, consider the Binary Search tree given below:
1. Inorder Traversal
In this method, first we process the left subtree of the root, in Inorder. Then we process the root and at the
end we process the right subtree of the root. We should always remember that every node may represent
a subtree itself.
If a binary tree is traversed in-order, the output will produce sorted key values in an ascending order.
i) Traverse the left subtree in Inorder
ii) Process/visit the root node
iii) Traverse the right subtree in Inorder
We start with F, and following in-order traversal, we move to its left subtree with B as its root. B is also
traversed in-order. Then we process the root F. Finally we move to its right subtree with G as its root. G
is traversed in Inorder. The process goes on until all the nodes are visited. The output of inorder traversal
of this tree will be −
A→ B → C → D → E → F → G → H → I
2. Preorder Traversal
In this technique we process the root of the tree first. Then we traverse the left subtree of the root in
preorder. Then finally we traverse the right subtree of the root in Preorder. That is,
1. Visit/process the root node
2. Traverse the left subtree in Preorder
3. Traverse the right subtree in Preorder
We start from F, and following pre-order traversal, we first visit F itself and then move to its left
subtree with B as its root. B is also traversed pre-order. Finally it moves to right subtree with G as its
root. G is traversed in pre-order. The process goes on until all the nodes are visited. The output of pre-
order traversal of this tree will be − F → B → A → D → C → E → G → I → H
3. Postorder Traversal
In this method we first process the left subtree of the tree, in Postorder. Then we process the right subtree
of the given tree in Postorder. Finally we process the root node.
1. Traverse the left subtree in Postorder
2. Traverse the right subtree in Postorder
3. Process the root node
We start from F, and following Post-order traversal, we first visit the left subtree with B as its root. B is
also traversed post-order. Then we move to right subtree with G as its root. G is traversed in postorder.
Finally we process the root of the given tree F. The output of post-order traversal of this tree will be : A
→C→E→D→B→H→I→G→F
1. If the data of the root node is greater, and if a left subtree exists, then repeat step 1 with root = root of left
subtree. Else, insert element as left child of current root.
2. If the data of the root node is smaller, and if a right subtree exists, then repeat step 2 with root = root of
right subtree. Else, insert element as right child of current root.
To insert a node into a Binary Search Tree, initially the item to be inserted is compared with data of the
root node. If the item is found to be smaller than the data of root node then the new node is inserted in the
left subtree of the root node. Otherwise the new node is inserted in the right subtree of the root node.
Now the root node of the left or right subtree is taken and compared with the new item and the procedure
is repeated. This is done till the left or right subtree where the new node to be inserted is found to be
empty. Finally the new node is made the appropriate child of this current node.
IV Deletion operations
The deletion operation has to consider three cases:
i) Deleting a leaf node (node with no children): A leaf node can be deleted by setting the appropriate link
of its parent to NULL and then disposing the deleted node.
ii) Deleting a node with only one child: To delete a node with one child, we can make the pointer from
parent to point to the child of the deleted node.
iii) Deleting a node with two children: To delete a node with two children, we search the tree for the
immediate successor of the node to be deleted. The node to be deleted will be replaced by the node of
closest value from right subtree.
To find the immediate successor, we first move to the right child of the node to be deleted and then to its
leftmost child. If the right child does not have a left child, then the right child itself is used as the
replacement node. Then we replace the value to be deleted with the value from replacement node. Then
delete the replacement node by changing its parent’s pointer to NULL.
B-Tree
B Tree is a specialized m-way tree that can be widely used for disk access. A B-Tree of
order m can have at most m-1 keys and m children. One of the main reasons of using B
tree is its capability to store large number of keys in a single node and keeping the
height of the tree relatively small.
A B tree of order m contains all the properties of an M way tree. In addition, it contains
the following properties.
1. Every node in a B-Tree contains at most m children.
2. Every node in a B-Tree except the root node and the leaf node contain at least m/2
children.
3. The root nodes must have at least 2 nodes.
4. All leaf nodes must be at the same level.
5. All keys of a node are sorted in increasing order. The child between two keys k 1
and k2 contains all keys in the range from k1 and k2.
B-Tree grows and shrinks from the root which is unlike Binary Search Tree. Binary Search
Trees grow downward and also shrink from downward.
It is not necessary that, all the nodes contain the same number of children but, each
node must have at least m/2 number of nodes.
A B tree of order 4 is shown in the following image.
Operations
While performing some operations on B Tree, any property of B Tree may be violated
such as number of minimum children a node can have. To maintain the properties of B
Tree, the tree may split or join.
Searching
Searching in B Trees is similar to that in Binary search tree. For example, if we search for
an item 49 in the following B Tree. The process will be something like following :
1. Compare item 49 with root node 78. since 49 < 78 hence, move to its left sub-
tree.
2. Since, 40<49<56, traverse right sub-tree of 40.
3. 49>45, move to right. Compare 49.
4. match found, return.
Insertion
Insertions are done at the leaf node level. The following algorithm needs to be followed in
order to insert an item into B Tree.
1. Traverse the B Tree in order to find the appropriate leaf node at which the node
can be inserted.
2. If the leaf node contain less than m-1 keys then insert the element in the
increasing order.
3. Else, if the leaf node contains m-1 keys, then follow the following steps.
o Insert the new element in the increasing order of elements.
o Split the node into the two nodes at the median.
o Push the median element upto its parent node.
o If the parent node also contain m-1 number of keys, then split it too by
following the same steps.
Example: Insert the node 8 into the B Tree of order 5 shown in the following image.
The node, now contain 5 keys which is greater than (5 -1 = 4 ) keys. Therefore split the
node from the median i.e. 8 and push it up to its parent node shown as follows.
Deletion
Deletion is also performed at the leaf nodes. The node which is to be deleted can either
be a leaf node or an internal node. Following algorithm needs to be followed in order to
delete a node from a B tree.
1. Locate the leaf node.
2. If there are more than m/2 keys in the leaf node then delete the desired key from
the node.
3. If the leaf node doesn't contain m/2 keys then complete the keys by taking the
element from eight or left sibling.
o If the left sibling contains more than m/2 elements then push its largest
element up to its parent and move the intervening element down to the
node where the key is deleted.
o If the right sibling contains more than m/2 elements then push its smallest
element up to the parent and move intervening element down to the node
where the key is deleted.
4. If neither of the sibling contain more than m/2 elements then create a new leaf
node by joining two leaf nodes and the intervening element of the parent node.
5. If parent is left with less than m/2 nodes then, apply the above process on the
parent too.
If the the node which is to be deleted is an internal node, then replace the node with its
in-order successor or predecessor. Since, successor or predecessor will always be on the
leaf node hence, the process will be similar as the node is being deleted from the leaf
node.
Example 1 Delete the node 53 from the B Tree of order 5 shown in the following figure.
Now, 57 is the only element which is left in the node, the minimum number of elements
that must be present in a B tree of order 5, is 2. it is less than that, the elements in its
left and right sub-tree are also not sufficient therefore, merge it with the left sibling and
intervening element of parent i.e. 49.
The final B tree is shown as follows.
Applications of Trees
1. Expression tree - One of the applications of binary tree is representing an expression containing
operands and binary operators by a strictly binary tree. The root of the tree contains the operator and the
leaf contains operands.
2. Searching operations – Binary Search Trees, height balanced trees, multi-way search trees are used for
faster search.
3. Game trees – Trees can be used in playing games like tic-tac-toe.
4. Decision trees – Trees can be used to represent a set of decisions to help in decision making process.
Each node represent a condition/situation and its children represents alternative decisions/solutions.
5. Sorting – Another application of binary tree is in sorting – Heap sorting.
The AVL tree was introduced in the year of 1962 by G.M. Adelson-Velsky and E.M. Landis.
AVL tree is a self balanced binary search tree. That means, an AVL tree is also a binary search tree but it is
a balanced tree. A binary tree is said to be balanced, if the balance factor of every node in the tree is either
-1, 0 or +1.
In other words, a binary tree is said to be balanced if for every node, height of its children differ by at
most one. The balance factor of a node is calculated as height of left subtree - height of right subtree. In
an AVL tree, every node maintains an extra information known as balance factor.
The above tree is a binary search tree and every node is satisfying balance factor condition. So this tree is
said to be an AVL tree.
Every AVL Tree is a binary search tree but all the Binary Search Trees need not to be AVL trees.
In AVL tree, after performing every operation like insertion and deletion we need to check the balance
factor of every node in the tree. If every node satisfies the balance factor condition then we conclude the
operation otherwise we must make it balanced. We use rotation operations to make the tree balanced
whenever the tree becomes imbalanced due to any operation.
Rotation operations are used to make a tree balanced. Rotation is the process of moving the nodes to
either left or right to make tree balanced. There are four rotations and they are classified into two types.
In LL Rotation every node moves one position to left from the current position. To understand LL
Rotation, let us consider following insertion operations into an AVL Tree...
In RR Rotation every node moves one position to right from the current position. To understand RR
Rotation, let us consider following insertion operations into an AVL Tree...
The LR Rotation is combination of single left rotation followed by single right rotation. In LR Roration,
first every node moves one position to left then one position to right from the current position. To
understand LR Rotation, let us consider following insertion operations into an AVL Tree...
The RL Rotation is combination of single right rotation followed by single left rotation. In RL Roration,
first every node moves one position to right then one position to left from the current position. To
understand RL Rotation, let us consider following insertion operations into an AVL Tree...
-------------------
Expression Tree
A binary expression tree is a specific kind of a binary tree used to represent expressions.
An expression tree is a representation of expressions arranged in a tree-like data structure. In other
words, it is a tree with leaves as operands of the expression and nodes contain the operators. Expression
trees are mainly used for analyzing, evaluating and modifying expressions, especially
complex expressions.
The leaves of a binary expression tree are operands, such as constants or variable names, and the other
nodes contain operators. These trees happen to be binary, because all of the operations are binary. It is also
possible for a node to have only one child, as is the case with the unary minus operator. An expression
tree, T, can be evaluated by applying the operator at the root to the values obtained by recursively
evaluating the left and right subtrees.
An infix expression is produced by the inorder traversal, a postfix expression is produced by the post-
order traversal, and a prefix expression is produced by the pre-order traversal of the expression tree.
Eg.
A heap is a complete binary tree. A complete binary tree is a binary tree in which all the
levels except the last level, i.e., leaf node should be completely filled, and all the nodes
should be left-justified. Let's understand through an example.
In the above figure, we can observe that all the internal nodes are completely filled except
the leaf node; therefore, we can say that the above tree is a complete binary tree.
The above figure shows that all the internal nodes are completely filled except the leaf node,
but the leaf nodes are added at the right part; therefore, the above tree is not a complete
binary tree.
There are two types of the heap:
o Min Heap
o Max heap
For Input → 35 33 42 10 14 19 27 44 26 31
Min Heap: The value of the parent node should be less than or equal to either of its
children. Mathematically, it can be defined as:
A[Parent(i)] <= A[i]
Min-Heap − Where the value of the root node is less than or equal to either of its children.
Max Heap: The value of the parent node is greater than or equal to its children.
Mathematically, it can be defined as:
A[Parent(i)] >= A[i]
Max-Heap − Where the value of the root node is greater than or equal to either of its
children.
Note − In Min Heap construction algorithm, we expect the value of the parent node to be
less than that of the child node.
To remove/delete a root node in a Max Heap, you:
Delete the root node.
Move the last child node of the last level to root.
Compare the parent node with its children.
If the value of the parent is less than child nodes, swap them, and repeat until the heap
property is satisfied.
Example: Method 1
Create the max-heap from input elements: 7 2 9 4 5 3
We start by adding the first node, 7.
We move top-to-bottom, left-to-right and we add the 2nd node, 2.
Since 7 is larger than 2, the nodes remain in their current position. Next, 9 is added to the
heap.
We observe that 4 moved up, so 4 is also compared to 9. Considering that 4 is smaller than
9, they’re kept in place. The next value to be added is 5.
Since 5 moved up, 5 is compared to 9. Seeing that 5 is smaller than 9, we’ll keep the two
values in place. Finally, 3 is added.
Considering that 3 is smaller than 7, the two values remain in their positions. This completes
the construction of the Max-Heap from an array.
Method 2
For, the given array Arr[] = { 2, 5, 4, 8, 9, 10, 11}
Applications of Trees
Binary trees have many applications in computer science, including data storage and
retrieval, expression evaluation, network routing, and game AI. They can also be used to
implement various algorithms such as searching, sorting, and graph algorithms.
Expression tree - One of the applications of binary tree is representing an expression
containing operands and binary operators by a strictly binary tree. The root of the tree
contains the operator and the leaf contains operands.
Searching operations – Binary Search Trees, height balanced trees, multi-way search
trees are used for faster search.
Game trees – Trees can be used in playing games like tic-tac-toe.
Decision trees – Trees can be used to represent a set of decisions to help in decision
making process. Each node represent a condition/situation and its children represents
alternative decisions/solutions.
Sorting – Another application of binary tree is in sorting – Heap sorting.
Most popular databases use B-Trees and T-Trees, which are variants of the tree structure
we learned above to store their data
Compilers use a syntax tree to validate the syntax of every program you write.
Graphs
Graphs in data structures are non-linear data structures made up of a finite number of
nodes or vertices and the edges that connect them. Graphs in data structures are used to
address real-world problems in which it represents the problem area as a network like
telephone networks, circuit networks, and social networks. For example, it can represent a
single user as nodes or vertices in a telephone network, while the link between them via
telephone represents edges.
A graph is a non-linear kind of data structure made up of nodes or vertices and edges. The
edges connect any two nodes in the graph, and the nodes are also known as vertices.
This graph has a set of vertices V= { 1,2,3,4,5} and a set of edges E= { (1,2),(1,3),(2,3),
(2,4),(2,5),(3,5),(4,50 }.
Now that you’ve learned about the definition of graphs in data structures, you will learn
about their various types.
There are different types of graphs in data structures, each of which is detailed below.
1. Finite Graph
The graph G=(V, E) is called a finite graph if the number of vertices and edges in the
graph is limited in number
2. Infinite Graph
The graph G=(V, E) is called a finite graph if the number of vertices and edges in the
graph is interminable.
3. Trivial Graph
4. Simple Graph
If each pair of nodes or vertices in a graph G=(V, E) has only one edge, it is a simple
graph. As a result, there is just one edge linking two vertices, depicting one-to-one
interactions between two elements.
5. Multi Graph
If there are numerous edges between a pair of vertices in a graph G= (V, E), the graph is
referred to as a multigraph. There are no self-loops in a Multigraph.
6. Null Graph
It's a reworked version of a trivial graph. If several vertices but no edges connect them, a
graph G= (V, E) is a null graph.
7. Complete Graph
If a graph G= (V, E) is also a simple graph, it is complete. Using the edges, with n number
of vertices must be connected. It's also known as a full graph because each vertex's degree
must be n-1.
8. Pseudo Graph
9. Regular Graph
If a graph G= (V, E) is a simple graph with the same degree at each vertex, it is a regular
graph. As a result, every whole graph is a regular graph.
A graph G= (V, E) is called a labeled or weighted graph because each edge has a value or
weight representing the cost of traversing that edge.
A directed graph also referred to as a digraph, is a set of nodes connected by edges, each
with a direction.
An undirected graph comprises a set of nodes and links connecting them. The order of the
two connected vertices is irrelevant and has no direction. You can form an undirected
graph with a finite number of vertices and edges.
If there is a path between one vertex of a graph data structure and any other vertex, the
graph is connected.
When there is no edge linking the vertices, you refer to the null graph as a disconnected
graph.
Graphs in data structures are used to represent the relationships between objects. Every
graph consists of a set of points known as vertices or nodes connected by lines known as
edges. The vertices in a network represent entities.
The most frequent graph representations are the two that follow:
Adjacency matrix
Adjacency list
You’ll look at these two representations of graphs in data structures in more detail:
Adjacency Matrix
It's used to show which nodes are next to one another. I.e., is there any
connection between nodes in a graph?
If there is a weighted graph, you can record the edge's weight instead of 1s and
0s.
Weight or cost is indicated at the graph's edge, a weighted graph representing these values
in the matrix.
Adjacency List
You keep a list of neighbors for each vertex in the graph in this representation.
It means that each vertex in the graph has a list of its neighboring vertices.
You have an arra of vertices indexed by the vertex number, and the
corresponding array member for each vertex x points to a singly linked list of
x's neighbors.
You will now see which all operations are conducted in graphs data structure after
understanding the representation of graphs in the data structure.
The process of visiting or updating each vertex in a graph is known as graph traversal.
The sequence in which they visit the vertices is used to classify such traversals. Graph
traversal is a subset of tree traversal.
Breadth-first search
Depth-first search
BFS is a search technique for finding a node in a graph data structure that meets a set of
criteria.
It begins at the root of the graph and investigates all nodes at the current depth
level before moving on to nodes at the next depth level.
To maintain track of the child nodes that have been encountered but not yet
inspected, more memory, generally you require a queue.
Step 2: Select any vertex in your graph, say v1, from which you want to traverse the
graph.
Step 3: Examine any two data structures for traversing the graph.
Step 4: Starting from the vertex, you will add to the visited array, and afterward, you will
v1's adjacent vertices to the queue data structure.
Step 5: Now, using the FIFO concept, you must remove the element from the queue, put it
into the visited array, and then return to the queue to add the adjacent vertices of the
removed element.
Step 6: Repeat step 5 until the queue is not empty and no vertex is left to be visited.
DFS is a search technique for finding a node in a graph data structure that meets a set of
criteria.
To maintain track of the child nodes that have been encountered but not yet
inspected, more memory, generally a stack, is required.
Step 2: Select any vertex in our graph, say v1, from which you want to begin traversing
the graph.
Step 3: Examine any two data structures for traversing the graph.
Step 4: Insert v1 into the array's first block and push all the adjacent nodes or vertices of
vertex v1 into the stack.
Step 5: Now, using the FIFO principle, pop the topmost element and put it into the visited
array, pushing all of the popped element's nearby nodes into it.
Step 6: If the topmost element of the stack is already present in the array, discard it
instead of inserting it into the visited array.
Step 7: Repeat step 6 until the stack data structure isn't empty.
You will now look at applications of graph data structures after understanding the graph
traversal algorithm in this tutorial.
Users on Facebook are referred to as vertices, and if they are friends, there is
an edge connecting them. The Friend Suggestion system on Facebook is based
on graph theory.
You come across the Resource Allocation Graph in the Operating System,
where each process and resource are regarded vertically. Edges are drawn
from resources to assigned functions or from the requesting process to the
desired resource. A stalemate will develop if this results in the establishment
of a cycle.
Web pages are referred to as vertices on the World Wide Web. Suppose there is
a link from page A to page B that can represent an edge. This application is an
illustration of a directed graph.
Hashing
Hashing is a technique to convert a range of key values into a range of indexes of an
array. Hashing is designed to use a special function called the Hash function which is
used to map a given value with a particular key for faster access of elements. The
efficiency of mapping depends on the efficiency of the hash function used.
Hashing is the process of mapping large amount of data item to smaller table with the
help of hashing function. This method generally used the hash functions to map the
keys into a table, which is called a hash table.
Hashing is used with a database to enable items to be retrieved more quickly. Hashing
allows to update and retrieve any data entry in a constant time O(1). Constant time O(1)
means the operation does not depend on the size of the data.
Hash table Hash table is a type of data structure which is used for storing and accessing
data very quickly. Insertion of data in a table is based on a key value. Hence every entry
in the hash table is defined with some key. By using this key data can be searched in the
hash table by few key comparisons and then searching time is dependent upon the size
of the hash table.
It uses a hash function to compute an index into an array of buckets or slots from
which the desired value can be found.
The above figure shows the hash table with the size of n = 10. Each position of the hash
table is called as Slot. In the above hash table, there are n slots in the table, names =
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. Slot 0, slot 1, slot 2 and so on. Hash table contains no items,
so every slot is empty.
As we know the mapping between an item and the slot where item belongs in the hash
table is called the hash function. The hash function takes any item in the collection and
returns an integer in the range of slot names between 0 to n-1.
Suppose we have integer items { 26, 70, 18, 31, 54, 93 }. One common method of
determining a hash key is the division method of hashing and the formula is :
Division method or reminder method takes an item and divides it by the table size and
returns the remainder as its hash value.
After computing the hash values, we can insert each item into the hash table at the
designated position as shown in the above figure. In the hash table, 6 of the 10 slots are
occupied, it is referred to as the load factor and denoted by, λ = No. of items / table size.
For example , λ = 6/10.
It is easy to search for an item using hash function where it computes the slot number
for the item and then checks the hash table to see if it is present.
Hash function
1. Division method. In this method, the hash function is dependent upon the remainder
of a division. For example:-if the record 52,68,99,84 is to be placed in a hash table and
let us take the table size is 10.
8=68 % 10
9=99 % 10
4=84 % 10
2. Mid square method In this method firstly key is squared and then mid part of the
result is taken as the index. For example: consider that if we want to place a record of
3101 and the size of table is 1000. So 3101 * 3101=9616201 i.e. h (3101) = 162
(middle 3 digit)
3. Digit folding method. In this method the key is divided into separate parts and by
using some simple operations these parts are combined to produce a hash key. For
example: consider a record of 12465512 then it will be divided into parts i.e. 124, 655,
12. After dividing the parts combine these parts by adding it.
H(key)=124+655+12
=791
Collision
It is a situation in which the hash function returns the same hash key for more than one
record. Sometimes when we are going to resolve the collision it may lead to overflow
condition.
1) Chaining It is a method, when a collision occurs then a linked list is maintained for
the extra data.
Example: Let us consider a hash table of size 10 and we apply a hash function of
H(key)=key % size of table. Let us take the keys to be inserted are 31,33,77,61. In the
above diagram we can see at same bucket 1 there are two records which are maintained
by linked list or we can say by chaining method.
2) Linear probing. It is very easy and simple method to resolve or to handle the
collision. Here, collision can be solved by placing the second record linearly down,
whenever the empty place is found.
Example: Let us consider a hash table of size 10 and hash function is defined as
H(key)=key % table size. Consider that following keys are to be inserted that are
56,64,36,71.
In this diagram we can see that 56 and 36 need to be placed at same bucket but by
linear probing technique the records linearly placed downward if place is empty i.e. it can
be seen 36 is placed at index 7.
P = (1 + P) % (MOD) Table_size
For example,
If we insert next item 40 in our collection, it would have a hash value of 0 (40 % 10 = 0).
But 70 also had a hash value of 0, it becomes a problem.
Linear probing solves this problem:
P = H(40)
44 % 10 = 0
Position 0 is occupied by 70. so we look elsewhere for a position to store 40.
Using Linear Probing:
P= (P + 1) % table-size
0 + 1 % 10 = 1
But, position 1 is occupied by 31, so we look elsewhere for a position to store 40.
Using linear probing, we try next position : 1 + 1 % 10 = 2
Position 2 is empty, so 40 is inserted there.
*****