[go: up one dir, main page]

0% found this document useful (0 votes)
34 views54 pages

DSA Manual

The document outlines the course structure and objectives for the Data Structure and Algorithms Laboratory at Marathwada Mitra Mandal's College of Engineering, Pune, for the academic year 2024-25. It details the program outcomes, course objectives, and specific outcomes that students are expected to achieve, including practical implementations of various data structures and algorithms. Additionally, it includes a mapping of course outcomes to program outcomes, a list of experiments, and assignments related to data structures.

Uploaded by

Suraj Surve
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views54 pages

DSA Manual

The document outlines the course structure and objectives for the Data Structure and Algorithms Laboratory at Marathwada Mitra Mandal's College of Engineering, Pune, for the academic year 2024-25. It details the program outcomes, course objectives, and specific outcomes that students are expected to achieve, including practical implementations of various data structures and algorithms. Additionally, it includes a mapping of course outcomes to program outcomes, a list of experiments, and assignments related to data structures.

Uploaded by

Suraj Surve
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Marathwada Mitra Mandal’s

College of Engineering, Pune


Karve Nagar,Pune-411 052

Department of Artificial
Intelligence and Data Science
Engineering

Data Structure and Algorithms Laboratory


(217532)
Prepared by:
Mrs. Snehal Khajurgi

S.E –AI & DS


Semester III Academic Year 2024-25
PROGRAMS OUTCOMES (POs)

Engineering Graduates will be able to:

1. Engineering Knowledge: Apply the knowledge of mathematics, science, engineering


fundamentals and an engineering specialization to the solution of complex engineering
problems.

2. Problem Analysis: Identity, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of
mathematics, natural sciences, and engineering sciences.

3. Design / development of solutions: Design solutions for complex engineering problems


and design system components or processes that meet the specified needs with
appropriate consideration for the public health and safety, and the cultural, societal, and
environmental considerations.

4. Conduct investigations of complex problems: Use research – based knowledge and


research methods including design of experiments, analysis and interpretation of data, and
synthesis of the information to provide valid conclusions.

5. Modern tool usage: Create, select and apply appropriate techniques, resources, and
modern engineering and IT tools including prediction and modeling to complex
engineering activities with an understanding of the limitations.

6. The engineer and society: Apply reasoning informed by the contextual knowledge to
assess societal, health, safety, legal and cultural issues and the consequent responsibilities
relevant to the professional engineering practice.

7. Environment and sustainability: Understand the impact of the professional engineering


solutions in societal and environmental contexts, and demonstrate the knowledge of, and need
for sustainable development.

8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and
norms of the engineering practice.
9. Individual and team work: Function effectively as an individual, and as a member or
leader in diverse teams, and in multidisciplinary settings.

10. Communication: Communicate effectively on complex engineering activities with the


engineering community and with society at large, such as, being able to comprehend and
write effective reports and design documentation, make effective presentations, and give
and receive clear instructions.

11. Project management and finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a member
and leader in a team, to manage projects and in multidisciplinary environments.

12. Life-long learning: Recognize the need for, and have the preparation and ability to
engage in independent and life-long learning in the broadest context of technological
change.

Course Objective & Course Outcomes

Course Objectives:

● To understand practical implementation and usage of non linear data structures for solving
problems of different domains.
● To strengthen the ability to identify and apply the suitable data structure for the given real
world problems.
● To analyze advanced data structures including hash table, dictionary, trees, graphs, sorting
algorithms and file organization.
Course Outcomes:
After successfully completing the course students will be able to
1. Understand the ADT/libraries, hash tables and dictionary to design algorithms for a specific
problem.
2. Choose the most appropriate data structures and apply algorithms for graphical solutions of the
problems.
3. Apply and analyze non linear data structures to solve real world complex problems. 4. Apply
and analyze algorithm design techniques for indexing, sorting, multi-way searching, file
organization and compression.
5. Analyze the efficiency of the most appropriate data structure for creating efficient solutions for
engineering design situations.

CO-PO Mapping Matrix

SUBJECT PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 P
O
12

Data CO1 1 2 - ---


2 -

Structure and CO2 - 2 -


2-
Algorithms
CO3 - 2 21
Laboratory
CO4 1 2 2
1

CO5 1 1 2
2

C. CO-PSO mapping
Course Program Specific Outcomes
Outcome
12

210252.1 1 ---

210252.2 1 ---

210252.3 1 ---

210252.4 1 ---

210252.5 1 ---

210252.6 1 ---
INDEX
Sr. Date of CAS Signatur
No. Experiment PageDate of completio eof
conduction n Faculty

1 Hash Table Implementation

2 To create ADT that implements the "set"


concept
3 Implementation of Binary Search Tree

4 Implementation of Expression Tree from


the given Prefix Expression.

5 Implementation of Binary Search Tree to implement dictionary


concept

6 Implementation of a graph Data Structure


Implementation of a Minimum Spanning
7
Tree using Prim’s Algorithm
Implementation of an Optimal Binary
8
search tree Data Structure
Implementation of a Height Balance Tree
9 Data Structure

Implementation of a Priority Queue as


10
ADT
Implementation of sequential file
11
organization concept using cpp

12 Implementation of a direct access file

13 Mini project

14 VLab: To implement shortest path


algorithm using Dijkstra's Algorithm
CERTIFICATE
Certified that Mr./Ms._______________________________________ of Class SE AI & DS Roll
No.__________ has completed the Laboratory work in the subject Data Structures and Algorithms
Laboratory (217532) during the academic year 2024-25 Sem II.
Signature of the Faculty Signature of Head of the Department Date:

List Of assignments
Group Sr. Title of Assignment
No

A 1. Consider the telephone book database of N clients. Make use of a hash table
implementation to quickly look up a client's telephone number. Make use of
two collision handling techniques and compare them using number of
comparisons required to find a set of telephone numbers

2. To create ADT that implements the "set" concept

3. Beginning with an empty binary search tree, Construct a binary search tree by
inserting the values in the order given. After constructing a binary tree - i. Insert
new node, ii. Find number of nodes in longest path from root, iii. Minimum data
value found in the tree, iv. Change a tree so that the roles of the left and right
pointers are swapped at every node, v. Search a value

4 Construct an expression tree from the given prefix expression eg. +--a*bc/def
B and traverse it using post order traversal (non recursive) and then delete the
entire tree.

5. A Dictionary stores keywords and its meanings. Provide facility for adding new
keyword, deleting keywords, updating values of entry. Provide facility to display
whole data sorted in ascending /Descending order. Also find how many
maximum comparisons may require for finding any keyword. Use Binary Search
Tree for implementation.

6 Represent a given graph using adjacency matrix/list to perform DFS and using
adjacency list to perform BFS. Use the map of the area around the college as
the graph. Identify the prominent landmarks as nodes and perform DFS and BFS
on that.
C
7. You have a business with several offices; you want to lease phone lines to
connect them up with each other; and the phone company charges different
amounts of money to connect different pairs of cities. You want a set of lines
that connects all your offices with a minimum total cost. Solve the problem by
suggesting appropriate data structures.

8. Given sequence k = k1 < K2…<Kn of n sorted keys, with a search probability pi


for each key ki. Build the Binary search tree that has the least search cost given
the access probability for each key?
D
9. A Dictionary stores keywords and its meanings. Provide facility for adding new
keywords, deleting keywords, updating values of any entry. Provide a facility to
display whole data sorted in ascending/ Descending order. Also find how many
maximum comparisons may require for finding any keyword. Use Height
balance tree and find the complexity for finding a keyword

10. Consider a scenario for hospital to cater services to different kinds of patients
E as Serious (top priority), b) non-serious (medium priority), c) General Check-up
(Least priority). Implement the priority queue to cater services to the patients.

11. Department maintains student information. The file contains roll number,
name, division and address. Allow users to add, delete information about
students. Display information of a particular employee. If the record of the
F student does not exist an appropriate message is displayed. If it is, then the
system displays the student details. Use a sequential file to maintain the data.

12. Implementation of a direct access file -Insertion and deletion of a record from a
direct access file

13. Mini Project

14. VLab: To implement shortest path algorithm using Dijkstra's Algorithm

Experiment No. 1
⮚ Title: Hash Table Implementation
Problem Statement: Consider telephone book database of N clients. Make use of a hash table
implementation to quickly look up client‘s telephone number. Make use of two collision handling
techniques and compare them using number of comparisons required to find a set of telephone
numbers.

Aim: To make the use of collision handling technique (linear probing with replacement)
Outcome: At end of this experiment, student will be able to illustrate the use of collision handling
technique in hash table implementation.

⮚ Hardware Requirement:
PC/Laptop

⮚ Software Requirement:
Open Source Python, Programming tool like Google Colab, Jupyter Notebook, Pycharm, Spyder,
G++/GCC.
⮚ Theory:
Hashing:

Hashing is the process of transforming data and mapping it to a range of values which can be
efficiently looked up.

Collision Resolution Technique:


● Open Hashing (Separate chaining)
● Closed Hashing (Open Addressing)
o Liner Probing
o Quadratic probing
o Double hashing

Open Addressing:
● Open addressing is a method for handling collisions. In Open Addressing, all elements are stored in
the hash table itself. So at any point, the size of the table must be greater than or equal to the total
number of keys.
Operations on open addressing:
1. Insert(k): Keep probing until an empty slot is found. Once an empty slot is found, insert k. 2.
Search(k): Keep probing until slot’s key doesn’t become equal to k or an empty slot is reached.

3. Delete(k): So slots of deleted keys are marked specially as “deleted”.


The insert can insert an item in a deleted slot, but the search doesn’t stop at a deleted slot.

● Open Addressing is done in the following ways:

a) Linear Probing: In linear probing, we linearly probe for next slot. For example, the typical gap
between two probes is 1 as seen in the example below.
Let hash(x) be the slot index computed using a hash function and S be the table size
If slot hash(x) % S is full, then we try (hash(x) + 1) % S
If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S
..................................................
..................................................

Example: Let us consider a simple hash function as “key mod 7” and a sequence of keys as 50, 700, 76, 85,
92, 73, 101.

Conclusion:-
Thus all set operations have been studied and implemented client book record using hashing

Frequently Asked Questions:


1. What is the use of a hash table?
2. What is a collision? What are different collision resolution techniques?
3. What is linear probing?
4. Explain Linear probing with replacement using examples.

Experiment No. 2
⮚ Title: To create an ADT that implements the "set" concept.

⮚ Aim: To make the use of Set concept in Data Structure


⮚ Outcome: At the end of this experiment, students will be able to illustrate the use of Set operations for applying on
various applications.

⮚ Hardware Requirement:
PC/Laptop

⮚ Software Requirement:
Open Source Python, Programming tool like Google Colab, Jupyter Notebook, Pycharm, Spyder.

⮚ Theory:
Set in Python:
The set() function takes in an iterable and yields a list of objects which will be inserted into the set.
The {} syntax places the objects themselves into the set.
Set Size and Membership:
The len() function returns the number of elements in a set, and the ins and not in operators can be used to test
for membership:
>>> x = {'foo', 'bar', 'baz'}

>>> len(x)
3
Update() functions:
add_set = set((1, 2, 3, 4))
>>> add_set
{1, 2, 3, 4}
>>> add_set.update((1,))
>>> add_set
{1, 2, 3, 4}

>>> add_set.update(("cello", "violin"))


>>> add_set
{1, 2, 3, 4, 'violin', 'cello'}

Union Operations:
In Python, set union can be performed with the | operator:
>>> x1 = {'foo', 'bar', 'baz'}
>>> x2 = {'baz', 'qux', 'quux'}
>>> x1 | x2
{'baz', 'quux', 'qux', 'bar', 'foo'}
Intersection Operations:
x1.intersection(x2) and x1 & x2 return the set of elements common to both x1 and x2:
>>> x1 = {'foo', 'bar', 'baz'}
>>> x2 = {'baz', 'qux', 'quux'}

>>> x1.intersection(x2)
{'baz'}

>>> x1 & x2
{'baz'}

Difference Operations:
The difference between the two sets in Python is equal to the difference between the number of elements in two
sets. The function difference() returns a set that is the difference between two sets.

>>> a = {1, 2, 3, 30, 300}


>>> b = {10, 20, 30, 40}

>>> c = {100, 200, 300, 400}

>>> a.difference(b, c)
{1, 2, 3}

>>> a - b - c
{1, 2, 3}
Is subset operations:
In set theory, a set x1 is considered a subset of another set x2 if every element of x1 is in
x2. x1.issubset(x2) and x1 <= x2 return True if x1 is a subset of x2:
>>> x1 = {'foo', 'bar', 'baz'}
>>> x1.issubset({'foo', 'bar', 'baz', 'qux', 'quux'})
True

>>> x2 = {'baz', 'qux', 'quux'}


>>> x1 <= x2
False

Input:-
Set elements

Conclusion:- The types of functions in python have been studied and the operations on students records
have been implemented using user defined functions.

Frequently Asked Questions:


1. What are functions in Python explain with example?
2. What are the 4 types of functions in Python?
3. What is the need of function in Python?
4. Which key is used for function in Python?

Experiment No. 3

⮚ Title: Implementation of Binary Search Tree


Problem Statement: Beginning with an empty binary search tree, Construct a binary search tree by
inserting the values in the order given. After constructing a binary tree -i. Insert new node, ii. Find
number of nodes in longest path from root, iii. Minimum data value found in the tree, iv. Change a
tree so that the roles of the left and right pointers are swapped at every node, v. Search a value

Aim: To make the use of Binary Search Tree and perform the different operations on it.
⮚ Outcome: At the end of this experiment, students will be able to illustrate the use of Binary Search
Tree implementation.

⮚ Hardware Requirement:
PC/Laptop

⮚ Software Requirement:
G++/GCC.

⮚ Theory:
Binary Search Tree:
A binary search tree follows some order to arrange the elements. In a Binary search tree, the value
of left node must be smaller than the parent node, and the value of right node must be greater
than the parent node. This rule is applied recursively to the left and right sub trees of the root.
Binary Search Tree Operations:
1) Insert Operation: Inserting a new node into the existing tree.

Algorithm:
If node == NULL
return createNode(data)
if (data < node->data)
node->left = insert(node->left, data);
else if (data > node->data)
node->right = insert(node->right, data);
return node;

2) Searching a Node in BST: If the value is below the root, we can say for sure that the value is not in the
right sub tree; we need to only search in the left sub tree and if the value is above the root, we can say for
sure that the value is not in the left sub tree; we need to only search in the right sub tree.

Algorithm:
If root == NULL
return NULL;
If number == root->data
return root->data;
If number < root->data
return search(root->left)
If number > root->data
return search(root->right)

3) Finding the minimum node from tree:


Approach for finding minimum element:
● Traverse the node from root to left recursively until left is NULL.
● The node whose left is NULL is the node with minimum value.

int minValue(struct node* node)


{

struct node* current = node;

/* loop down to find the leftmost leaf */

while (current->left != NULL)


{

current = current->left;

return(current->key);

}
4) Mirror() Operation:
The Mirror() operation finds the mirror of the tree that will interchange all left and right sub trees in a
linked binary tree.
Algorithm:
(1) Call Mirror for left->subtree i.e., Mirror(left->subtree) (2)
Call Mirror for right->subtree i.e., Mirror(right-subtree) (3)
Swap left and right subtrees.
temp = left->subtree
left->subtree = right->subtree
right->subtree = temp

Input:-

Name and telephone number of client.

Output:-

1) Insert new node 2)number of nodes in longest path 3) minimum 4) mirror 5) search 6)
inorder 7) preorder 8) postorder
1
Enter the data : 34
Do you want to insert more value :y
Enter the data : 12
Do you want to insert more value :y
Enter the data : 67
Do you want to insert more value :y
Enter the data : 2
Do you want to insert more value :n
The Total no.of nodes are:4

Do you want to continue : y


1) Insert new node 2)number of nodes in longest path 3) minimum 4) mirror 5) search 6)
inorder 7) preorder 8) postorder
2
Number of nodes in longest path:
3
Do you want to continue : y
1) Insert new node 2)number of nodes in longest path 3) minimum 4) mirror 5) search 6)
inorder 7) preorder 8) postorder
3
The min element is: 2
Do you want to continue : y
1) Insert new node 2)number of nodes in longest path 3) minimum 4) mirror 5) search 6)
inorder 7) preorder 8) postorder
4
The mirror of tree is: 67 34 12 2
Do you want to continue : n

Conclusion: - Thus binary search tree operations are implemented

Frequently Asked Questions:


1. What are the operations on Binary search tree?
2. Difference between Binary tree and binary Search tree?
3. Write an ADT for Binary Search Tree?

Experiment No. 4
⮚ Title: Implementation of Expression Tree from the given Prefix Expression.
Problem Statement: Construct an expression tree from the given prefix expression eg. +--a*bc/def and
traverse it using post order traversal (non recursive) and then delete the entire tree.

Aim: To make the use of Binary Tree and stack Data structure to construct the Expression Tree from
the given prefix expression.

⮚ Outcome: At the end of this experiment, students will be able to illustrate the use of Expression tree
implementation.

⮚ Hardware Requirement:
PC/Laptop
⮚ Software Requirement:
G++/GCC.

⮚ Theory:
Expression Binary Tree:
The expression tree is a binary tree in which each internal node corresponds to the operator and each
leaf node corresponds to the operand so for example expression tree for 3 + ((5+9)*2) would be:

Construction of Expression Tree:


Now For constructing an expression tree we use a stack. We loop through input expressions and
do the following for every character.
1. If a character is an operand push that into the stack
2. If a character is an operator, pop two values from the stack, make them its child and push the
current node again.

Algorithm to construct Expression Tree:


Begin
class ExpressionTree which has following functions:
function push() to push nodes into the tree:
If stack is null
then push the node as first element
Else
push the node and make it top

function pop() to pop out nodes from the tree:


If stack is null
then print underflow
Else
Pop out the node and update top

function insert() to insert characters:


If it is digit
then push it.
Else if it is operator
Then pop it.
Else
Print “invalid Expression”
Expression Tree Traversal:
1. In order Traversal
void inorder(node* x)
{
cout<<"Tree in InOrder Traversal is: "<<endl;
if (x == NULL)
return;
else {
inorder(x->left);
cout << x->value << " ";
inorder(x->right);
}
}
Use of Expression tree:

1. The main objective of using the expression trees is to make complex expressions and can be
easily evaluated using these expression trees.
2. It is also used to find out the associativity of each operator in the expression.
3. It is also used to solve the postfix, prefix, and infix expression evaluation.

Delete operation on Binary Tree:


Algorithm:
● Starting at the root, find the deepest and rightmost node in the binary tree and node
which we want to delete.
● Replace the deepest rightmost node’s data with the node to be deleted.
● Then delete the deepest rightmost node.

1) Node to be deleted is the leaf: Simply remove from the tree.


50 50
/ \ delete(20) / \
30 70 ---------> 30 70
/ \ / \ \ / \
20 40 60 80 40 60 80
2) Node to be deleted has only one child: Copy the child to the node and delete the child
50 50
/ \ delete(30) / \
30 70 ---------> 40 70
\ / \ / \
40 60 80 60 80
3) Node to be deleted has two children: Find inorder successor of the node. Copy contents of the inorder
successor to the node and delete the inorder successor. Note that inorder predecessor can also be
used.
50 60
/ \ delete(50) / \
40 70 ---------> 40 70
/ \ \
60 80 80

Input:-

Prefix Expression

Conclusion:-
Frequently Asked Questions:
1. What is binary tree traversal?
2. Explain the expression tree with suitable examples.
3. Construct the expression tree from the given expression: *+ab-cd”
4. Explain the delete operation on Binary Tree.

Experiment No. 5
⮚ Title: Implementation of General Tree
Problem Statement: A book consists of chapters, chapters consist of sections and sections consist
of subsections. Construct a tree and print the nodes. Find the time and space requirements of
your method.

Aim: 1. To make the use of Tree Data Structure.


2. To Learn the General Tree Data Structure Implementation.

Outcome: At end of this experiment, student will be able to illustrate the use of General Tree
Data Structure implementation.

⮚ Hardware Requirement:
PC/Laptop

⮚ Software Requirement:
G++/GCC.

⮚ Theory:
General Tree:
A tree T is a finite set of one or more nodes such that there is one designated node RR, called the
root of T. If the set (T−{R})(T−{R}) is not empty, these nodes are partitioned into n>0n>0
disjoint sets T0T0, T1T1, …, Tn−1Tn−1, each of which is a tree, and whose roots
R1,R2,...,RnR1,R2,...,Rn, respectively, are children of RR. The subsets Ti(0≤i<n)Ti(0≤i<n) are
said to be subtrees of T.

Representing General Tree:


May the number of children per node vary over a large range (possibly with no logical
upper limit)?

● Setting an upper limit renders some trees un-representable.


● Even with an upper limit, allocating a fixed number of pointers within each node may
waste a huge amount of space.
● Supporting a variable number of pointers also raises performance issues. ● How can the
pointers be organized for easy traversal? Does this require a secondary data structure
within the node?
● Does the scheme provide for efficient search? node insertion? node deletion?

Linked Node Implementation:


The binary node type presented earlier can be extended for a general tree representation. The
pointers can be managed by:

● allocating a uniform-sized array of node pointers and limiting the number of children a
node is allowed to have; -
● allocating a dynamically-sized array of node pointers, custom-fit to the number of
children the node currently has (and re-sizing manually as needed);
● Using a linked list of node pointers;
● Using a vector of node pointers, growing as needed.

Input:-
Book name

No of chapters in Book

Chapter name

No of sections in each chapter

Section names

Program code:
# include <iostream>
# include <cstdlib>
# include <string.h>
using namespace std;
struct node
{char label[10];
int ch_count;

struct node *child[10];


}*root;

/*
* Class Declaration
*/
class GT
{
public:
void create_tree();
void display(node * r1);
GT()
{
root = NULL;
}
};
void GT::create_tree()
{
int tbooks,tchapters,i,j,k;
root = new node;
cout<<"Enter name of book";
cin>>root->label;
cout<<"Enter no. of chapters in book";
cin>>tchapters;
root->ch_count = tchapters;
for(i=0;i<tchapters;i++)
{
root->child[i] = new node;
cout<<"Enter Chapter name\n";
cin>>root->child[i]->label;
cout<<"Enter no. of sections in Chapter: "<<root->child[i]->label;
cin>>root->child[i]->ch_count;
for(j=0;j<root->child[i]->ch_count;j++)
{
root->child[i]->child[j] = new node;
cout<<"Enter Section "<<j+1<<"name\n";
cin>>root->child[i]->child[j]->label;
//cout<<"Enter no. of subsections in "<<r1->child[i]->child[j]->label;
//cin>>r1->child[i]->ch_count;
}

}void GT::display(node * r1)


{
int i,j,k,tchapters;
if(r1 != NULL)
{

cout<<"\n-----Book Hierarchy---";
cout<<"\n Book title : "<<r1->label;
tchapters = r1->ch_count;
for(i=0;i<tchapters;i++)
{
cout<<"\n Chapter "<<i+1;
cout<<" "<<r1->child[i]->label;
cout<<"\n Sections";
for(j=0;j<r1->child[i]->ch_count;j++)
{
//cin>>r1->child[i]->child[j]->label;
cout<<"\n "<<r1->child[i]->child[j]->label;
}
}
}
}
int main()
{
int choice;
GT gt;
while (1)
{
cout<<"-----------------"<<endl;
cout<<"Book Tree Creation\n1.Create\n2.Display\n3.Quit"<<endl;
cout<<"Enter your choice : ";
cin>>choice;
switch(choice)
{
case 1:gt.create_tree();
case 2: gt.display(root);
break;
case 3:exit(1);
default:
cout<<"Wrong choice"<<endl;
}
}
}
Conclusion:- Thus implemented general tree and its operations.

Frequently Asked Questions:


1. Write the C++ function to print given binary tree?

2. Write the concept of Skewed binary tree.


3. What is BST? Draw BST for the following data:10, 08, 15, 12, 13, 07, 09, 17, 20, 18, 04,
05.
4. Explain the concept of representation of a binary tree using an array.
Experiment No. 6

⮚ Title: Implementation of a graph Data Structure


Problem Statement: Represent a given graph using adjacency matrix/list to perform DFS and
using adjacency list to perform BFS. Use the map of the area around the college as the graph.
Identify the prominent landmarks as nodes and perform DFS and BFS on that.
⮚ Aim: To make the use of Graph data Structure to represent a map of the area around the college
as the graph.
⮚ Outcome: At the end of this experiment, students will be able to illustrate the use of graph
implementation.

⮚ Hardware Requirement:
PC/Laptop

⮚ Software Requirement:
G++/GCC.

⮚ Theory:
Graph: A Graph is a non-linear data structure consisting of nodes and edges. The nodes are
sometimes also referred to as vertices and the edges are lines or arcs that connect any two
nodes in the graph. More formally a Graph can be defined as,
“A Graph consists of a finite set of vertices (or nodes) and set of Edges which connect a pair of
nodes.”

In the above Graph, the set of vertices V = {0, 1, 2, 3, 4} and

the set of edges E = {01, 12, 23, 34, 04, 14, 13}.

Representations of a graph:
1. Adjacency Matrix
2. Adjacency List

Adjacency Matrix:
Adjacency Matrix is a 2D array of size V x V where V is the number of vertices in a graph. Let
the 2D array be adj[][], a slot adj[i][j] = 1 indicates that there is an edge from vertex i to vertex
j. Adjacency matrix for undirected graph is always symmetric. Adjacency Matrix is also used
to represent weighted graphs. If adj[i][j] = w, then there is an edge from vertex i to vertex j
with weight w.
The adjacency matrix for the above example graph is:

Adjacency List:
An array of lists is used. The size of the array is equal to the number of vertices. Let the array
i
be an array[]. An entry array[i] represents the list of vertices adjacent to the th vertex. This
representation can also be used to represent a weighted graph. The weights of edges can be
represented as lists of pairs. Following is the adjacency list representation of the above
graph.

Graph Traversal:
DFS Algorithm:
● Step 1: Insert the root node or starting node of a tree or a graph in the stack.
● Step 2: Pop the top item from the stack and add it to the visited list.
● Step 3: Find all the adjacent nodes of the node marked visited and add the ones that are not
yet visited, to the stack.
● Step 4: Repeat steps 2 and 3 until the stack is empty.

Implementation of DFS:
Algorithm DFS ( ) // This algorithm is used to traverse graphs in DFS order.

1. {
2. for(i=1 to n) do
3. visit[i]=0;
4. Read(Starting vertex in v1);
5. push(v1);
6. while(top!=-1) // until stack I s not empty
7. {
8. v1=pop();
9. if(visit[v1]==0) // v1 is unvisited
10. {
11. Write(v1);
12. visit[v1]=1;
13. for(v2=1 to n) // find neighbors
14. if(g[v1][v2]==1) // neighbor exist
15. push(v2);
16. }
17. }

18. }

BFS Algorithm:
1. The algorithm for breadth-first search traversal is:
2. Select the root of the graph.
3. Insert the root vertex into the queue.
4. Pop a vertex from the queue, mark it as visited, and output its value. 5. Visit
its unvisited neighbor vertex, push them to the queue, and mark visited. 6.
Repeat step 3 until the queue is empty.
7. When the queue is empty, end the program.

Input:-

Vertices and edges


Conclusion:-

Thus DFS and BFS algorithms have been studied and implemented for the graph Frequently
Asked Questions:
1. What is the difference between graph and tree data structure?
2. Write a short note on different representations of a graph..
3. Explain the BFS traversal of a graph.
4. Show BFS and DFS for the following graph with the starting vertex as 0.
Experiment No. 7
⮚ Title: Implementation of a Minimum Spanning Tree using Prim’s Algorithm
⮚ Problem Statement: You have a business with several offices; you want to lease phone lines to connect them up with each
other; and the phone company charges different amounts of money to connect different pairs of cities. You want a set of
lines that connects all your offices with a minimum total cost. Solve the problem by suggesting appropriate data
structures.
⮚ Aim: To make the use of weighted Graph data Structure to represent several offices
⮚ Outcome: At the end of this experiment, students will be able to illustrate the use of Prim's
algorithm to find the minimum spanning tree implementation.

⮚ Hardware Requirement:
PC/Laptop

⮚ Software Requirement:
G++/GCC.

⮚ Theory:
Spanning Tree:
A spanning tree is a sub-graph of an undirected connected graph, which includes all the
vertices of the graph with a minimum possible number of edges. If a vertex is missed, then it
is not a spanning tree.

Minimum Spanning Tree(MST)


A minimum spanning tree (MST) or minimum weight spanning tree for a weighted, connected,
undirected graph is a spanning tree with a weight less than or equal to the weight of every other
spanning tree. The weight of a spanning tree is the sum of weights given to each edge of the
spanning tree.

A minimum spanning tree is a spanning tree in which the sum of the weight of the edges is as
minimum as possible.

The possible spanning trees from the above graph are:

The
minimum spanning tree from the above spanning trees is:

The minimum spanning tree from a graph is found using the following algorithms: 1. Prim's Algorithm
Prim’s Algorithm also use Greedy approach to find the minimum spanning tree. In Prim’s
Algorithm we grow the spanning tree from a starting position. Unlike an edge in Kruskal's, we
add vertex to the growing spanning tree in Prim's.

Algorithm Steps:

● Maintain two disjoint sets of vertices. One containing vertices that are in the growing
spanning tree and other that are not in the growing spanning tree.

● Select the cheapest vertex that is connected to the growing spanning tree and is not in the
growing spanning tree and add it into the growing spanning tree. This can be done using
Priority Queues. Insert the vertices, that are connected to growing spanning trees, into the
Priority Queue.
● Check for cycles. To do that, mark the nodes which have been already selected and insert
only those nodes in the Priority Queue that are not marked.

Example: apply the Prim’s algorithm on given graph

Solution:

Cost of minimum spanning tree:


=sum of all the edges
=10+25+22+12+16+14
=99

2. Kruskal's Algorithm
Steps for finding MST using Kruskal’s algorithm
1. Sort all the edges in non-decreasing order of their weight.
2. Pick the smallest edge. Check if it forms a cycle with the spanning tree formed so far.
If the cycle is not formed, include this edge. Else, discard it.
3. Repeat step#2 until there are (V-1) edges in the spanning tree
Example on Kruskal’s Algorithm:
Cost of minimum spanning tree:
=sum of all the edges
=1+5+3+2
=11

Input:-

Vertices and edges


Output:
Enter no. of vertices: 4
Do you want an edge between 1 and 2: y
Enter weight: 5
Do you want an edge between 1 and 3: y
Enter weight: 7
Do you want an edge between 1 and 4: n
Do you want an edge between 2 and 3: y
Enter weight: 10
Do you want an edge between 2 and 4: n
Do you want an edge between 3 and 4: y
Enter weight: 12

1->2
1->3
3->4
Min. Cost: 24

Conclusion:- Thus Minimum spanning tree algorithm have been implemented


Frequently Asked Questions:

1. What is the minimum spanning tree?


2. Explain the Prim’s algorithm with an example.
3. List the applications of minimum spanning trees.
4. Time complexity of Prim’s and Kruskal’s Algorithm

Experiment No. 8

⮚ Title: Implementation of an Optimal Binary search tree Data Structure


⮚ Problem Statement: Given sequence k = k1 <k2 < ... <kn of n sorted keys, with a search probability
pi for each key ki . Build the Binary search tree that has the least search cost given the access
probability for each key?

⮚ Aim: To make the use of optimal binary search tree data structure to calculate search
probability pi for n keys.

⮚ Outcome: At end of this experiment, student will be able to illustrate the use of optimal binary
search tree implementation.

⮚ Hardware Requirement:
PC/Laptop

⮚ Software Requirement:
G++/GCC.

⮚ Theory:

Optimal Binary Search Tree:


The frequency and key-value determine the overall cost of searching a node. The cost of
searching is a very important factor in various applications. The overall cost of searching a node
should be less. The time required to search a node in BST is more than the balanced binary
search tree as a balanced binary search tree contains a lesser number of levels than the BST.
There is one way that can reduce the cost of a binary search tree is known as an optimal binary
search tree.
Example:
Keys ; 10, 20, 30, 40, 50, 60, 70

In the above tree, all the nodes on the left subtree are smaller than the value of the root node, and
all the nodes on the right subtree are larger than the value of the root node. The maximum time
required to search a node is equal to the minimum height of the tree, equal to log(n).
Now we will see how many binary search trees can be made from the given number of keys.

The Formula for calculating the number of trees:


��2����
��+1
Dynamic Approach

First, we will calculate the values where j-i is equal to zero.


When i=0, j=0, then j-i = 0

When i = 1, j=1, then j-i = 0

When i = 2, j=2, then j-i = 0

When i = 3, j=3, then j-i = 0

When i = 4, j=4, then j-i = 0

Therefore, c[0, 0] = 0, c[1 , 1] = 0, c[2,2] = 0, c[3,3] = 0, c[4,4] = 0

Now we will calculate the values where j-i equal to 1.

When j=1, i=0 then j-i = 1

When j=2, i=1 then j-i = 1

When j=3, i=2 then j-i = 1

When j=4, i=3 then j-i = 1

Now to calculate the cost, we will consider only the jth value.

The cost of c[0,1] is 4 (The key is 10, and the cost corresponding to key 10 is 4).
The cost of c[1,2] is 2 (The key is 20, and the cost corresponding to key 20 is 2).

The cost of c[2,3] is 6 (The key is 30, and the cost corresponding to key 30 is 6)

The cost of c[3,4] is 3 (The key is 40, and the cost corresponding to key 40 is 3)

Now we will calculate the values where j-i = 2

When j=2, i=0 then j-i = 2

When j=3, i=1 then j-i = 2

When j=4, i=2 then j-i = 2

In this case, we will consider two keys.

o When i=0 and j=2, then keys 10 and 20. There are two possible trees that can be made out
from these two keys shown below:

In the first binary tree, cost would be: 4*1 + 2*2 = 8

In
the second binary tree, cost would be: 4*2 + 2*1 = 10

The minimum cost is 8; therefore, c[0,2] = 8


o When i=1 and j=3, then keys 20 and 30. There are two possible trees that can be made out
from these two keys shown below:

In the first binary tree, cost would be: 1*2 + 2*6 = 14

In the second binary tree, cost would be: 1*6 + 2*2 = 10

The minimum cost is 10; therefore, c[1,3] = 10

o When i=2 and j=4, we will consider the keys at 3 and 4, i.e., 30 and 40. There are two
possible trees that can be made out from these two keys shown as below:

In the first binary tree, cost would be: 1*6 + 2*3 = 12

In the second binary tree, cost would be: 1*3 + 2*6 = 15

The minimum cost is 12, therefore, c[2,4] = 12

Now we will calculate the values when j-i = 3

When j=3, i=0 then j-i = 3

When j=4, i=1 then j-i = 3

o When i=0, j=3 then we will consider three keys, i.e., 10, 20, and 30.

The following are the trees that can be made if 10 is considered as a root
node.

In the above tree, 10 is the root node, 20 is the right child of node 10, and 30 is the right child of
node 20.

Cost would be: 1*4 + 2*2 + 3*6 = 26

In the above tree, 10 is the root node, 30 is the right child of node 10, and 20 is the left child of
node 20.

Cost would be: 1*4 + 2*6 + 3*2 = 22

The following tree can be created if 20 is considered as the root node.

In the above tree, 20 is the root node, 30 is the right child of node 20, and 10 is the left child of
node 20.

Cost would be: 1*2 + 4*2 + 6*2 = 22

The following are the trees that can be created if 30 is considered as the root node.

In the above tree, 30 is the root node, 20 is the left child of node 30, and 10 is the left child of
node 20.

Cost would be: 1*6 + 2*2 + 3*4 = 22


In the above tree, 30 is the root node, 10 is the left child of node 30 and 20 is the right child of
node 10.

Cost would be: 1*6 + 2*4 + 3*2 = 20

Therefore, the minimum cost is 20 which is the 3rd root. So, c[0,3] is equal to

20. o When i=1 and j=4 then we will consider the keys 20, 30, 40

c[1,4] = min{ c[1,1] + c[2,4], c[1,2] + c[3,4], c[1,3] + c[4,4] } + 11

= min{0+12, 2+3, 10+0}+ 11

= min{12, 5, 10} + 11

The minimum value is 5; therefore, c[1,4] = 5+11 = 16

o Now we will calculate the values when j-i = 4

When j=4 and i=0 then j-i = 4

In this case, we will consider four keys, i.e., 10, 20, 30 and 40. The frequencies of 10, 20, 30 and
40 are 4, 2, 6 and 3 respectively.

w[0, 4] = 4 + 2 + 6 + 3 = 15

If we consider 10 as the root node then

C[0, 4] = min {c[0,0] + c[1,4]}+ w[0,4] = min {0 + 16} + 15= 31

If we consider 20 as the root node then

C[0,4] = min{c[0,1] + c[2,4]} + w[0,4] = min{4 + 12} + 15 = 16 + 15 = 31


If we consider 30 as the root node then,

C[0,4] = min{c[0,2] + c[3,4]} +w[0,4] = min {8 + 3} + 15 = 26

If we consider 40 as the root node then,

C[0,4] = min{c[0,3] + c[4,4]} + w[0,4] = min{20 + 0} + 15 = 35

In the above cases, we have observed that 26 is the minimum cost; therefore, c[0,4] is equal to
26.

The optimal binary tree can be created as:

General formula for calculating the minimum cost is:

ALGORITHMS IN PSEUDOCODE
The following function builds an optimal
binary search tree
FUNCTION CONSTRUCT(R, i, j) begin
*build a new internal node N labeled (i, j)
k ← R (i, j) if i = k then

*build a new leaf node N’ labeled (i, i) else


*N’ ← CONSTRUCT(R, i, k)
*N’ is the left child of node N
if k = (j – 1) then
*build a new leaf node N’’ labeled (j, j)
else
*N’’ ← CONSTRUCT(R, k + 1, j)
*N’’ is the right child of node N
return N
end

Input:-

k = k1 <k2 < ... <kn of n sorted keys

Output:-

Optimal Binary search Tree


Optimal Binary Search Tree
Enter the number of nodes 4
Enter the data as…
a[1] 1
a[2] 2
a[3] 3
a[4] 4
p[1] 3
p[2]3
p[3]1
p[4]1
q[0]2
q[1]3
q[2]1
q[3]1
q[4]1
The Optimal Binary Search Tree For the Given Node Is…
The Root of this OBST is ::2
The Cost of this OBST is::32
NODE LEFT CHILD RIGHT CHILD
213
1
34
4

Conclusion:- Successfully implemented Optimal Binary search operations


Frequently Asked Questions:
1. What is an optimal binary search tree? What is its use?
2. Enlist the names of static tree tables with suitable examples.
3. What is a symbol table? What are operations on the symbol table? Give complete
specification of symbol table ADT.
4. Explain static and dynamic tree tables.
5. Write the algorithm to build the OBST.

Experiment No. 9

⮚ Title: Implementation of a Height Balance Tree Data Structure


⮚ Problem Statement: A Dictionary stores keywords and its meanings. Provide facility for adding
new keywords, deleting keywords, updating values of any entry. Provide a facility to display
whole data sorted in ascending/ Descending order. Also find how many maximum
comparisons may require for finding any keyword. Use Height balance tree and find the
complexity for finding a keyword.

⮚ Aim: To make the use of Height Balanced tree data structure to build the dictionary which
stores keywords and its meaning.

⮚ Outcome: At end of this experiment, student will be able to illustrate the use of height balance
tree implementation.

⮚ Hardware Requirement:
PC/Laptop

⮚ Software Requirement:
G++/GCC.

⮚ Theory:
Height Balance Tree:
A balanced binary tree is also known as height balanced tree. It is defined as binary tree in when
the difference between the height of the left subtree and right subtree is not more than m, where m
is usually equal to 1. The height of a tree is the number of edges on the longest path between the
root node and the leaf node.
Height-balancing requirement:

A node in a tree is height-balanced if the heights of its subtrees differ by no more than 1.
(That is, if the subtrees have heights h1 and h2, then |h1 − h2| ≤ 1.)

A tree is height-balanced if all of its nodes are height-balanced. (An empty tree is height-balanced by
definition.)
Operations on an AVL tree:]
Insertion operation on AVL Tree:
Insertion in AVL tree is performed in the same way as it is performed in a binary search tree. The new
node is added into AVL tree as the leaf node. However, it may lead to violation in the AVL tree
property and therefore the tree may need balancing.
S
Rotation Description
N

1 LL The new node is inserted to the left sub-tree of left sub-tree of critical node.
Rotation

2 RR The new node is inserted to the right sub-tree of the right sub-tree of the
Rotation critical node.

3 LR The new node is inserted to the right sub-tree of the left sub-tree of the
Rotation critical node.

4 RL The new node is inserted to the left sub-tree of the right sub-tree of the
Rotation critical node.

Left Rotation
If a tree becomes unbalanced, when a node is inserted into the right subtree of the right
subtree, then we perform a single left rotation −
Right Rotation
AVL tree may become unbalanced, if a node is inserted in the left subtree of the left subtree. The tree
then needs a right rotation.

Left-
right Rotation

Right-left Rotation:

Input:-
Keywords and Its meaning.

Output:-

Height Balanced Tree

Conclusion:- Thus learned and implemented Hight Balanced Tree.


Frequently Asked Questions:
1. What is an AVL Tree?
2. Explain all rotations on AVL tree insertion.
3. explain the search operation on AVL Tree.
4. Calculate the time complexity of AVL tree.

5. Explain the deletion operation on AVL Tree with suitable examples.

Experiment No. 10

Title: Implementation of a Priority Queue as ADT.


⮚ Problem Statement: Consider a scenario for Hospital to cater services to different kinds of
patients as Serious (top priority), b) non-serious (medium priority), c) General Check-up
(Least priority). Implement the priority queue to cater services to the patients.

⮚ Aim: Implement priority queue as ADT using single linked list for servicing patients in a
hospital with priorities as i) Serious (top priority) ii) medium illness (medium priority) iii)
General (Least priority).

Objective:
● To understand the concept of priority Queue.
● How data structures Queue is represented as an ADT.

Outcome: At the end of this experiment, students will be able to illustrate the use of Priority Queue as ADT.
⮚ Hardware Requirement:
PC/Laptop

⮚ Software Requirement:
G++/GCC.

⮚ Theory:
Queue Data Structure:

A Queue is a linear structure which follows a particular order in which the operations are performed.
The order is First In First Out (FIFO). A good example of a queue is any queue of consumers for a
resource where the consumer that came first is served first. The difference between stacks and queues
is in removing. In a stack we remove the item the most recently added; in a queue, we remove the item
the least recently added.

Priority Queue
Priority Queue is an abstract data type, which is similar to a queue, however, in the priority queue,
every element has some priority. The priority of the elements in a priority queue determines the order
in which elements are removed from the priority queue. Therefore all the elements are either arranged
in an ascending or descending order.

ALGORITHM: Define structure for Queue (Priority, Patient Info, Next Pointer).

Empty Queue:

Return true if Queue is Empty else False.

isEmpty (Front) //Front is pointer of structure, which is first element of Queue.

Step 1: If Front = = NULL

Step 2: Return 1

Step 3: Return 0

Insert Function:
Insert Patient in Queue with respect to the Priority.

Front is pointer variable of type Queue, which is 1st node of Queue.

Patient is a pointer variable of type Queue, which hold the information about new patient.

Insert (Front, Queue)

Step 1: If Front = = NULL //Queue Empty Then Front = Patient;

Step 2: Else if Patient->Priority > Front->Priority Then

i) Patient->Next = Front;
ii) Front=Patient;

Step 3: Else

A) Temp=Front;
B) Do Steps a while Temp! = NULL and Patient->Priority <= Temp->Next->Priority

a) Temp=Temp->Next;

c) Patient->Next = Temp->Next; Temp->Next = Patient;

Step 4: Stop

Delete Patient details from Queue after patient get treatment:

Front is pointer variable of type Queue, which is 1st node of Queue.

Delete Node from Front. Delete (Front)

Step 1: Temp = Front;

Step 2: Front = Front->Next;

Step 3: return Temp

Display Queue Front:

Front is pointer variable of type Queue, which is 1st node of Queue.

Display (Front)

Step 1: Temp = Front;


Step 2: Do Steps while Temp! = NULL

a) Display Temp Data

b) If Priority 1 Then ¯General Checkupǁ;

Else

If Priority 2 Then Display ¯ Non-serious";

Else

If Priority 3 Then Display "Serious" Else Display "Unknown";

c) Temp = Temp->Next;

Step 3: Stop

Input:-

Name of patients & category of patient like

a) Serious (top priority), b) non-serious (medium priority), c) General Checkup (Least priority).

Output:-
Enter Your Choice:-

1 for Insert the Data in Queue

2 for show the Data in Queue

3 for Delete the data from the Queue

0 for Exit

Enter the number of patinent

Enter your name of the patient : a

Enter your Prioritys (0: serious, 1: non-serious, 2: genral checkup) : 1

Enter your name of the patient : b


Enter your Prioritys (0: serious, 1: non-serious, 2: genral checkup) : 2

Enter your name of the patient : c

Enter your Prioritys (0: serious, 1: non-serious, 2: genral checkup) : 0

1 for Insert the Data in Queue

2 for show the Data in Queue

3 for Delete the data from the Queue

0 for Exit

Patient's Name - c Priority - 'Serious'

Patient's Name - a Priority - 'Non-serious'

Patient's Name - b Priority - 'Checkup'

1 for Insert the Data in Queue

2 for show the Data in Queue

3 for Delete the data from the Queue

0 for Exit

3
deleted Element =c

Its Priority = 10

1 for Insert the Data in Queue

2 for show the Data in Queue

3 for Delete the data from the Queue

0 for Exit

Patient's Name - a Priority - 'Non-serious'


Patient's Name - b Priority - 'Checkup'

1 for Insert the Data in Queue

2 for show the Data in Queue

3 for Delete the data from the Queue

0 for Exit

Conclusion: - Successfully implemented of ADT priority queue.

Frequently Asked Questions:


1. What is the Priority Queue?
2. Explain the different queue operations.
3. What is queue data structure?
4. Explain the priority queue implementation using a linked list.

Experiment No. 11
Title: Implementation of sequential file organization concept using cpp .
Problem Statement: Department maintains student information. The file contains roll number,
name, division and address. Allow users to add, delete information about students. Display
information of a particular employee. If the record of the student does not exist an
appropriate message is displayed. If it is, then the system displays the student details. Use a
sequential file to maintain the data.

⮚ Aim: Implement file organization concept to maintain the student information record.
Objective:
● To understand the concept of sequential file organization.
● to understand the file handling concepts.

Outcome: At the end of this experiment, students will be able to illustrate the use of sequential files to maintain
the data.
⮚ Hardware Requirement:
PC/Laptop

⮚ Software Requirement:
G++/GCC.

⮚ Theory:
File – A file is a collection of related information that is recorded on secondary storage such as
magnetic disks, magnetic tables and optical disks.
File Organization:
File Organization refers to the logical relationships among various records that constitute the file,
particularly with respect to the means of identification and access to any specific record. In
simple terms, Storing the files in certain order is called file Organization. File Structure refers to
the format of the label and data blocks and of any logical control record.
Types of file organization:
● Sequential File Organization
● Heap File Organization
● Hash File Organization
● B+ Tree File Organization
● Clustered File Organization

Sequential File Organization:

● The easiest method for file organization is sequential method.


● In this method the files are stored one after another in a sequential manner. ● The records are
arranged in the ascending or descending order of a key field. ● Sequential file search starts from
the beginning of the file and the records can be added at the end of file.
● In a sequential file it is not possible to add a record in the middle of the file without rewriting the
file.

There are two ways to implement this method:


● Pile File Method
– This method is quite simple, in which we store the records in a
sequence i.e one after another in the order in which they are inserted into the
tables.
1) Insertion of new record – Let the R1, R3 and so on upto R5 and R4 be four records in the
sequence. Here, records are nothing but a row in any table. Suppose a new record R2 has
to be inserted in the sequence, then it is simply placed at the end of the file.

2) Sorted File Method –In this method, As the name itself suggests, whenever a new record has
to be inserted, it is always inserted in a sorted (ascending or descending) manner.
Sorting of records may be based on any primary key or any other key.

Conclusion: - Successfully implemented of queue and its operation.

Frequently Asked Questions:


1. 1. Explain the various modes of opening the file in C or C++?
2. Explain any three operations on sequential file organization with example. 3.
What are the advantages and disadvantages of sequential access file organization. 4.
write the applications of sequential file organization.
Experiment No. 12

Title: External Sorting


Problem Statement: Assume we have two input and two output tapes to perform the sorting.
The internal memory can hold and sort m records at a time. Write a program in java for
external sorting. Find out time complexity.
⮚ Aim: to learn the external sorting concept.
Objective:
● To understand the concept and basics of Sorting, internal and external memory handling.

● To analyze the time complexity of sorting algorithms.

Outcome: To implement the concept and basics of Sorting, internal and external memory
handling.

⮚ Hardware Requirement:
PC/Laptop

⮚ Software Requirement:
G++/GCC.

⮚ Theory:
External Sorting:
External Sorting is sorting the lists that are so large that the whole list cannot be contained in the
internal memory of a computer.

Assume that the list(or file) to be sorted resides on a disk. The term block refers to the unit of
data that is read form or written to a disk at one time. A block generally consists of several
records. For a disk, there are three factors contributing to read/write time:

(i) Seek time:


time taken to position the read/write heads to the correct cylinder. This will depend
on the number of cylinders across which the heads have to move.

(ii) Latency time: time until the right sector of the track is under the read/write head.
(iii) Transmission time: time to transmit the block of data to/from the disk.

Example of External Sorting: 2-way Merge Sort, k-way Merge Sort, Polyphase Merge sort etc.
External Sorting with Disks:
The most popular method for sorting on external storage devices is merge

sort. This method consists of essentially two distinct phases.

First Phase(Run Generation Phase): Segments of the input file are sorted using a good internal sort
method. These sorted segments, known as runs, are written out onto external storage as they are
generated.

Second Phase(Merge Phase):The runs generated in phase one are merged together following the
merge tree pattern , until only one run is left.

2-way Merging/ Basic External Sorting Algorithm


Assume unsorted data is on disk at start.

Let M=maximum number of records that can be sorted & sorted in internal memory at one

time. Algorithm:
Repeat

1. Read M records into main memory & sort internally.

2. Write this sorted sub-list into disk. (This is one “run”).

Until data is processed into runs.

Repeat

1. Merge two runs into one sorted run twice as long.

2. Write this single run back onto disk

Until all runs processed into runs twice as long

Merge runs again as often as needed until only one large run: The sorted list.

Pseudo Code for merging of M records:


open N input files and one output file

(* initialize buffers *)

loop from i = 1 to N

if
end_of_file (file i)
then buffer[i] <- invalid_key else buffer[i] <- first record of

file i;

(* merge *)

stop <- false

repeat

s <- index of smallest buffer element


if buffer[s] = invalid_key

then stop <- true else write buffer[s]

if end_of_file (file s)

then buffer[s] <- invalid_key

else buffer[s] <- next record from file s

until stop = true

close files

Measuring Performance: Time Complexity:

The size of a run and number of initial runs (nr) is dictated by the number of file blocks (b) and
the available buffer space (nb)

• Nr = b/nb

• E.g b = 1024 blocks and nb = 5 blocks then 205 initial runs will be needed The degree of

merging (dm) is the number of runs that can be merged together in each pass. • One buffer

block is needed to hold one block from each run being merged • One buffer block is needed

for containing one block of merged result

• Dm is the smaller of (nb – 1) and nr


Number of passes = logdm(nr)

Analysis :This algorithm requires log(N/M) passes with initial run pass. Therefore, at each pass the
N records are processed and at last we will get a time complexity as O(N log(N/M).

Conclusion:

Hence we have learned about theExternal Sorting.

Frequently Asked Questions:


1. What is the external sorting.
2. Time complexity of external sorting
3. Differentiate between internal sorting and external sorting.
1.

You might also like