Data Structure Mca
Data Structure Mca
Abstract Data type (ADT) is a type (or class) for objects whose behavior is defined by a set of values
and a set of operations. The definition of ADT only mentions what operations are to be performed but not
how these operations will be implemented. It does not specify how data will be organized in memory and
what algorithms will be used for implementing the operations. It is called “abstract” because it gives an
implementationindependent view.
The process of providing only the essentials and hiding the details is known as abstraction.
The user of data type does not need to know how that data type is implemented, for example, we have been
using Primitive values like int, float, char data types only with the knowledge that these data type can
operate and be performed on without any idea of how they are implemented.
1. List ADT
Vies of list
• The data is generally stored in key sequence in a list which has a head structure consisting of
count, pointers and address of compare function needed to compare the data in the list.
• The data node contains the pointer to a data structure and a selfreferential pointer which points to the
next node in the list.
• The List ADT Functions is given below:
• get() – Return an element from the list at any given position.
• insert() – Insert an element at any position of the list.
• remove() – Remove the first occurrence of any element from a nonempty list.
• removeAt() – Remove the element at a specified location from a nonempty list.
• replace() – Replace an element at any position by another element.
• size() – Return the number of elements in the list.
• isEmpty() – Return true if the list is empty, otherwise return false.
• isFull() – Return true if the list is full, otherwise return false.
2. Stack ADT
View of stack
• In Stack ADT Implementation instead of data being stored in each node, the pointer to data is stored.
• The program allocates memory for the data and address is passed to the stack ADT.
• The head node and the data nodes are encapsulated in the ADT. The calling function can only see the
pointer to the stack.
• The stack head structure also contains a pointer to top and count of number of entries currently in
stack.
• push() – Insert an element at one end of the stack called top.
• pop() – Remove and return the element at the top of the stack, if it is not empty.
• peek() – Return the element at the top of the stack without removing it, if the stack is not empty.
• size() – Return the number of elements in the stack.
• isEmpty() – Return true if the stack is empty, otherwise return false.
• isFull() – Return true if the stack is full, otherwise return false.
3. Queue ADT
View of Queue
• The queue abstract data type (ADT) follows the basic design of the stack abstract data type.
• Each node contains a void pointer to the data and the link pointer to the next element in the queue. The
program’s responsibility is to allocate memory for storing the data.
• enqueue() – Insert an element at the end of the queue.
• dequeue() – Remove and return the first element of the queue, if the queue is not empty.
• peek() – Return the element of the queue without removing it, if the queue is not empty.
• size() – Return the number of elements in the queue.
• isEmpty() – Return true if the queue is empty, otherwise return false.
• isFull() – Return true if the queue is full, otherwise return false.
Advantages Disadvantages
Encapsulation: ADTs provide a way to Overhead: Implementing ADTs can add
encapsulate data and operations into a overhead in terms of memory and
single unit, making it easier to manage processing, which can affect
and modify the data performance.
structure.
Abstraction: ADTs allow users to work Complexity: ADTs can be complex to
with data structures without having to implement, especially for large and
know the implementation details, complex data structures.
which can simplify
programming and reduce errors.
Data Structure Independence: ADTs can Learning Curve: Using ADTs requires
be implemented using different data knowledge of their implementation and
structures, which can make it easier to usage, which can take time and effort to
adapt to changing needs and learn.
requirements.
Information Hiding: ADTs can protect the
Limited Flexibility: Some ADTs may be
Integrity of data by controlling access and limited in their functionality or may not
preventing unauthorized modifications. be suitable for all types of data
structures.
Modularity: ADTs can be combined with Cost: Implementing ADTs may require
other ADTs to form more complex data additional resources and investment,
structures, which can increase which can increase the cost of
flexibility and modularity in development.
programming.
Iterators in Python
An iterator in Python is an object that is used to iterate over iterable objects like lists, tuples, dicts, and
sets. The Python iterators object is initialized using the iter() method. It uses the next() method for
iteration.
1. iter (): The iter() method is called for the initialization of an iterator. This returns an iterator
object
2. next (): The next method returns the next value for the iterable. When we use a for loop to traverse
any iterable object, internally it uses the iter() method to get an iterator object, which further uses the
next() method to iterate over. This method raises a StopIteration to signal the end of the iteration
Array Representation
Arrays are represented as a collection of multiple containers where each container stores one element. These containers are
indexed from '0' to 'n1', where n is the size of that particular array.
As per the above illustration, following are the important points to be considered −
To create an array in Python, import the array module and use its array() function. We can create an
array of three basic types namely integer, float and Unicode characters using this function.
The array() function accepts typecode and initializer as a parameter value and returns an object of array
class.
Syntax
# importing
import array as array_name
# creating array
obj = array_name.array(typecode[, initializer])
Where,
• typecode − The typecode character used to speccify the type of elements in the array.
• initializer − It is an optional value from which array is initialized. It must be a list, a byteslike object, or
iterable elements of the appropriate type.
• Python provides powerful data structures called lists, which can store and manipulate collections of
elements. Also provides many ways to create 2dimensional lists/arrays.
• However one must know the differences between these ways because they can create complications
in code that can be very difficult to trace out. In this article, we will explore the right way to use 2D
arrays/lists in Python.
•Using 2D arrays/lists the right way involves understanding the structure, accessing elements, and
efficiently manipulating data in a twodimensional grid. When working with structured data or grids,
2D arrays or lists can be useful. A 2D array is essentially a list of lists, which represents a tablelike
structure with rows and columns.
Creating a 1D list
• In Python, Initializing a collection of elements in a linear sequence requires creating a 1D array, which
is a fundamental process.
• Although Python does not have a builtin data structure called a ‘1D array’, we can use a list that can
achieve the same functionality.
• Python lists are dynamic and versatile, making them an excellent choice for representing 1D arrays.
Let’s start by looking at common ways of creating a 1d array of size N initialized with 0s.
Creating 1D List using Naive Methods
Manually initializing and populating a list without using any advanced features or constructs in Python is
known as creating a 1D list using “Naive Methods”
Syntax
# importing
import array as array_name
# creating array
obj = array_name.array(typecode[, initializer])
Where,
• typecode − The typecode character used to speccify the type of elements in the
array.
• initializer − It is an optional value from which array is initialized. It must be a list,
a byteslike object, or iterable elements of the appropriate type.
Characteristics:
• Structure: Defined by dimensions m×nm \times nm×n, where mmm is the number
of rows and nnn is the number of columns.
• Indexing: Elements are accessed using two indices, typically starting from 0 (e.g.,
array[row][column]).
• Homogeneous Data: Typically stores elements of the same data type.
Example in Python:
python
Copy code
# Creating a 3x3 matrix
matrix = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
# Accessing elements
print(matrix[0][1]) # Output: 2
print(matrix[2][0]) # Output: 7
# Modifying elements
matrix[1][1] = 10
print(matrix)
The Matrix Abstract Data Type (MAT) defines a set of operations that can be performed on a
matrix, abstracting away the underlying representation. This allows for manipulation of matrices
without needing to manage their internal storage details.
Sure! Let's dive into sets, maps, and multidimensional arrays, including their definitions,
characteristics, and examples.
Sets
A set is an abstract data type that represents a collection of unique elements. Sets are often used
to perform operations such as union, intersection, and difference.
# Characteristics:
Dynamic Size: The size of a set can change as elements are added or removed.
# Common Operations:
6. Difference: Get elements that are in one set but not in another.
# Example in Python:
```python
my_set = {1, 2, 3}
# Set operations
set_a = {1, 2, 3}
set_b = {3, 4, 5}
```
Maps
A map (also known as a dictionary or associative array) is an abstract data type that consists of
keyvalue pairs. It allows for fast retrieval of values based on their keys.
# Characteristics:
Unique Keys: Keys must be unique; duplicate keys are not allowed.
# Common Operations:
# Example in Python:
```python
print(my_map['b']) # Output: 2
print(f"{key}: {value}")
MultiDimensional Arrays
A multidimensional array is an array that contains arrays as its elements, allowing for the
representation of data in more than one dimension. The most common example is a
twodimensional array (matrix).
# Characteristics:
Access via Multiple Indices: Elements are accessed using multiple indices (e.g., `array[i][j]` for
a 2D array).
Example in Python:
```python
matrix = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
# Accessing elements
# Modifying elements
matrix[0][1] = 10
print(matrix)
Unit2
Algorithm Analysis
Efficiency of an algorithm can be analyzed at two different stages, before implementation and after
implementation. They are the following −
Algorithm Complexity
Suppose X is an algorithm and n is the size of input data, the time and space used by the algorithm
X are the two main factors, which decide the efficiency of X.
• Time Factor − Time is measured by counting the number of key operations such as
comparisons in the sorting algorithm.
• Space Factor − Space is measured by counting the maximum memory space required by the
algorithm.
The complexity of an algorithm f(n) gives the running time and/or the storage space required by the
algorithm in terms of n as the size of input data.
Space Complexity
Space complexity of an algorithm represents the amount of memory space required by the
algorithm in its life cycle. The space required by an algorithm is equal to the sum of the
following two components −
• A fixed part that is a space required to store certain data and variables, that are independent of
the size of the problem. For example, simple variables and constants used, program size, etc.
• A variable part is a space required by variables, whose size depends on the size of the problem.
For example, dynamic memory allocation, recursion stack space, etc.
Space complexity S(P) of any algorithm P is S(P) = C + SP(I), where C is the fixed part and S(I) is
the variable part of the algorithm, which depends on instance characteristic I. Following is a
simple example that tries to explain the concept −
Algorithm: SUM(A, B)
Step 1 − START
Step 2 − C ← A + B + 10
Step 3 − Stop
Here we have three variables A, B, and C and one constant. Hence S(P) = 1 + 3. Now, space
depends on data types of given variables and constant types and it will be multiplied
accordingly.
Time Complexity
Time complexity of an algorithm represents the amount of time required by the algorithm to run to
completion. Time requirements can be defined as a numerical function T(n), where T(n) can be
measured as the number of steps, provided each step consumes constant time.
For example, addition of two nbit integers takes n steps. Consequently, the total computational time
is T(n) = c ∗ n, where c is the time taken for the addition of two bits. Here, we observe that T(n)
grows linearly as the input size increases.
Algorithm Types
The efficiency and accuracy of algorithms have to be analyzed to compare them and choose a
specific algorithm for certain scenarios. The process of making this analysis is called
Asymptotic analysis. It refers to computing the running time of any operation in mathematical
units of computation.
The efficiency and accuracy of algorithms have to be analyzed to compare them and choose a
specific algorithm for certain scenarios. The process of making this analysis is called
Asymptotic analysis. It refers to computing the running time of any operation in mathematical
units of computation.
The commonly used asymptotic notations to calculate the running time complexity of an algorithm.
• Ο Notation
• Ω Notation
• θ Notation
Big Oh Notation, Ο
The notation Ο(n) is the formal way to express the upper bound of an algorithm's running time. It
measures the worst case time complexity or the longest amount of time an algorithm can
possibly take to complete.
ing co
Ο(f(n)) = { g(n) : there exists c > 0 and n0 such that f(n) ≤ c.g(n) for all n > n0. }
Omega Notation, Ω
The notation Ω(n) is the formal way to express the lower bound of an algorithm's running time. It
measures the best case time complexity or the best amount of time an algorithm can possibly
take to complete.
Ω(f(n)) ≥ { g(n) : there exists c > 0 and n0 such that g(n) ≤ c.f(n) for all n > n0. }
Theta Notation, θ
The notation θ(n) is the formal way to express both the lower bound and the upper bound of an
algorithm's running time. It is represented as follows −
θ(f(n)) = { g(n) if and only if g(n) = Ο(f(n)) and g(n) = Ω(f(n)) for all n > n0. }
constant − Ο(1)
logarithmic − Ο(log n)
linear − Ο(n)
− Ο(n log n)
n log n
quadratic − Ο(n2)
cubic − Ο(n3)
polynomial − nΟ(1)
exponential − 2Ο(n)
Algorithm Classes
Algorithms are unambiguous steps which should give us a welldefined output by processing zero or
more inputs. This leads to many approaches in designing and writing the algorithms. It has been
observed that most of the algorithms can be classified into the following categories.
Greedy Algorithms
Greedy algorithms try to find a localized optimum solution, which may eventually lead to globally
optimized solutions
Greedy algorithms look for a easy solution at that point in time without considering how it impacts
the future steps. It is similar to how humans solve problems without going through the complete
details of the inputs provided.
Networking algorithms use the greedy approach. Here is a list of few of them −
This class of algorithms involves dividing the given problem into smaller subproblems and then
solving each of the subproblem independently. When the problem cannot be further sub
divided, we start merging the solution to each of the subproblem to arrive at the solution for the
bigger problem.
In divide and conquer approach, the problem in hand, is divided into smaller subproblems and then
each problem is solved independently.
When we keep on dividing the subproblems into even smaller subproblems, we may eventually
reach a stage where no more division is possible.
Those "atomic" smallest possible subproblem (fractions) are solved. The solution of all
subproblems is finally merged in order to obtain the solution of an original problem.
Broadly, we can understand divideandconquer approach in a threestep process.
Divide/Break
This step involves breaking the problem into smaller subproblems. Subproblems should represent a
part of the original problem. This step generally takes a recursive approach to divide the
problem until no subproblem is further divisible. At this stage, subproblems become atomic in
nature but still represent some part of the actual problem.
Conquer/Solve
This step receives a lot of smaller subproblems to be solved. Generally, at this level, the problems
are considered 'solved' on their own.
Merge/Combine
When the smaller subproblems are solved, this stage recursively combines them until they
formulate a solution of the original problem. This algorithmic approach works recursively and
conquer &s; merge steps works so close that they appear as one.
Examples
• Merge Sort
• Quick Sort
• Kruskal's Minimal Spanning Tree Algorithm
• Binary Search
Dynamic Programming
Dynamic programming involves dividing the bigger problem into smaller ones but unlike divide
and conquer it does not involve solving each subproblem independently.
These algorithms are used for optimization. Dynamic algorithms are motivated for an overall
optimization of the problem and not the local optimization.
Recursion
Recursion allows a function to call itself. Fixed steps of code get executed again and again for new
values.
Recursive Algorithm?
A recursive algorithm calls itself with smaller input values and returns the result for the current
input by carrying out basic operations on the returned value for the smaller input. Generally, if a
problem can be solved by applying solutions to smaller versions of the same problem, and the
smaller versions shrink to readily solvable instances, then the problem can be solved using a
recursive algorithm.
To build a recursive algorithm, you will break the given problem statement into two parts. The first
one is the base case, and the second one is the recursive step.
• Base Case: It is nothing more than the simplest instance of a problem, consisting of a condition
that terminates the recursive function. This base case evaluates the result when a given
condition is met.
• Recursive Step: It computes the result by making recursive calls to the same function, but with
the inputs decreased in size or complexity.
For example, consider this problem statement: Print sum of n natural numbers using recursion. This
statement clarifies that we need to formulate a function that will calculate the summation of all
natural numbers in the range 1 to n. Hence, mathematically you can represent the function as:
There are four different types of recursive algorithms, you will look at them one by one.
• Direct Recursion
A function is called direct recursive if it calls itself in its function body repeatedly. To better
understand this definition, look at the structure of a direct recursive program.
In this program, you have a method named fun that calls itself again in its function body. Thus, you
can say that it is direct recursive.
• Indirect Recursion
The recursion in which the function calls itself via another function is called indirect recursion.
Now, look at the indirect recursive program structure.
In this example, you can see that the function fun1 explicitly calls fun2, which is invoking fun1
again. Hence, you can say that this is an example of indirect recursion.
• Tailed Recursion
A recursive function is said to be tailrecursive if the recursive call is the last execution done by the
function. Let’s try to understand this definition with the help of an example.
If you observe this program, you can see that the last line ADI will execute for method fun is a
recursive call. And because of that, there is no need to remember any previous state of the
program.
• NonTailed Recursion
A recursive function is said to be nontail recursive if the recursion call is not the last thing done by
the function. After returning back, there is something left to evaluate. Now, consider this
example.
In this function, you can observe that there is another operation after the recursive call. Hence the
ADI will have to memorize the previous state inside this method block. That is why this
program can be considered nontail recursive.
Moving forward, you will implement a C program that exhibits recursive algorithmic nature.
Each recursive call generates a new copy of the function on stack memory. Once the procedure
returns some data, the copy is deleted from storage. Each recursive call maintains a separate
stack because all parameters and other variables defined inside functions are kept on the stack.
The stack is deleted after the value from the relevant function is returned.
Recursion is quite complicated in terms of resolving and monitoring the values at each recursive
call. As a result, you have to maintain the stack and track the values of the variables specified in
it. To better understand the memory allocation of recursive functions, examine the following
example.
Now, have a look at this recursive Fibonacci code for n = 5. First, all stacks are preserved, each of
which prints the matching value of n until n becomes zero. When the termination condition is
achieved, the stacks are destroyed one at a time by returning 0 to their calling stack. To
understand the call stack hierarchy, look at the figure below.
Analysis of Recursive Algorithms
A recursive function is a function that is defined in terms of itself. Similarly, an algorithm is
said to be recursive if the same algorithm is invoked in the body. An algorithm that calls
itself is direct recursive.
These recursive algorithms are extremely powerful, but even more importantly, need more
times and storage than iterative algorithms ( that consist of iterative instruction like for,
while, and repeat), therefore , recursive algorithms are worse than iterative algorithms
because they are used the stack, but they are more clearly than an iterative.
When analyzing a recursive function for its step count ( analysis its running time), we often
obtain a recursive formula. These recursive formulas are referred to as recurrence relations
which are solved by repeated substitutions (iterative substitution) method.
Ex. 6:
Total space complexity of recursive functions=cells number for each calling*of calling
Each call to Rsum function needs two cells (one for variable S and the other one for
return address).
We can extricate ourselves from the difficulties resulting from situations when the chosen parameters
are not adequate to determine the step count uniquely by defining three kinds of step counts: best
case, worst case, and average case.
1) ) The bestcase step count is the minimum number of steps that can be executed for the given
parameters.
2) ) The worstcase step count is the maximum number of steps that can be executed for the given
parameters .
3) ) The average step count is the average number of steps executed on instances with the given
parameters.
In some instances, the above three cases are equivalent (there is one formula only) for the same value
of instance characteristics, like in summation array element example (instance characteristics is
n).
But in other instances, dependency on instance characteristics is not enough to determine time complexities,
like in search problem about an element in an array (time complexities are effected with the position of
required element in array in addition to instance characteristics
UNIT – 3
STACK
A stack is a fundamental data structure in computer science that operates on a Last In, First Out
(LIFO) principle. This means that the most recently added item is the first one to be removed.
Here’s a detailed overview of stacks
Key Operations
2. Pop: Removes and returns the item at the top of the stack.
3. Peek (or Top): Returns the item at the top of the stack without removing it.
5. Size: Returns the number of items in the stack (optional in some implementations).
Characteristics
• LIFO Principle: The last element added is the first to be removed.
• Dynamic Size: Stacks can grow and shrink dynamically as elements are added or
removed (in implementations like linked lists or resizable arrays).
• Fixed Size: In some cases, stacks have a fixed size, and attempts to push an item onto a
fullstack or pop from an empty stack might result in errors.
Implementations
Applications of Stacks:
• Function calls: Stacks are used to keep track of the return addresses of function calls,
allowing the program to return to the correct location after a function has finished
executing.
• Recursion: Stacks are used to store the local variables and return addresses of recursive
function calls, allowing the program to keep track of the current state of the recursion.
• Expression evaluation: Stacks are used to evaluate expressions in postfix notation
(Reverse Polish Notation).
• Syntax parsing: Stacks are used to check the validity of syntax in programming languages
and other formal languages.
• Memory management: Stacks are used to allocate and manage memory in some operating
systems and programming languages.
.
Advantages of Stacks:
• Simplicity: Stacks are a simple and easytounderstand data structure, making them suitable
for a wide range of applications.
• Efficiency: Push and pop operations on a stack can be performed in constant time (O(1)),
providing efficient access to data.
• Lastin, Firstout (LIFO): Stacks follow the LIFO principle, ensuring that the last element
added to the stack is the first one removed. This behavior is useful in many scenarios, such
as function calls and expression evaluation.
• Limited memory usage: Stacks only need to store the elements that have been pushed
onto them, making them memoryefficient compared to other data structures.
Disadvantages of Stacks:
• Limited access: Elements in a stack can only be accessed from the top, making it difficult
to retrieve or modify elements in the middle of the stack.
• Potential for overflow: If more elements are pushed onto a stack than it can hold, an
overflow error will occur, resulting in a loss of data.
• Not suitable for random access: Stacks do not allow for random access to elements,
making them unsuitable for applications where elements need to be accessed in a specific
order.
• Limited capacity: Stacks have a fixed capacity, which can be a limitation if the number of
elements that need to be stored is unknown or highly variable.
QUEUE
A queue is a fundamental data structure that operates on a First In, First Out (FIFO) principle.
This means that the first item added to the queue is the first one to be removed. Here’s an
overview of queues:
• FIFO Principle: The first element added is the first to be removed, making it suitable for
scenarios where order of processing is important.
• Dynamic Size: Queues can grow and shrink dynamically as elements are added or
removed (in implementations like linked lists or resizable arrays).
• Fixed Size: Some implementations use a fixedsize array, which can lead to overflow
issues if the queue grows beyond its capacity.
Implementations
• Allows insertion and deletion of elements from both ends (front and rear).
• Can be implemented using arrays or linked lists and provides more flexibility compared
to standard queues.
Applications
1. Task Scheduling:
Operating systems and applications often use queues to manage tasks and processes,
scheduling them in the order they arrive.
2. Buffering:
Queues are used in buffering data streams, such as in IO buffering, where data is collected and
processed in the order it arrives.
BFS algorithm uses a queue to explore nodes in a graph or tree level by level.
4. Print Spooling:
Print jobs are managed using a queue, ensuring that print requests are processed in the order
they are received.
queue = []
# Enqueue
# Dequeue
# Peek
# isEmpty
# Size
output:
DEQUEUE
A Deque (short for DoubleEnded Queue) is a data structure that allows insertion and deletion of
elements from both ends—both the front and the rear. This makes it a versatile and powerful
structure for various applications. Here’s an overview of deques:
Key Operations
3. Remove from Front: Removes and returns the element at the front of the deque.
4. Remove from Rear: Removes and returns the element at the rear of the deque.
5. Peek Front: Returns the element at the front without removing it.
6. Peek Rear: Returns the element at the rear without removing it.
8. Size: Returns the number of elements in the deque (optional in some implementations).
Characteristics
• Double Ended: Elements can be added or removed from both ends, making it more
flexible than a standard queue.
• Dynamic Size: Deques can grow and shrink dynamically, depending on the
implementation (e.g., linked lists or resizable arrays).
• Fixed Size: Some implementations use a fixed size array, which can lead to overflow
issues if the deque grows beyond its capacity.
Implementations
1. ArrayBased Deque:
Typically implemented with a circular array to efficiently manage the front and rear pointers.
Allows constanttime operations for adding and removing elements, but resizing may be
necessary if the array is full.
Uses a doubly linked list where each node points to both its previous and next node.
Allows efficient addition and removal of elements from both ends but requires extra space for
pointers and involves more complex memory management.
Applications
1. Algorithm Optimization:
Deques are used in algorithms like the sliding window maximum or minimum, where you
need to keep track of a subset of elements and efficiently manage them.
2. Scheduling:
Useful in task scheduling systems where tasks can be dynamically added or removed from
both ends of a queue.
3. Palindrome Checking:
Deques can efficiently check for palindromes by comparing characters from both ends towards
the center.
4. Undo Mechanisms:
In applications where you need to add and remove operations from both ends, such as
undo/redo functionalities.
Example Usage
Python’s `collections` module provides a `deque` class, but here’s a basic implementation for
educational purposes:
```python
class Deque:
def __init__(self):
self.items = []
def is_empty(self):
return len(self.items) == 0
self.items.append(item)
def remove_front(self):
if not self.is_empty():
return self.items.pop(0)
def remove_rear(self):
if not self.is_empty():
return self.items.pop()
def peek_front(self):
if not self.is_empty():
return self.items[0]
def peek_rear(self):
if not self.is_empty():
return self.items[1]
return len(self.items)
```
LINKED LISTS
Linked List
o Linked List can be defined as collection of objects called nodes that are randomly
stored in the memory.
o A node contains two fields i.e. data stored at that particular address and the
pointer which contains the address of the next node in the memory.
o The last node of the list contains pointer to the null.
Key Characteristics:
1. Directional: Each node has a single pointer to the next node, creating a oneway traversal from
the head to the end of the list.
2. Dynamic Size: Nodes can be added or removed without reallocating or reorganizing the entire
structure.
3. Efficient Operations: Insertion and deletion of nodes can be performed efficiently, but only if
you have a reference to the node immediately before the insertion or deletion point.
Basic Operations:
1. Traversal: To access elements, you start from the head and follow the `next` pointers until you
reach the desired node or the end of the list.
2. Insertion: To insert a node, you adjust the pointers of the surrounding nodes.
3. Deletion: To delete a node, you need to adjust the pointer of the preceding node to bypass the
node being removed.
A singly linked list is the simplest kind of linked lists. It takes up less space in memory because
each node has only one address to the next node, like in the image
Example in Python:
```python
class Node:
self.data = data
self.next = None
class SinglyLinkedList:
def __init__(self):
self.head = None
new_node = Node(data)
if not self.head:
self.head = new_node
return
last = self.head
while last.next:
last = last.next
last.next = new_node
def print_list(self):
current = self.head
while current:
current = current.next
print("None")
```
Circularly Linked List is a variant where the last node points back to the first node, forming a
circular structure. This circular link allows for continuous traversal through the list without
encountering an end condition.
# Key Characteristics:
1. Circular Reference: The last node’s `next` pointer refers back to the head, creating a circular
chain of nodes.
2. Traversal: Can start from any node and continue indefinitely since there is no end node.
3. Insertion and Deletion: Insertion and deletion operations can be slightly more complex
compared to singly linked lists due to the circular nature.
# Types:
1. Circular Singly Linked List: Each node points to the next node, and the last node points to the
first node.
2. Circular Doubly Linked List: Each node points to both the next and the previous node, with
the last node pointing to the first and vice versa.
Example in Python:
```python
class CircularNode:
self.data = data
self.next = None
class CircularLinkedList:
def __init__(self):
self.head = None
new_node = CircularNode(data)
if not self.head:
self.head = new_node
new_node.next = new_node
else:
last = self.head
last = last.next
last.next = new_node
new_node.next = self.head
def print_list(self):
if not self.head:
print("List is empty")
return
current = self.head
while True:
current = current.next
if current == self.head:
break
print("(circular)")
```
Doubly Linked List is a more complex variation where each node contains two pointers: one to
the next node and one to the previous node. This allows traversal in both directions.
# Key Characteristics:
1. Bidirectional Traversal: Nodes can be traversed in both forward and backward directions due
to the presence of two pointers.
2. Insertion and Deletion: More flexible compared to singly linked lists as you can easily access
both preceding and succeeding nodes.
3. Memory Usage: Requires more memory for storing two pointers per node.
# Basic Operations:
1. Traversal: Nodes can be traversed in both directions, which can be advantageous for certain
algorithms.
2. Insertion: Easier insertion as you have references to both the next and previous nodes.
3. Deletion: Simplified deletion since you can directly adjust pointers in both directions.
Example in Python:
```python
class DoublyNode:
self.data = data
self.next = None
self.prev = None
class DoublyLinkedList:
def __init__(self):
self.head = None
new_node = DoublyNode(data)
if not self.head:
self.head = new_node
return
last = self.head
while last.next:
last = last.next
last.next = new_node
new_node.prev = last
def print_list(self):
current = self.head
while current:
current = current.next
print("None")
```
TREE
Trees Overview
A tree is a hierarchical data structure that consists of nodes connected by edges. It is used to
represent relationships with a hierarchical structure, where each node can have zero or more
children, but only one parent. Trees are fundamental in computer science for various
applications, including representing hierarchical data, organizing information, and supporting
efficient data retrieval and manipulation.
Key Terminology
1. Node: The basic unit of a tree, containing data and references to its children.
2. Root: The topmost node in a tree, which does not have a parent. Every tree has exactly one
root.
8. Depth: The length of the path from the root to a node. The root has a depth of 0.
9. Height: The length of the longest path from a node to a leaf. The height of a leaf is 0.
10. Level: The level of a node is the number of edges from the root to the node. The root is at
level 0.
Types of Trees
1. Binary Trees:
• Binary Tree: Each node has at most two children, commonly referred to as the left child
and the right child.
• Full Binary Tree: Every node other than the leaves has exactly two children.
• Complete Binary Tree: All levels are fully filled except possibly the last level, which is
filled from left to right.
• Perfect Binary Tree: All internal nodes have exactly two children, and all leaf nodes are
at the same level.
• Balanced Binary Tree: The height of the left and right subtrees of any node differs by at
most one.
Binary Search Tree: A binary tree where each node’s left subtree contains only nodes with
values less than the node’s value, and the right subtree contains only nodes with values greater
than the node’s value.
3. AVL Trees:
AVL Tree: A self balancing binary search tree where the difference in height between the left
and right sub trees of any node is at most one. Balancing is maintained through rotations during
insertions and deletions.
5. B Trees:
B Tree: A self balancing tree data structure that maintains sorted data and allows searches,
sequential access, insertions, and deletions in logarithmic time. Often used in databases and file
systems.
Trie: A tree used for efficient retrieval of a key in a dataset of strings, where nodes represent
prefixes of the strings. It is particularly useful for tasks like autocomplete and spellchecking.
7. Nary Trees:
Nary Tree: A tree where each node can have up to `N` children. It is a generalization of binary
trees.
8. Heap:
Binary Heap: A complete binary tree used to implement priority queues. It satisfies the heap
property, where the key at the root is either the maximum (maxheap) or minimum (minheap)
among all keys in the tree.
1. Traversal:
In order Traversal: Visit the left sub tree, then the node, and then the right sub tree. Commonly
used with binary search trees to retrieve values in sorted order.
Preorder Traversal: Visit the node first, then the left subtree, and finally the right subtree.
Useful for creating a copy of the tree.
Pos torder Traversal: Visit the left subtree, then the right subtree, and finally the node. Useful
for deleting the tree or evaluating expressions.
Level Order Traversal: Visit nodes level by level from top to bottom and left to right.
2. Insertion:
Binary Trees: Insert a new node at the first available position in level order traversal.
Binary Search Trees: Insert nodes while maintaining the BST property.
3. Deletion:
Binary Search Trees: Remove a node while preserving the BST property, which may involve
reorganizing the tree.
4. Searching:
Binary Search Trees: Search operations are efficient with O(log n) complexity on average due
to the tree’s sorted nature.
Applications
1. Hierarchical Data Representation: Trees are used to represent hierarchical relationships, such
as organizational structures and file systems.
2. Database Indexing: Btrees and B+ trees are used in database systems to maintain indices and
facilitate fast data retrieval.
3. Search Algorithms: Binary search trees and their variants are used in search operations to
maintain sorted data.
4. Memory Management: Trees are used in memory allocation and management, such as in the
buddy system for memory allocation.
BINARY TREE
Binary Trees
A Binary Tree is a foundational data structure in computer science where each node has at most
two children, typically referred to as the left child and the right child. This structure provides a
simple yet powerful way to organize data and supports various algorithms and operations. Here’s
an indepth look at binary trees, including their properties, types, and common operations.
Key Properties
2. Root: The top node of the tree. There is only one root in a binary tree.
3. Leaf: A node with no children (both left and right pointers are `None`).
5. Height: The length of the longest path from a node to a leaf. The height of a leaf node is 0.
6. Depth: The length of the path from the root to a node. The root node has a depth of 0.
7. Level: The level of a node is defined as the number of edges from the root to the node. The
root node is at level 0.
Types of Binary Trees
• Every node has either 0 or 2 children. No node has exactly one child.
• All levels are fully filled except possibly the last level, which is filled from left to right.
• All internal nodes have exactly two children, and all leaf nodes are at the same level. It
is a type of complete binary tree.
• The height of the left and right subtrees of any node differs by at most one. AVL trees
are a common example of balanced binary trees.
• Each node has only one child, making the tree essentially a linked list. This occurs in
cases where the tree is not balanced.
• A binary tree where for each node, the left sub tree contains only nodes with values less
than the node’s value, and the right sub tree contains only nodes with values greater
than the node’s value.
7. Heap:
A complete binary tree used to implement priority queues. It follows the heap property where
the key at the root is either the maximum (in a max heap) or minimum (in a mini heap) among all
keys.
Basic Operations
1. Traversal:
• In order Traversal: Visit the left sub tree, then the node, then the right sub tree. This
results in nodes being visited in ascending order for a BST.
• Preorder Traversal: Visit the node first, then the left subtree, then the right sub tree.
Useful for creating a copy of the tree.
• Postorder Traversal: Visit the left subtree, then the right subtree, then the node. Useful
for deleting or evaluating expressions.
• Level Order Traversal: Visit nodes level by level from top to bottom and left to right.
Typically implemented using a queue.
2. Insertion:
• Binary Trees: Insert a new node at the first available position in levelorder traversal.
• Binary Search Trees: Insert nodes while maintaining the BST property, placing each node
in its correct position relative to its parent nodes.
3. Deletion:
• Binary Trees: Remove a node and adjust the tree structure accordingly.
• Binary Search Trees: Remove a node while maintaining the BST property, which may
involve reorganizing the tree. There are three cases to handle:
• Node to be deleted is a leaf node.
• Node to be deleted has one child.
• Node to be deleted has two children (requires finding the successor or predecessor).
4. Searching:
• Binary Search Trees: Search operations are efficient, with average time complexity of
O(log n) for balanced trees, due to the sorted nature of the tree.
Example in Python
class Node:
self.data = data
self.left = None
self.right = None
def __ init__(self):
self.root = None
if not self.root:
self.root = Node(data)
else:
self._insert(self.root, data)
current_node.left = Node(data)
else:
self._insert(current_node.left, data)
else:
if current_node.right is None:
current_node.right = Node(data)
else:
self._insert(current_node.right, data)
def inorder_traversal(self):
elements = []
self._inorder(self.root, elements)
return elements
if node:
self._inorder(node.left, elements)
elements.append(node.data)
self._inorder(node.right, elements)
def preorder_traversal(self):
elements = []
self._preorder(self.root, elements)
return elements
def _preorder(self, node, elements):
if node:
elements.append(node.data)
self._preorder(node.left, elements)
self._preorder(node.right, elements)
def postorder_traversal(self):
elements = []
self._postorder(self.root, elements)
return elements
if node:
self._postorder(node.left, elements)
self._postorder(node.right, elements)
elements.append(node.data)
def level_order_traversal(self):
elements = []
if self.root is None:
return elements
queue = [self.root]
while queue:
node = queue.pop(0)
elements.append(node.data)
if node.left:
queue.append(node.left)
if node.right:
queue.append(node.right)
return element
Applications
1. Expression Parsing: Binary trees, particularly expression trees, are used to parse and evaluate
mathematical expressions.
2. Binary Search Trees: Used for efficient searching, insertion, and deletion operations in
dynamic datasets.
3. Heaps: Implemented for priority queues, heapsort, and efficient access to the maximum or
minimum element.
4. Decision Trees: Used in machine learning for classification and regression tasks.
IMPLEMENTING TREES
Implementing trees involves creating a data structure where each node can have a varying
number of children, depending on the type of tree. Below, we’ll cover the implementation of
several common types of trees: a basic binary tree, a binary search tree (BST), and an AVL tree
(a selfbalancing BST). We'll use Python for these implementations.
1. Binary Tree
A Binary Tree is a tree where each node has at most two children. Here’s how to implement a
basic binary tree:
# Basic Binary Tree Implementation
```python
class TreeNode:
self.data = data
self.left = None
self.right = None
class BinaryTree:
def __init__(self):
self.root = None
if not self.root:
self.root = TreeNode(data)
else:
self._insert(self.root, data)
if node.left is None:
node.left = TreeNode(data)
else:
self._insert(node.left, data)
else:
if node.right is None:
node.right = TreeNode(data)
else:
self._insert(node.right, data)
def inorder_traversal(self):
elements = []
self._inorder(self.root, elements)
return elements
if node:
self._inorder(node.left, elements)
elements.append(node.data)
self._inorder(node.right, elements)
def preorder_traversal(self):
elements = []
self._preorder(self.root, elements)
return elements
if node:
elements.append(node.data)
self._preorder(node.left, elements)
self._preorder(node.right, elements)
def postorder_traversal(self):
elements = []
self._postorder(self.root, elements)
return elements
if node:
self._postorder(node.left, elements)
self._postorder(node.right, elements)
elements.append(node.data)
def level_order_traversal(self):
elements = []
if self.root is None:
return elements
queue = [self.root]
while queue:
node = queue.pop(0)
elements.append(node.data)
if node.left:
queue.append(node.left)
if node.right:
queue.append(node.right)
return elements
```
A Binary Search Tree (BST) is a binary tree where each node's left child contains values less
than the node’s value, and the right child contains values greater than the node’s value.
```python
self.data = data
self.left = None
self.right = None
class BinarySearchTree:
def __init__(self):
self.root = None
self.root = BSTNode(data)
else:
self._insert(self.root, data)
if node.left is None:
node.left = BSTNode(data)
else:
self._insert(node.left, data)
else:
if node.right is None:
node.right = BSTNode(data)
else:
self._insert(node.right, data)
if node is None:
return False
if data == node.data:
return True
elif data < node.data:
else:
def inorder_traversal(self):
elements = []
self._inorder(self.root, elements)
return elements
if node:
self._inorder(node.left, elements)
elements.append(node.data)
self._inorder(node.right, elements)
```
3. AVL Tree
An AVL Tree is a self balancing binary search tree where the difference in heights of the left and
right sub trees of any node is at most one. This ensures O(log n) time complexity for insertion,
deletion, and search operations.
```python
class AVLNode:
self.data = data
self.left = None
self.right = None
class AVLTree:
def __init__(self):
self.root = None
if not node:
return AVLNode(data)
else:
# Left heavy
if balance > 1:
return self._right_rotate(node)
else:
node.left = self._left_rotate(node.left)
return self._right_rotate(node)
Right heavy
if balance < 1:
return self._left_rotate(node)
else:
node.right = self._right_rotate(node.right)
return self._left_rotate(node)
return node
y = z.right
T2 = y.left
y.left = z
z.right = T2
return y
y = z.left
T3 = y.right
y.right = z
z.left = T3
return y
if not node:
return 0
return node.height
return 0
def inorder_traversal(self):
elements = []
self._inorder(self.root, elements)
return elements
if node:
self._inorder(node.left, elements)
elements.append(node.data)
self._inorder(node.right, elements)
```
Tree traversal refers to the process of visiting all the nodes in a tree in a specific order. Tree
traversal algorithms are fundamental for performing various operations on trees, such as
searching, sorting, and manipulating data. There are several methods for traversing trees, and the
choice of algorithm often depends on the specific requirements of the problem. The main
traversal algorithms for trees are:
Preorder Traversal
Postorder Traversal
Each traversal method has different characteristics and use cases. Below is a detailed explanation
of each traversal algorithm.
DepthFirst Traversal explores as far as possible along each branch before backtracking. It uses a
stackbased approach, either explicitly or through recursion.
Inorder Traversal
Inorder traversal visits nodes in the following order: left subtree, node itself, then right subtree.
For a binary search tree (BST), this traversal yields nodes in ascending order.
Algorithm:
Implementation in Python:
```python
def inorder_traversal(root):
elements = []
return elements
if node:
_inorder(node.left, elements)
elements.append(node.data)
_inorder(node.right, elements)
```
Preorder Traversal
Preorder traversal visits nodes in the following order: node itself, left subtree, then right subtree.
It is useful for creating a copy of the tree or for prefix notation.
Algorithm:
Implementation in Python:
```python
def preorder_traversal(root):
elements = []
_preorder(root, elements)
return elements
if node:
elements.append(node.data)
_preorder(node.left, elements)
_preorder(node.right, elements)
```
Postorder Traversal
Postorder traversal visits nodes in the following order: left subtree, right subtree, then the node
itself. It is useful for deleting the tree or for postfix notation.
Algorithm:
Implementation in Python:
```python
def postorder_traversal(root):
elements = []
_postorder(root, elements)
return elements
if node:
_postorder(node.left, elements)
_postorder(node.right, elements)
elements.append(node.data)
```
BreadthFirst Traversal explores all nodes at the present depth level before moving on to nodes at
the next depth level. It uses a queue to keep track of nodes.
Level order traversal visits nodes level by level from top to bottom and left to right. It is useful
for operations that require processing nodes in their level order, such as finding the shortest path
in an unweighted tree.
Algorithm:
1. Initialize a queue with the root node.
Implementation in Python:
```python
def level_order_traversal(root):
if not root:
return []
elements = []
queue = deque([root])
while queue:
node = queue.popleft()
elements.append(node.data)
if node.left:
queue.append(node.left)
if node.right:
queue.append(node.right)
return elements
```
Inorder Traversal: Produces nodes in nondecreasing order for a BST. Useful for sorted output.
Preorder Traversal: Useful for copying the tree, prefix expression evaluation, or serialization.
Postorder Traversal: Useful for deletion of the tree or postfix expression evaluation.
Level Order Traversal: Useful for processing nodes level by level, such as finding the shortest
path in an unweighted tree.
Applications
2. Preorder Traversal: Useful for generating a copy of the tree or prefix expression in expression
trees.
3. Postorder Traversal: Useful for safely deleting a tree or postfix expression in expression trees.
4. Level Order Traversal: Used in algorithms that need to process nodes level by level, such as in
finding the shortest path or managing hierarchical data.
UNIT – 4
PRIORITY QUEUE
Priority Queues
A Priority Queue is a type of abstract data structure that supports the efficient retrieval of the
element with the highest (or lowest) priority. Unlike standard queues where elements are
processed in a firstin, firstout (FIFO) manner, priority queues allow elements to be retrieved
based on their priority levels rather than their order of insertion.
Key Concepts
1. Priority: Each element in a priority queue has a priority level. The element with the highest
priority is the one that gets de queued first. In a max priority queue, the element with the highest
value has the highest priority, while in a minpriority queue, the element with the lowest value has
the highest priority.
2. Insertion: Elements are added to the priority queue with an associated priority level. The
priority queue maintains the internal order based on these priorities.
3. Deletion/Removal: The element with the highest priority is removed from the priority queue.
If the priority queue is implemented as a maxpriority queue, the element with the maximum
priority is removed, and if it is a mini priority queue, the element with the minimum priority is
removed.
4. Peek: This operation allows you to view the element with the highest priority without
removing it from the queue.
Implementations
1. Binary Heap
3. Fibonacci Heap
4. Pairing Heap
5. Ordered List
The most common implementation is the binary heap, which provides efficient operations for
priority queue functionalities.
1. Binary Heap
A binary heap is a complete binary tree that satisfies the heap property. There are two types
of binary heaps:
• MaxHeap: The key at the root node is the maximum among all keys in the heap. Every
parent node has a key greater than or equal to the keys of its children.
• MinHeap: The key at the root node is the minimum among all keys in the heap. Every
parent node has a key less than or equal to the keys of its children.
• Insert: Add a new element to the heap while maintaining the heap property.
• Extract Max/Min: Remove the root element (maximum or minimum) and restore the heap
property.
• Peek: View the root element without removing it.
A balanced binary search tree, like a RedBlack Tree, maintains sorted order of elements and
provides efficient insertion and deletion. It can also be used to implement a priority queue,
though it is less common compared to binary heaps.
Operations:
Deletion: Remove the root element and maintain the balanced tree property.
3. Fibonacci Heap
A Fibonacci Heap is a more advanced heap data structure that supports a collection of trees and
offers better amortized time complexities for operations compared to binary heaps.
4. Pairing Heap
A Pairing Heap is another type of heap with simpler implementation and good amortized time
complexities for insertions and deletions.
5. Ordered List
A priority queue can also be implemented using an ordered list, where elements are kept in a
sorted order based on priority.
Operations:
• Insert: Insert an element into the correct position in the list with O(n) time complexity.
• ExtractMax/Min: Remove the first element in the list with O(1) time complexity.
• Peek: View the first element in the list with O(1) time complexity.
The Priority Queue is an abstract data type (ADT) that extends the functionality of a standard
queue by associating each element with a priority. In a priority queue, elements are dequeued
based on their priority rather than their insertion order. The basic idea is that elements with
higher priority are served before elements with lower priority.
Key Operations
1. Insert (or Enqueue): Add an element with a given priority to the priority queue.
2. Remove (or Dequeue): Remove and return the element with the highest priority.
3. Peek (or Front): View the element with the highest priority without removing it from the
queue.
The priority queue ADT can be formally defined by the following operations:
• Description: Adds an element `item` with a specified `priority` to the priority queue.
• Complexity: The time complexity depends on the underlying implementation (e.g., O(log
n) for binary heaps).
2. Remove ():
• Description: Removes and returns the element with the highest priority. In a maxpriority
queue, this would be the element with the largest priority value, while in a minpriority
queue, it would be the element with the smallest priority value.
• Complexity: The time complexity is typically O(log n) for binary heaps but can vary
based on the data structure used.
3. Peek ():
• Description: Returns the element with the highest priority without removing it from the
queue.
• Complexity: The time complexity is O(1) for binary heaps and other similar data
structures.
4. Is Empty():
1. Binary Heap
3. Fibonacci Heap
4. Pairing Heap
5. Ordered List
1. Binary Heap
A binary heap is a complete binary tree that maintains the heap property, where:
Max Heap: Each parent node is greater than or equal to its child nodes.
Mini Heap: Each parent node is less than or equal to its child nodes.
• Time Complexities:
• Insert: O(log n)
• Remove: O(log n)
• Peek: O(1)
A balanced BST like a RedBlack Tree or AVL Tree maintains sorted order and can be used for
implementing priority queues.
• Time Complexities:
• Insert: O(log n)
• Remove: O(log n)
• Peek: O(1)
3. Fibonacci Heap
A Fibonacci Heap is a more advanced data structure that provides better amortized time
complexities for several operations.
• Time Complexities:
• Insert: O(1) amortized
• Remove: O(log n) amortized
• Decrease Key: O(1) amortized
4. Pairing Heap
A Pairing Heap is a simpler alternative with good amortized time complexities for priority queue
operations.
• Time Complexities:
• Insert: O(1) amortized
• Remove: O(log n) amortized
5. Ordered List
An ordered list maintains elements in a sorted order, allowing for efficient access to the highest
(or lowest) priority element.
Time Complexities:
Insert: O(n)
Remove: O(1)
Peek: O(1)
Applications
Priority queues are widely used in various algorithms and systems, including:
• Scheduling: Operating systems use priority queues to manage tasks based on priority
levels.
• Dijkstra’s Algorithm: Finds the shortest path in a graph by repeatedly extracting the node
with the smallest distance.
• A* Search Algorithm: Used in path finding and graph traversal, where nodes are
prioritized based on a cost function.
• Huffman Coding: Data compression algorithms use priority queues to build optimal
prefix codes.
SORTING
Sorting with a Priority Queue
Sorting with a priority queue involves using the properties of a priority queue to sort elements
efficiently. This approach is particularly useful when dealing with large datasets or when a direct
sorting algorithm like quicksort or merge sort might be less practical. The most common sorting
algorithm that uses a priority queue is Heap sort, but priority queues can also be used in other
sorting contexts.
Heapsort is a comparisonbased sorting algorithm that uses a binary heap data structure to sort
elements. It works by leveraging the heap property to repeatedly extract the maximum (or
minimum) element from the heap and build a sorted list. Heapsort is particularly efficient
because it sorts in place and has a time complexity of O(n log n).
1. Build a MaxHeap (or MinHeap): Convert the input list into a maxheap (for ascending order)
or a minheap (for descending order). This step ensures that the largest (or smallest) element is at
the root of the heap.
2. Extract Elements: Repeatedly extract the root element of the heap, which is the maximum (or
minimum) element, and place it at the end of the sorted section of the list. After each extraction,
restore the heap property by reheapifying.
3. Repeat: Continue extracting the root element and reheapifying until the heap is empty,
resulting in a sorted list.
1. Build a Priority Queue: Insert all elements into a priority queue. If using a minpriority queue,
the smallest element will be at the front, and if using a maxpriority queue, the largest element
will be at the front.
2. Extract Elements: Continuously extract elements from the priority queue and insert them into
a list. This list will be sorted in ascending order if a minpriority queue is used, or in descending
order if a maxpriority queue is used.
1. Insert All Elements: Insert each element of the input list into the priority queue. This operation
generally has a time complexity of O(n log n), where n is the number of elements.
2. Extract Elements: Remove elements from the priority queue one by one and place them into a
result list. Each extraction operation also has a time complexity of O(log n), but since there are n
extractions, the overall complexity is O(n log n).
```python
import heapq
def priority_queue_sort(iterable):
priority_queue = []
heapq.heappush(priority_queue, item)
# Extract elements from the priority queue and build the sorted list
sorted_list = []
# Example usage
sorted_data = priority_queue_sort(data)
```
MAPS
Maps (Dictionaries)
A map, also known as a dictionary, associative array, or hash table, is an abstract data type that
associates keys with values. It allows for efficient retrieval, insertion, and deletion of keyvalue
pairs. Maps are widely used in programming due to their versatility and efficiency in handling
associative data.
Key Concepts
1. KeyValue Pair: Each entry in a map is a pair consisting of a unique key and an associated
value. The key is used to identify and access the value.
2. Uniqueness: Keys in a map are unique; no two entries can have the same key. Values,
however, can be duplicated.
3. Operations:
Implementations
Maps can be implemented using various underlying data structures, each with its own
performance characteristics:
1. Hash Table
4. Tree
1. Hash Table
A hash table uses a hash function to compute an index into an array of buckets or slots, from
which the desired value can be found. This provides averagecase constant time complexity O(1)
for insertion, deletion, and retrieval operations.
Example in Python:
```python
# Creating a map
my_map = {}
my_map['apple'] = 3
my_map['banana'] = 5
# Retrieving values
print(my_map['apple']) # Output: 3
del my_map['apple']
print(f'{key}: {value}')
```
A BST is a binary tree where each node has at most two children, and for any node, the left
child’s key is less than the node’s key, and the right child’s key is greater.
BST Characteristics:
Balanced trees like Red Black Trees are BSTs with additional properties to ensure that the tree
remains balanced. This guarantees logarithmic time complexity for all basic operations.
Red Black Tree Characteristics:
A Tree is a special type of tree used for storing associative data structures. It is often used for
dictionaries, where keys are strings.
Tree Characteristics:
HASHTABLE
Hash Table
A hash table is a data structure that provides efficient methods for storing and retrieving data
using a hash function. It maps keys to values, enabling average case constant time complexity
O(1) for common operations such as insertion, deletion, and lookup. Hash tables are widely used
in various applications, including databases, caches, and associative arrays.
Key Concepts
1. Hash Function:
• A hash function is a function that takes an input (or key) and computes an index into an
array of buckets or slots. The goal is to distribute keys uniformly across the buckets to
minimize collisions.
• A good hash function should minimize collisions and distribute keys uniformly.
• Buckets:
Buckets are storage slots in the hash table where data is stored. Each bucket corresponds to an
index computed by the hash function
3. Collisions:
Collisions occur when two keys hash to the same index. Effective collision resolution
strategies are crucial for maintaining the efficiency of the hash table.
• Chaining: Each bucket maintains a linked list of entries. When a collision occurs, the new
entry is added to the list at the corresponding bucket.
• Open Addressing: When a collision occurs, the hash table searches for the next available
slot according to a probing sequence (e.g., linear probing, quadratic probing).
5. Load Factor:
The load factor is defined as the ratio of the number of elements to the number of buckets. It
measures the density of the hash table. A high load factor may lead to increased collisions and
degraded performance, prompting the need for resizing.
6. Resizing:
To maintain efficiency, hash tables are resized (usually doubled) when the load factor exceeds
a certain threshold. Resizing involves rehashing all existing keys to the new array of buckets.
KEY OPERATIONS:
4. Contains Key:
```python
class HashTable:
self.size = size
index = self._hash(key)
for kv in self.table[index]:
if kv[0] == key:
kv[1] = value
return
self.table[index].append([key, value])
index = self._hash(key)
for i, kv in enumerate(self.table[index]):
if kv[0] == key:
del self.table[index][i]
return
index = self._hash(key)
for kv in self.table[index]:
if kv[0] == key:
return kv[1]
return None
index = self._hash(key)
for kv in self.table[index]:
if kv[0] == key:
return True
return False
def __str__(self):
return str(self.table)
# Example usage
ht = HashTable()
ht.insert("apple", 3)
ht.insert("banana", 5)
print(ht.get("apple")) # Output: 3
ht.remove("apple")
print(ht)
```
1. Dynamic Resizing:
• Description: Hash tables often grow dynamically to accommodate more elements and
maintain efficiency. When the load factor exceeds a threshold, the hash table is resized
(usually doubled) and all existing keys are rehashed.
• Implementation: This involves creating a new larger array and rehashing all entries from
the old array to the new one.
2. Perfect Hashing:
• Description: A technique where the hash function is chosen such that there are no
collisions. This is often used in situations where the set of keys is known in advance
and remains static.
3. Cuckoo Hashing:
• Description: A hash table scheme where each key may be hashed to multiple possible
locations. When a collision occurs, the existing key is displaced (kicked out) and
reinserted into its alternative location.
SKIP LISTS
Skip Lists
A skip list is a probabilistic data structure that is used to implement a sorted dictionary or
associative array. It provides an efficient way to perform dynamic set operations such as
insertion, deletion, and search. Skip lists are an alternative to balanced binary search trees,
offering similar performance with simpler implementation.
Key Concepts
1. Layered Structure:
• A skip list consists of multiple levels of linked lists. Each level is a sorted list that
contains a subset of the elements in the level below it.
• The bottom level (level 0) contains all the elements, while higher levels contain fewer
elements. Each element in a higher level has a probability of being promoted from the
level below.
2. Nodes:
• Each node in a skip list contains a key, a value, and multiple forward pointers. The
number of forward pointers depends on the level of the node.
• Nodes are connected in a sorted order at each level. The level of a node determines the
number of levels it appears in.
3. Probabilistic Balancing:
• Skip lists use randomness to maintain balance. When inserting a new element, it is
randomly promoted to higher levels with a certain probability. This ensures that the
skip list remains balanced with high probability.
• Search: Traverse the skip list from the highest level down to the lowest level, moving
right and down to find the desired element.
• Insert: Insert a new element at the bottom level and probabilistically promote it to
higher levels.
• Delete: Remove the element from all levels where it appears.
Operations
1. Search:
• Description: Locate a key in the skip list.
• Time Complexity: O(log n) on average.
• Procedure: Start at the topleft corner and move right until you pass the target key, then
drop down to the next level and continue the process.
2. Insert:
3. Delete:
1. Simpilcity:
Easier to implement compared to balanced trees like AVL or Red Black Trees.
2. Efficient:
Provides O(log n) average case time complexity for search, insertion, and deletion operations.
3. Dynamic:
Supports dynamic changes (inserts and deletes) efficiently without the need for complex
rebalancing.
1. Probabilistic Guarantees:
Performance is probabilistic, which means that while it has good averagecase performance, the
worst case behavior is not strictly guaranteed.
2. Memory Overhead:
Requires additional space for multiple levels and forward pointers, which can increase
memory usage compared to simpler data structures.
SETS MULTISETS – MULTI MAPS
In computer science, sets, multisets, and multimaps are different types of abstract data types that
manage collections of elements with varying properties. Each has specific characteristics and use
cases that make them suitable for different scenarios.
Sets
A set is a collection of distinct elements. The primary operations associated with sets are
insertion, deletion, and membership testing. Sets do not allow duplicate elements and are
commonly used to handle collections of unique items.
Characteristics
1. Uniqueness:
Sets do not allow duplicate elements. Each element can appear only once in the set.
2. Operations:
Multisets
A multi set (also known as a bag) is a generalization of a set that allows duplicate elements.
Unlike sets, multi sets can contain multiple occurrences of the same element.
Characteristics:
1. Duplicates Allowed:
Multi sets can contain multiple instances of the same element. The number of occurrences of
each element is tracked.
2. Operations:
Multi maps:
A multi map (or multi map) is a generalization of a map that allows multiple values for a single
key. Unlike standard maps where each key is associated with a single value, multi maps allow
each key to be associated with a collection of values.
Characteristics
Each key in a multimap can be associated with multiple values, usually stored in a collection
such as a list or set.
2. Operations:
• Insert: Add a keyvalue pair, allowing multiple values for the same key.
• Remove: Remove a specific value for a key or all values for a key.
• Get: Retrieve all values associated with a key.
• Contains Key: Check if a key exists in the multimap.
• Contains Value: Check if a value exists for a given key.
# Example in Python
Python's `collections.defaultdict` combined with a list or set can be used to create a multimap.
```python
mm = defaultdict(list)
mm['a'].append(1)
mm['a'].append(2)
mm['b'].append(3)
mm['b'].append(4)
# Retrieving values
# Removing values
mm['a'].remove(1)
del mm['b']
```
UNIT5
Search Trees
A Binary Search Tree is a data structure used in computer science for organizing and storing
data in a sorted manner. Each node in a Binary Search Tree has at most two children,
a left child and a right child, with the left child containing values less than the parent node and
the right child containing values greater than the parent node. This hierarchical structure
allows for efficient searching, insertion, and deletion operations on the data stored in the tree.
Searching in BST means to locate a specific node in the data structure. In Binary search tree,
searching a node is easy because of its a specific order. The steps of searching a node in Binary
Search tree are listed as follows –
1. First, compare the element to be searched with the root element of the tree.
• If root is matched with the target element, then return the node’s location.
• If it is not matched, then check whether the item is less than the root element, if it is
smaller than the root element, then move to the left subtree.
• If it is larger than the root element, then move to the right subtree.
2. Repeat the above procedure recursively until the match is found.
3. If the element is not found or not present in the tree, then return NULL.
2. AVL Tree
An AVL Tree is a selfbalancing binary search tree where the difference in height between the
left and right subtrees of any node (called the balance factor) is at most one. This balance ensures
that operations like insertion, deletion, and lookup have O(log n) time complexity.
Operations
Balancing: AVL trees use rotations to maintain balance. The four types of rotations are:
1. Right Rotation
2. Left Rotation
3. RightLeft Rotation
4. LeftRight Rotation
A RedBlack Tree is a balanced binary search tree with additional properties to maintain balance:
3. Red nodes cannot have red children (no two red nodes can be adjacent).
4. Every path from a node to its descendant leaves has the same number of black nodes.
Operations
4. BTree
A BTree is a selfbalancing search tree that maintains sorted data and allows searches, sequential
access, insertions, and deletions in logarithmic time. It is commonly used in databases and file
systems.
Characteristics
1. Multilevel Index: Nodes in a Btree can have multiple children and keys.
3. Order: A Btree of order `m` can have at most `m1` keys and `m` children.
Operations
5. B+ Tree
A B+ Tree is a variation of a Btree where all values are stored at the leaf level, and internal
nodes only store keys to guide the search. It is widely used in databases and file systems for
indexing.
Characteristics
1. Leaf Nodes: All values are stored in leaf nodes, which are linked in a doubly linked list.
2. Internal Nodes: Only keys are stored in internal nodes to guide the search.
Operations
Search trees are crucial for managing dynamic sets of data where efficient querying and updating
are required. Different types of search trees provide various advantages:
A Binary Search Tree (BST) is a type of binary tree where each node has at most two children,
commonly referred to as the left and right child. It is used to store data in a way that allows for
efficient retrieval, insertion, and deletion operations. The BST property ensures that the left
subtree of a node contains only nodes with keys less than the node’s key, and the right subtree
contains only nodes with keys greater than the node’s key.
Characteristics
1. Binary Tree Structure: Each node has up to two children: a left child and a right child.
2. Ordering Property: All nodes in the left subtree have keys less than the node’s key.
All nodes in the right subtree have keys greater than the node’s key.
3. Dynamic:
BSTs can grow and shrink dynamically as elements are added or removed.
Basic Operations
1. Insertion:
To insert a new node into a BST, start at the root and traverse the tree according to the BST
property until you find an appropriate null position where the new node should be inserted.
2. Search:
To search for a value, start at the root and traverse the tree based on the BST property. If the
key is less than the current node’s key, move to the left child; if greater, move to the right child.
If the key matches, return the node; otherwise, return null if the end of the tree is reached
3. Deletion:
4. Traversal:
• Inorder: Visit the left subtree, then the node, and then the right subtree. This traversal
prints the keys in ascending order.
• Preorder: Visit the node, then the left subtree, and then the right subtree. This traversal is
useful for creating a copy of the tree.
• Postorder: Visit the left subtree, then the right subtree, and then the node. This traversal is
useful for deleting the tree or evaluating expression trees.
• Levelorder: Visit nodes level by level from top to bottom. This traversal is typically
implemented using a queue.
1. Height:
The height of a BST is the length of the longest path from the root to a leaf. In the worst case,
a BST can become skewed (like a linked list), making operations O(n). Balanced BSTs ensure
O(log n) height.
2. Balanced BSTs:
To ensure that the height remains O(log n), selfbalancing BSTs like AVL trees and RedBlack
trees are used. They automatically adjust the tree structure during insertions and deletions to
maintain balance.
3. Applications:
Dynamic Set Operations: Insertion, deletion, and lookup operations in logarithmic time on
average.
4. Advantages:
5. Disadvantages:
Can become unbalanced, leading to degraded performance (O(n)) in the worst case.
Balanced Search Trees are a subset of binary search trees (BSTs) designed to maintain a
balanced structure, which ensures that the height of the tree remains logarithmic in relation to the
number of nodes. This balance guarantees that operations such as insertion, deletion, and search
can be performed efficiently in O(log n) time complexity, where n is the number of nodes in the
tree.
Balanced search trees address the problem of a BST becoming unbalanced, which can occur
when elements are inserted in a sorted or nearly sorted order, leading to a degenerate (skewed)
tree structure similar to a linked list.
1. AVL Trees
AVL Trees are a type of selfbalancing binary search tree. They maintain balance by ensuring
that the heights of the two child subtrees of any node differ by no more than one. This balance is
maintained through rotations during insertion and deletion operations.
Characteristics
• Balance Factor: For any node in the AVL tree, the difference between the height of the
left and right subtrees (balance factor) is 1, 0, or 1.
• Rotations: To maintain balance, AVL trees use four types of rotations:
• Right Rotation (Single Rotation)
• Left Rotation (Single Rotation)
• LeftRight Rotation (Double Rotation)
• RightLeft Rotation (Double Rotation)
Operations
• Insertion: After inserting a node, check the balance factors of the nodes and perform
necessary rotations to restore balance.
• Deletion: After deleting a node, adjust the tree to restore balance by performing rotations
as needed.
• Search: O(log n) time complexity due to balanced structure.
2. RedBlack Trees
RedBlack Trees are another type of selfbalancing binary search tree where each node has an
additional color attribute (red or black) to ensure the tree remains approximately balanced. The
balancing criteria ensure that no path from the root to a leaf is more than twice as long as any
other such path.
Characteristics
Color Rules:
Operations
Insertion: Insert the node and then fix any violations of redblack properties by performing
rotations and color changes.
Deletion: Remove the node and adjust the tree to maintain redblack properties through
rotations and color changes.
3. BTrees
BTrees are selfbalancing tree data structures that maintain sorted data and allow searches,
sequential access, insertions, and deletions in logarithmic time. They are often used in databases
and file systems to manage large datasets efficiently.
Characteristics
Order: A Btree of order `m` can have at most `m1` keys and `m` children.
Multilevel Index: Allows efficient multilevel indexing and management of large datasets.
Operations
• Insertion: Insert a key into the appropriate leaf node and split nodes if necessary to
maintain balance.
• Deletion: Remove a key and ensure the tree remains balanced by merging or
redistributing nodes if needed.
• Search: O(log n) time complexity due to balanced structure.
Splay Trees
A Splay Tree is a type of selfadjusting binary search tree with the property that recently
accessed elements are moved to the root of the tree. This selfadjusting property is achieved
through a process known as "splaying," which involves a series of tree rotations.
Characteristics
1. SelfSplaying:
o After accessing (inserting, deleting, or searching) a node, the tree is rearranged so
that the accessed node is moved to the root of the tree through a series of
rotations.
2. No Strict Balancing:
o Unlike AVL or RedBlack trees, splay trees do not maintain strict balance criteria.
Instead, they use the splaying operation to ensure that frequently accessed nodes
are quickly accessible.
3. Amortized Analysis:
o Although individual operations can be inefficient in the worst case, splay trees
provide good performance on average. The amortized time complexity for splay
tree operations is O(log n), where n is the number of nodes in the tree.
Splaying Operations
The splaying operation involves three types of tree rotations: zig, zigzig, and zigzag.
1. Zig Operation:
o Applied when the node to be splayed is the child of the root. This involves a
single rotation.
2. ZigZig Operation:
oApplied when the node to be splayed and its parent are both either left or right
children. This involves two rotations.
3. ZigZag Operation:
o Applied when the node to be splayed is a child of its parent, but is on the opposite
1. Search:
o Search for a node involves traversing the tree as in a standard binary search tree.
After the node is found (or if it’s not found), the splay operation is performed to
bring the accessed node to the root.
o Time Complexity: O(log n) amortized.
2. Insertion:
o Insert the node as you would in a regular binary search tree. After the insertion,
perform the splay operation to bring the newly inserted node to the root.
o Time Complexity: O(log n) amortized.
3. Deletion:
o To delete a node, first splay the node to the root (if it exists). Then, remove the
root and restructure the tree by combining the left and right subtrees.
o Time Complexity: O(log n) amortized.
Advantages:
• Amortized Time Complexity: Operations like search, insertion, and deletion have
amortized time complexity of O(log n).
• SelfAdjusting: The tree adapts based on access patterns, potentially speeding up access
to frequently used nodes.
Disadvantages:
• Lack of Strict Balancing: Unlike AVL and RedBlack trees, splay trees do not guarantee
strict balance, so individual operations can be inefficient in the worst case.
• Implementation Complexity: The splaying operation and rotations can be more
complex to implement correctly compared to other balanced trees.
Applications
Splay trees are particularly useful in scenarios where the access patterns are not uniform. They
are used in applications where:
• Locality of Reference: Recently accessed elements are likely to be accessed again soon.
• Cache Optimization: Splay trees can optimize cache performance due to their
selfadjusting nature.
In summary, splay trees provide a unique approach to maintaining balanced binary search trees
through selfadjusting mechanisms. While they may not always provide the same worstcase
guarantees as other balanced trees, their amortized performance and adaptability can be
beneficial in many realworld applications.
Sorting and selection are fundamental operations in computer science, used to arrange elements
in a specific order and to find particular elements within a dataset, respectively. These operations
are essential for data management, search optimization, and efficient algorithm design. Here’s an
overview of sorting and selection techniques:
Sorting
Sorting refers to the process of arranging elements in a specific order, usually in ascending or
descending order. There are numerous sorting algorithms, each with its own performance
characteristics and use cases
Selection
Selection refers to finding specific elements from a dataset, such as the smallest, largest, or kth
smallest/largest element. This is useful for tasks where only a subset of data needs to be
analyzed.
1. Linear Search:
o Description: Scans through the list to find the desired element.
o Time Complexity: O(n) in the worstcase.
o Use Case: Simple and effective for small or unsorted datasets.
2. Binary Search:
o Description: Efficiently finds an element in a sorted list by repeatedly dividing
the search interval in half.
o Time Complexity: O(log n) in the worstcase.
o Use Case: Efficient for searching in sorted data.
3. Selection Algorithms:
o Description: Algorithms to find the kth smallest or largest element in an unsorted
array. Examples include:
▪ Quickselect: An efficient algorithm based on Quick Sort that finds the kth
smallest element.
• Time Complexity: O(n) average case, O(n^2) worst case.
▪ Median of Medians: An algorithm that finds the kth smallest element
with guaranteed O(n) worstcase time complexity.
• Time Complexity: O(n).
Merge Sort
Merge Sort and Quick Sort are two popular sorting algorithms with distinct characteristics and
use cases. Both algorithms are efficient but differ in their approaches, performance
characteristics, and suitability for various types of data.
Merge Sort is a divideandconquer sorting algorithm that divides the array into smaller subarrays,
sorts those subarrays, and then merges them to produce the sorted array.
1. Divide:
Recursively split the array into two halves until each subarray contains a single element.
2. Conquer:
Merge the sorted subarrays to produce the sorted array. The merging process involves
combining two sorted arrays into a single sorted array.
3. Combine:
Continue merging until the entire array is merged into a single sorted array.
Time Complexity
Space Complexity
Stability
Implementation
def merge_sort(arr):
if len(arr) > 1:
mid = len(arr) // 2
left_half = arr[:mid]
right_half = arr[mid:]
merge_sort(left_half)
merge_sort(right_half)
i=j=k=0
arr[k] = left_half[i]
i += 1
else:
arr[k] = right_half[j]
j += 1
k += 1
arr[k] = left_half[i]
i += 1
k += 1
while j < len(right_half):
arr[k] = right_half[j]
j += 1
k += 1
# Example usage
merge_sort(arr)
```
Advantages:
Consistent Performance: Merge sort always performs at O(n log n) time complexity.
Disadvantages:
Space Complexity: Requires additional space proportional to the size of the array.
Overhead: Extra time and space are needed for merging operations.
Quick Sort
Quick Sort is another divideandconquer sorting algorithm that selects a 'pivot' element, partitions
the array into elements less than and greater than the pivot, and recursively sorts the partitions.
1. Partition:
Choose a pivot element from the array and partition the other elements into two subarrays
according to whether they are less than or greater than the pivot.
2. Recursively Sort:
Recursively apply the same process to the subarrays formed by the partitioning.
3. Combine:
Time Complexity
a) WorstCase: O(n^2) (occurs when the pivot is the smallest or largest element each time)
b) AverageCase: O(n log n)
c) BestCase: O(n log n) (occurs when the pivot divides the array into two equal halves)
Space Complexity
Stability
```python
def quick_sort(arr):
if len(arr) <= 1:
return arr
pivot = arr[len(arr) // 2]
# Example usage
```
Advantages:
Average Case Performance: Generally faster in practice than merge sort for large datasets.
Disadvantages:
Unstable: Does not guarantee to maintain the relative order of equal elements.