UNIT I Data Structures
UNIT I Data Structures
The main use of the data structure is to build efficient algorithms such as sorting
algorithms. Stack, queue, array, etc., have a linear relationship between the data elements,
while graph, tree, hashmap, etc., have a complex relationship with the data.
There are two categories of data structure - linear data structure and non-linear data
structure. In real life, linear data structure is used to develop software, and non-linear data
structure is used in image processing and artificial intelligence.
A linear data structure is a type of data structure that stores the data linearly or sequentially.
In the linear data structure, data is arranged in such a way that one element is adjacent to its previous
and the next element. It includes the data at a single level such that we can traverse all data into a
single run.
The implementation of the linear data structure is always easy as it stores the data linearly.
The common examples of linear data types are Stack, Queue, Array, LinkedList, and Hashmap, etc.
1. Stack
Users can push/pop data from a single end of the stack. Users can insert the data into
the stack via push operation and remove data from the stack via pop operation. The stack
follows the rule of LIFO (last in first out). Users can access all the stack data from the top of
the stack in a linear manner.
In real-life problems, the stack data structure is used in many applications. For example, the
All web browser uses the stack to store the backward/forward operations.
2. Queue
Queue data structure stores the data in a linear sequence. Queue data structure
follows the FIFO rule, which means first-in-first-out. It is similar to the stack data structure,
but it has two ends. In the queue, we can perform insertion operations from the rear using
the Enqueue method and deletion operations from the front using the deque method.
3. Array
The array is the most used Linear data type. The array stores the objects of the same
data type in a linear fashion. Users can use an array to construct all linear or non-linear data
structures. For example, Inside the car management software to store the car names array of
the strings is useful.
We can access the element of the array by the index of elements. In an array, the index
always starts at 0. To prevent memory wastage, users should create an array of dynamic sizes.
4. LinkedList
LinkedList data structure stores the data in the form of a node. Every linked list node
contains the element value and address pointer. The address pointer of the LinkedList
consists of the address of the next node. It can store unique or duplicate elements.
In a non-linear data structure, data is connected to its previous, next, and more
elements like a complex structure. In simple terms, data is not organized sequentially in such
a type of data structure.
1. Tree
Types of trees:
1. Binary tree
3. AVL tree
4. Red-black tree
The Heap data structure is also a non-linear tree-based data structure, and it uses the
complete binary tree to store the data.
2. Graph
The graph contains vertices and edges. Every vertex of the graph stores the data
element. Edges are used to build a vertices relationship. Social networks are real-life
examples of graph data structure.
Here, we can consider the user as a vertex, and the relation between him/her with other users
is called an edge. There is no limitation for building the relationship between any two vertexes of the
graph like a tree. We can implement the graph data structure using the array or LinkedList.
# Difference between linear data structure and non-linear data structure
Here, we have explained the key difference between linear and non-linear data structure in
the table format.
Interface Specification: The first step in implementing an ADT is defining the interface,
which includes a list of operations that can be performed on the data type. These operations
define the behaviour of the ADT.
For example, for a stack ADT, operations might include push, pop, and peek.
Design of Data Structure: Once the interface is defined, the next step is to choose an
appropriate data structure to implement the ADT. The choice of data structure depends on the
operations supported by the ADT and their efficiency requirements.
Common data structures used to implement ADTs include arrays, linked lists, trees,
hash tables, and more.
Implementation of Operations: After selecting the data structure, each operation defined in
the interface needs to be implemented. These implementations ensure that the operations
behave according to the specified semantics.
For instance, implementing a push operation on a stack might involve adding an
element to the top of an array or linked list.
This ensures that users interact with the ADT only through the defined operations,
maintaining the integrity of the data structure.
This involves testing each operation individually as well as testing the ADT as a
whole to verify its correctness and efficiency.
This includes documenting the interface, the behaviour of each operation, any
preconditions or post conditions, and examples of usage.
By following these steps, developers can create reusable and modular ADTs that
provide a clear and consistent interface for working with abstract data types, while hiding the
complexities of their implementation.
Now we’ll define three ADTs namely List ADT, Stack ADT, Queue ADT.
1. List ADT
The data is generally stored in key sequence in a list which has a head structure
consisting of count, pointers and address of compare function needed to compare the data in
the list.
The data node contains the pointer to a data structure and a self-referential pointer
which points to the next node in the list.
2. Stack ADT
In Stack ADT Implementation instead of data being stored in each node, the
pointer to data is stored.
The program allocates memory for the data and address is passed to the stack
ADT.
The head node and the data nodes are encapsulated in the ADT. The calling
function can only see the pointer to the stack.
The stack head structure also contains a pointer to top and count of number of
entries currently in stack.
push() – Insert an element at one end of the stack called top.
pop() – Remove and return the element at the top of the stack, if it is not empty.
peek() – Return the element at the top of the stack without removing it, if the stack is
not empty.
size() – Return the number of elements in the stack.
isEmpty() – Return true if the stack is empty, otherwise return false.
isFull() – Return true if the stack is full, otherwise return false.
3. Queue ADT
The queue abstract data type (ADT) follows the basic design of the stack abstract data type.
Each node contains a void pointer to the data and the link pointer to the next element
in the queue. The program’s responsibility is to allocate memory for storing the data.
dequeue() – Remove and return the first element of the queue, if the queue is not empty.
peek() – Return the element of the queue without removing it, if the queue is not empty.
Features of ADT:
Abstract data types (ADTs) are a way of encapsulating data and operations on that
data into a single unit. Some of the key features of ADTs include:
Abstraction: The user does not need to know the implementation of the data structure only
essentials are provided.
Robust: The program is robust and has the ability to catch errors.
Encapsulation: ADTs hide the internal details of the data and provide a public interface for
users to interact with the data. This allows for easier maintenance and modification of the
data structure.
Data Abstraction: ADTs provide a level of abstraction from the implementation details of
the data. Users only need to know the operations that can be performed on the data, not how
those operations are implemented.
Data Structure Independence: ADTs can be implemented using different data structures,
such as arrays or linked lists, without affecting the functionality of the ADT.
Information Hiding: ADTs can protect the integrity of the data by allowing access only to
authorized users and operations. This helps prevent errors and misuse of the data.
Modularity: ADTs can be combined with other ADTs to form larger, more complex data
structures. This allows for greater flexibility and modularity in programming.
Advantages:
Encapsulation: ADTs provide a way to encapsulate data and operations into a single unit,
making it easier to manage and modify the data structure.
Abstraction: ADTs allow users to work with data structures without having to know the
implementation details, which can simplify programming and reduce errors.
Data Structure Independence: ADTs can be implemented using different data structures,
which can make it easier to adapt to changing needs and requirements.
Information Hiding: ADTs can protect the integrity of data by controlling access and
preventing unauthorized modifications.
Modularity: ADTs can be combined with other ADTs to form more complex data structures,
which can increase flexibility and modularity in programming.
Disadvantages:
Overhead: Implementing ADTs can add overhead in terms of memory and processing,
which can affect performance.
Complexity: ADTs can be complex to implement, especially for large and complex data
structures.
Learning Curve: Using ADTs requires knowledge of their implementation and usage, which
can take time and effort to learn.
Limited Flexibility: Some ADTs may be limited in their functionality or may not be suitable
for all types of data structures.
Cost: Implementing ADTs may require additional resources and investment, which can
increase the cost of development.
# Time Complexity
Time complexity is a measure of the amount of time an algorithm takes to run as a
function of the length of its input. It quantifies the number of elementary operations
performed by the algorithm with respect to the input size. Here's a breakdown of common
time complexities and examples:
Describes algorithms whose runtime does not depend on the size of the input.
Describes algorithms that have a runtime proportional to the logarithm of the input size.
Describes algorithms where the runtime grows linearly with the size of the input.
Examples include iterating through an array once, linear search, or scanning a linked list.
Commonly found in efficient sorting algorithms like Merge Sort, Quick Sort, and Heap Sort.
Describes algorithms where the runtime grows quadratically with the size of the input.
Examples include nested loops that iterate over all pairs of elements in a collection,
such as Bubble Sort or Selection Sort.
O(n^k), where k > 1 - Polynomial Time Complexity:
Examples include algorithms with nested loops or recursive algorithms with multiple
recursive calls.
Describes algorithms where the runtime doubles with each additional element in the input.
# Space Complexity
Space complexity in data structures refers to the amount of memory or space required
by a data structure to store and manage its elements. It quantifies the additional memory
overhead beyond the actual data being stored. Here's an overview of space complexity for
common data structures:
Arrays:
Space complexity for storing n elements: O(n) - Arrays require contiguous memory
allocation to store elements. The space complexity is proportional to the number of elements.
Linked Lists:
Space complexity for storing n elements: O(n) - In addition to the space required for
storing data, linked lists also require space for maintaining references (pointers) between
nodes. Each node typically consists of the data and a reference to the next node.
Stacks and Queues:
Space complexity for storing n elements: O(n) - Similar to linked lists, stacks and
queues also require space for maintaining references between elements. Additionally, they
may require auxiliary data structures like arrays or linked lists for implementation.
Trees:
Space complexity for storing n elements: O(n) - The space complexity of trees
depends on the type of tree and its structure. In binary trees, each node has references to left
and right child nodes, adding extra space overhead. Balanced trees like AVL trees or Red-
Black trees may require additional metadata for balancing, increasing space complexity.
Graphs:
Space complexity for storing n vertices and m edges: O(n + m) - Graphs can be
represented using various data structures such as adjacency matrix or adjacency list. The
space complexity depends on the chosen representation. Adjacency matrices require O(n^2)
space, while adjacency lists require O(n + m) space, where m is the number of edges.
Hash Tables:
Space complexity for storing n key-value pairs: O(n) - Hash tables require space
proportional to the number of key-value pairs stored. However, hash tables typically offer
efficient space usage due to collision resolution techniques such as chaining or open
addressing.
ASYMPTOTIC NOTATION:
Asymptotic notation, also known as Big O notation, is a mathematical notation used
to describe the limiting behavior of a function as its input size approaches infinity. It provides
a convenient way to analyze the time or space complexity of algorithms without worrying
about constant factors or lower-order terms. Here are the commonly used asymptotic
notations:
These notations are used to succinctly express the behaviour of algorithms and data
structures in terms of their efficiency and scalability. They help in comparing algorithms,
predicting their performance, and making informed decisions about algorithm selection and
optimization.
# SEARCHING:
Linear search is a technique which traverse the array sequentially to locate given item or search
element.
In Linear search, we access each element of an array one by one sequentially and see
whether it is desired element or not. We traverse the entire list and match each element of
the list with the item whose location is to be found. If the match found then location of
the item is returned otherwise the algorithm return NULL.
A search is successful then it will return the location of desired element
If A search will unsuccessful if all the elements are accessed and desired element not
found.
Linear search is mostly used to search an unordered list in which the items are not sorted.
Step 2 - Compare the search element with the first element in the list.
Step 3 - If both are matched, then display "Given element is found!!!" and terminate the
function
Step 4 - If both are not matched, then compare search element with the next element in the
list.
Step 5 - Repeat steps 3 and 4 until search element is compared with last element in the list.
Step 6 - If last element in the list also doesn't match, then display "Element is not found!!!"
and terminate the function.
Example:
Consider the following list of elements and the element to be searched...
2. Binary Search
Binary Search in C is generally used to solve a wide range of issues in computer science and
real-world scenarios related to searching.
Now suppose, we are provided with a sorted array and a target element (let's say k) and we
want to search if k exists in the array or not, and if k does exist in the array, we return the
index/position of k in the array.
So, let's consider a sorted array shown in the image below and try to search number 6 and
return its index. Please note the array has a total of 7 elements.
In the first iteration of binary search, we check if the middle element is equal to 6. If
it is equal, we return the mid index. Here, arr[mid] = arr[3] = 9, i.e., not equal to 6.
So, we check if the middle element is greater than or less than 6. Now, 9 is greater
than 6, so we assign mid - 1 (= 2) index to the end variable that reduces the array size
by half (we know that 6 will be present on the left side of the array, as we are working
in a sorted array).
In the second iteration, again, we check if the middle element is equal to, greater
than, or smaller than 6. Now, the middle element 4 is greater than 6, so we assign mid
+ 1 index to the start variable.
In the third iteration, the middle element equals 6, so we return the mid index value,
i.e., 2.
The time complexity of binary search algorithm is:
It is efficient because it continually divides the search space in half until it finds the
element or only one element remains in the list to be tested.
It specifies whether the element being searched is before or after the current place in
the list.
It is the fastest searching algorithm with the worst-case time complexity of O(log
N) and works best for large lists.
# Sorting Technics:
Sorting refers to rearrangement of a given array or list of elements according to a comparison
operator on the elements. The comparison operator is used to decide the new order of elements in
the respective data structure.
When we have a large amount of data, it can be difficult to deal with it, especially when it is
arranged randomly. When this happens, sorting that data becomes crucial. It is necessary to sort
data in order to make searching easier.
The sorting algorithm is important in Computer Science because it reduces the complexity of a
problem. There is a wide range of applications for these algorithms, including searching
algorithms, database algorithms, divide and conquer methods, and data structure algorithms.
1. Bubble Sort Algorithm
Bubble Sort is the simplest sorting algorithm that works by repeatedly swapping the
adjacent elements if they are in the wrong order. This algorithm is not suitable for
large data sets as its average and worst-case time complexity is quite high.
In Bubble Sort algorithm,
traverse from left and compare adjacent elements and the higher one is placed at right
side.
In this way, the largest element is moved to the rightmost end at first.
This process is then continued to find the second largest and place it and so on until
the data is sorted.
First Pass:
The largest element is placed in its correct position, i.e., the end of the array.
Second Pass:
Third Pass:
Best Case: The best case occurs when the array is already sorted. So the number of
comparisons required is N-1 and the number of swaps required = 0. Hence the best case
complexity is O(N).
Worst Case: The worst-case condition for bubble sort occurs when elements of the array are
arranged in decreasing order.
In the worst case, the total number of iterations or passes required to sort a given array is
(N-1).
Average Case Time Complexity: The number of comparisons is constant in Bubble Sort. So
in average case, there are O(N2) comparisons. This is because irrespective of the arrangement
of elements, the number of comparisons C(N) is same.
2. Selection Sort
Selection sort is another sorting technique in which we find the minimum element in
every iteration and place it in the array beginning from the first index. Thus, a selection sort
also gets divided into a sorted and unsorted sub array.
Let’s consider the following array as an example: arr[] = {64, 25, 12, 22, 11}
First pass:
For the first position in the sorted array, the whole array is traversed from index 0 to 4
sequentially. The first position where 64 is stored presently, after traversing whole array it is
clear that 11 is the lowest value.
64 25 12 22 11
Thus, replace 64 with 11. After one iteration 11, which happens to be the least value in the
array, tends to appear in the first position of the sorted list.
11 25 12 22 64
Second Pass:
For the second position, where 25 is present, again traverse the rest of the array in a
sequential manner.
11 25 12 22 64
After traversing, we found that 12 is the second lowest value in the array and it should appear
at the second place in the array, thus swap these values.
11 12 25 22 64
Third Pass:
Now, for third place, where 25 is present again traverse the rest of the array and find the third
least value present in the array.
11 12 25 22 64
While traversing, 22 came out to be the third least value and it should appear at the third
place in the array, thus swap 22 with element present at third position.
11 12 22 25 64
Fourth pass:
Similarly, for fourth position traverse the rest of the array and find the fourth least element in
the array
As 25 is the 4th lowest value hence, it will place at the fourth position.
11 12 22 25 64
Fifth Pass:
At last the largest value present in the array automatically get placed at the last position in the
array
The resulted array is the sorted array.
11 12 22 25 64
Time Complexity: The time complexity of Selection Sort is O(N2) as there are two nested
loops:
Another loop to compare that element with every other Array element = O(N)
Auxiliary Space: O(1) as the only extra memory used is for temporary variables while
swapping two values in Array. The selection sort never makes more than O(N) swaps and can
be useful when memory writing is costly.
3. Insertion Sort
Insertion sort is a simple sorting algorithm that works similarly to the way you sort
playing cards in your hands. The array is virtually split into a sorted and an unsorted part.
Values from the unsorted part are picked and placed at the correct position in the sorted part.
12 11 13 5 6
First Pass:
Initially, the first two elements of the array are compared in insertion sort.
12 11 13 5 6
Here, 12 is greater than 11 hence they are not in the ascending order and 12 is not at
its correct position. Thus, swap 11 and 12.
11 12 13 5 6
Second Pass:
11 12 13 5 6
Here, 13 is greater than 12, thus both elements seems to be in ascending order, hence,
no swapping will occur. 12 also stored in a sorted sub-array along with 11
Third Pass:
Now, two elements are present in the sorted sub-array which are 11 and 12
11 12 13 5 6
Both 5 and 13 are not present at their correct place so swap them
11 12 5 13 6
After swapping, elements 12 and 5 are not sorted, thus swap again
11 5 12 13 6
Fourth Pass:
Now, the elements which are present in the sorted sub-array are 5, 11 and 12
5 11 12 13 6
Clearly, they are not sorted, thus perform swap between both
5 11 12 6 13
5 11 6 12 13
5 6 11 12 13