DAA Notes
DAA Notes
Binary Search is an efficient algorithm for finding an item from a sorted list of items. It works by
repeatedly dividing the search interval in half. If the value of the search key is less than the item
in the middle of the interval, narrow the interval to the lower half. Otherwise, narrow it to the
upper half. Repeatedly check until the value is found or the interval is empty.
Pseudo-code
Iterative Approach
Recursive Approach
function binarySearch(arr, target, low, high):
if low > high:
return -1 // Base case: target is not found
Time Complexity
● Best Case: O(1) — The best-case scenario occurs when the target element is the
middle element of the array.
● Average and Worst Case: O(log n) — In the average and worst cases, the algorithm
repeatedly divides the search interval in half, leading to a logarithmic time complexity.
Space Complexity
● Iterative Approach: O(1) — The iterative approach uses a constant amount of extra
space.
● Recursive Approach: O(log n) — The recursive approach requires additional space for
the call stack. In the worst case, the depth of the recursion is log n (where n is the
number of elements in the array).
Merge Sort
Merge Sort is a classic divide-and-conquer algorithm for sorting an array or list. It works by
dividing the array into smaller sub-arrays, sorting those sub-arrays, and then merging them back
together.
Detailed Explanation:
1. Divide:
○ If the array has more than one element, split it into two halves.
○ Continue splitting each half until you have sub-arrays of size one. An array of
size one is trivially sorted.
2. Conquer:
○ Recursively apply merge sort to each half.
3. Combine:
○ Merge the sorted halves to produce a sorted array.
○ The merging process involves comparing the elements of each half and
combining them in order.
Pseudo-code:
function mergeSort(arr):
if length(arr) <= 1:
return arr
mid = length(arr) // 2
left = mergeSort(arr[0:mid])
right = mergeSort(arr[mid:])
return result
Recursive Approach:
function mergeSort(arr):
// Base case: if the array has 1 or no elements, it is already
sorted
if length(arr) <= 1:
return arr
return result
Time Complexity:
Space Complexity:
1. Time Complexity:
○ Divide Step: Each divide step takes O(1) time.
○ Conquer Step: Recursively sort two halves, each of size n/2. The recurrence
relation for the time complexity is T(n) = 2T(n/2) + O(n), which solves to O(n log
n).
○ Combine Step: Merging two sorted halves takes O(n) time.
2. Space Complexity:
○ Merge Sort requires additional space for the temporary arrays used during the
merging process. At each level of recursion, temporary arrays are created to hold
the merged results, leading to a space complexity of O(n).
Example Walkthrough:
1. Divide:
○ [38, 27, 43, 3, 9, 82, 10]
○ Split into [38, 27, 43, 3] and [9, 82, 10]
2. Conquer (Recursive calls):
○ Split [38, 27, 43, 3] into [38, 27] and [43, 3]
○ Split [38, 27] into [38] and [27]
○ Split [43, 3] into [43] and [3]
○ Split [9, 82, 10] into [9, 82] and [10]
○ Split [9, 82] into [9] and [82]
3. Combine:
○ Merge [38] and [27] to get [27, 38]
○ Merge [43] and [3] to get [3, 43]
○ Merge [27, 38] and [3, 43] to get [3, 27, 38, 43]
○ Merge [9] and [82] to get [9, 82]
○ Merge [9, 82] and [10] to get [9, 10, 82]
○ Merge [3, 27, 38, 43] and [9, 10, 82] to get [3, 9, 10, 27, 38, 43, 82]
The final sorted array is [3, 9, 10, 27, 38, 43, 82].
Merge Sort is a stable sorting algorithm, meaning that it maintains the relative order of equal
elements in the sorted output. It is particularly useful for large datasets due to its consistent O(n
log n) time complexity.
Quick Sort is a highly efficient sorting algorithm that follows the divide-and-conquer principle. It
works by selecting a 'pivot' element from the array and partitioning the other elements into two
sub-arrays according to whether they are less than or greater than the pivot. The sub-arrays are
then sorted recursively.
1. Choose a Pivot: Select an element from the array as the pivot. This can be done in
various ways, such as picking the first element, the last element, a random element, or
the median.
2. Partitioning: Rearrange the array so that all elements less than the pivot come before it,
and all elements greater than the pivot come after it. The pivot is now in its final position.
3. Recursively Sort the Sub-arrays: Apply the same process to the sub-arrays of
elements with smaller and larger values.
Pseudo-code:
function quickSort(arr, low, high):
if low < high:
// Partition the array and get the pivot index
pi = partition(arr, low, high)
Time Complexity:
● Best Case: O(n log n) - This occurs when the pivot element divides the array into two
nearly equal halves.
● Average Case: O(n log n) - This occurs when the pivot element divides the array into
sub-arrays that are not highly unbalanced.
● Worst Case: O(n^2) - This occurs when the pivot element is either the smallest or the
largest element every time, leading to the most unbalanced partitions.
Space Complexity:
● In-place Sorting: O(log n) - This occurs due to the recursive stack space. Quick Sort is
typically implemented in place, so it doesn't require additional storage for elements.
● Not In-place Sorting: O(n) - If a non-in-place partitioning method is used, the space
complexity can be higher.
Detailed Explanation:
1. Choose a Pivot:
○ The pivot can be any element, but common choices are the first element, the last
element, or a random element. In the pseudo-code, the last element is chosen as
the pivot.
2. Partitioning:
○ The partition function rearranges the array so that all elements less than or equal
to the pivot are on the left, and all elements greater than the pivot are on the
right.
○ A pointer i is used to track the position of the last smaller element.
○ The loop iterates over each element, and if the element is smaller than the pivot,
it is swapped with the element at i+1.
3. Recursive Sorting:
○ After partitioning, the pivot is in its final position.
○ The array is then recursively sorted by applying the same procedure to the left
and right sub-arrays formed by the pivot.
Example:
Detailed Explanation
Traditional Multiplication
To multiply two n-digit numbers, the traditional approach involves performing n^2 single-digit
multiplications, which results in a time complexity of O(n^2).
Karatsuba's Insight
The Karatsuba algorithm improves on this by reducing the number of single-digit multiplications
to O(n^log₂(3)), approximately O(n^1.585). The key idea is to split each number into two halves
and use three multiplications instead of four.
Karatsuba's Approach
1. Split the numbers: For two numbers x and y of length n, split them into two halves:
○ x = x1 * 10^m + x0
○ y = y1 * 10^m + y0 Here, m is n/2, x1 and y1 are the higher-order halves,
and x0 and y0 are the lower-order halves.
2. Recursive multiplications:
○ z2 = karatsuba(x1, y1) (product of higher-order halves)
○ z0 = karatsuba(x0, y0) (product of lower-order halves)
○ z1 = karatsuba(x1 + x0, y1 + y0) - z2 - z0 (cross terms)
3. Combine the results:
○ The final result is z2 * 10^(2*m) + (z1 * 10^m) + z0.
Pseudo-code
pseudo
Copy code
return x * y
n = max(length(x), length(y))
m = n // 2
x0 = x % 10^m
y1 = y // 10^m
y0 = y % 10^m
// Recursive multiplications
z2 = karatsuba(x1, y1)
z0 = karatsuba(x0, y0)
Time Complexity
Space Complexity
The space complexity of the Karatsuba algorithm is O(n) due to the storage requirements for
intermediate results and recursive call stack depth.
Breakdown of Steps
Closest-Pair Problem
The closest-pair problem involves finding the pair of points that are closest to each other in a set
of points in a plane. The most efficient known solution uses a divide-and-conquer approach.
function closestPair(points):
// Sort points by x-coordinate
px = sort(points, by_x_coordinate)
// Sort points by y-coordinate
py = sort(points, by_y_coordinate)
// Base case
if n <= 3:
return bruteForceClosestPair(px)
// Divide
mid = n // 2
midPoint = px[mid]
pxLeft = px[0:mid]
pxRight = px[mid:n]
// Conquer
(d1, pair1) = closestPairRec(pxLeft, pyLeft)
(d2, pair2) = closestPairRec(pxRight, pyRight)
// Combine
strip = [p for p in py if abs(p.x - midPoint.x) < d]
if dStrip < d:
return (dStrip, pairStrip)
else:
return (d, bestPair)
function bruteForceClosestPair(points):
minDist = infinity
bestPair = None
Time Complexity:
Space Complexity:
Convex-Hull Problem
The convex hull of a set of points is the smallest convex polygon that contains all the points.
One of the efficient algorithms to find the convex hull is Graham's scan.
Time Complexity:
● Convex-Hull (Graham's Scan): O(n log n) due to sorting the points by angle.
Space Complexity:
● Convex-Hull (Graham's Scan): O(n) for storing the points on the hull.
Explanation
Closest-Pair Problem:
Key Insight: By combining the results from the two halves and the "strip" region near the
dividing line, we ensure that all potential closest pairs are considered efficiently.
1. Preprocessing: Find the point with the lowest y-coordinate (p0) and sort the points
based on the angle they make with p0.
2. Initialize Hull: Start with the first three sorted points.
3. Iterate: For each point, ensure it makes a counterclockwise turn with the last two points
in the hull. If not, remove the last point from the hull until a counterclockwise turn is
made.
Key Insight: By sorting the points and ensuring only counterclockwise turns, we efficiently
construct the convex hull.
Control abstraction in the context of greedy algorithms refers to the high-level approach or
structure used to implement and control the execution of such algorithms. Greedy algorithms
are problem-solving techniques that make locally optimal choices at each step with the hope of
finding a global optimum solution.
1. Problem Definition: Clearly define the problem and the objective function to be
optimized.
2. Greedy Choice Property: Identify the greedy choice property, which states that at each
step, the locally optimal choice should be made without considering the consequences
of the choice on future steps. This choice is often based on a heuristic that guides the
selection process.
3. Feasibility: Ensure that the chosen solution at each step satisfies the problem
constraints and is feasible.
4. Optimality: Establish that the locally optimal choices made at each step collectively lead
to a globally optimal solution.
5. Iterative Approach: Implement a loop or iterative approach to repeatedly make greedy
choices until a termination condition is met. This typically involves iterating over the
elements of the problem instance and selecting the greedy choice at each step.
6. Termination Condition: Define a termination condition that stops the iterative process
when a satisfactory solution is obtained or when certain criteria are met.
7. Solution Construction: Construct the final solution based on the greedy choices made
at each step.
Time Complexity: The time complexity of a greedy algorithm depends on the specific problem
being solved and the efficiency of the implementation. In general, the time complexity can range
from linear to polynomial time, depending on the problem size and the computational complexity
of the greedy choice evaluation.
Space Complexity: The space complexity of a greedy algorithm typically depends on the data
structures used for problem representation and solution construction. In many cases, the space
complexity is linear or logarithmic in the size of the input data.
In the Fractional Knapsack problem, we are given a set of items, each with a weight and a
value, and a knapsack with a maximum weight capacity. The goal is to select a subset of items
to maximize the total value while keeping the total weight within the capacity of the knapsack.
Unlike the 0/1 Knapsack problem, in the Fractional Knapsack problem, we can take fractions of
items, i.e., we can take a part of an item if it maximizes the value.
Pseudo-code:
totalValue = 0
remainingCapacity = capacity
if remainingCapacity == 0:
break
totalValue += item.value
remainingCapacity -= item.weight
else:
remainingCapacity = 0
return totalValue
Time Complexity:
● Sorting the items based on value per unit weight takes O(n log n) time, where n is the
number of items.
● The loop to select items takes O(n) time.
● Therefore, the overall time complexity is O(n log n) + O(n) = O(n log n), dominated by
the sorting.
Space Complexity:
● The space complexity is O(1) because we only need a constant amount of extra space
for variables like totalValue and remainingCapacity.
Explanation:
1. Sorting: First, we sort the items based on their value per unit weight in non-ascending
order. This allows us to consider items with higher value-to-weight ratios first, maximizing
the value we can obtain.
2. Greedy Selection: We iterate through the sorted items and select items as long as there
is remaining capacity in the knapsack. If the weight of the current item is less than or
equal to the remaining capacity, we take the entire item. Otherwise, we take a fraction of
the item that fits into the remaining capacity, maximizing the value.
3. Return: Finally, we return the total value obtained by selecting items.
The Fractional Knapsack problem can be efficiently solved using a greedy strategy, where we
always choose the item with the highest value per unit weight at each step. The pseudo-code
provided outlines this approach and achieves an optimal solution.
Prim's Algorithm:
1. Greedy Strategy: Prim's algorithm follows a greedy strategy, meaning it selects the
locally optimal choice at each step, which eventually leads to the globally optimal
solution.
2. Starting Point: Prim's algorithm requires a starting vertex to begin the construction of
the MST. The choice of the starting vertex does not affect the final MST, as the algorithm
will always find the same MST regardless of the starting point.
3. Priority Queue: Prim's algorithm uses a priority queue to efficiently select the next edge
to include in the MST. At each step, it chooses the edge with the smallest weight that
connects a vertex in the MST to a vertex outside the MST. This ensures that the
algorithm always selects the edge that contributes the least to the total weight of the
MST.
4. Lazy Implementation: Prim's algorithm can be implemented using a lazy approach,
where edges are added to the priority queue without removing duplicates. This approach
simplifies the implementation but may lead to redundant edges in the priority queue and
slightly higher time complexity.
Pseudo code:
function prim(graph):
initialize an empty list to store the edges of the MST
initialize a set to keep track of vertices already in the MST
initialize a priority queue to store edges
select the edge with the smallest weight from the priority queue
if the edge connects a vertex in MST set to a vertex not in MST set:
add the edge to the MST set
add the edge to the MST list
return MST
Time Complexity:
Space Complexity:
1. Sorting Edges: Kruskal's algorithm begins by sorting all the edges of the graph in
non-decreasing order of their weights. This sorting step is essential to ensure that the
algorithm considers edges in increasing order of weight during the MST construction
process.
2. Disjoint-Set Data Structure: Kruskal's algorithm utilizes a disjoint-set data structure
(also known as union-find) to efficiently determine whether adding an edge to the MST
will create a cycle. This data structure allows the algorithm to quickly check whether two
vertices belong to the same connected component and avoid adding edges that would
create cycles.
3. Edge Selection: Kruskal's algorithm iterates over the sorted edges and greedily adds
edges to the MST as long as they do not form cycles. This approach ensures that the
algorithm always selects the smallest available edge that connects two disjoint
components of the graph, leading to the construction of the MST.
4. Optimizations: Kruskal's algorithm can be further optimized by using techniques such
as path compression and union by rank in the disjoint-set data structure. These
optimizations help improve the overall efficiency of the algorithm, particularly for graphs
with a large number of vertices and edges.
Pseudo-code:
function kruskal(graph):
sort edges of graph in ascending order of weight
initialize an empty list to store the edges of the MST
initialize a disjoint-set data structure to keep track of
connected components
return MST
Time Complexity:
Space Complexity:
Comparison:
● Time Complexity: Prim's algorithm typically has a time complexity of O(E log V), while
Kruskal's algorithm has a time complexity of O(E log E) or O(E log V), depending on the
implementation. In practice, Kruskal's algorithm may be faster for sparse graphs with
fewer edges, while Prim's algorithm may be faster for dense graphs with more edges.
● Space Complexity: Both algorithms have a space complexity of O(V) to store data
structures such as priority queues and disjoint-set forests.
● Implementation Complexity: Prim's algorithm is generally easier to implement
compared to Kruskal's algorithm, especially when using a priority queue data structure.
Kruskal's algorithm requires additional bookkeeping for edge sorting and cycle detection
using disjoint-set data structures.
Dijkstra's Algorithm
Explanation:
1. Initialization:
○ Create an array dist[] to store the shortest distance from the source vertex to
each vertex in the graph. Initialize dist[] with infinity except for the source
vertex, which is initialized to 0.
○ Create a set visited[] to keep track of vertices whose shortest distance from
the source vertex is already calculated. Initially, all vertices are unvisited.
2. Main Loop:
○ Repeat until all vertices are visited:
■ Choose the unvisited vertex u with the minimum dist[u].
■ Mark u as visited.
■ Update dist[v] for all unvisited neighbors v of u such that dist[v] >
dist[u] + weight(u, v). This update guarantees the shortest path
from the source to v through u.
3. Result:
○ After the algorithm terminates, dist[] contains the shortest distances from the
source vertex to all other vertices in the graph.
Pseudo-code:
plaintext
Copy code
function Dijkstra(graph, source):
dist[source] = 0
visited = empty set
return dist
Time Complexity:
● The time complexity of Dijkstra's algorithm depends on the data structure used to
implement the priority queue.
● With a simple implementation using arrays to store distances, the time complexity is
O(V^2) for dense graphs, where V is the number of vertices.
● With a binary heap or Fibonacci heap as the priority queue, the time complexity is O((V +
E) log V), where E is the number of edges.
Space Complexity:
● The space complexity of Dijkstra's algorithm is O(V), where V is the number of vertices.
This includes the space for the dist[] array and the visited[] set.
Control abstraction in dynamic programming refers to the design pattern or methodology used
to encapsulate the repetitive control flow and decision-making logic involved in solving
optimization problems using dynamic programming techniques. By abstracting away the details
of how the problem is solved, control abstraction allows programmers to focus on the problem's
essence and promotes code reuse and maintainability.
1. Define Subproblems: Identify the subproblems that need to be solved to find the
optimal solution to the original problem. Break down the problem into smaller,
overlapping subproblems.
2. Memoization or Tabulation: Decide whether to use memoization (top-down) or
tabulation (bottom-up) to store the solutions to subproblems and avoid redundant
computations. Memoization involves caching the results of already solved subproblems,
while tabulation involves filling up a table iteratively.
3. Recurrence Relation: Formulate a recurrence relation that expresses the optimal
solution to the original problem in terms of solutions to its subproblems. This relation
defines the dependency between subproblems and helps in solving them recursively or
iteratively.
4. Abstract Control Flow: Implement control flow constructs such as loops, conditionals,
and function calls to orchestrate the solution process. Use control abstraction to
encapsulate the repetitive decision-making logic and simplify the code structure.
5. Optimization: Leverage optimization techniques such as pruning, early termination, or
problem-specific optimizations to improve the efficiency of the dynamic programming
solution. These optimizations can be integrated into the control abstraction to enhance
performance.
6. Error Handling and Edge Cases: Handle error conditions and edge cases gracefully
within the control abstraction. Ensure that the solution is robust and handles corner
cases correctly.
Knapsack (0/1)
The 0/1 knapsack problem is a classic optimization problem where given a set of items, each
with a weight and a value, the goal is to determine the maximum value that can be obtained by
selecting a subset of the items while ensuring that the total weight does not exceed a given
capacity.
Here's how you can approach the 0/1 knapsack problem using dynamic programming:
// Bottom-up iteration
for i from 1 to n:
for w from 1 to W:
if weights[i-1] <= w:
dp[i][w] = max(values[i-1] + dp[i-1][w-weights[i-1]],
dp[i-1][w])
else:
dp[i][w] = dp[i-1][w]
return dp[n][W]
Time Complexity: O(nW), where n is the number of items and W is the capacity of the
knapsack.
Space Complexity: O(nW), the space required to store the dynamic programming table.
This dynamic programming approach efficiently solves the 0/1 knapsack problem by considering
all possible combinations of items and capacities in a bottom-up manner, leading to an optimal
solution.
Matrix chain multiplication
Matrix chain multiplication is another classic optimization problem where given a sequence of
matrices, the goal is to determine the most efficient way to multiply these matrices together. The
order of multiplication significantly affects the total number of scalar multiplications needed to
compute the final product.
Here's how you can approach the matrix chain multiplication problem using dynamic
programming:
plaintext
Copy code
function matrixChainOrder(p, n):
dp = new array[n+1][n+1]
// Bottom-up iteration
for l from 2 to n:
for i from 1 to n-l+1:
j = i + l - 1
dp[i][j] = infinity
for k from i to j-1:
cost = dp[i][k] + dp[k+1][j] + p[i-1] * p[k] * p[j]
dp[i][j] = min(dp[i][j], cost)
return dp[1][n]
Space Complexity: O(n^2), the space required to store the dynamic programming table.
This dynamic programming approach efficiently solves the matrix chain multiplication problem
by considering all possible partition points within the matrix chain and computing the minimum
cost of multiplication for each subchain length. The final result represents the minimum cost of
multiplying the entire matrix chain.
The longest common subsequence (LCS) problem is a classic dynamic programming problem
where given two sequences (e.g., strings), the goal is to find the longest subsequence that is
present in both of them.
Here's how you can approach the LCS problem using dynamic programming:
1. Initialization: Create a 2D array dp of size (m+1) x (n+1), where m and n are the
lengths of the two sequences. Initialize the first row and first column of dp with zeros.
2. Bottom-up Iteration:
○ Iterate over each character i of the first sequence and each character j of the
second sequence.
○ If the characters match (sequence1[i-1] == sequence2[j-1]), increment
the value of dp[i][j] by 1 plus the value of dp[i-1][j-1].
○ If the characters don't match, set dp[i][j] to the maximum of dp[i-1][j]
and dp[i][j-1], representing the longest common subsequence found so far.
3. Result: The length of the longest common subsequence is stored in dp[m][n].
// Bottom-up iteration
for i from 1 to m:
for j from 1 to n:
if sequence1[i-1] == sequence2[j-1]:
dp[i][j] = dp[i-1][j-1] + 1
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
return dp[m][n]
Time Complexity: O(m * n), where m and n are the lengths of the two sequences.
Space Complexity: O(m * n), the space required to store the dynamic programming table.
This dynamic programming approach efficiently solves the LCS problem by considering all
possible subproblems and building up the solution from smaller subproblems. The final result
represents the length of the longest common subsequence between the two input sequences
Algorithm Steps:
Pseudo-code:
function floydWarshall(graph):
V = number of vertices in the graph
dist[][] = new array[V][V]
// Initialize distances
for i from 0 to V-1:
for j from 0 to V-1:
if i == j:
dist[i][j] = 0
else if there is an edge from i to j:
dist[i][j] = weight of the edge
else:
dist[i][j] = infinity
// Main loop
for k from 0 to V-1:
for i from 0 to V-1:
for j from 0 to V-1:
dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j])
return dist
Time Complexity:
● The time complexity of the Floyd-Warshall algorithm is O(V^3), where V is the number of
vertices in the graph.
Space Complexity:
● The space complexity is O(V^2), as we need to store the distances between all pairs of
vertices in a 2D array.
n-Queen problem
The N-Queens problem is a classic combinatorial problem in the field of computer science and
mathematics. The goal is to place N queens on an N×N chessboard in such a way that no two
queens threaten each other. In other words, no two queens can share the same row, column, or
diagonal.
Problem Statement:
Given an N×N chessboard, find a placement of N queens such that no two queens attack each
other.
Backtracking Algorithm:
One of the most common ways to solve the N-Queens problem is by using a backtracking
algorithm. The backtracking algorithm works by recursively exploring all possible placements of
queens on the chessboard and backtracking when a solution cannot be found.
Algorithm Steps:
Pseudo-code:
function solveNQueens(N):
solutions = empty list
return solutions
Time Complexity:
● The time complexity of the backtracking algorithm for the N-Queens problem is
exponential, O(N!), where N is the size of the chessboard. This is because the number of
possible configurations to explore grows rapidly with the size of the board.
Space Complexity:
Example:
. . Q . // Solution 1
Q . . .
. . . Q
. Q . .
. Q . . // Solution 2
. . . Q
Q . . .
. . Q .
The solutions above show valid placements of queens on a 4×4 chessboard where no two
queens threaten each other.
The Graph Coloring Problem is a classic problem in graph theory and computer science. The
goal is to assign colors to the vertices of a graph in such a way that no two adjacent vertices
have the same color, while using the fewest number of colors possible. This problem has many
real-world applications, such as scheduling, register allocation in compilers, and map coloring.
Problem Statement:
Given an undirected graph G, the Graph Coloring Problem asks whether it is possible to assign
colors to the vertices of G such that no two adjacent vertices share the same color.
Naive Approach:
One simple approach to solve the Graph Coloring Problem is to try all possible color
assignments recursively and backtrack when a valid coloring cannot be achieved. This
brute-force approach is exponential in time complexity and is not feasible for large graphs.
A more efficient approach to solve the Graph Coloring Problem is to use a greedy algorithm.
The basic idea is to iteratively assign colors to the vertices of the graph, starting from an
arbitrary vertex and using the smallest available color that does not conflict with the colors of
adjacent vertices.
1. Initialization: Start with an empty list of colors and an empty set of colored vertices.
2. Vertex Selection: Select a vertex from the graph.
3. Color Assignment: Assign the smallest available color to the selected vertex such that
it does not conflict with the colors of adjacent vertices.
4. Repeat: Repeat steps 2 and 3 for all remaining uncolored vertices until all vertices are
colored.
5. Result: After the algorithm terminates, the list of colors assigned to the vertices
represents a valid coloring of the graph.
return colors
Time Complexity:
● The time complexity of the greedy coloring algorithm depends on the method used to
check for conflicts and the ordering of vertices. In the worst case, the algorithm has a
time complexity of O(V^2), where V is the number of vertices in the graph.
Space Complexity:
● The space complexity is O(V), where V is the number of vertices in the graph, as the
algorithm only needs to store the colors assigned to the vertices.
Problem Statement:
Given an undirected graph GGG, the Hamiltonian Cycle problem asks whether there exists a
cycle that visits each vertex exactly once and returns to the starting vertex.
Approach:
A common approach to solve the Hamiltonian Cycle problem is to use a backtracking algorithm.
This approach involves trying to build the cycle step by step and backtracking when a partial
solution cannot be extended to a complete solution.
1. Initialization: Start with an initial vertex and mark it as visited. Create an empty path and
add the initial vertex to it.
2. Recursive Function:
○ Recursively try to add vertices to the path one by one.
○ Check if the current vertex can be added to the path without violating the
Hamiltonian Cycle constraints (i.e., no vertex is visited more than once and the
added vertex is adjacent to the last vertex in the path).
○ If adding the current vertex results in a complete Hamiltonian cycle (i.e., the path
includes all vertices and the last vertex is adjacent to the starting vertex), return
true.
○ If no valid vertex can be added, backtrack by removing the last added vertex and
try the next possibility.
3. Termination: If a Hamiltonian cycle is found, return true; otherwise, return false.
return false
Time Complexity:
● The time complexity of the backtracking algorithm for the Hamiltonian Cycle problem is
O(V!), where VVV is the number of vertices in the graph. This is because, in the worst
case, the algorithm explores all possible permutations of the vertices.
Space Complexity:
● The space complexity is O(V), where VVV is the number of vertices in the graph. This is
due to the space needed to store the path and the recursion stack.
Example:
Consider the following undirected graph:
A - B
| \ |
D - C
A: [B, C, D]
B: [A, C]
C: [A, B, D]
D: [A, C]
Depth-First Search (DFS) is a search algorithm that explores a graph by going as deep as
possible along each branch before backtracking. It uses a Last-In-First-Out (LIFO) data
structure, such as a stack, to manage the nodes to be explored next.
return visited
Time Complexity:
● Time Complexity: O(V + E), where V is the number of vertices and E is the number of
edges.
● Space Complexity: O(V) for the stack and visited set.
Breadth-First Search (BFS) is a search algorithm that explores a graph layer by layer, visiting
all the neighbors of a node before moving to the next level. It uses a First-In-First-Out (FIFO)
data structure, such as a queue, to manage the nodes to be explored next.
return visited
Time Complexity:
● Time Complexity: O(V + E), where V is the number of vertices and E is the number of
edges.
● Space Complexity: O(V) for the queue and visited set.
Dijkstra's Algorithm is a search algorithm used to find the shortest paths from a single source
vertex to all other vertices in a weighted graph with non-negative edge weights. It uses a priority
queue to always expand the least-cost node next.
return dist
Time Complexity:
Summary:
● LIFO Search (DFS): Uses a stack, explores as deep as possible, suitable for exploring
all nodes and paths.
● FIFO Search (BFS): Uses a queue, explores all neighbors level by level, suitable for
finding the shortest path in an unweighted graph.
● Least Cost Search (Dijkstra's Algorithm): Uses a priority queue, finds the shortest
path in a weighted graph with non-negative edge weights.
15 Puzzle Problem
The 15 Puzzle (also called the 16 Puzzle or the Sliding Puzzle) is a classic problem in
combinatorial mathematics and computer science. It consists of a 4x4 grid with 15 numbered
tiles and one empty space. The goal is to arrange the tiles in numerical order by making sliding
moves that use the empty space.
Problem Statement
Given a 4x4 grid with tiles numbered from 1 to 15 and one empty space, the objective is to
move the tiles until the numbers are in order from 1 to 15 with the empty space in the last
position.
BFS can be used to find the shortest sequence of moves to solve the puzzle. It explores all
possible moves level by level starting from the initial configuration.
return neighbors
function reconstructPath(state):
path = []
while state.parent is not None:
path.append(state)
state = state.parent
path.reverse()
return path
Time Complexity:
Problem Statement
Given a set of nnn cities and a distance function d(i,j)d(i, j)d(i,j) representing the distance
between city iii and city jjj, find a tour (a cycle that visits each city exactly once) of minimum total
distance.
Brute-Force Approach
The brute-force approach to solve the TSP is to generate all possible permutations of the cities,
calculate the total distance for each permutation, and select the permutation with the minimum
total distance.
Time Complexity:
● Time Complexity: O(n!), where nnn is the number of cities. This is because there are
n!n!n! permutations to evaluate.
● Space Complexity: O(n), primarily for storing the current permutation and distances
The classes of NP-Hard and NP-Complete problems are central to the study of computational
complexity theory. Understanding these classes involves knowing about the broader class of
problems known as NP (nondeterministic polynomial time).
Definitions
A problem is in NP if:
NP-Hard
NP-Complete
Relationships
● P ⊆ NP: All problems that can be solved in polynomial time can also be verified in
polynomial time.
● NP ⊆ NP-Hard: NP-Complete problems are a subset of NP-Hard problems.
● P vs NP Question: It is an open question whether P = NP, i.e., whether every problem
whose solution can be verified in polynomial time can also be solved in polynomial time.
Examples
NP-Complete Problems
1. Traveling Salesman Problem (Decision Version): Given a set of cities and distances
between them, is there a tour of length at most kkk?
2. Boolean Satisfiability Problem (SAT): Given a Boolean formula, is there an
assignment of truth values to variables that makes the formula true?
3. Clique Problem: Given a graph and an integer kkk, is there a clique (a subset of
vertices, all of which are adjacent to each other) of size kkk?
4. Subset Sum Problem: Given a set of integers, is there a non-empty subset whose sum
is zero?
1. Halting Problem: Given a program and an input, does the program halt or run forever?
(This problem is undecidable and not in NP, but it's NP-Hard.)
2. General Travelling Salesman Problem (Optimization Version): Finding the shortest
possible route that visits each city and returns to the origin city. (Optimization problems
like these are generally NP-Hard but not in NP because the solution is not just a yes/no
answer.)
Reductions
Summary
● NP-Hard Problems: At least as hard as the hardest problems in NP. Solving any
NP-Hard problem in polynomial time would solve all NP problems in polynomial time.
● NP-Complete Problems: Problems that are both in NP and NP-Hard. These are the
hardest problems in NP, and finding a polynomial-time solution to any NP-Complete
problem would imply P = NP.
Understanding NP-Hard and NP-Complete problems helps in recognizing the inherent difficulty
of many computational problems and guides the development of algorithms and heuristics for
practical solutions.
Problem Statement
Given a text TTT of length nnn and a pattern PPP of length mmm, find all occurrences of PPP in
TTT.
Approach
The brute force approach involves checking for the pattern PPP starting from each possible
position in the text TTT. For each position, it checks if the substring of TTT starting at that
position matches PPP.
Pseudo-code
function BruteForceStringMatching(T, P):
n = length(T)
m = length(P)
for i from 0 to n - m:
j = 0
while j < m and T[i + j] == P[j]:
j += 1
if j == m:
print("Pattern found at index", i)
Explanation
1. Initialization:
○ nnn: Length of the text TTT.
○ mmm: Length of the pattern PPP.
2. Outer Loop: Iterate over each possible starting position iii in the text TTT where PPP
could potentially match. This loop runs from 000 to n−mn - mn−m because if the pattern
starts at n−m+1n - m + 1n−m+1 or later, there aren't enough characters left in the text for
the pattern to fit.
3. Inner Loop: For each starting position iii, check each character in PPP against the
corresponding character in TTT. The inner loop increments jjj as long as the characters
match.
4. Check Match: If the inner loop completes and jjj equals mmm, then PPP matches TTT
starting at index iii.
5. Output: Print the starting index iii whenever a match is found.
Time Complexity
Space Complexity
Example
1. i=0i = 0i=0: Compare "ABAB" with "ABAB" (match found at index 0).
2. i=1i = 1i=1: Compare "BABA" with "ABAB" (no match).
3. i=2i = 2i=2: Compare "ABAB" with "ABAB" (match found at index 2).
4. i=3i = 3i=3: Compare "BABA" with "ABAB" (no match).
5. i=4i = 4i=4: Compare "ABAB" with "ABAB" (match found at index 4).
Summary
Brute Force String Matching is easy to understand and implement but can be inefficient for large
texts and patterns due to its O(n⋅m)O(n \cdot m)O(n⋅m) time complexity. More efficient
algorithms like the Knuth-Morris-Pratt (KMP) or Boyer-Moore algorithms are often used in
practice for large-scale string matching problems. However, the brute force approach is useful in
scenarios where the simplicity of implementation outweighs the need for efficiency.
Problem Statement
Given a text TTT of length nnn and a pattern PPP of length mmm, find all occurrences of PPP in
TTT.
Approach
1. Preprocessing the Pattern: Construct the longest prefix suffix (lps) array.
2. Searching the Text: Use the lps array to perform efficient matching.
Pseudo-code
The lps array is constructed to record the longest proper prefix of the pattern that is also a suffix
for each position in the pattern. This helps in determining the next positions to be compared
when a mismatch occurs.
function computeLPSArray(P):
m = length(P)
lps = array of size m initialized to 0
length = 0 // length of previous longest prefix suffix
i = 1
while i < m:
if P[i] == P[length]:
length += 1
lps[i] = length
i += 1
else:
if length != 0:
length = lps[length - 1]
else:
lps[i] = 0
i += 1
return lps
Using the lps array, perform the pattern search on the text.
i = 0 // index for T
j = 0 // index for P
while i < n:
if P[j] == T[i]:
i += 1
j += 1
if j == m:
print("Pattern found at index", i - j)
j = lps[j - 1]
elif i < n and P[j] != T[i]:
if j != 0:
j = lps[j - 1]
else:
i += 1
Explanation
1. computeLPSArray(P):
○ Initialize the lps array and set the length of the previous longest prefix suffix to 0.
○ Iterate over the pattern. If characters match, increment the length and update the
lps array.
○ If characters do not match and the length is not zero, update the length to the
value at the previous index in the lps array.
○ If characters do not match and the length is zero, set the current lps value to 0
and move to the next index.
2. KMPSearch(T, P):
○ Initialize indices for the text and pattern.
○ Iterate over the text. If characters match, move both indices forward.
○ If the entire pattern matches, record the match position and use the lps array to
update the pattern index.
○ If characters do not match and the pattern index is not zero, use the lps array to
update the pattern index.
○ If characters do not match and the pattern index is zero, move the text index
forward.
Time Complexity
● Time Complexity:
○ Preprocessing the pattern (computing the lps array): O(m)O(m)O(m)
○ Searching the text: O(n)O(n)O(n)
○ Total time complexity: O(n+m)O(n + m)O(n+m)
● Space Complexity:
○ The lps array requires O(m)O(m)O(m) space.
○ Overall space complexity: O(m)O(m)O(m)
Example
Summary
The Knuth-Morris-Pratt (KMP) algorithm efficiently solves the string matching problem by
preprocessing the pattern to create an lps array. This allows the algorithm to skip unnecessary
comparisons during the search phase, resulting in a linear time complexity relative to the length
of the text and pattern. This makes KMP a significant improvement over the brute force string
matching approach.
Rabin-Karp Algorithm
The Rabin-Karp algorithm is a string searching algorithm that uses hashing to find occurrences
of a pattern within a text efficiently. It computes the hash value of the pattern and then slides a
window over the text, computing the hash value of each window and comparing it with the hash
value of the pattern.
Problem Statement
Given a text TTT of length nnn and a pattern PPP of length mmm, find all occurrences of PPP in
TTT.
Approach
The Rabin-Karp algorithm uses hashing to efficiently find matches between the pattern and
substrings of the text. It works as follows:
1. Hashing: Compute the hash value of the pattern PPP and of each window of length
mmm in the text TTT.
2. Comparison: Compare the hash value of each window with the hash value of the
pattern. If the hash values match, perform a character-by-character comparison to
confirm the match.
3. Slide Window: Move the window one position to the right and recompute the hash value
of the new window.
Pseudo-code
for i from 0 to n - m:
if hash_T == hash_P and T[i:i+m] == P:
print("Pattern found at index", i)
if i < n - m:
hash_T = rehash(hash_T, T[i], T[i+m])
function hash(s):
h = 0
for c in s:
h = (h * BASE + ord(c)) % PRIME
return h
Explanation
1. Hashing:
○ The hash function computes the hash value of a string using a rolling hash
technique, where each character is multiplied by a base value and summed.
○ The hash value is calculated modulo a prime number to prevent overflow and
ensure uniform distribution of hash values.
2. Comparison:
○ Compare the hash value of each window of length mmm with the hash value of
the pattern.
○ If the hash values match, perform a character-by-character comparison to
confirm the match.
3. Slide Window:
○ Move the window one position to the right by removing the leftmost character and
adding the next character from the text.
○ Recompute the hash value of the new window using a rolling hash technique.
Time Complexity
Example
Consider the following example:
Summary
The Rabin-Karp algorithm efficiently finds occurrences of a pattern within a text by using
hashing to compare substrings. It has an average-case time complexity of O(n+m)O(n +
m)O(n+m) and a worst-case time complexity of O(n⋅m)O(n \cdot m)O(n⋅m), making it suitable
for practical use in many scenarios. However, it may not perform well with patterns and texts of
very large sizes due to potential hash collisions.
The Naive String Matching algorithm is the simplest approach to find all occurrences of a
pattern within a text. It involves checking for a match starting from each position in the text by
comparing the pattern with the substring of the text starting at that position.
Problem Statement
Given a text TTT of length nnn and a pattern PPP of length mmm, find all occurrences of PPP in
TTT.
Approach
1. Iterate over Text: For each position iii in the text TTT from 000 to n−mn - mn−m:
○ Check if the substring of length mmm starting at position iii matches the pattern
PPP.
○ If a match is found, record the index iii.
Pseudo-code
function NaiveStringMatching(T, P):
n = length(T)
m = length(P)
for i from 0 to n - m:
j = 0
while j < m and T[i + j] == P[j]:
j += 1
if j == m:
print("Pattern found at index", i)
Explanation
1. Initialization:
○ nnn: Length of the text TTT.
○ mmm: Length of the pattern PPP.
2. Outer Loop: Iterate over each possible starting position iii in the text TTT where PPP
could potentially match. This loop runs from 000 to n−mn - mn−m because if the pattern
starts at n−m+1n - m + 1n−m+1 or later, there aren't enough characters left in the text for
the pattern to fit.
3. Inner Loop: For each starting position iii, check each character in PPP against the
corresponding character in TTT. The inner loop increments jjj as long as the characters
match.
4. Check Match: If the inner loop completes and jjj equals mmm, then PPP matches TTT
starting at index iii.
5. Output: Print the starting index iii whenever a match is found.
Time Complexity
Example
1. i=0i = 0i=0: Compare "ABAB" with "ABAB" (match found at index 0).
2. i=1i = 1i=1: Compare "BABA" with "ABAB" (no match).
3. i=2i = 2i=2: Compare "ABAB" with "ABAB" (match found at index 2).
4. i=3i = 3i=3: Compare "BABA" with "ABAB" (no match).
5. Continue this process until the end of the text.
Summary
The Naive String Matching algorithm is straightforward to implement but may not be the most
efficient for large texts and patterns due to its O(n⋅m)O(n \cdot m)O(n⋅m) time complexity.
However, it serves as a baseline for more sophisticated string matching algorithms and can be
useful for small-scale problems where simplicity is prioritized over efficiency.
Finite Automata, also known as Finite State Machines (FSMs), are computational models that
can be in exactly one of a finite number of states at any given time. They are widely used in
various areas, including computer science, linguistics, and circuit design, to model systems with
discrete inputs and outputs.
Definition
Working Principle
Example
Summary
Finite Automata provide a formal framework for modeling systems with discrete states and
inputs. They are used in various fields for tasks ranging from language processing to hardware
design. Understanding Finite Automata is fundamental in computer science, especially in the
study of formal languages, automata theory, and compiler construction.
Overview
The Boyer-Moore algorithm works by comparing the pattern with the text from right to left,
skipping over portions of the text based on a set of heuristic rules, thereby avoiding
unnecessary character comparisons.
Key Components
1. Bad Character Rule: If a mismatch occurs, the algorithm shifts the pattern to align the
last occurrence of the mismatched character in the pattern with the mismatched
character in the text. This rule aims to skip as many characters as possible.
2. Good Suffix Rule: If a mismatch occurs after a partial match, the algorithm shifts the
pattern based on the occurrence of a good suffix in the pattern. This rule aims to skip
over the maximum possible characters while maintaining alignment with the text.
3. Preprocessing: The Boyer-Moore algorithm preprocesses the pattern to compute two
lookup tables:
○ Bad Character Table: Records the rightmost occurrence of each character in the
pattern.
○ Good Suffix Table: Helps in determining the shift amount based on the
occurrence of good suffixes in the pattern.
Pseudo-code
function BoyerMoore(T, P):
n = length(T)
m = length(P)
bad_character_table = computeBadCharacterTable(P)
good_suffix_table = computeGoodSuffixTable(P)
i = 0 // index for T
while i <= n - m:
j = m - 1 // index for P
while j >= 0 and P[j] == T[i + j]:
j -= 1
if j < 0:
print("Pattern found at index", i)
i += good_suffix_table[0] if i + m < n else 1
else:
char_shift = j - bad_character_table[T[i + j]]
suffix_shift = good_suffix_table[j]
i += max(char_shift, suffix_shift, 1)
function computeBadCharacterTable(P):
table = dictionary mapping each character to its rightmost
occurrence in P
return table
function computeGoodSuffixTable(P):
m = length(P)
suffixes = computeSuffixes(P)
border = computeBorder(suffixes)
table = array of size m initialized to m (default shift)
for i from 0 to m - 1:
j = m - 1 - suffixes[i] // length of the matched suffix
table[j] = min(table[j], i - suffixes[i] + 1)
for i from 0 to m - 2:
j = m - 1 - border[i] // length of the border
table[j] = min(table[j], m - 1 - i)
return table
function computeSuffixes(P):
// Implementation of the suffix generation algorithm (not shown
here)
// Returns an array of length m containing the lengths of the
longest suffixes for each position in P
function computeBorder(A):
// Implementation of the border generation algorithm (not shown
here)
// Returns an array of length m containing the lengths of the
longest borders for each position in A
Time Complexity
Example
Let's trace the Boyer-Moore algorithm on this input to find occurrences of the pattern within the
text.
Summary
The Boyer-Moore algorithm is a powerful and widely used string searching algorithm known for
its efficiency, particularly in practical scenarios with large texts and patterns. It achieves this
efficiency through clever heuristic rules and preprocessing of the pattern. While it may have a
higher worst-case time complexity compared to some other algorithms, its practical performance
often outperforms them in real-world applications.