[go: up one dir, main page]

0% found this document useful (0 votes)
27 views57 pages

DAA Notes

Uploaded by

Sonu kumar Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views57 pages

DAA Notes

Uploaded by

Sonu kumar Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Binary Search

Binary Search is an efficient algorithm for finding an item from a sorted list of items. It works by
repeatedly dividing the search interval in half. If the value of the search key is less than the item
in the middle of the interval, narrow the interval to the lower half. Otherwise, narrow it to the
upper half. Repeatedly check until the value is found or the interval is empty.

Pseudo-code

Iterative Approach

function binarySearch(arr, target):


low = 0
high = length(arr) - 1

while low <= high:


mid = (low + high) // 2 // Find the middle index
if arr[mid] == target: // Check if the middle element is the
target
return mid // Target found
else if arr[mid] < target: // If target is greater, ignore
the left half
low = mid + 1
else: // If target is smaller, ignore the right half
high = mid - 1

return -1 // Target not found

Recursive Approach
function binarySearch(arr, target, low, high):
if low > high:
return -1 // Base case: target is not found

mid = (low + high) // 2 // Find the middle index

if arr[mid] == target: // Check if the middle element is the


target
return mid // Target found
else if arr[mid] < target: // If target is greater, search in the
right half
return binarySearch(arr, target, mid + 1, high)
else: // If target is smaller, search in the left half
return binarySearch(arr, target, low, mid - 1)

Time Complexity

● Best Case: O(1) — The best-case scenario occurs when the target element is the
middle element of the array.
● Average and Worst Case: O(log n) — In the average and worst cases, the algorithm
repeatedly divides the search interval in half, leading to a logarithmic time complexity.

Space Complexity

● Iterative Approach: O(1) — The iterative approach uses a constant amount of extra
space.
● Recursive Approach: O(log n) — The recursive approach requires additional space for
the call stack. In the worst case, the depth of the recursion is log n (where n is the
number of elements in the array).

Merge Sort

Merge Sort is a classic divide-and-conquer algorithm for sorting an array or list. It works by
dividing the array into smaller sub-arrays, sorting those sub-arrays, and then merging them back
together.

Detailed Explanation:

1. Divide: Split the array into two halves.


2. Conquer: Recursively sort each half.
3. Combine: Merge the two sorted halves into a single sorted array.

Steps of the Algorithm:

1. Divide:
○ If the array has more than one element, split it into two halves.
○ Continue splitting each half until you have sub-arrays of size one. An array of
size one is trivially sorted.
2. Conquer:
○ Recursively apply merge sort to each half.
3. Combine:
○ Merge the sorted halves to produce a sorted array.
○ The merging process involves comparing the elements of each half and
combining them in order.

Pseudo-code:
function mergeSort(arr):
if length(arr) <= 1:
return arr

mid = length(arr) // 2
left = mergeSort(arr[0:mid])
right = mergeSort(arr[mid:])

return merge(left, right)

function merge(left, right):


result = []
i = 0
j = 0

while i < length(left) and j < length(right):


if left[i] <= right[j]:
result.append(left[i])
i += 1
else:
result.append(right[j])
j += 1

// Add remaining elements of left (if any)


while i < length(left):
result.append(left[i])
i += 1

// Add remaining elements of right (if any)


while j < length(right):
result.append(right[j])
j += 1

return result
Recursive Approach:
function mergeSort(arr):
// Base case: if the array has 1 or no elements, it is already
sorted
if length(arr) <= 1:
return arr

// Divide the array into two halves


mid = length(arr) // 2
left = arr[0:mid]
right = arr[mid:]

// Recursively sort both halves


left_sorted = mergeSort(left)
right_sorted = mergeSort(right)

// Merge the two sorted halves


return merge(left_sorted, right_sorted)

function merge(left, right):


result = []
i = 0
j = 0

// Merge the two arrays by comparing their elements


while i < length(left) and j < length(right):
if left[i] <= right[j]:
result.append(left[i])
i += 1
else:
result.append(right[j])
j += 1

// Append any remaining elements from the left array


while i < length(left):
result.append(left[i])
i += 1

// Append any remaining elements from the right array


while j < length(right):
result.append(right[j])
j += 1

return result

Time Complexity:

● Best Case: O(n log n)


● Average Case: O(n log n)
● Worst Case: O(n log n)

Space Complexity:

● Space Complexity: O(n)

Explanation of Time and Space Complexity:

1. Time Complexity:
○ Divide Step: Each divide step takes O(1) time.
○ Conquer Step: Recursively sort two halves, each of size n/2. The recurrence
relation for the time complexity is T(n) = 2T(n/2) + O(n), which solves to O(n log
n).
○ Combine Step: Merging two sorted halves takes O(n) time.
2. Space Complexity:
○ Merge Sort requires additional space for the temporary arrays used during the
merging process. At each level of recursion, temporary arrays are created to hold
the merged results, leading to a space complexity of O(n).

Example Walkthrough:

Consider sorting the array [38, 27, 43, 3, 9, 82, 10]:

1. Divide:
○ [38, 27, 43, 3, 9, 82, 10]
○ Split into [38, 27, 43, 3] and [9, 82, 10]
2. Conquer (Recursive calls):
○ Split [38, 27, 43, 3] into [38, 27] and [43, 3]
○ Split [38, 27] into [38] and [27]
○ Split [43, 3] into [43] and [3]
○ Split [9, 82, 10] into [9, 82] and [10]
○ Split [9, 82] into [9] and [82]
3. Combine:
○ Merge [38] and [27] to get [27, 38]
○ Merge [43] and [3] to get [3, 43]
○ Merge [27, 38] and [3, 43] to get [3, 27, 38, 43]
○ Merge [9] and [82] to get [9, 82]
○ Merge [9, 82] and [10] to get [9, 10, 82]
○ Merge [3, 27, 38, 43] and [9, 10, 82] to get [3, 9, 10, 27, 38, 43, 82]

The final sorted array is [3, 9, 10, 27, 38, 43, 82].

Merge Sort is a stable sorting algorithm, meaning that it maintains the relative order of equal
elements in the sorted output. It is particularly useful for large datasets due to its consistent O(n
log n) time complexity.

Quick Sort Algorithm

Quick Sort is a highly efficient sorting algorithm that follows the divide-and-conquer principle. It
works by selecting a 'pivot' element from the array and partitioning the other elements into two
sub-arrays according to whether they are less than or greater than the pivot. The sub-arrays are
then sorted recursively.

Steps of the Quick Sort Algorithm:

1. Choose a Pivot: Select an element from the array as the pivot. This can be done in
various ways, such as picking the first element, the last element, a random element, or
the median.
2. Partitioning: Rearrange the array so that all elements less than the pivot come before it,
and all elements greater than the pivot come after it. The pivot is now in its final position.
3. Recursively Sort the Sub-arrays: Apply the same process to the sub-arrays of
elements with smaller and larger values.

Pseudo-code:
function quickSort(arr, low, high):
if low < high:
// Partition the array and get the pivot index
pi = partition(arr, low, high)

// Recursively sort elements before and after partition


quickSort(arr, low, pi - 1)
quickSort(arr, pi + 1, high)
function partition(arr, low, high):
// Choose the rightmost element as pivot
pivot = arr[high]

// Pointer for the greater element


i = low - 1

// Traverse through all elements, compare each with the pivot


for j = low to high - 1:
if arr[j] <= pivot:
// If element smaller than pivot is found, swap it with
the greater element pointed by i
i = i + 1
swap arr[i] with arr[j]

// Swap the pivot element with the greater element specified by i


swap arr[i + 1] with arr[high]

// Return the position from where partition is done


return i + 1

Time Complexity:

● Best Case: O(n log n) - This occurs when the pivot element divides the array into two
nearly equal halves.
● Average Case: O(n log n) - This occurs when the pivot element divides the array into
sub-arrays that are not highly unbalanced.
● Worst Case: O(n^2) - This occurs when the pivot element is either the smallest or the
largest element every time, leading to the most unbalanced partitions.

Space Complexity:

● In-place Sorting: O(log n) - This occurs due to the recursive stack space. Quick Sort is
typically implemented in place, so it doesn't require additional storage for elements.
● Not In-place Sorting: O(n) - If a non-in-place partitioning method is used, the space
complexity can be higher.

Detailed Explanation:
1. Choose a Pivot:
○ The pivot can be any element, but common choices are the first element, the last
element, or a random element. In the pseudo-code, the last element is chosen as
the pivot.
2. Partitioning:
○ The partition function rearranges the array so that all elements less than or equal
to the pivot are on the left, and all elements greater than the pivot are on the
right.
○ A pointer i is used to track the position of the last smaller element.
○ The loop iterates over each element, and if the element is smaller than the pivot,
it is swapped with the element at i+1.
3. Recursive Sorting:
○ After partitioning, the pivot is in its final position.
○ The array is then recursively sorted by applying the same procedure to the left
and right sub-arrays formed by the pivot.

Example:

Given an array: [10, 7, 8, 9, 1, 5]

1. Choose the pivot: 5


2. Partition the array:
○ [1, 5, 8, 9, 10, 7]
3. The pivot (5) is now in the correct place.
4. Recursively apply Quick Sort to the sub-arrays:
○ Left sub-array: [1]
○ Right sub-array: [8, 9, 10, 7]

Repeat the process for the right sub-array, and so on.

Multiplication of Large Integers (Karatsuba Algorithm)

The Karatsuba algorithm is an efficient multiplication algorithm that uses a divide-and-conquer


approach to multiply two large integers. It significantly reduces the number of recursive
multiplications compared to the traditional O(n^2) method.

Detailed Explanation

Traditional Multiplication

To multiply two n-digit numbers, the traditional approach involves performing n^2 single-digit
multiplications, which results in a time complexity of O(n^2).
Karatsuba's Insight

The Karatsuba algorithm improves on this by reducing the number of single-digit multiplications
to O(n^log₂(3)), approximately O(n^1.585). The key idea is to split each number into two halves
and use three multiplications instead of four.

Karatsuba's Approach

1. Split the numbers: For two numbers x and y of length n, split them into two halves:
○ x = x1 * 10^m + x0
○ y = y1 * 10^m + y0 Here, m is n/2, x1 and y1 are the higher-order halves,
and x0 and y0 are the lower-order halves.
2. Recursive multiplications:
○ z2 = karatsuba(x1, y1) (product of higher-order halves)
○ z0 = karatsuba(x0, y0) (product of lower-order halves)
○ z1 = karatsuba(x1 + x0, y1 + y0) - z2 - z0 (cross terms)
3. Combine the results:
○ The final result is z2 * 10^(2*m) + (z1 * 10^m) + z0.

Pseudo-code

pseudo

Copy code

function karatsuba(x, y):

// Base case for small numbers

if x < 10 or y < 10:

return x * y

// Calculate the size of the numbers

n = max(length(x), length(y))

m = n // 2

// Split x and y into high and low parts


x1 = x // 10^m

x0 = x % 10^m

y1 = y // 10^m

y0 = y % 10^m

// Recursive multiplications

z2 = karatsuba(x1, y1)

z0 = karatsuba(x0, y0)

z1 = karatsuba(x1 + x0, y1 + y0) - z2 - z0

// Combine the results

return (z2 * 10^(2*m)) + (z1 * 10^m) + z0

Time Complexity

The time complexity of the Karatsuba algorithm is O(n^log₂(3)), which is approximately


O(n^1.585). This is derived from the fact that each recursive step splits the problem into three
subproblems of half the size.

Space Complexity

The space complexity of the Karatsuba algorithm is O(n) due to the storage requirements for
intermediate results and recursive call stack depth.

Breakdown of Steps

1. Base Case: If either x or y is a single-digit number, the multiplication is straightforward


and returns x * y.
2. Splitting the Numbers:
○ x1 and x0 are obtained by dividing x by 10^m and taking the remainder
respectively.
○Similarly, y1 and y0 are obtained by dividing y by 10^m and taking the remainder
respectively.
3. Recursive Calls:
○ Compute z2 as the product of the high parts (x1 and y1).
○ Compute z0 as the product of the low parts (x0 and y0).
○ Compute z1 as the product of the sum of the parts minus the previously
computed z2 and z0.
4. Combining Results:
○ The final product is assembled by shifting z2 left by 2*m digits, z1 by m digits,
and adding z0.

Closest-Pair Problem

The closest-pair problem involves finding the pair of points that are closest to each other in a set
of points in a plane. The most efficient known solution uses a divide-and-conquer approach.

Pseudo-code for Closest-Pair Problem

function closestPair(points):
// Sort points by x-coordinate
px = sort(points, by_x_coordinate)
// Sort points by y-coordinate
py = sort(points, by_y_coordinate)

return closestPairRec(px, py)

function closestPairRec(px, py):


n = length(px)

// Base case
if n <= 3:
return bruteForceClosestPair(px)

// Divide
mid = n // 2
midPoint = px[mid]

pxLeft = px[0:mid]
pxRight = px[mid:n]

pyLeft = [p for p in py if p.x <= midPoint.x]


pyRight = [p for p in py if p.x > midPoint.x]

// Conquer
(d1, pair1) = closestPairRec(pxLeft, pyLeft)
(d2, pair2) = closestPairRec(pxRight, pyRight)

// Find the smaller of two distances


d = min(d1, d2)
bestPair = pair1 if d1 < d2 else pair2

// Combine
strip = [p for p in py if abs(p.x - midPoint.x) < d]

(dStrip, pairStrip) = closestStripPair(strip, d)

if dStrip < d:
return (dStrip, pairStrip)
else:
return (d, bestPair)

function bruteForceClosestPair(points):
minDist = infinity
bestPair = None

for i in range(0, length(points)):


for j in range(i + 1, length(points)):
dist = distance(points[i], points[j])
if dist < minDist:
minDist = dist
bestPair = (points[i], points[j])

return (minDist, bestPair)

function closestStripPair(strip, d):


minDist = d
bestPair = None

for i in range(0, length(strip)):


for j in range(i + 1, min(i + 7, length(strip))):
dist = distance(strip[i], strip[j])
if dist < minDist:
minDist = dist
bestPair = (strip[i], strip[j])

return (minDist, bestPair)

Time Complexity:

● Closest-Pair (Divide and Conquer): O(n log n)

Space Complexity:

● Closest-Pair (Divide and Conquer): O(n)

Convex-Hull Problem

The convex hull of a set of points is the smallest convex polygon that contains all the points.
One of the efficient algorithms to find the convex hull is Graham's scan.

Pseudo-code for Convex-Hull Problem (Graham's Scan)


function convexHull(points):
// Step 1: Find the point with the lowest y-coordinate (p0)
p0 = point with lowest y-coordinate (if tie, choose leftmost)
points = sort(points, by_angle_with_p0)

// Step 2: Initialize the convex hull with first three points


hull = [points[0], points[1], points[2]]

// Step 3: Process the remaining points


for i in range(3, length(points)):
while (length(hull) > 1 and orientation(hull[-2], hull[-1],
points[i]) != counterclockwise):
hull.pop()
hull.append(points[i])
return hull

function orientation(p, q, r):


// Returns:
// 0 if p, q, r are collinear
// 1 if Clockwise
// 2 if Counterclockwise
val = (q.y - p.y) * (r.x - q.x) - (q.x - p.x) * (r.y - q.y)
if val == 0: return 0
elif val > 0: return 1
else: return 2

Time Complexity:

● Convex-Hull (Graham's Scan): O(n log n) due to sorting the points by angle.

Space Complexity:

● Convex-Hull (Graham's Scan): O(n) for storing the points on the hull.

Explanation

Closest-Pair Problem:

1. Preprocessing: Sort the points by their x and y coordinates.


2. Divide: Split the set of points into two halves.
3. Conquer: Recursively find the smallest distance in both halves.
4. Combine: Consider the points near the dividing line and find the smallest distance
across the line.

Key Insight: By combining the results from the two halves and the "strip" region near the
dividing line, we ensure that all potential closest pairs are considered efficiently.

Convex-Hull Problem (Graham's Scan):

1. Preprocessing: Find the point with the lowest y-coordinate (p0) and sort the points
based on the angle they make with p0.
2. Initialize Hull: Start with the first three sorted points.
3. Iterate: For each point, ensure it makes a counterclockwise turn with the last two points
in the hull. If not, remove the last point from the hull until a counterclockwise turn is
made.
Key Insight: By sorting the points and ensuring only counterclockwise turns, we efficiently
construct the convex hull.

Control Abstraction for Greedy Method:

Control abstraction in the context of greedy algorithms refers to the high-level approach or
structure used to implement and control the execution of such algorithms. Greedy algorithms
are problem-solving techniques that make locally optimal choices at each step with the hope of
finding a global optimum solution.

1. Problem Definition: Clearly define the problem and the objective function to be
optimized.
2. Greedy Choice Property: Identify the greedy choice property, which states that at each
step, the locally optimal choice should be made without considering the consequences
of the choice on future steps. This choice is often based on a heuristic that guides the
selection process.
3. Feasibility: Ensure that the chosen solution at each step satisfies the problem
constraints and is feasible.
4. Optimality: Establish that the locally optimal choices made at each step collectively lead
to a globally optimal solution.
5. Iterative Approach: Implement a loop or iterative approach to repeatedly make greedy
choices until a termination condition is met. This typically involves iterating over the
elements of the problem instance and selecting the greedy choice at each step.
6. Termination Condition: Define a termination condition that stops the iterative process
when a satisfactory solution is obtained or when certain criteria are met.
7. Solution Construction: Construct the final solution based on the greedy choices made
at each step.

Time Complexity: The time complexity of a greedy algorithm depends on the specific problem
being solved and the efficiency of the implementation. In general, the time complexity can range
from linear to polynomial time, depending on the problem size and the computational complexity
of the greedy choice evaluation.

Space Complexity: The space complexity of a greedy algorithm typically depends on the data
structures used for problem representation and solution construction. In many cases, the space
complexity is linear or logarithmic in the size of the input data.

Fractional Knapsack Problem:


Explanation:

In the Fractional Knapsack problem, we are given a set of items, each with a weight and a
value, and a knapsack with a maximum weight capacity. The goal is to select a subset of items
to maximize the total value while keeping the total weight within the capacity of the knapsack.
Unlike the 0/1 Knapsack problem, in the Fractional Knapsack problem, we can take fractions of
items, i.e., we can take a part of an item if it maximizes the value.

Pseudo-code:

function fractionalKnapsack(items, capacity):

// Sort items by value per unit weight (value/weight)

sort items by (value / weight) in non-ascending order

totalValue = 0

remainingCapacity = capacity

for each item in items:

if remainingCapacity == 0:

break

if item.weight <= remainingCapacity:

totalValue += item.value

remainingCapacity -= item.weight

else:

fraction = remainingCapacity / item.weight

totalValue += fraction * item.value

remainingCapacity = 0
return totalValue

Time Complexity:

● Sorting the items based on value per unit weight takes O(n log n) time, where n is the
number of items.
● The loop to select items takes O(n) time.
● Therefore, the overall time complexity is O(n log n) + O(n) = O(n log n), dominated by
the sorting.

Space Complexity:

● The space complexity is O(1) because we only need a constant amount of extra space
for variables like totalValue and remainingCapacity.

Explanation:

1. Sorting: First, we sort the items based on their value per unit weight in non-ascending
order. This allows us to consider items with higher value-to-weight ratios first, maximizing
the value we can obtain.
2. Greedy Selection: We iterate through the sorted items and select items as long as there
is remaining capacity in the knapsack. If the weight of the current item is less than or
equal to the remaining capacity, we take the entire item. Otherwise, we take a fraction of
the item that fits into the remaining capacity, maximizing the value.
3. Return: Finally, we return the total value obtained by selecting items.

The Fractional Knapsack problem can be efficiently solved using a greedy strategy, where we
always choose the item with the highest value per unit weight at each step. The pseudo-code
provided outlines this approach and achieves an optimal solution.

Prim's Algorithm:

1. Greedy Strategy: Prim's algorithm follows a greedy strategy, meaning it selects the
locally optimal choice at each step, which eventually leads to the globally optimal
solution.
2. Starting Point: Prim's algorithm requires a starting vertex to begin the construction of
the MST. The choice of the starting vertex does not affect the final MST, as the algorithm
will always find the same MST regardless of the starting point.
3. Priority Queue: Prim's algorithm uses a priority queue to efficiently select the next edge
to include in the MST. At each step, it chooses the edge with the smallest weight that
connects a vertex in the MST to a vertex outside the MST. This ensures that the
algorithm always selects the edge that contributes the least to the total weight of the
MST.
4. Lazy Implementation: Prim's algorithm can be implemented using a lazy approach,
where edges are added to the priority queue without removing duplicates. This approach
simplifies the implementation but may lead to redundant edges in the priority queue and
slightly higher time complexity.

Pseudo code:

function prim(graph):
initialize an empty list to store the edges of the MST
initialize a set to keep track of vertices already in the MST
initialize a priority queue to store edges

start from an arbitrary vertex s


add s to the MST set

while MST set does not contain all vertices:


for each edge (u, v) incident to a vertex in the MST:
if v is not in MST set:
add (u, v) to the priority queue with weight as priority

select the edge with the smallest weight from the priority queue
if the edge connects a vertex in MST set to a vertex not in MST set:
add the edge to the MST set
add the edge to the MST list

return MST

Time Complexity:

● Each edge is considered once: O(E)


● Operations on priority queue (insertion, deletion): O(E log V)
● Total: O(E log V)

Space Complexity:

● O(V) for the MST set and priority queue


Kruskal's Algorithm:

1. Sorting Edges: Kruskal's algorithm begins by sorting all the edges of the graph in
non-decreasing order of their weights. This sorting step is essential to ensure that the
algorithm considers edges in increasing order of weight during the MST construction
process.
2. Disjoint-Set Data Structure: Kruskal's algorithm utilizes a disjoint-set data structure
(also known as union-find) to efficiently determine whether adding an edge to the MST
will create a cycle. This data structure allows the algorithm to quickly check whether two
vertices belong to the same connected component and avoid adding edges that would
create cycles.
3. Edge Selection: Kruskal's algorithm iterates over the sorted edges and greedily adds
edges to the MST as long as they do not form cycles. This approach ensures that the
algorithm always selects the smallest available edge that connects two disjoint
components of the graph, leading to the construction of the MST.
4. Optimizations: Kruskal's algorithm can be further optimized by using techniques such
as path compression and union by rank in the disjoint-set data structure. These
optimizations help improve the overall efficiency of the algorithm, particularly for graphs
with a large number of vertices and edges.

Pseudo-code:

function kruskal(graph):
sort edges of graph in ascending order of weight
initialize an empty list to store the edges of the MST
initialize a disjoint-set data structure to keep track of
connected components

for each vertex v in graph:


make_set(v)

for each edge (u, v) in sorted edges:


if find(u) ≠ find(v): // u and v belong to different
connected components
add (u, v) to MST
union(u, v) // merge the connected components

return MST

Time Complexity:

● Sorting edges: O(E log E)


● Operations on disjoint-set (find and union): O(E log V)
● Total: O(E log E + E log V) ≈ O(E log V), where E is the number of edges and V is the
number of vertices.

Space Complexity:

● O(V) for the disjoint-set data structure

Comparison:

● Time Complexity: Prim's algorithm typically has a time complexity of O(E log V), while
Kruskal's algorithm has a time complexity of O(E log E) or O(E log V), depending on the
implementation. In practice, Kruskal's algorithm may be faster for sparse graphs with
fewer edges, while Prim's algorithm may be faster for dense graphs with more edges.
● Space Complexity: Both algorithms have a space complexity of O(V) to store data
structures such as priority queues and disjoint-set forests.
● Implementation Complexity: Prim's algorithm is generally easier to implement
compared to Kruskal's algorithm, especially when using a priority queue data structure.
Kruskal's algorithm requires additional bookkeeping for edge sorting and cycle detection
using disjoint-set data structures.

Dijkstra's Algorithm

Explanation:

1. Initialization:
○ Create an array dist[] to store the shortest distance from the source vertex to
each vertex in the graph. Initialize dist[] with infinity except for the source
vertex, which is initialized to 0.
○ Create a set visited[] to keep track of vertices whose shortest distance from
the source vertex is already calculated. Initially, all vertices are unvisited.
2. Main Loop:
○ Repeat until all vertices are visited:
■ Choose the unvisited vertex u with the minimum dist[u].
■ Mark u as visited.
■ Update dist[v] for all unvisited neighbors v of u such that dist[v] >
dist[u] + weight(u, v). This update guarantees the shortest path
from the source to v through u.
3. Result:
○ After the algorithm terminates, dist[] contains the shortest distances from the
source vertex to all other vertices in the graph.

Pseudo-code:

plaintext
Copy code
function Dijkstra(graph, source):
dist[source] = 0
visited = empty set

while there are unvisited vertices:


u = vertex with minimum dist[u] among unvisited vertices
visited.add(u)

for each neighbor v of u:


if v is unvisited:
dist[v] = min(dist[v], dist[u] + weight(u, v))

return dist

Time Complexity:

● The time complexity of Dijkstra's algorithm depends on the data structure used to
implement the priority queue.
● With a simple implementation using arrays to store distances, the time complexity is
O(V^2) for dense graphs, where V is the number of vertices.
● With a binary heap or Fibonacci heap as the priority queue, the time complexity is O((V +
E) log V), where E is the number of edges.

Space Complexity:

● The space complexity of Dijkstra's algorithm is O(V), where V is the number of vertices.
This includes the space for the dist[] array and the visited[] set.

Control Abstraction for Dynamic Programming

Control abstraction in dynamic programming refers to the design pattern or methodology used
to encapsulate the repetitive control flow and decision-making logic involved in solving
optimization problems using dynamic programming techniques. By abstracting away the details
of how the problem is solved, control abstraction allows programmers to focus on the problem's
essence and promotes code reuse and maintainability.

Here's how control abstraction can be applied in dynamic programming:

1. Define Subproblems: Identify the subproblems that need to be solved to find the
optimal solution to the original problem. Break down the problem into smaller,
overlapping subproblems.
2. Memoization or Tabulation: Decide whether to use memoization (top-down) or
tabulation (bottom-up) to store the solutions to subproblems and avoid redundant
computations. Memoization involves caching the results of already solved subproblems,
while tabulation involves filling up a table iteratively.
3. Recurrence Relation: Formulate a recurrence relation that expresses the optimal
solution to the original problem in terms of solutions to its subproblems. This relation
defines the dependency between subproblems and helps in solving them recursively or
iteratively.
4. Abstract Control Flow: Implement control flow constructs such as loops, conditionals,
and function calls to orchestrate the solution process. Use control abstraction to
encapsulate the repetitive decision-making logic and simplify the code structure.
5. Optimization: Leverage optimization techniques such as pruning, early termination, or
problem-specific optimizations to improve the efficiency of the dynamic programming
solution. These optimizations can be integrated into the control abstraction to enhance
performance.
6. Error Handling and Edge Cases: Handle error conditions and edge cases gracefully
within the control abstraction. Ensure that the solution is robust and handles corner
cases correctly.

Knapsack (0/1)

The 0/1 knapsack problem is a classic optimization problem where given a set of items, each
with a weight and a value, the goal is to determine the maximum value that can be obtained by
selecting a subset of the items while ensuring that the total weight does not exceed a given
capacity.

Here's how you can approach the 0/1 knapsack problem using dynamic programming:

Dynamic Programming Approach:

1. Initialization: Create a 2D array dp of size (n+1) x (W+1), where n is the number of


items and W is the capacity of the knapsack. Initialize the first row and first column of dp
with zeros.
2. Bottom-up Iteration:
○ Iterate over each item from 1 to n and each capacity from 1 to W.
○ For each item i and capacity w:
■ If the weight of the current item is less than or equal to the current
capacity w, calculate the maximum value that can be obtained by either
including or excluding the item, and store the maximum value in
dp[i][w].
■Otherwise, the value of dp[i][w] is the same as the value obtained by
excluding the current item.
3. Result: The maximum value that can be obtained is stored in dp[n][W].

Here's the pseudo-code for the dynamic programming approach:

function knapsack(items, values, weights, n, W):


dp = new array[n+1][W+1]

// Initialize first row and first column with zeros


for w from 0 to W:
dp[0][w] = 0
for i from 0 to n:
dp[i][0] = 0

// Bottom-up iteration
for i from 1 to n:
for w from 1 to W:
if weights[i-1] <= w:
dp[i][w] = max(values[i-1] + dp[i-1][w-weights[i-1]],
dp[i-1][w])
else:
dp[i][w] = dp[i-1][w]

return dp[n][W]

Time Complexity: O(nW), where n is the number of items and W is the capacity of the
knapsack.

Space Complexity: O(nW), the space required to store the dynamic programming table.

This dynamic programming approach efficiently solves the 0/1 knapsack problem by considering
all possible combinations of items and capacities in a bottom-up manner, leading to an optimal
solution.
Matrix chain multiplication
Matrix chain multiplication is another classic optimization problem where given a sequence of
matrices, the goal is to determine the most efficient way to multiply these matrices together. The
order of multiplication significantly affects the total number of scalar multiplications needed to
compute the final product.

Here's how you can approach the matrix chain multiplication problem using dynamic
programming:

Dynamic Programming Approach:

1. Initialization: Create a 2D array dp of size (n+1) x (n+1), where n is the number of


matrices in the sequence. Initialize the main diagonal of dp with zeros.
2. Bottom-up Iteration:
○ Iterate over each subchain length l from 2 to n.
○ For each subchain length l, iterate over each possible start index i.
○ Calculate the minimum cost of multiplying the matrices in the subchain [i,
i+l-1] by considering all possible partition points within the subchain.
○ Update dp[i][i+l-1] with the minimum cost.
3. Result: The minimum cost of multiplying the entire matrix chain is stored in dp[1][n].

Here's the pseudo-code for the dynamic programming approach:

plaintext
Copy code
function matrixChainOrder(p, n):
dp = new array[n+1][n+1]

// Initialize main diagonal with zeros


for i from 1 to n:
dp[i][i] = 0

// Bottom-up iteration
for l from 2 to n:
for i from 1 to n-l+1:
j = i + l - 1
dp[i][j] = infinity
for k from i to j-1:
cost = dp[i][k] + dp[k+1][j] + p[i-1] * p[k] * p[j]
dp[i][j] = min(dp[i][j], cost)
return dp[1][n]

Time Complexity: O(n^3), where n is the number of matrices in the sequence.

Space Complexity: O(n^2), the space required to store the dynamic programming table.

This dynamic programming approach efficiently solves the matrix chain multiplication problem
by considering all possible partition points within the matrix chain and computing the minimum
cost of multiplication for each subchain length. The final result represents the minimum cost of
multiplying the entire matrix chain.

Longest common subsequence:

The longest common subsequence (LCS) problem is a classic dynamic programming problem
where given two sequences (e.g., strings), the goal is to find the longest subsequence that is
present in both of them.

Here's how you can approach the LCS problem using dynamic programming:

Dynamic Programming Approach:

1. Initialization: Create a 2D array dp of size (m+1) x (n+1), where m and n are the
lengths of the two sequences. Initialize the first row and first column of dp with zeros.
2. Bottom-up Iteration:
○ Iterate over each character i of the first sequence and each character j of the
second sequence.
○ If the characters match (sequence1[i-1] == sequence2[j-1]), increment
the value of dp[i][j] by 1 plus the value of dp[i-1][j-1].
○ If the characters don't match, set dp[i][j] to the maximum of dp[i-1][j]
and dp[i][j-1], representing the longest common subsequence found so far.
3. Result: The length of the longest common subsequence is stored in dp[m][n].

Here's the pseudo-code for the dynamic programming approach:

function longestCommonSubsequence(sequence1, sequence2):


m = length(sequence1)
n = length(sequence2)
dp = new array[m+1][n+1]
// Initialize first row and first column with zeros
for i from 0 to m:
dp[i][0] = 0
for j from 0 to n:
dp[0][j] = 0

// Bottom-up iteration
for i from 1 to m:
for j from 1 to n:
if sequence1[i-1] == sequence2[j-1]:
dp[i][j] = dp[i-1][j-1] + 1
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])

return dp[m][n]

Time Complexity: O(m * n), where m and n are the lengths of the two sequences.

Space Complexity: O(m * n), the space required to store the dynamic programming table.

This dynamic programming approach efficiently solves the LCS problem by considering all
possible subproblems and building up the solution from smaller subproblems. The final result
represents the length of the longest common subsequence between the two input sequences

All pair shortest path (Floyd Warshall)


The Floyd-Warshall algorithm is a classic algorithm used to find the shortest paths between all
pairs of vertices in a weighted graph. It works for both directed and undirected graphs, and it
can handle graphs with negative edge weights, as long as there are no negative cycles.

Algorithm Steps:

1. Initialization: Create a 2D array dist[][] of size (V x V), where V is the number of


vertices in the graph. Initialize the array such that dist[i][j] is the weight of the edge
from vertex i to vertex j if there is an edge, and inf (infinity) otherwise. Also, initialize
dist[i][i] to 0 for all vertices.
2. Main Loop:
○ Iterate over each vertex k from 1 to V.
○ For each pair of vertices i and j, update dist[i][j] to be the minimum of:
■ The current value of dist[i][j].
■ The sum of the distances from i to k and from k to j.
3. Result: After the algorithm terminates, the dist[][] array contains the shortest
distances between all pairs of vertices.

Pseudo-code:

function floydWarshall(graph):
V = number of vertices in the graph
dist[][] = new array[V][V]

// Initialize distances
for i from 0 to V-1:
for j from 0 to V-1:
if i == j:
dist[i][j] = 0
else if there is an edge from i to j:
dist[i][j] = weight of the edge
else:
dist[i][j] = infinity

// Main loop
for k from 0 to V-1:
for i from 0 to V-1:
for j from 0 to V-1:
dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j])

return dist

Time Complexity:

● The time complexity of the Floyd-Warshall algorithm is O(V^3), where V is the number of
vertices in the graph.

Space Complexity:

● The space complexity is O(V^2), as we need to store the distances between all pairs of
vertices in a 2D array.
n-Queen problem
The N-Queens problem is a classic combinatorial problem in the field of computer science and
mathematics. The goal is to place N queens on an N×N chessboard in such a way that no two
queens threaten each other. In other words, no two queens can share the same row, column, or
diagonal.

Problem Statement:

Given an N×N chessboard, find a placement of N queens such that no two queens attack each
other.

Backtracking Algorithm:

One of the most common ways to solve the N-Queens problem is by using a backtracking
algorithm. The backtracking algorithm works by recursively exploring all possible placements of
queens on the chessboard and backtracking when a solution cannot be found.

Algorithm Steps:

1. Initialization: Start with an empty chessboard.


2. Recursive Placement:
○ Start from the first row and try placing a queen in each column of the current row.
○ If a queen can be placed in a column without threatening other queens, mark that
cell as occupied and move to the next row.
○ Recursively repeat this process for the next row.
3. Backtracking:
○ If there are no more rows to explore, a solution has been found. Add the current
placement of queens to the list of solutions.
○ If a queen cannot be placed in any column of the current row without threatening
other queens, backtrack to the previous row and try placing the queen in a
different column.
○ Repeat this process until all possible placements have been explored.
4. Result: After exploring all possible placements, the list of solutions contains all valid
placements of queens on the chessboard.

Pseudo-code:
function solveNQueens(N):
solutions = empty list

// Helper function to recursively place queens


function placeQueens(board, row):
if row == N:
add board to solutions
return
for each column in the current row:
if queen can be placed at (row, column):
place queen at (row, column)
placeQueens(board, row + 1)
remove queen from (row, column)

// Start with an empty chessboard


empty_board = empty N×N chessboard
placeQueens(empty_board, 0)

return solutions

Time Complexity:

● The time complexity of the backtracking algorithm for the N-Queens problem is
exponential, O(N!), where N is the size of the chessboard. This is because the number of
possible configurations to explore grows rapidly with the size of the board.

Space Complexity:

● The space complexity is also exponential, as it depends on the number of valid


placements of queens on the chessboard.

Example:

Consider the N-Queens problem for N = 4:

. . Q . // Solution 1
Q . . .
. . . Q
. Q . .

. Q . . // Solution 2
. . . Q
Q . . .
. . Q .
The solutions above show valid placements of queens on a 4×4 chessboard where no two
queens threaten each other.

Graph coloring problem

The Graph Coloring Problem is a classic problem in graph theory and computer science. The
goal is to assign colors to the vertices of a graph in such a way that no two adjacent vertices
have the same color, while using the fewest number of colors possible. This problem has many
real-world applications, such as scheduling, register allocation in compilers, and map coloring.

Problem Statement:

Given an undirected graph G, the Graph Coloring Problem asks whether it is possible to assign
colors to the vertices of G such that no two adjacent vertices share the same color.

Naive Approach:

One simple approach to solve the Graph Coloring Problem is to try all possible color
assignments recursively and backtrack when a valid coloring cannot be achieved. This
brute-force approach is exponential in time complexity and is not feasible for large graphs.

Greedy Coloring Algorithm:

A more efficient approach to solve the Graph Coloring Problem is to use a greedy algorithm.
The basic idea is to iteratively assign colors to the vertices of the graph, starting from an
arbitrary vertex and using the smallest available color that does not conflict with the colors of
adjacent vertices.

Greedy Coloring Algorithm Steps:

1. Initialization: Start with an empty list of colors and an empty set of colored vertices.
2. Vertex Selection: Select a vertex from the graph.
3. Color Assignment: Assign the smallest available color to the selected vertex such that
it does not conflict with the colors of adjacent vertices.
4. Repeat: Repeat steps 2 and 3 for all remaining uncolored vertices until all vertices are
colored.
5. Result: After the algorithm terminates, the list of colors assigned to the vertices
represents a valid coloring of the graph.

Pseudo-code for Greedy Coloring Algorithm:


function greedyColoring(graph):
colors = empty list
colored_vertices = empty set

// Helper function to get the smallest available color for a


vertex
function getSmallestColor(vertex):
for color in colors:
if color does not conflict with adjacent vertices of
vertex:
return color
return new color

// Iterate over each vertex in the graph


for vertex in graph.vertices:
// Get the smallest available color for the vertex
color = getSmallestColor(vertex)
// Assign the color to the vertex
assign color to vertex
// Add the color to the list of colors if it is new
if color not in colors:
add color to colors

return colors

Time Complexity:

● The time complexity of the greedy coloring algorithm depends on the method used to
check for conflicts and the ordering of vertices. In the worst case, the algorithm has a
time complexity of O(V^2), where V is the number of vertices in the graph.

Space Complexity:

● The space complexity is O(V), where V is the number of vertices in the graph, as the
algorithm only needs to store the colors assigned to the vertices.

Hamiltonian Cycle Problem


The Hamiltonian Cycle problem is a classic problem in graph theory. A Hamiltonian cycle in a
graph is a cycle that visits each vertex exactly once and returns to the starting vertex.
Determining whether such a cycle exists in a given graph is known as the Hamiltonian Cycle
problem.

Problem Statement:

Given an undirected graph GGG, the Hamiltonian Cycle problem asks whether there exists a
cycle that visits each vertex exactly once and returns to the starting vertex.

Approach:

A common approach to solve the Hamiltonian Cycle problem is to use a backtracking algorithm.
This approach involves trying to build the cycle step by step and backtracking when a partial
solution cannot be extended to a complete solution.

Backtracking Algorithm Steps:

1. Initialization: Start with an initial vertex and mark it as visited. Create an empty path and
add the initial vertex to it.
2. Recursive Function:
○ Recursively try to add vertices to the path one by one.
○ Check if the current vertex can be added to the path without violating the
Hamiltonian Cycle constraints (i.e., no vertex is visited more than once and the
added vertex is adjacent to the last vertex in the path).
○ If adding the current vertex results in a complete Hamiltonian cycle (i.e., the path
includes all vertices and the last vertex is adjacent to the starting vertex), return
true.
○ If no valid vertex can be added, backtrack by removing the last added vertex and
try the next possibility.
3. Termination: If a Hamiltonian cycle is found, return true; otherwise, return false.

Pseudo-code for Backtracking Algorithm:


function hamiltonianCycle(graph):
path = []
start_vertex = 0
path.append(start_vertex)

if findCycle(graph, path, start_vertex):


return path
else:
return "No Hamiltonian Cycle found"
function findCycle(graph, path, current_vertex):
if len(path) == len(graph.vertices):
if isAdjacent(graph, current_vertex, path[0]):
return true
else:
return false

for vertex in graph.vertices:


if isValidVertex(vertex, graph, path, current_vertex):
path.append(vertex)
if findCycle(graph, path, vertex):
return true
path.pop()

return false

function isValidVertex(vertex, graph, path, current_vertex):


if vertex in path:
return false
if not isAdjacent(graph, current_vertex, vertex):
return false
return true

function isAdjacent(graph, v1, v2):


return v2 in graph.adjList[v1]

Time Complexity:

● The time complexity of the backtracking algorithm for the Hamiltonian Cycle problem is
O(V!), where VVV is the number of vertices in the graph. This is because, in the worst
case, the algorithm explores all possible permutations of the vertices.

Space Complexity:

● The space complexity is O(V), where VVV is the number of vertices in the graph. This is
due to the space needed to store the path and the recursion stack.

Example:
Consider the following undirected graph:

A - B
| \ |
D - C

The adjacency list representation is:

A: [B, C, D]
B: [A, C]
C: [A, B, D]
D: [A, C]

Applying the backtracking algorithm to find a Hamiltonian cycle:

1. Start from vertex A: path = [A]


2. Try vertex B: path = [A, B]
3. Try vertex C: path = [A, B, C]
4. Try vertex D: path = [A, B, C, D]
5. Check if D is adjacent to A: Yes, path forms a Hamiltonian cycle: [A, B, C, D, A]

The algorithm successfully finds the Hamiltonian cycle: A → B → C → D → A.

LIFO Search (Depth-First Search)

Depth-First Search (DFS) is a search algorithm that explores a graph by going as deep as
possible along each branch before backtracking. It uses a Last-In-First-Out (LIFO) data
structure, such as a stack, to manage the nodes to be explored next.

Pseudo-code for DFS:

function DFS(graph, start_vertex):


stack = new Stack()
stack.push(start_vertex)
visited = set()

while not stack.isEmpty():


vertex = stack.pop()
if vertex not in visited:
visited.add(vertex)
for neighbor in graph.getNeighbors(vertex):
if neighbor not in visited:
stack.push(neighbor)

return visited

Time Complexity:

● Time Complexity: O(V + E), where V is the number of vertices and E is the number of
edges.
● Space Complexity: O(V) for the stack and visited set.

FIFO Search (Breadth-First Search)

Breadth-First Search (BFS) is a search algorithm that explores a graph layer by layer, visiting
all the neighbors of a node before moving to the next level. It uses a First-In-First-Out (FIFO)
data structure, such as a queue, to manage the nodes to be explored next.

Pseudo-code for BFS:


function BFS(graph, start_vertex):
queue = new Queue()
queue.enqueue(start_vertex)
visited = set()
visited.add(start_vertex)

while not queue.isEmpty():


vertex = queue.dequeue()
for neighbor in graph.getNeighbors(vertex):
if neighbor not in visited:
visited.add(neighbor)
queue.enqueue(neighbor)

return visited

Time Complexity:

● Time Complexity: O(V + E), where V is the number of vertices and E is the number of
edges.
● Space Complexity: O(V) for the queue and visited set.

Least Cost Search (Dijkstra's Algorithm)

Dijkstra's Algorithm is a search algorithm used to find the shortest paths from a single source
vertex to all other vertices in a weighted graph with non-negative edge weights. It uses a priority
queue to always expand the least-cost node next.

Pseudo-code for Dijkstra's Algorithm:

function Dijkstra(graph, start_vertex):


dist = {vertex: infinity for vertex in graph}
dist[start_vertex] = 0
priority_queue = new PriorityQueue()
priority_queue.enqueue(start_vertex, 0)

while not priority_queue.isEmpty():


current_vertex = priority_queue.dequeue()

for neighbor, weight in graph.getNeighbors(current_vertex):


alternative_path = dist[current_vertex] + weight
if alternative_path < dist[neighbor]:
dist[neighbor] = alternative_path
priority_queue.enqueue(neighbor, alternative_path)

return dist

Time Complexity:

● Time Complexity: O((V + E) log V) with a binary heap priority queue.


● Space Complexity: O(V) for the distance table and priority queue.

Summary:

● LIFO Search (DFS): Uses a stack, explores as deep as possible, suitable for exploring
all nodes and paths.
● FIFO Search (BFS): Uses a queue, explores all neighbors level by level, suitable for
finding the shortest path in an unweighted graph.
● Least Cost Search (Dijkstra's Algorithm): Uses a priority queue, finds the shortest
path in a weighted graph with non-negative edge weights.
15 Puzzle Problem

The 15 Puzzle (also called the 16 Puzzle or the Sliding Puzzle) is a classic problem in
combinatorial mathematics and computer science. It consists of a 4x4 grid with 15 numbered
tiles and one empty space. The goal is to arrange the tiles in numerical order by making sliding
moves that use the empty space.

Problem Statement

Given a 4x4 grid with tiles numbered from 1 to 15 and one empty space, the objective is to
move the tiles until the numbers are in order from 1 to 15 with the empty space in the last
position.

Search Algorithms for the 15 Puzzle

Breadth-First Search (BFS)

BFS can be used to find the shortest sequence of moves to solve the puzzle. It explores all
possible moves level by level starting from the initial configuration.

Pseudo-code for BFS:


plaintext
Copy code
function BFS(start_state):
queue = new Queue()
queue.enqueue(start_state)
visited = set()
visited.add(start_state)

while not queue.isEmpty():


current_state = queue.dequeue()
if isGoal(current_state):
return reconstructPath(current_state)

for neighbor in getNeighbors(current_state):


if neighbor not in visited:
visited.add(neighbor)
queue.enqueue(neighbor)
neighbor.parent = current_state

return None // No solution found


function getNeighbors(state):
neighbors = []
empty_position = findEmptyPosition(state)
possible_moves = getPossibleMoves(empty_position)

for move in possible_moves:


neighbor = makeMove(state, move)
neighbors.append(neighbor)

return neighbors

function reconstructPath(state):
path = []
while state.parent is not None:
path.append(state)
state = state.parent
path.reverse()
return path

Time Complexity:

● Time Complexity: O(b^d), where b is the branching factor (maximum number of


possible moves per state) and d is the depth of the solution.
● Space Complexity: O(b^d), as BFS requires storing all nodes at the current level.

Traveling Salesman Problem (TSP)

The Travelling Salesman Problem (TSP) is a classic optimization problem in combinatorial


optimization and computer science. The problem is defined as follows: given a list of cities and
the distances between each pair of cities, find the shortest possible route that visits each city
exactly once and returns to the starting city.

Problem Statement

Given a set of nnn cities and a distance function d(i,j)d(i, j)d(i,j) representing the distance
between city iii and city jjj, find a tour (a cycle that visits each city exactly once) of minimum total
distance.
Brute-Force Approach

The brute-force approach to solve the TSP is to generate all possible permutations of the cities,
calculate the total distance for each permutation, and select the permutation with the minimum
total distance.

Pseudo-code for Brute-Force Approach:


plaintext
Copy code
function TSP_BruteForce(cities, distances):
min_distance = infinity
best_permutation = None

for permutation in permutations(cities):


current_distance = 0
for i in range(len(permutation) - 1):
current_distance +=
distances[permutation[i]][permutation[i + 1]]
current_distance += distances[permutation[-1]][permutation[0]]

if current_distance < min_distance:


min_distance = current_distance
best_permutation = permutation

return best_permutation, min_distance

Time Complexity:

● Time Complexity: O(n!), where nnn is the number of cities. This is because there are
n!n!n! permutations to evaluate.
● Space Complexity: O(n), primarily for storing the current permutation and distances

NP-Hard and NP-Complete Problems

The classes of NP-Hard and NP-Complete problems are central to the study of computational
complexity theory. Understanding these classes involves knowing about the broader class of
problems known as NP (nondeterministic polynomial time).
Definitions

NP (Nondeterministic Polynomial Time)

A problem is in NP if:

1. Verification: Given a candidate solution, it can be verified in polynomial time whether


the solution is correct.
2. Non-deterministic Turing Machine: It can be solved in polynomial time by a
non-deterministic Turing machine, which is a theoretical model that can "guess"
solutions and verify their correctness in polynomial time.

NP-Hard

A problem is NP-Hard if:

1. As Hard as Any NP Problem: Every problem in NP can be reduced to it in polynomial


time. This means that if you can solve an NP-Hard problem in polynomial time, you can
solve all NP problems in polynomial time.
2. Not Necessarily in NP: NP-Hard problems are at least as hard as the hardest problems
in NP, but they do not have to be in NP themselves. This means they do not necessarily
have a solution verifiable in polynomial time.

NP-Complete

A problem is NP-Complete if:

1. In NP: The problem itself is in NP.


2. NP-Hard: It is as hard as any problem in NP (every problem in NP can be reduced to it
in polynomial time).

Relationships

● P ⊆ NP: All problems that can be solved in polynomial time can also be verified in
polynomial time.
● NP ⊆ NP-Hard: NP-Complete problems are a subset of NP-Hard problems.
● P vs NP Question: It is an open question whether P = NP, i.e., whether every problem
whose solution can be verified in polynomial time can also be solved in polynomial time.

Examples

NP-Complete Problems

1. Traveling Salesman Problem (Decision Version): Given a set of cities and distances
between them, is there a tour of length at most kkk?
2. Boolean Satisfiability Problem (SAT): Given a Boolean formula, is there an
assignment of truth values to variables that makes the formula true?
3. Clique Problem: Given a graph and an integer kkk, is there a clique (a subset of
vertices, all of which are adjacent to each other) of size kkk?
4. Subset Sum Problem: Given a set of integers, is there a non-empty subset whose sum
is zero?

NP-Hard Problems (Not Necessarily in NP)

1. Halting Problem: Given a program and an input, does the program halt or run forever?
(This problem is undecidable and not in NP, but it's NP-Hard.)
2. General Travelling Salesman Problem (Optimization Version): Finding the shortest
possible route that visits each city and returns to the origin city. (Optimization problems
like these are generally NP-Hard but not in NP because the solution is not just a yes/no
answer.)

Reductions

To show that a problem is NP-Complete, you generally need to:

1. Show it is in NP: Demonstrate that a candidate solution can be verified in polynomial


time.
2. Polynomial-time Reduction: Reduce a known NP-Complete problem to your problem
in polynomial time, showing that solving your problem can solve the NP-Complete
problem.

Summary

● NP-Hard Problems: At least as hard as the hardest problems in NP. Solving any
NP-Hard problem in polynomial time would solve all NP problems in polynomial time.
● NP-Complete Problems: Problems that are both in NP and NP-Hard. These are the
hardest problems in NP, and finding a polynomial-time solution to any NP-Complete
problem would imply P = NP.

Understanding NP-Hard and NP-Complete problems helps in recognizing the inherent difficulty
of many computational problems and guides the development of algorithms and heuristics for
practical solutions.

Brute Force String Matching


Brute Force String Matching is the simplest and most straightforward string matching algorithm.
It searches for occurrences of a "pattern" string within a "text" string by checking for a match
starting at each position in the text.

Problem Statement

Given a text TTT of length nnn and a pattern PPP of length mmm, find all occurrences of PPP in
TTT.

Approach

The brute force approach involves checking for the pattern PPP starting from each possible
position in the text TTT. For each position, it checks if the substring of TTT starting at that
position matches PPP.

Pseudo-code
function BruteForceStringMatching(T, P):
n = length(T)
m = length(P)

for i from 0 to n - m:
j = 0
while j < m and T[i + j] == P[j]:
j += 1
if j == m:
print("Pattern found at index", i)

Explanation

1. Initialization:
○ nnn: Length of the text TTT.
○ mmm: Length of the pattern PPP.
2. Outer Loop: Iterate over each possible starting position iii in the text TTT where PPP
could potentially match. This loop runs from 000 to n−mn - mn−m because if the pattern
starts at n−m+1n - m + 1n−m+1 or later, there aren't enough characters left in the text for
the pattern to fit.
3. Inner Loop: For each starting position iii, check each character in PPP against the
corresponding character in TTT. The inner loop increments jjj as long as the characters
match.
4. Check Match: If the inner loop completes and jjj equals mmm, then PPP matches TTT
starting at index iii.
5. Output: Print the starting index iii whenever a match is found.

Time Complexity

● Worst-case Time Complexity: O(n⋅m)O(n \cdot m)O(n⋅m)


○ The outer loop runs n−m+1n - m + 1n−m+1 times, which is O(n)O(n)O(n).
○ The inner loop, in the worst case, runs mmm times for each position in the outer
loop.
○ Thus, the worst-case time complexity is O(n⋅m)O(n \cdot m)O(n⋅m).

Space Complexity

● Space Complexity: O(1)O(1)O(1)


○ The space complexity is constant since only a few variables are used and no
additional space proportional to the input size is required.

Example

Consider the following example:

● Text TTT: "ABABABAB"


● Pattern PPP: "ABAB"

Let's trace the brute force algorithm on this input:

1. i=0i = 0i=0: Compare "ABAB" with "ABAB" (match found at index 0).
2. i=1i = 1i=1: Compare "BABA" with "ABAB" (no match).
3. i=2i = 2i=2: Compare "ABAB" with "ABAB" (match found at index 2).
4. i=3i = 3i=3: Compare "BABA" with "ABAB" (no match).
5. i=4i = 4i=4: Compare "ABAB" with "ABAB" (match found at index 4).

The matches are found at indices 0, 2, and 4.

Summary

Brute Force String Matching is easy to understand and implement but can be inefficient for large
texts and patterns due to its O(n⋅m)O(n \cdot m)O(n⋅m) time complexity. More efficient
algorithms like the Knuth-Morris-Pratt (KMP) or Boyer-Moore algorithms are often used in
practice for large-scale string matching problems. However, the brute force approach is useful in
scenarios where the simplicity of implementation outweighs the need for efficiency.

Knuth-Morris-Pratt (KMP) Algorithm


The Knuth-Morris-Pratt (KMP) algorithm is an efficient string matching algorithm that improves
upon the brute force approach by avoiding unnecessary comparisons. It does this through
preprocessing the pattern to create a partial match table (also known as the "lps" array) which
allows the algorithm to skip portions of the text.

Problem Statement

Given a text TTT of length nnn and a pattern PPP of length mmm, find all occurrences of PPP in
TTT.

Approach

The KMP algorithm consists of two main parts:

1. Preprocessing the Pattern: Construct the longest prefix suffix (lps) array.
2. Searching the Text: Use the lps array to perform efficient matching.

Pseudo-code

Step 1: Constructing the LPS Array

The lps array is constructed to record the longest proper prefix of the pattern that is also a suffix
for each position in the pattern. This helps in determining the next positions to be compared
when a mismatch occurs.

function computeLPSArray(P):
m = length(P)
lps = array of size m initialized to 0
length = 0 // length of previous longest prefix suffix
i = 1

while i < m:
if P[i] == P[length]:
length += 1
lps[i] = length
i += 1
else:
if length != 0:
length = lps[length - 1]
else:
lps[i] = 0
i += 1
return lps

Step 2: KMP Search

Using the lps array, perform the pattern search on the text.

function KMPSearch(T, P):


n = length(T)
m = length(P)
lps = computeLPSArray(P)

i = 0 // index for T
j = 0 // index for P

while i < n:
if P[j] == T[i]:
i += 1
j += 1

if j == m:
print("Pattern found at index", i - j)
j = lps[j - 1]
elif i < n and P[j] != T[i]:
if j != 0:
j = lps[j - 1]
else:
i += 1

Explanation

1. computeLPSArray(P):
○ Initialize the lps array and set the length of the previous longest prefix suffix to 0.
○ Iterate over the pattern. If characters match, increment the length and update the
lps array.
○ If characters do not match and the length is not zero, update the length to the
value at the previous index in the lps array.
○ If characters do not match and the length is zero, set the current lps value to 0
and move to the next index.
2. KMPSearch(T, P):
○ Initialize indices for the text and pattern.
○ Iterate over the text. If characters match, move both indices forward.
○ If the entire pattern matches, record the match position and use the lps array to
update the pattern index.
○ If characters do not match and the pattern index is not zero, use the lps array to
update the pattern index.
○ If characters do not match and the pattern index is zero, move the text index
forward.

Time Complexity

● Time Complexity:
○ Preprocessing the pattern (computing the lps array): O(m)O(m)O(m)
○ Searching the text: O(n)O(n)O(n)
○ Total time complexity: O(n+m)O(n + m)O(n+m)
● Space Complexity:
○ The lps array requires O(m)O(m)O(m) space.
○ Overall space complexity: O(m)O(m)O(m)

Example

Consider the following example:

● Text TTT: "ABABDABACDABABCABAB"


● Pattern PPP: "ABABCABAB"
1. Compute the LPS Array:
○ For pattern "ABABCABAB", the lps array is [0, 0, 1, 2, 0, 1, 2, 3, 4].
2. Search the Text:
○ Start comparing the pattern with the text.
○ When a mismatch occurs, use the lps array to skip unnecessary comparisons.
○ Matches are found at indices 10.

Summary

The Knuth-Morris-Pratt (KMP) algorithm efficiently solves the string matching problem by
preprocessing the pattern to create an lps array. This allows the algorithm to skip unnecessary
comparisons during the search phase, resulting in a linear time complexity relative to the length
of the text and pattern. This makes KMP a significant improvement over the brute force string
matching approach.
Rabin-Karp Algorithm

The Rabin-Karp algorithm is a string searching algorithm that uses hashing to find occurrences
of a pattern within a text efficiently. It computes the hash value of the pattern and then slides a
window over the text, computing the hash value of each window and comparing it with the hash
value of the pattern.

Problem Statement

Given a text TTT of length nnn and a pattern PPP of length mmm, find all occurrences of PPP in
TTT.

Approach

The Rabin-Karp algorithm uses hashing to efficiently find matches between the pattern and
substrings of the text. It works as follows:

1. Hashing: Compute the hash value of the pattern PPP and of each window of length
mmm in the text TTT.
2. Comparison: Compare the hash value of each window with the hash value of the
pattern. If the hash values match, perform a character-by-character comparison to
confirm the match.
3. Slide Window: Move the window one position to the right and recompute the hash value
of the new window.

Pseudo-code

function RabinKarp(T, P):


n = length(T)
m = length(P)
hash_P = hash(P)
hash_T = hash(T[0:m]) // Compute hash of first window

for i from 0 to n - m:
if hash_T == hash_P and T[i:i+m] == P:
print("Pattern found at index", i)
if i < n - m:
hash_T = rehash(hash_T, T[i], T[i+m])

function hash(s):
h = 0
for c in s:
h = (h * BASE + ord(c)) % PRIME
return h

function rehash(old_hash, old_char, new_char):


new_hash = (old_hash - ord(old_char) * pow(BASE, m-1)) % PRIME
new_hash = (new_hash * BASE + ord(new_char)) % PRIME
return new_hash

Explanation

1. Hashing:
○ The hash function computes the hash value of a string using a rolling hash
technique, where each character is multiplied by a base value and summed.
○ The hash value is calculated modulo a prime number to prevent overflow and
ensure uniform distribution of hash values.
2. Comparison:
○ Compare the hash value of each window of length mmm with the hash value of
the pattern.
○ If the hash values match, perform a character-by-character comparison to
confirm the match.
3. Slide Window:
○ Move the window one position to the right by removing the leftmost character and
adding the next character from the text.
○ Recompute the hash value of the new window using a rolling hash technique.

Time Complexity

● Average-case Time Complexity: O(n+m)O(n + m)O(n+m)


○ Computing the hash values of the pattern and the first window of the text takes
O(m)O(m)O(m) time.
○ Comparing the hash values and performing character-by-character comparisons
for potential matches takes O(n−m)O(n - m)O(n−m) time.
○ Total time complexity: O(n+m)O(n + m)O(n+m)
● Worst-case Time Complexity: O(n⋅m)O(n \cdot m)O(n⋅m)
○ In the worst case, all hash values match, and character-by-character
comparisons are needed for each potential match.
● Space Complexity: O(1)O(1)O(1)
○ The space complexity is constant since only a few variables are used and no
additional space proportional to the input size is required.

Example
Consider the following example:

● Text TTT: "ABABDABACDABABCABAB"


● Pattern PPP: "ABABCABAB"
1. Hashing:
○ Hash value of pattern PPP: h(P)=hash("ABABCABAB")h(P) =
\text{hash}(\text{"ABABCABAB"})h(P)=hash("ABABCABAB")
○ Hash value of first window of text:
h("ABABDABACD")h(\text{"ABABDABACD"})h("ABABDABACD")
2. Comparison:
○ Compare hash values: h(P)h(P)h(P) and
h("ABABDABACD")h(\text{"ABABDABACD"})h("ABABDABACD") do not match.
○ Slide window to the right.
3. Slide Window:
○ Remove leftmost character ("A") and add next character ("B") to form new
window: "BABDABACDA"\text{"BABDABACDA"}"BABDABACDA"
○ Recompute hash value of new window:
h("BABDABACDA")h(\text{"BABDABACDA"})h("BABDABACDA")
4. Repeat Comparison and Sliding:
○ Continue comparing hash values and sliding the window until a match is found or
until the end of the text is reached.

Summary

The Rabin-Karp algorithm efficiently finds occurrences of a pattern within a text by using
hashing to compare substrings. It has an average-case time complexity of O(n+m)O(n +
m)O(n+m) and a worst-case time complexity of O(n⋅m)O(n \cdot m)O(n⋅m), making it suitable
for practical use in many scenarios. However, it may not perform well with patterns and texts of
very large sizes due to potential hash collisions.

Naive String Matching Algorithm

The Naive String Matching algorithm is the simplest approach to find all occurrences of a
pattern within a text. It involves checking for a match starting from each position in the text by
comparing the pattern with the substring of the text starting at that position.

Problem Statement

Given a text TTT of length nnn and a pattern PPP of length mmm, find all occurrences of PPP in
TTT.
Approach

The Naive String Matching algorithm works as follows:

1. Iterate over Text: For each position iii in the text TTT from 000 to n−mn - mn−m:
○ Check if the substring of length mmm starting at position iii matches the pattern
PPP.
○ If a match is found, record the index iii.

Pseudo-code
function NaiveStringMatching(T, P):
n = length(T)
m = length(P)

for i from 0 to n - m:
j = 0
while j < m and T[i + j] == P[j]:
j += 1
if j == m:
print("Pattern found at index", i)

Explanation

1. Initialization:
○ nnn: Length of the text TTT.
○ mmm: Length of the pattern PPP.
2. Outer Loop: Iterate over each possible starting position iii in the text TTT where PPP
could potentially match. This loop runs from 000 to n−mn - mn−m because if the pattern
starts at n−m+1n - m + 1n−m+1 or later, there aren't enough characters left in the text for
the pattern to fit.
3. Inner Loop: For each starting position iii, check each character in PPP against the
corresponding character in TTT. The inner loop increments jjj as long as the characters
match.
4. Check Match: If the inner loop completes and jjj equals mmm, then PPP matches TTT
starting at index iii.
5. Output: Print the starting index iii whenever a match is found.

Time Complexity

● Time Complexity: O(n⋅m)O(n \cdot m)O(n⋅m)


○ The outer loop runs n−m+1n - m + 1n−m+1 times.
○ The inner loop, in the worst case, runs mmm times for each position in the outer
loop.
○ Thus, the time complexity is O(n⋅m)O(n \cdot m)O(n⋅m).
● Space Complexity: O(1)O(1)O(1)
○ The space complexity is constant since only a few variables are used and no
additional space proportional to the input size is required.

Example

Consider the following example:

● Text TTT: "ABABDABACDABABCABAB"


● Pattern PPP: "ABAB"

Let's trace the Naive String Matching algorithm on this input:

1. i=0i = 0i=0: Compare "ABAB" with "ABAB" (match found at index 0).
2. i=1i = 1i=1: Compare "BABA" with "ABAB" (no match).
3. i=2i = 2i=2: Compare "ABAB" with "ABAB" (match found at index 2).
4. i=3i = 3i=3: Compare "BABA" with "ABAB" (no match).
5. Continue this process until the end of the text.

The matches are found at indices 0 and 2.

Summary

The Naive String Matching algorithm is straightforward to implement but may not be the most
efficient for large texts and patterns due to its O(n⋅m)O(n \cdot m)O(n⋅m) time complexity.
However, it serves as a baseline for more sophisticated string matching algorithms and can be
useful for small-scale problems where simplicity is prioritized over efficiency.

Finite Automata, also known as Finite State Machines (FSMs), are computational models that
can be in exactly one of a finite number of states at any given time. They are widely used in
various areas, including computer science, linguistics, and circuit design, to model systems with
discrete inputs and outputs.

Definition

A Finite Automaton consists of the following components:

1. States: A finite set of states, denoted QQQ.


2. Alphabet: A finite set of input symbols, denoted Σ\SigmaΣ.
3. Transition Function: A function that maps a state and an input symbol to a new state. It
is denoted δ:Q×Σ→Q\delta: Q \times \Sigma \rightarrow Qδ:Q×Σ→Q.
4. Start State: One of the states from QQQ designated as the start state, denoted
q0q_0q0​.
5. Accepting States: A subset of states from QQQ designated as accepting states (or final
states).

Types of Finite Automata

1. Deterministic Finite Automaton (DFA):


○ In a DFA, for each state and input symbol, there is exactly one transition to a new
state.
○ Formally, for all q∈Qq \in Qq∈Q and a∈Σa \in \Sigmaa∈Σ, δ(q,a)\delta(q,
a)δ(q,a) is a single state.
2. Nondeterministic Finite Automaton (NFA):
○ In an NFA, for each state and input symbol, there can be multiple transitions to
new states (or no transition at all).
○ Formally, for all q∈Qq \in Qq∈Q and a∈Σa \in \Sigmaa∈Σ, δ(q,a)\delta(q,
a)δ(q,a) is a set of states.

Working Principle

1. Initialization: Start in the start state q0q_0q0​.


2. Input Processing: For each input symbol, transition to a new state according to the
transition function δ\deltaδ.
3. Acceptance: If the machine reaches an accepting state after processing the entire input,
accept the input; otherwise, reject it.

Example

Consider the following DFA:

● States QQQ: { q0q_0q0​, q1q_1q1​, q2q_2q2​}


● Alphabet Σ\SigmaΣ: { 0, 1 }
● Transition Function δ\deltaδ:
○ δ(q0,0)=q0\delta(q_0, 0) = q_0δ(q0​,0)=q0​, δ(q0,1)=q1\delta(q_0, 1) =
q_1δ(q0​,1)=q1​
○ δ(q1,0)=q2\delta(q_1, 0) = q_2δ(q1​,0)=q2​, δ(q1,1)=q1\delta(q_1, 1) =
q_1δ(q1​,1)=q1​
○ δ(q2,0)=q2\delta(q_2, 0) = q_2δ(q2​,0)=q2​, δ(q2,1)=q1\delta(q_2, 1) =
q_1δ(q2​,1)=q1​
● Start State q0q_0q0​: q0q_0q0​
● Accepting States: { q2q_2q2​}

This DFA accepts binary strings ending with '10'.


Applications

1. Lexical Analysis (Tokenization): Recognizing keywords, identifiers, and operators in


programming languages.
2. Regular Expression Matching: Implementing pattern matching algorithms.
3. Protocol Specification: Modeling communication protocols in networking.
4. Hardware Design: Designing digital circuits and controllers.
5. Natural Language Processing: Analyzing syntax and semantics in linguistic analysis.

Summary

Finite Automata provide a formal framework for modeling systems with discrete states and
inputs. They are used in various fields for tasks ranging from language processing to hardware
design. Understanding Finite Automata is fundamental in computer science, especially in the
study of formal languages, automata theory, and compiler construction.

Boyer Moore algorithm


The Boyer-Moore algorithm is a highly efficient string searching algorithm known for its simplicity
and practical performance. It is particularly effective for searching for occurrences of a pattern
within a text, especially in cases where the pattern length is much smaller than the text length.

Overview

The Boyer-Moore algorithm works by comparing the pattern with the text from right to left,
skipping over portions of the text based on a set of heuristic rules, thereby avoiding
unnecessary character comparisons.

Key Components

1. Bad Character Rule: If a mismatch occurs, the algorithm shifts the pattern to align the
last occurrence of the mismatched character in the pattern with the mismatched
character in the text. This rule aims to skip as many characters as possible.
2. Good Suffix Rule: If a mismatch occurs after a partial match, the algorithm shifts the
pattern based on the occurrence of a good suffix in the pattern. This rule aims to skip
over the maximum possible characters while maintaining alignment with the text.
3. Preprocessing: The Boyer-Moore algorithm preprocesses the pattern to compute two
lookup tables:
○ Bad Character Table: Records the rightmost occurrence of each character in the
pattern.
○ Good Suffix Table: Helps in determining the shift amount based on the
occurrence of good suffixes in the pattern.
Pseudo-code
function BoyerMoore(T, P):
n = length(T)
m = length(P)
bad_character_table = computeBadCharacterTable(P)
good_suffix_table = computeGoodSuffixTable(P)

i = 0 // index for T
while i <= n - m:
j = m - 1 // index for P
while j >= 0 and P[j] == T[i + j]:
j -= 1
if j < 0:
print("Pattern found at index", i)
i += good_suffix_table[0] if i + m < n else 1
else:
char_shift = j - bad_character_table[T[i + j]]
suffix_shift = good_suffix_table[j]
i += max(char_shift, suffix_shift, 1)

function computeBadCharacterTable(P):
table = dictionary mapping each character to its rightmost
occurrence in P
return table

function computeGoodSuffixTable(P):
m = length(P)
suffixes = computeSuffixes(P)
border = computeBorder(suffixes)
table = array of size m initialized to m (default shift)
for i from 0 to m - 1:
j = m - 1 - suffixes[i] // length of the matched suffix
table[j] = min(table[j], i - suffixes[i] + 1)
for i from 0 to m - 2:
j = m - 1 - border[i] // length of the border
table[j] = min(table[j], m - 1 - i)
return table
function computeSuffixes(P):
// Implementation of the suffix generation algorithm (not shown
here)
// Returns an array of length m containing the lengths of the
longest suffixes for each position in P

function computeBorder(A):
// Implementation of the border generation algorithm (not shown
here)
// Returns an array of length m containing the lengths of the
longest borders for each position in A

Time Complexity

● Average-case Time Complexity: O(n+m)O(n + m)O(n+m)


● Worst-case Time Complexity: O(n⋅m)O(n \cdot m)O(n⋅m)
○ The worst-case occurs when all characters in the text match with the pattern
except for the last character, resulting in a comparison at every position.
● Space Complexity: O(m+alphabet size)O(m + \text{{alphabet size}})O(m+alphabet size)

Example

Consider the following example:

● Text TTT: "ABABDABACDABABCABAB"


● Pattern PPP: "ABABCABAB"

Let's trace the Boyer-Moore algorithm on this input to find occurrences of the pattern within the
text.

Summary

The Boyer-Moore algorithm is a powerful and widely used string searching algorithm known for
its efficiency, particularly in practical scenarios with large texts and patterns. It achieves this
efficiency through clever heuristic rules and preprocessing of the pattern. While it may have a
higher worst-case time complexity compared to some other algorithms, its practical performance
often outperforms them in real-world applications.

You might also like