Daa Notes All V Units
Daa Notes All V Units
UNIT - I
Define the term: Algorithm
The algorithm is a set of rules defined in specific order to do certain
computation and carry out some predefined task. It is a step by step procedure
to solve the problem
State the properties of Algorithm
Following are the properties of algorithm:
Input
Output
Definiteness
Finiteness
Effectiveness
Define the term: Time Complexity
The amount of time required to solve the given problem is called the time
complexity of that algorithm
Define the term: Space Complexity
The amount of memory required to solve the given problem is called the space
complexity of that algorithm
Enlist few problem solving strategies
Following are some of the popular problem solving strategies:
String matching algorithms
Number theory and numerical algorithms
Divide and conquer
Linear programming
Dynamic programming
Greedy methods
Sorting
What is Asymptotic Analysis?
Asymptotic notations are the mathematical tools that can be used to determine
the time or space complexity of an algorithm without having to implement it in
a programming language. This measure is unaffected by machine-specific
constants. It is a way of describing a significant part of the cost of the
algorithm.
The asymptotic notations examine algorithms that are not affected by any of
these factors.
Finding minimum element from an array of size n takes maximum n
comparisons.
This algorithm’s asymptotic complexity is linear (in order of n). The primary
term in the complexity formula is this linear behavior.
The primitive operation is the most frequent operation appearing in the
algorithm and it depends on the problem. The primitive operation is the major
component of cost.
The fundamental process for sorting and searching problems is a comparison.
The primitive operation for adding arrays or matrices is addition. The primitive
operation for the factorial problem is multiplication.
2
n 1 2 3 4 5 6 7
f(n) 7 13 23 37 55 77 103
g(n
10 20 30 40 50 60 70
)
Algorithm A may outperform algorithm B for small input sizes, however when
input sizes become sufficiently big (in this example n = 5), f(n) always runs
slower (performs more steps) than g(n). As a result, understanding the growth
rate of functions is critical.
Asymptotic notations describe the function’s limiting behavior. For example, if
the function f(n) = 8n2 + 4n – 32, then the term 4n – 32 becomes insignificant
as n increases. As a result, the n2 term limits the growth of f(n).
When doing complexity analysis, the following assumptions are assumed.
Assumptions:
The actual cost of operation is not considered.
Abstract cost c is ignored : O(c. n 2) reduces to O(n2)
Only leading term of a polynomial is considered : O(n 3 + n) reduces to
O(n3)
Drop multiplicative or divisive constant if any : O(2n 2) and O(n2/2) both
reduces to O(n2).
Types of Asymptotic Notations
Various notations like Big Oh (Ο), Big Omega (Ω), Big Theta (Θ), Little Oh (ο),
Little Omega (ω) are used to describe the asymptotic running time of the
algorithm
3
For small input size, there may be many cross overs between the growth rate
of f(n) and c.g(n), but once n becomes significantly large, f(n) grows always
slower than c.g(n). This value of n is called crossover point and is denoted as
n0.
Big Omega Notation (Ω)
This notation is denoted by ‘Ω’, and it is pronounced as “Big Omega”. Big
Omega notation defines lower bound for the algorithm. It means the running
4
time of algorithm cannot be less than its asymptotic lower bound for any
random sequence of data.
Let f(n) and g(n) are two nonnegative functions indicating the running time of
two algorithms. We say the function g(n) is lower bound of function f(n) if there
exist some positive constants c and n 0 such that 0 c.g(n) ≤ f(n) for all n ≥ n 0.
It is denoted as f(n) = Ω (g(n)).
Big Theta Notation (Θ)
This notation is denoted by ‘Θ’, and it is pronounced as “Big Theta”. Big
Theta notation defines tight bound for the algorithm. It means the running
time of algorithm cannot be less than or greater than it’s asymptotic tight
bound for any random sequence of data
Let f(n) and g(n) are two nonnegative functions indicating running time of two
algorithms. We say the function g(n) is tight bound of function f(n) if there exist
some positive constants c1, c2, and n0 such that 0 ≤ c1× g(n) ≤ f(n) ≤ c2× g(n)
for all n ≥ n0. It is denoted as f(n) = Θ (g(n)).
For big oh, big omega and big theta, g(n) is not a single function but it’s a set
of functions which always grow quickly, slowly or at the same rate of f(n)
respectively for n ≥ n0.
Properties of Asymptotic Notations
Asymptotic notations satisfy the following properties
Correlation:
f(n) = Ο(g(n) ) → f ≤ g
5
Binary Search
The binary search method is more efficient than the linear search method. The
array must be sorted for binary search, which is not necessary for linear search.
Binary search is a search strategy based on divide and conquer. The method
divides the list into two halves at each step and checks weather the element to
be searched is on the top or lower side of the array. If the element is located in
the center, the algorithm returns.
Let us assign minimum and maximum index of the array to variables low and
high, respectively. The middle index is computed as (low + high)/2.
In every iteration, the algorithm compares the middle element of the array with
a key to be searched. Initial range of search is A[0] to A[n – 1]. When the key is
compared with A [mid], there are three possibilities :
1. The array is already sorted, so if key < A[mid] then key cannot be
present in the bottom half of the array. Search space of problem will be
reduced by setting the high index to (mid – 1). New search range would
be A[low] to A[mid – 1].
2. If key > A[mid] then the key cannot be present in the upper half of the
array. Search space of problem will be reduced by moving the low index
to (mid + 1). New search range would be A[mid + 1] to A[high]
3. If key = A[mid], the search is successful and algorithm halts.
This process is repeated till index low is less than high or element is found.
6
low ← 1
high ← n
whilelow < high do
mid ← (low + high) / 2
ifA[mid] == key then
return mid
else if A[mid] < key then
low ← mid + 1
else
high ← mid – 1
end
end
return0
Binary search reduces search space by half in every iteration. In a linear
search, the search space was reduced by one only.
If there are n elements in an array, binary search, and linear search have to
search among (n / 2) and (n – 1) elements respectively in the second iteration.
In the third iteration, the binary search has to scan only (n / 4) elements,
whereas linear search has to scan (n – 2) elements. This shows that a binary
search would hit the bottom very quickly.
The complexity of linear search and binary search for all three cases is
compared in the following table.
low = 1, high = 8,
mid = low + high) / 2 = (1 + 8)/ 2 = 9/2 = 4
A[mid] = A[4] = 44
As Key <A[4], update high
high = mid – 1 = 4 – 1 = 3
Iteration 2 :New search space
low = 1, high = 3,
mid = (low + high) / 2 = (1 + 3)/ 2= 4/2 = 2
A[mid] = A[2] = 22
As Key >A[2], update low
low = mid + 1= 2 + 1 = 3
Iteration 3 :New search space
low = 3, high = 3,
mid = (low + high) / 2 = (3 + 3)/ 2= 6/2 = 3
A[mid] = A[3] = 33
Key = A[3], An element found, so return mid
Quick Sort
Quick sort is a sorting technique that has the ability to break a massive data
array into smaller ones in order to save time. Here, in the case of the quick sort
algorithm, an extensive array is divided into two arrays, one of which has
8
values less than the specified value, say pivot. The pivot value is bigger than
the other.
Working of Quick Sort Algorithm
Now, it’s time to see the working of a quick sort algorithm, and for that, we
need to take an unsorted array.
Let the components of the array are –
Here,
L= Left
R = Right
P = Pivot
In the given series of arrays, let’s assume that the leftmost item is the pivot.
So, in this condition, a[L] = 23, a[R] = 26 and a[P] = 23.
Since, at this moment, the pivot item is at left, so the algorithm initiates from
right and travels towards left.
Now, a[P] < a[R], so the algorithm travels forward one position towards left, i.e.
–
Now, a[L] = 18, a[R] = 23, and a[P] = 23. Since the pivot is at right, so the
algorithm begins from left and travels to right.
As a[P] > a[L], so algorithm travels one place to right as –
9
Now, a[L] = 8, a[R] = 23, and a[P] = 23. As a[P] > a[L], so algorithm travels
one place to right as –
Now, a[L] = 28, a[R] = 23, and a[P] = 23. As a[P] < a[L], so, swap a[P] and
a[L], now pivot is at left, i.e. –
Since the pivot is placed at the leftmost side, the algorithm begins from right
and travels to left. Now, a[L] = 23, a[R] = 28, and a[P] = 23. As a[P] < a[R], so
algorithm travels one place to left, as –
Now, a[P] = 23, a[L] = 23, and a[R] = 13. As a[P] > a[R], so, exchange a[P] and
a[R], now pivot is at right, i.e. –
Now, a[P] = 23, a[L] = 13, and a[R] = 23. Pivot is at right, so the algorithm
begins from left and travels to right.
Now, a[P] = 23, a[L] = 23, and a[R] = 23. So, pivot, left and right, are pointing
to the same element. It represents the termination of the procedure.
Item 23, which is the pivot element, stands at its accurate position.
Items that are on the right side of element 23 are greater than it, and the
elements that are on the left side of element 23 are smaller than it.
10
Now, in a similar manner, the quick sort algorithm is separately applied to the
left and right sub-arrays. After sorting gets done, the array will be –
In the below algorithm, ar is the given array, start is the starting element, and finish is the last
component of the array.
The given array will initially be divided into two parts in accordance with the
merge sort. The list is repeatedly split into two equal pieces in the merge sort
algorithm until it can no longer be split.
There are a total of eight elements in the array below, which has been
separated into four array sizes (each).
Now divide these two arrays once more into two sections. These arrays are 4
bytes in size. We’ll divide them into two equal pieces.
To get the final value that we can’t split any further, we shall split these arrays
once more.
In the subsequent iteration, compare the arrays holding the two data values
and combine them into a single array of found values in sorted order.
The arrays are finally combined at this point. The array will now appear as
follows:
Average
O(n*logn)
Case
fori ← 1 to n do
forj ← 1 to n do
C[i][j] ← 0
fork ← 1 to n do
C[i][j] ← C[i][j] + A[i][k]*B[k][j]
end
end
end
Complexity Analysis
The innermost statement is enclosed within three for loops. That statement
runs in constant time, i.e. in O(1) time. Running time of algorithm would be,
13
C11 = S1 + S4 – S5 + S7
C12 = S3 + S5
C21 = S2 + S4
C22 = S1 + S3 – S2 + S6
Where,
S1 = (A11 + A22) * (B11 + B22)
S2 = (A21 + A22) * B11
S3 = A11 * (B12 – B22)
S4 = A22 * (B21 – B11)
S5 = (A11 + A12) * B22
S6 = (A21 – A11) * (B11 + B12)
S7 = (A12 – A22) * (B21 + B22)
Let us check if it is same as the conventional approach.
C12 = S3 + S5
= A11 * (B12 – B22) + (A11 + A12) * B22
= A11 * B12 – A11 * B22 + A11 * B22 + A12 * B22
= A11 * B12 + A12 * B22
This is same as C12 derived using conventional approach. Similarly we can derive all Cij for
strassen’s matrix multiplication.
Complexity Analysis of Matrix Multiplication using Divide and Conquer approach
The conventional approach performs eight multiplications to multiply two matrices of size 2 x 2.
Whereas Strassen’s approach performs seven multiplications on the problem of size 1 x 1, which in
turn finds the multiplication of 2 x 2 matrices using addition. In short, to solve the problem of size
n, Strassen’s approach creates seven problems of size (n – 2). Recurrence equation for Strassen’s
approach is given as,
T(n) = 7.T(n/2)
Two matrices of size 1 x 1 needs only one multiplication, so the base case would be, T (1) = 1. Let
us find the solution using the iterative approach. By substituting n = (n / 2) in above equation,
⇒T(n) = 72.T(n/22)
T(n/2) = 7.T(n/4)
.
.
.
T (n) = 7k .T(2k/2k)
= 7k .T (1)
= 7k
14
= 7log2 n
= nlog2 7
= n2.81 < n3
Thus, running time of Strassen’s matrix multiplication algorithm O(n 2.81), which is less than cubic
order of traditional approach. The difference between running time becomes significant when n is
large.
Example of Matrix Multiplication using Divide and Conquer Approach
Problem: Multiply given two matrices A and B using Strassen’s
approach, where
15
16
17
UNIT - II
Disjoint set operations, union and find algorithms
A disjoint set data structure is a data structure that stores a list of disjoint
sets. In other words, this data structure divides a set into multiple subsets
- such that no 2 subsets contain any common element. They are also
called union-find data structures or merge-find sets.
For example: if the initial set is [1,2,3,4,5,6,7,8].
A Disjoint Set Data Structure might partition it as - [(1,2,4), (3,5), (6,8),
(7)].
This contains all of the elements of the original set, and no 2 subsets have
any element in common.
The following partitions would be invalid:
[(1,2,3),(3,4,5),(6,8),(7)] - invalid because 3 occurs in 2 subsets.
[(1,2,3),(5,6),(7,8)] -invalid as 4 is missing.
Representative Member
Each subset in a Disjoint Set Data Structure has a representative
member. More commonly, we call this the parent. This is simply the
member responsible for identifying the subset.
Let's understand this with an example.
Suppose this is the partitioning of sets in our Disjoint Set Data structure.
Union/Merge
Suppose, in our example - a new express train started from Delhi to Agra.
This would mean that all the cities connected to Delhi are now connected
to all the cities connected to Agra.
This simply means that we can merge the subsets of Agra and Delhi into
a single subset. It will look like this.
One key issue here is which out of Agra and Delhi will be the new Parent?
For now, we will just choose any of them, as it will be sufficient for our
purposes. Let's take Delhi for now.
Further optimizations can be made to this by choosing the parent
based on which subset has a larger number of values.
19
This is the new state of our Disjoint Set Data Structure. Delhi is the parent
of Delhi, Gurgaon, Noida, Agra, Vrindavan, and Mathura.
This would mean that we would have to change the parent of Agra,
Vrindavan, and Mathura (basically, all the children of Agra), to Delhi. This
would be linear in time. If we perform the union operation m times, and
each union takes O(n) times, we would get a quadratic O(mn) time
complexity. This is not optimized enough.
Instead, we structure it like this:
Earlier, Agra was its own parent. Now, Delhi is the parent of Agra.
Now, you might be wondering - The parent of Vrindavan is still Agra. It
should have been Delhi now.
This is where the next operation helps us - the find operation.
Find
The find operation helps us find the parent of a node. As we saw above,
the direct parent of a node might not be its actual (logical) parent. E.g.,
the logical parent of Vrindavan should be Delhi in the above example. But
its direct parent is Agra.
So, how do we find the actual parent?
The find operation helps us to find the actual parent. In pseudocode, the
find operation looks like this:
find(node):
if (parent(node)==node) return node;
else return find(parent(node));
We don't return the direct parent of the node. We keep going up the tree
of parents until we find a node that is its own parent.
Let's understand this via our example.
20
Let's try to find the parent of K in this example. (A and D are their own
parents)
1. ParentOf(K) is I. Is I == K?
False. So, we move upward, now using I.
2. ParentOf(I) is C. Is C== I?
False. Move upward using C.
3. ParentOf(C) is A. Is A==C?
False. Move upward using A.
21
4. ParentOf(A) is A. Is A==A?
True! Our final answer is A.
Let's understand the use of a Disjoint Set Data Structure through a simple
application - Finding a cycle in an undirected graph.
Here, wewill use a disjoint set data structure to find whether an undirected
graph contains a cycle or not. We are assuming that our graph does not
contain any self-loops.
Let this be our graph. Each node is initialized as its own parent. The list
ofedges is shown on the top right, and the parents are shown at the
bottom. We have shown each edge only once for simplicity (i.e., 1-2 and
2-1 aren't shown separately)
Algorithm
We will go through each edge and do the following. Let the edge be u-v.
1. Find the parent of u and v. If both parents are the same, it means u
and v are part of the same subset, and we have found a cycle.v
2. If they are not the same, merge u and v. We don't merge u and v
directly. Instead, we merge the parents of u and v. This ensures that
all the siblings of u and all the siblings of v have the same common
parent.
We do this for all edges.
Walkthrough
1. 1-2 Edge.
Parent of 1 is 1, parent of 2 is 2. They are different, so we will merge
them. Set the parent of 2 as 1 (this is chosen randomly, setting the parent
of 1 as 2 would also have been correct ).
22
2. 1-4 Edge.
3. 2-5 Edge.
4. 3-6 Edge.
5. 5-6 Edge.
Note here - the direct parent of 6 is still 3. But, the actual parent becomes
1, because the parent of 3 is 1. (The dotted line is not an edge in the
graph, it shows that the parent of 3 is 1)
6. 6-2 Edge.
N Queen Problem
N Queen problem is the classical Example of backtracking. N-Queen
problem is defined as, “given N x N chess board, arrange N queens in
such a way that no two queens attack each other by being in same row,
column or diagonal”.
For N = 1, this is trivial case. For N = 2 and N = 3, solution is not
possible. So we start with N = 4 and we will generalize it for N
queens.
4-Queen Problem
Problem : Given 4 x 4 chessboard, arrange four queens in a way, such
that no two queens attack each other. That is, no two queens are placed
in the same row, column, or diagonal.
4 x 4 Chessboard
We have to arrange four queens, Q1, Q2, Q3 and Q4 in 4 x 4 chess
board. We will put ith queen in ith row. Let us start with position (1,
1). Q1 is the only queen, so there is no issue. partial solution is <1>
We cannot place Q2 at positions (2, 1) or (2, 2). Position (2, 3) is
acceptable. partial solution is <1, 3>.
Next, Q3 cannot be placed in position (3, 1) as Q1 attacks her. And
it cannot be placed at (3, 2), (3, 3) or (3, 4) as Q2 attacks her. There
is no way to put Q3 in third row. Hence, the algorithm backtracks
and goes back to the previous solution and readjusts the position of
queen Q2. Q2 is moved from positions (2, 3) to
(2, 4). Partial solution is <1, 4>
Now, Q3 can be placed at position (3, 2). Partial solution is <1, 4,
3>.
Queen Q4 cannot be placed anywhere in row four. So again,
backtrack to the previous solution and readjust the position of Q3.
Q3 cannot be placed on (3, 3) or(3, 4). So the algorithm backtracks
even further.
All possible choices for Q2 are already explored, hence the
algorithm goes back to partial solution <1> and moves the queen
Q1 from (1, 1) to (1, 2). And this process continues until a solution is
found.
All possible solutions for 4-queen are shown in fig (a) & fig. (b)
25
The solution of the 4-queen problem can be seen as four tuples (x 1, x2, x3,
x4), where xi represents the column number of queen Q i. Two possible
solutions for the 4-queen problem are (2, 4, 1, 3) and (3, 1, 4, 2).
Algorithm
Algorithm
N_QUEEN (k, n)
// Description : To find the solution of n x n queen problem using
backtracking
// Input :
n: Number of queen
k: Number of the queen being processed currently, initially set to 1.
for i ← 1 to n do
if PLACE(k , i) then
x[k] ← i
if k == n then
print X[1…n]
else
N_QUEEN(k + 1, n)
end
end
end
Function
PLACE(k, i)
// k is the number of queen being processed
// i is the number of columns
For j ← 1 to k – 1 do
if x[j] == i OR ((abs(x[j]) - i) == abs(j - k))then
return false
end
end
return true
28
Explicit Constraints:
xi = 1 if LBi ≤ xi ≤ UBi
xi = 0 otherwise
Implicit constraints:
Example
Solution:
Set X = [0, 0, 0, 0]
Sum > M, so backtrack and remove the previously added item from the
solution set.
The algorithm for solving the sum of subsets problem using recursion is stated below:
31
Algorithm
SUB_SET_PROBLEM(i, sum, W, remSum)
// Description : Solve sub of subset problem using backtracking
// Input :
W: Number for which subset is to be computed
i: Item index
sum : Sum of integers selected so far
remSum : Size of remaining problem i.e. (W – sum)
If FEASIBLE_SUB_SET(i) == 1 then
if (sum == W) then
print X[1…i]
end
else
X[i + 1] ← 1
SUB_SET_PROBLEM(i + 1, sum + w[i] + 1, W, remSum – w[i]
+1)
X[i + 1] ← 0 // Exclude the ith item
SUB_SET_PROBLEM(i + 1, sum, W, remSum – w[i] + 1 )
end
function
FEASIBLE_SUB_SET(i)
if (sum + remSum ≥ W) AND (sum == W) or (sum + w[i] + 1 ≤ W) then
return0
end
else
return 1
The first recursive call represents the case when the current item is
selected, and hence the problem size is reduced by w[i].
The second recursive call represents the case when we do not select the
current item.
Complexity Analysis
Design a timetable.
Sudoku
Register allocation in the compiler
Map coloring
Mobile radio frequency assignment:
C1 C2 C3 C4 C5
C1 0 1 0 1 0
C2 1 0 1 0 0
C3 0 1 0 1 1
C4 1 0 1 0 1
33
C5 0 0 1 0 0
Adjacency matrix for graph G
Thus, vertices A and C will be colored with color 1, and vertices B and D
will be colored with color 2.
Complexity Analysis
With M colors and n vertices, total number of nodes in state space tree
would be
1 + M + M2 + M3 + …. + Mn
Hence, T(n) = 1 + M + M2 + M3 + …. + Mn
UNIT -III
Floyd Warshall is all pair shortest path algorithm used to find the shortest path or
shortest distance between every pair of vertices in the given graph. The algorithm
is applicable on both directed and undirected graphs, but it fails with the graph
having negative cycles; a negative cycle means the sum of the edges in a cycle is
negative.
Example:
Make the distance matrix or graph matrix. Insert the distance value and if the
distance is unknown fill that with ∞.
D0 [B, C] = 8
D0 [B, D] = D0 [B, A] + D0 [A, D]
2>3+∞
2 > ∞….. No changes
D0 [B, D] = 2
D0 [C, B] = 4
D0 [C, D] = D0 [C, A] + D0 [A, D]
∞>∞+∞
∞ > ∞….. No changes
D0 [C, D] = ∞
D0 [D, B] = D0 [D, A] + D0 [A, B]
∞>∞+6
∞ > ∞….. No changes
D0 [D, B] = ∞
D0[D, C] = 7
Hence, the shortest path matrix using Floyd’s Warshall algorithm is as follows
Algorithm
4. Choose each vertex one by one and update all the shortest paths that
includes the selected vertex as an intermediate vertex in the shortest path.
5. If a vertex k is selected as an intermediate vertex, then {0, 1, 2 ….. k-1}
will be considered as an intermediate vertex automatically.
6. For every pair of vertex (i, j) i.e. source and destination respectively, two
cases are possible:
a. k is not an intermediate vertex between i and j. The value of the
distance between i and j dist[i][j] will remain the same.
b. if k is an intermediate vertex between i and j. The value of dist[i][j]
will become dist[i][k] + dist[k][j] if dist[i][j] > dist[i][k] + dist[k][j]
7. End.
Psuedo Code
n = no of vertices
D = matrix of dimension n*n
for k = 1 to n
for i = 1 to n
for j = 1 to n
Dk[i, j] = min (Dk-1[i, j], Dk-1[i, k] + Dk-1[k, j])
return D
Complexity
Following are the complexities in the algorithm:
Time Complexity: There are three for loops in the pseudo-code of the
algorithm, so the time complexity will be O (n^3).
Space Complexity: The space complexity of Floyd’s Warshall algorithm is O
(n^2).
39
0/1 knapsackproblem
Dynamic Programming approach
We are given N items where each item has some weight and profit
associated with it. We are also given a bag with capacity W, [i.e., the bag
can hold at most W weight in it]. The target is to put the items into the
bag such that the sum of profits associated with them is the maximum
possible.
The constraint here is we can either put an item completely into the bag
or cannot put it at all [It is not possible to put a part of an item into the
bag].
OBJECTS OB OB OB
1 2 3
PROFIT 10 12 28
WEIGHT 1 2 4
PROFIT/WEIGHT 10 6 7
Recursive Tree
In Binary search tree, the nodes in the left subtree have lesser
value than the root node and the nodes in the right subtree have
greater value than the root node.
If we know the key values and the frequencies of each node we can
determine the overall cost of searching a node. The cost of
searching is a very important factor in various applications. The
overall cost of searching a node should be less.
The time required to search a node in BST is more than the
balanced binary search tree as a balanced binary search tree
contains a lesser number of levels than the BST. There is one way
that can reduce the cost of a binary search tree is known as an
optimal binary search tree.
For example: 10, 20, 30 are the keys, and the following are the
binary search trees that can be made out from these keys.
2n
The Formula for calculating the number of trees: Cn
n+1
2X3
C3 6 C 3
= =5
3+1 4
If we find the cost for searching an element based on its level. The height
balanced binary search tree is providing the less cost.
Example:
KEY 1 2 3
0 0 0
FREQUENCY 3 2 5
Dynamic Approach
Consider the below table, which contains the keys and frequencies.
index 1 2 3 4
key value 10 20 30 40
freq 4 2 6 3
Obtain the cost value for each pair of keys and store the values in a table.
Step 1:
We will calculate the values where j-i is equal to zero.
Therefore, c[0, 0] = 0, c[1 , 1] = 0, c[2,2] = 0, c[3,3] = 0, c[4,4] = 0
Step 2:
Now we will calculate the values where j-i equal to 1.
Now to calculate the cost, we will consider only the jth value.
The cost of c[0,1] is 4 (The key is 10).
The cost of c[1,2] is 2 (The key is 20).
The cost of c[2,3] is 6 (The key is 30).
The cost of c[3,4] is 3 (The key is 40).
Step 3:
Now we will calculate the values where j-i = 2
In this case, we will consider two keys.
45
Step 4
Now we will calculate the values when j-i = 3
When j=3, i=0 then j-i = 3
When j=4, i=1 then j-i = 3
When i=0, j=3 then we consider 1,2,3 as roots and weight w[0,3] is
(4+2+6)
When i=1, j=4, then we consider 2,3,4 as roots and weight w[1,4] is
(2+6+3)
Step 5
When j=4 and i=0 then j-i = 4
In this case, we will consider four keys, i.e., 10, 20, 30 and 40. The
frequencies of 10, 20, 30 and 40 are 4, 2, 6 and 3 respectively.
w[0, 4] = 4 + 2 + 6 + 3 = 15
46
Optimal Substructure:
• Here the task can be divided into sub problems and the solutions to
the sub problems are combined to give the final solution
Overlapping sub problems
• Here if we observe the sub problems are repeated.
• Hence by using Tabulation method, we can avoid repetitive calls to
same sub problem solution. And the time complexity can be
reduced
Time Complexity:
• The time complexity of OBST ALGO. is O(n2) as we are traversing a
matrix of size n x n. Although we are traversing the upper triangular
half only, In asymptotic analysis it is taken as O(n 2)
Space Complexity:
• The space complexity will be O(n2) because we have created a dp
matrix of size n x n.
Algorithm: Step 1 : Read n symbols with probability Pi
EXAMPLE 2:
48
49
50
51
52
53
For the above graph, the optimal route is to follow the minimum cost path: 1-
2-4-3-1. And this shortest route would cost 10+25+30+15 =80
Adjacency matrix for the above graph is:
54
1) |S| = Φ:
cost (2, Φ, 1) = dist(2, 1) = 10
cost (3, Φ, 1) = dist(3, 1) = 15
cost (4, Φ, 1) = dist(4, 1) = 20
2) |S| = 1:
cost (2, {3}, 1) = dist(2, 3) + cost (3, Φ, 1) = 35+15 = 50
cost (2, {4}, 1) = dist(2, 4) + cost (4, Φ, 1) = 25+20 = 45
cost (3, {2}, 1) = dist(3, 2) + cost (2, Φ, 1) = 35+10 = 45
cost (3, {4}, 1) = dist(3, 4) + cost (4, Φ, 1) = 30+20 = 50
cost (4, {2}, 1) = dist(4, 2) + cost (2, Φ, 1) = 25+10 = 35
cost (4, {3}, 1) = dist(4, 3) + cost (3, Φ, 1) = 30+15 = 45
3) |S| = 2:
cost (2, {3, 4}, 1) = min [ dist[2,3]+Cost(3,{4},1) = 35+50 = 85,
dist[2,4]+Cost(4,{3},1) = 25+45 = 70 ] = 70
cost (3, {2, 4}, 1) = min [ dist[3,2]+Cost(2,{4},1) = 35+45 = 80,
dist[3,4]+Cost(4,{2},1) = 30+35 = 65 ] = 65
cost (4, {2, 3}, 1) = min [ dist[4,2]+Cost(2,{3},1) = 25+50 = 75,
dist[4,3]+Cost(3,{2},1) = 30+45 = 75 ] = 75
4) |S| = 3:
cost (1, {2, 3, 4}, 1) = min [ dist[1,2]+Cost(2,{3,4},1) = 10+70 = 80
dist[1,3]+Cost(3,{2,4},1) = 15+65 = 80
dist[1,4]+Cost(4,{2,3},1) = 20+75 = 95 ] = 80
So the optimal solution would be
1-2-4-3-1
1-3-4-2-1
56
we do not know the exact cost to reach city i to city 1 through other
cities.
3. Now, we need to start at 1 and complete the tour. We need to select the
next city in such a way-
4. cost(i, S, j)=min cost (i, S−{i}, j)+dist(i,j) where i∈S and i≠j
5. end
Complexity Analysis of TSP
Time Complexity: There are a total of 2N subproblems for each node. So the
total number of subproblems would be N*2 N. Each of the sub-problems
requires linear time to solve. If the origin node is not specified, we have to run
loops for N nodes. So the total time complexity for an optimal solution would
be the Number of nodes * Number of subproblems * time to solve each sub-
problem. The time complexity can be defined as O(N2 * 2^N).
Space Complexity: The dynamic programming approach uses memory to
store C(S, i), where S is a subset of the vertices set. There is a total of 2 N
subsets for each node. So, the space complexity is O(2^N).
57
Let’s say, we have to set up a system consisting of D1, D2, D3, …………, and
Dndevices, each device has some costs C1, C2, C3, …….., Cn. Each device has a
reliability of 0.9 then the entire system has reliability which is equal to the
product of the reliabilities of all devices i.e., πri= (0.9)4. (for 4 devices)
It means that 35% of the system has a chance to fail, due to the failure of any
one device. the problem is that we want to construct a system whose
reliability is maximum. Here in order to increase the reliability we can take
more than one copy of each device so that if one device fails, the copy of that
device works, which means we connect the copies of devices parallel.
The probability that device does not work well = 1 – r1 = 1 – 0.9 = 0.1
The probability that all three copies failed = ( 1-r 1 )3 = (0.1)3 = 0.001
The Probability that all three copies work properly = 1 – (1- r1)3 = 1- 0.001 =
0.999
We can see that the system with multiple copies of the same device parallel
may increase the reliability of the system.
Multiple copies of the same device type are connected in parallel through the
use of a switching circuit
58
We have to design a three-stage system with device types D1, D2, and D3.
The costs are $30, $15, and $20 respectively. The cost of the system is to be
no more than $105. The reliability of each device type is 0.9, 0.8, and 0.5
respectively.
Pi Ci ri
P1 30 0.9
P2 15 0.8
P3 20 0.5
Explanation:
Now, let us calculate how many copies of each device we can buy with $40, If
consume all $40 in device1 then we can buy 40/30 = 1 and 1 copy we have
already so overall 2 copies of device1. Now In general, we can have the
formula to calculate the upper bond of each device:
S0 = {(1,0)}
Device 1:
Each Si from Si-1 by trying out all possible values for ri and combining the
resulting tuples together.
let us consider P1 :
59
After combining both conditions of Stage1 i.e., with copy one and copy
of 2 devices respectively.
S1 = { ( 0.9, 30 ), ( 0.99, 60 ) }
Device 2:
S2 will contain all reliability and cost pairs that we will get by taking all
possible values for the stage2 in conjunction with possibilities calculated in S 1.
First of all we will check the reliability at stage2 when we have 1, 2, and 3 as a
copy of device. let us assume that Ei is the reliability of the particular stage
with n number of devices, so for S2 we first calculate:
P1 1 COPY P1 2 COPIES
0.9 30 0.99 60
Device 3:
First of all we will check the reliability at stage3 when we have 1, 2, and 3 as a
copy of device. Ei is the reliability of the particular stage with n number of
devices, so for S3 we first calculate:
(0.648, 100)}
UNIT - IV
Fractional Knapsack
The knapsack problem states that − given a set of items, holding weights and profit
values, one must determine the subset of the items to be added in a knapsack such
that, the total weight of the items must not exceed the limit of the knapsack and its
total profit value is maximum.
It is one of the most popular problems that take greedy approach to be solved. It is
called as the Fractional Knapsack Problem.
Knapsack Algorithm
The weights (Wi) and profit values (Pi) of the items to be added in the knapsack are
taken as an input for the fractional knapsack algorithm and the subset of the items
added in the knapsack without exceeding the limit and with maximum profit is
achieved as the output.
Algorithm
1. Consider all the items with their weights and profits mentioned
respectively.
2. Calculate Pi/Wi of all the items and sort the items in descending order
based on their Pi/Wi values.
3. Without exceeding the limit, add the items into the knapsack.
4. If the knapsack can still store some weight, but the weights of other items
exceed the limit, the fractional part of the next time can be added.
5. Hence, giving it the name fractional knapsack problem.
Examples
For the given set of items and the knapsack capacity of 10 kg, find the subset
of the items to be added in the knapsack such that the profit is maximum.
Items 1 2 3 4 5
Profits 10 15 10 12 8
Solution
Step 1
Given, n = 5
Wi = {3, 3, 2, 5, 1}
Pi = {10, 15, 10, 12, 8}
Items 1 2 3 4 5
Profits 10 15 10 20 8
Pi/Wi 3.3 5 5 4 8
Step 2
Items 5 2 3 4 1
Profits 8 15 10 20 10
Pi/Wi 8 5 5 4 3.3
Step 3
Without exceeding the knapsack capacity, insert the items in the knapsack with
maximum profit.
Knapsack = {5, 2, 3}
However, the knapsack can still hold 4 kg weight, but the next item having 5 kg
weight will exceed the capacity. Therefore, only 4 kg weight of the 5 kg will be
added in the knapsack.
Items 5 2 3 4 1
Profits 8 15 10 20 10
Knapsack 1 1 1 4/5 0
Jobs J4 J1 J3 J2 J5 J6
Deadlines 2 5 3 3 4 2
Step-02:
Value of maximum deadline = 5.
64
So, draw a Gantt chart with maximum time on Gantt chart = 5 units as
shown-
Now,
We take each job one by one in the order they appear in Step-01.
We place the job on Gantt chart as far as possible from 0.
Step-03:
We take job J4.
Since its deadline is 2, so we place it in the first empty cell before deadline 2
as-
Step-04:
We take job J1.
Since its deadline is 5, so we place it in the first empty cell before deadline 5
as-
Step-05:
We take job J3.
Since its deadline is 3, so we place it in the first empty cell before deadline 3
as-
Step-06:
We take job J2.
Since its deadline is 3, so we place it in the first empty cell before deadline 3.
Since the second and third cells are already filled, so we place job J2 in the
first cell as-
Step-07:
65
Now,
J2 , J4 , J3 , J5 , J1
This is the required order in which the jobs must be completed in order to
obtain the maximum profit.
Maximum earned profit
= Sum of profit of all the jobs in optimal schedule
= Profit of job J2 + Profit of job J4 + Profit of job J3 + Profit of job J5 + Profit of job J1
= 180 + 300 + 190 + 120 + 200
= 990 units
Time and space Complexity:
The time complexity for the above job scheduling algorithm is O(n 2) in the
worst case when we will look for all the slots in the Gantt chart for a given job
id and on average, n jobs search n/2 slots. So, this would also take O(n 2) time
approximately O(n2)time in the best and average cases. Space complexity is
O(n) as extra space is used in the above implementation.
66
Spanning Tree
Given an undirected and connected graph G=(V,E) , a spanning tree of the graph G
is a tree that spans G (that is, it includes every vertex of G) and is a sub graph of G
(every edge in the tree belongs to G)
Minimum cost Spanning Tree
The cost of the spanning tree is the sum of the weights of all the edges in the tree.
There can be many spanning trees. Minimum spanning tree is the spanning tree
where the cost is minimum among all the spanning trees. There also can be many
minimum spanning trees.
There are two famous algorithms for finding the Minimum Spanning Tree:
Kruskal’s Algorithm
Kruskal’s Algorithm builds the spanning tree by adding edges one by one into a
growing spanning tree. Kruskal's algorithm follows greedy approach as in each
iteration it finds an edge which has least weight and add it to the growing spanning
tree.
Algorithm Steps:
Sort the graph edges with respect to their weights.
Start adding edges to the MST from the edge with the smallest weight until
the edge of the largest weight.
Only add edges which doesn't form a cycle , edges which connect only
disconnected components.
Consider following example:
67
Step Description
4
68
Pseudo code
69
Prim’s Algorithm
Prim’s Algorithm also use Greedy approach to find the minimum spanning tree. In
Prim’s Algorithm we grow the spanning tree from a starting position. Unlike an
edge in Kruskal's, we add vertex to the growing spanning tree in Prim's.
Algorithm Steps:
Maintain two disjoint sets of vertices. One containing vertices that are in the
growing spanning tree and other that are not in the growing spanning tree.
Select the cheapest vertex that is connected to the growing spanning tree
and is not in the growing spanning tree and add it into the growing spanning
tree. This can be done using Priority Queues. Insert the vertices, that are
connected to growing spanning tree, into the Priority Queue.
Check for cycles. To do that, mark the nodes which have been already
selected and insert only those nodes in the Priority Queue that are not
marked.
Consider the example below:
Step Description
1
70
4
71
Pseudo code
72
73
A —> E 3 A —> E
Example:
For instance, consider the following graph. We will start with vertex A, So vertex A
has a distance 0, and the remaining vertices have an undefined (infinite) distance
from the source. Let S be the set of vertices whose shortest path distances from the
source are to be calculated.
We start from source vertex A and start relaxing A'sneighbors. Since vertex B can
be reached from a direct edge from vertex A, update its distance to 10 (weight of
edge A–B). Similarly, we can reach vertex E through a direct edge from A, so we
update its distance from INFINITY to 3.
After processing all outgoing edges of A, we next consider a vertex having minimum
distance. B has a distance of 10, E has distance 3, and all remaining vertices have
distance INFINITY. So, we choose E and push it into set S. Now our set becomes S =
{A, E}. Next, we relax with E'sneighbors. E has 3neighborsB,Cand D. We have
already found one route to vertex B through vertex A having cost 10. But if we visit
a vertex B through vertex E, we are getting an even cheaper route, i.e., (cost of
edge A–E + cost of edge E–B) = 3 + 1 = 4 < 10 (cost of edge A–B).
We repeat the process till we have processed all the vertices, i.e., Set S becomes
full.
75
76
dist[source] = 0 // Initialization
create vertex set Q
Dijkstra’s algorithm runs in O(E.log(V)) time like Prim’s algorithm. Here, E is the
total number of edges, and V is the graph’s number of vertices.
77
Here, we are not going to relax ‘C’. Because, it is already found that 3 is the smallest path.
If we relax it is giving 2.
If the negative weight edge=-6 the path from D B will be ‘0’ it is better than ‘1’ at B.
Hence, it is clear that Dijkstra’s algorithm will not find shortest path when there is negative
weight edge cycles
78
UNIT –V
Travelling Salesman Problem - using Branch and Bound
Travelling Salesman Problem (TSP) is an interesting problem. Problem is defined as
“given n cities and distance between each pair of cities, find out the path which
visits each city exactly once and come back to starting city, with the constraint of
minimizing the travelling distance.”
TSP has many practical applications. It is used in network design, and transportation
route design. The objective is to minimize the distance.
Consider directed weighted graph G = (V, E, W), where node represents cities and
weighted directed edges represents direction and distance between two cities.
1. Initially, graph is represented by cost matrix C.
2. Convert cost matrix to reduced matrix by subtracting minimum values from
appropriate rows and columns, such that each row and column contains at
least one zero entry.
3. Find cost of reduced matrix. Cost is given by summation of subtracted
amount from the cost matrix to convert it in to reduce matrix.
4. Prepare state space tree for the reduce matrix
5. Find least cost valued node A (i.e. E-node), by computing reduced cost node
matrix with every remaining node.
6. If <i, j> edge is to be included, then do following :
a. Set all values in row i and all values in column j of A to ∞
b. Set A[j, 1] = ∞
c. Reduce A again, except rows and columns having all ∞ entries.
7. Compute the cost of newly created reduced matrix as,
8. Cost = L + Cost(i, j) + r
9. Where, L is cost of original reduced cost matrix and r is cost computed
10. If all nodes are not visited then go to step 4.
Row Reduction:
Matrix M is called reduced matrix if each of its row and column has at least one zero
entry or entire row or entire column has ∞ value. Let M represents the distance
matrix of 5 cities. M can be reduced as follow:
Find the minimum element from each row and subtract it from each cell of matrix.
Row reduction cost is the summation of all the values subtracted from each rows:
Column reduction:
Matrix MRowRed is row reduced but not the column reduced. Matrix is called column
reduced if each of its column has at least one zero entry or all ∞ entries.
To reduced above matrix, we will find the minimum element from each column and
subtract it from each cell of matrix.
80
Each row and column of M ColRed has at least one zero entry, so this matrix is reduced
matrix, let us call the matrix as M1
= (10 + 2 + 2 + 3 + 4) + (1 + 3) = 25
This means all tours in graph has length at least 25. This is the optimal cost of the
path.
M2 is already reduced.
Cost of node 2 :
= 25 + 0 + 10 = 35
Set M1 [3][1] = ∞
Cost of node 3:
= 25 + 11 + 17 = 53
Set M1 [4][1] = ∞
Cost of node 4:
= 25 + 0 + 0 = 25
Cost of node 5:
= 25 + 5 + 1 = 31
Node 4 has minimum cost for path 1-4. We can go to vertex 2, 3 or 5. Let’s explore
all three nodes.
Cost of node 6:
= 25 + 0 + 3 = 28
Set M [3][1] = ∞
Cost of node 7:
= 25 + 2 + 11 + 12 = 50
Matrix M8 is reduced.
Cost of node 8:
= 25 + 11 + 0 = 36
Path 1-4-2 leads to minimum cost. Let’s find the cost for two possible paths.
86
= M6 [][3] = ∞
Set M6 [3][1] = ∞
Cost of node 9:
= 28 + 11 + 2 + 11 = 52
Set M6 [5][1] = ∞
= 28 + 0 + 0 = 28
= 28 + 0 + 0 = 28
LCBB for Knapsack (least cost branch and bound for knapsack)
LC branch and bound solution for knapsack problem is derived as follows:
1. Derive state space tree.
2. Compute lower bound c^ and upper bound (u)
for each node in state space tree.
3. If lower bound is greater than upper bound than kill that node.
4. Else select node with minimum lower bound as E-node.
5. Repeat step 3 and 4 until all nodes are examined.
6. The node with minimum lower bound value c^is the answer node.
7. Trace the path from leaf to root in the backward direction to find the solution
tuple.
Example:
Consider an instance of knapsack using LCBB for knapsack capacity
M = 15.
i 1 2 3 4
Pi 10 10 12 18
Wi 2 4 6 9
c^(1)=−32–15–(2+4+6)/9∗18
= -38
State-space tree:
When there is tie, we can select any node. Let’s make node 7 E-node.
Node 8 : Inclusion of item 4 at node 7
u(8) = -(10 + 10 + 18) = – 38
c^(8)=−38–15–(2+4+9)/6∗12
= -38
Node 9 : Exclusion of item 4 at node 7
This node excludes both the items, 3 and 4. We can add only item 1 and 2.
u(9) = -(10 + 10) = – 20
c^=−20–15–(2+4)0∗0 (no fraction)
= -20
At last level, node 8 has minimum c^, so node 8 would be the answer node. By
tracing from node 8 to root, we get item 4, 2 and 1 in the knapsack.
Solution vector X = {x1, x2, x4} and profit = p1 + p2 + p4 = 38
93
Time Complexity: O(N), as only one path through the tree will have to be traversed in the
beat case and its worst time complexity is still given as O(2 N) .
94
Upper = u(1) = – 32
C^(1)=−32–15–(2+4+6)/9∗18
= -38
C^=−32–15–(2+4+6)/9∗18
= -38
c^(3)=−22–15–(4+6)/9∗18
= -32
95
C^ =−32–15–(2+4+6)/9∗18
= -38
C^(5)=−22–15–(2+6)/9∗18
= -36
C^(6)=−22–15–(4+6)/9∗18
= -32
C^(7)=−30–0
= -30
96
Out of node 4, 5, 6 and 7, C^(7) > upper, so kill node 7. Remaining 3 nodes are
alive and added in list. Out of 4, 5 and 6, node 4 has minimum c^ value, so it
becomes next E-node.
c^(8)=−32–15–(2+4+6)/9∗18
= -38
c^(9)=−38–0
= -38
C^(5)> upper and c^(6)> upper, so kill them. If we continue in this way, final state
space tree looks as follow :
97
profit = 10 + 10 + 18 = 38
98
Types of Problems:
Decidable problems
Undecidable problems.
Decidable Problems:
Tractable
Intractable
Undecidable Problems:
Example : We don’t know whether Turing machine halts or not for some
given input.
99
Classification of Problems
Definitions
An optimization problem aims for the best solution from the set of all
feasible solutions. An algorithm which solves optimization problem is
called optimization algorithms.
The optimization algorithm seeks to find the best profit or least cost
solution. Decision problems are simpler than optimization problems.
Problems which takes practically unacceptable time i.e. very long time to
be solved are called intractable problems.
Reducibility : Let P1 and P2 are two problems and there exists some
deterministic algorithm A for P1 such that P1 can be solved in polynomial
time using A. If the same algorithm can solve the problem P 2 then we can
say that P2 is reducible to P1.
P and NP Problems
P and NP Problems are the class of problems. Problems are grouped based
on their complexity.
P-class Problems
NP-Class of Problems
Examples of NP problems
Sr.
P Problems NP Problems
No.
P problems are set of
NP problems are problems which can
problems which can be solved
1. be solved in nondeterministic
in polynomial time by
polynomial time.
deterministic algorithms.
The solution to NP problems cannot be
P Problems can be solved and obtained in polynomial time, but if the
2.
verified in polynomial time solution is given, it can be verified in
polynomial time.
P problems are a subset of NP NP Problems are a superset of P
3.
problems problems
All P problems are All the NP problems are non-
4.
deterministic in nature deterministic in nature
Example: Selection sort,
5. Example: TSP, Knapsack problem
Linear search
Deterministic Algorithms
Non-deterministic Algorithms
NP-Complete Problems
104
These two facts prove that NP-complete problems are the harder
problems in class NP. They are often referred to as NPC.
If any NP-complete problem belongs to class P, then
P = NP. However, a solution to any NP-complete problem can be
verified in polynomial time, but cannot be obtained in polynomial
time.
NP-complete problems are often solved using randomization
algorithms, heuristic approaches or approximation algorithms.
Some of the well-known NP-complete problems are listed here :
o Boolean satisfiability problem.
o Knapsack problem.
o Hamiltonian path problem.
o Travelling salesman problem.
o Subset sum problem.
o Vertex covers the problem.
o Graph colouring problem.
o Clique problem.
NP-Hard Problem
There does not exist any known algorithm which can decide the
answer for any given input in polynomial time. So halting problem is
an NP-hard problem.
Different mathematicians have given different relationships
considering the possibilities of P = NP and P ≠ NP.
The relationship between P, NP, NP-Complete and NP-hard is
described in below Fig. assuming that P ≠ NP. This is a widely
accepted relationship among these complexity classes.
Subset sum problem and travelling salesman problem are NPC and
also belong to NP-hard. There are certain problems which belong to
NP-hard but they are not NP-complete. A well-known example of an
NP-hard problem is the Halting problem.
The halting problem is stated as, “Given the algorithm and set of
inputs, will it run forever ?” The answer to this question is Yes or No,
so this is a decision problem.
There does not exist any known algorithm which can decide the
answer for any given input in polynomial time. So halting problem is
an NP-hard problem.
Also, the Boolean satisfiability problem, which is in NPC, can be
reduced to an instance of the halting problem in polynomial time by
transforming it to the description of a Turing machine that tries all
truth value assignments. The Turing machine halts when it finds
such an assignment, otherwise, it goes into an infinite loop.
# NP-Complete NP-Hard
# NP-Complete NP-Hard
Cook’s theorem
In computational complexity theory, the Cook–Levin theorem, also known
as Cook’s theorem, states that the Boolean satisfiability problem is NP-
complete. That is, it is in NP, and any problem in NP can be reduced in
polynomial time by a deterministic Turing machine to the Boolean
satisfiability problem.
hence, anSAT is a significant problem and can be stated as follows:
Given a boolean expression F having n variables x1,x2,….,xn, and Boolean
operators, is it possible to have an assignment for variables true or false
such that binary expression F is true?
Boolean variable: A variable, say x, that can have only two values,
true or false, is called a boolean variable
Literal: A literal can be a logical variable, say x, or the negation of
it, that is x or x̄; x is called a positive literal, and x̄ is called the
negative literal
Clause: A sequence of variables(x1,x2,….,xn) that can be separated
by a logical OR operator is called a clause. For example, (x1 V x2 V
x3) is a clause of three literals.
Expressions: One can combine all the preceding clauses using a
Boolean operator to form an expression.
CNF form: An expression is in CNF form(conjunctive normal form) if
the set of clauses are separated by an AND (^), operator, while the
literals are connected by an OR (v) operator. The following is an