DAA Handouts Feb12
DAA Handouts Feb12
Tapas Pandit
Contents
1 Introduction 2
1.1 Efficiency of Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Run-Time of Linear Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Run-Time of Binary Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Asymptotic Notations 9
1
1 Introduction
The very first question is, “What is an algorithm?”. You can find several answers to this question, such as
“a set of rules that must be followed when solving a particular problem” or “a set of steps used to complete
a specific task” or “a step-by-step procedure used to solve a problem”. However, none of them is a correct
answer. An intuitive answer is that “an algorithm is a finite sequence of elementary operations with
the objective of performing some computational task”.
1. How elementary is ‘elementary‘? The elementary operations that we consider will be at a higher
level and include arithmetic and logical operations.
2. Finiteness ensures that an algorithm must stop.
Algorithm 1: Compute (a + b) ∗ (c + d)
Input: Four integer values a, b, c, and d.
Output: Value of (a + b) ∗ (c + d).
1 t1 = a + b;
2 t2 = c + d;
3 t3 = t1 ∗ t2 ;
4 return t3
Sequential Vs. Parallel Execution. We emphasize the sequential nature of the procedure, where the
order of the steps is crucial. Any permutation of this sequence does not guarantee the desired output. For
example, if we perform step 3 first, then steps 1 and 2, the output will be erroneous. However, sometimes
different orderings of the steps give the desired output. For example, consider the following orderings.
1. t1 = a + b; 2. t2 = c + d; 3. t1 ∗ t2 ;
1. t2 = c + d; 2. t1 = a + b; 3. t1 ∗ t2 ;
Note that t1 = a + b and t2 = c + d can be executed independently to each other. Let us recall two well
known computing units as follows.
Throughout this course, we will only consider sequential executions and algorithms.
Input and Output. An input may consist of several elements. For example, in Algorithm 1, inputs are
a, b, c and d. Similarly, output of an algorithm may consist of several elements. Algorithm 1 outputs a
single value. Note that inputs and outputs of an algorithm B are related to some function f : Dom → Rang,
where Dom and Rang are input space and output space of the algorithm B. In other words, every algorithm
is associated with some function. When the underlying Rang is {TRUE, FALSE} or {YES, NO} or {1, 0},
the corresponding algorithm is said to solve a decision problem. One such decision problem is the element
search problem as follows.
2
Element Search
Input: A list L of integer values and a search key x.
Question: Does x ∈ L?
Output: YES if x ∈ L, else NO.
Note that classifiers in the area of machine learning solve classification problems which are also decision
problems.
Remark 1.1. Looks like solving decision problems is easy than non-decision problems. However, there
are (uncountably) many problems for which no algorithm is possible thanks to Turing model of com-
putation. We also see some of the well-known decision problems (for which some algorithms exist) in the
module of complexity theory.
When an algorithm is said to be efficient? At high level, efficiency requires nominal resource. Here
resource means the time of execution and the space required by the algorithm.
1. Time: The number steps required by the algorithm to produce its output.
2. Space: The number of temporary variables used by the algorithm. For example, in Algorithm 1 the
temporary variables are t1 , t2 and t3 and thus, space required by the algorithm is 3.
• In-place Algorithm: It is an algorithm that performs its task without requiring additional
memory space proportional to the input size.
• In other words, the algorithm modifies the input data structure itself, utilizing only a constant
amount of extra memory regardless of the input size.
• So, Algorithm 1 is an in-place algorithm.
3. Other resources: For example, power consumption is important for battery operated devices.
Input’s Size. Typically the time taken by an algorithm depends on its input’s size. For example, in case
of the element search problem, if the size of the list L gets increased, the underlying algorithm takes more
time. Thus, the input’s size is an important parameter for deciding algorithm’s efficiency. Note that the
set of all possible inputs of an algorithm is typically infinite. In general, the size of an input can be treated
as a function from the set of all possible inputs to N. Note that fixing a positive integer n essentially fixes
the set of all inputs of size n and this is a typically a finite set. Moreover, the size of an input depends
on the algorithm or the underlying problem statement or domain of the underlying function. In fact,
• for arithmetic problem (solved in Algorithm 1), the input size is the number of bits required to
write down the values of a, b, c and d.
Exercise 1.2. What are the input size functions and elementary operations for the following problems?
3
1. Polynomial Evaluation: Given an integer a ∈ N and a polynomial P (x) of degree n defined over
N, compute P (a).
2. Polynomial Multiplication: Given two polynomials P1 (x) and P2 (x) of degree n defined over N,
compute P1 (x) · P2 (x).
Time complexity of an algorithm. Let t(n) denote the number of steps (i.e., run-time) required by
an algorithm B on an input of size n. As mentioned earlier, the run-time of an algorithm typically depends
on its input’s size. Can we define a run-time function for the algorithm from set of all possible input sizes
to N? The answer is no, because the number of steps can be different1 for two different inputs of size n.
However, we can tackle this issue in the following ways.
1. Worst-case time complexity. In this case, t(n) is defined to the maximum of the different number
of steps that the algorithm B requires for different inputs of size n. More formally,
where Domn is the set of all possible inputs of size n and time(B(x)) denotes the run-time of B on
input x. Note that worst-case time complexity may not present a proper picture of the performance.
It may take a rather long time only for a few inputs of size n, for example, quicksort (see later).
Thus, labeling such an algorithm as inefficient is inappropriate.
2. Average-case time complexity. In this case, t(n) captures average time over all inputs of size n.
More formally, it is defined as follows. For each n, the set of all inputs of size n is assumed to be
finite and a uniform distribution is defined on this set. Thus, the time function t(n) becomes a
random variable T (n). Therefore, the average-case time complexity is defined to be E[T (n)] (so, a
function of n).
For some algorithms, the average-case complexity can be more informative than the worst-case com-
plexity, especially if the worst-case scenario is rare or improbable in practical usage. It provides a
better understanding of how an algorithm is expected to perform under typical conditions. However,
determining the average-case complexity often requires assumptions about the input distribution,
making it a more theoretical measure compared to the worst-case complexity.
Analogously, one can also formulate the worst-case and average-case space complexity required by an
algorithm.
Example 1.3 (Arithmetic Problem). Here elementary basic operations are addition and multiplica-
tion. So, Algorithm 1 requires to compute two additions and one multiplication. However, the run-
time depends on the size of integers a, b, c and d. Also note that sizes of a, b, c and d can vary. Let
n = Max{dlog2 ae, dlog2 be, dlog2 ce, dlog2 de}. Adding two n-bit integers takes time proportional to n and
multiplying two n-bit integers takes time proportional to nlog2 3 . So, the time complexity is proportional
to nlog2 3 , where n is value of size function.
1
Can you give some examples?
4
1.2 Run-Time of Linear Search
Let us consider a slightly modified version of the element search problem as stated below.
Element Search
Input: A list L of integer values and a search key x.
Question: Does x ∈ L?
Output: An index of x if x ∈ L, else NO.
2. Time complexity: It is defined to be the number of elementary operations, that is, the number
of comparisons of the form: ‘L[i] = x’. Additionally, other operations, such as accessing an element
from L, are proportional to the number of comparisons. Note that for an unsuccessful search, the
number of comparisons is n, whereas for a successful search, the number of comparisons varies from
1 to n. Thus, the worst-case complexity of Algorithm 2 is n. We can say that the number of steps
in the worst case is c1 · n for some constant c1 .
For calculating average-case time complexity, we fix n, that is, meaning inputs will come from
Domn . Note that the following two cases always occur.
• Successful search. In this case, x ∈ L. Let us call this event as succ. Assign 1/n (uniform)
probability to each possibilities of the number of comparisons (why), that is, Pr [T (n) = i|succ] =
1/n for all i = 1, . . . , n.
• Unsuccessful search. In this case, x 6∈ L. Let us call this event as unsucc. Then we have,
Pr [T (n) = n|unsucc] = 1.
5
Does the linear search algorithm perform better, if L is sorted in ascending order? No. Basically, the
linear search does not exploit the information that the list is ‘sorted’. Can we have a better element search
algorithm for the sorted list L? Yes, binary search!
We now solve the element search problem, where the list L is sorted, using the binary search algorithm.
The binary search algorithm is studied mainly in two ways: recursive and iterative approaches. Here, we
consider only the recursive approach of the binary search algorithm, while the iterative version is given as
an exercise. Moreover, in either approach, some variants are available. We first discuss the commonly
found version in Algorithm 3.
Algorithm 3: BinarySearch(L, n, x)
Input: A sorted list L of n integer values and a search key x
Output: Position (an index of x if x ∈ L, else 0.)
1 return Find(x, 1, n) ; // Function call to Algorithm 4.
Time complexity. After first search through the function call (Algorithm 4), either the search key x is
found, or the size of the search space is reduced by half, i.e., n/2. Similarly, after each subsequent search
the size of the list becomes n/22 , n/23 , . . . , (refer to Figure 1).
6
Figure 1: An illustration of binary search (Courtesy: Wikipedia)
Therefore, when the list L is sorted, binary search yields far better performance than linear search. We
now consider a tweaked version of the binary search algorithm, specifically, the function Find, which is
provided in 5.
Can you find any difference between Find-4 and Find-5? The first one involves two comparisons, whereas
the second one involves a single comparison. Thus, the latter version is faster than the former version.
One can easily verify that the worst-case complexity of the tweaked version of binary search (associated
with Find-5) is 2dlog2 ne comparisons.
Exercise 1.5. What is the average/best-case time complexity of the tweaked version of binary search?
Question. Can the element search be improved further? Yes! The interpolation search algorithm
improves search performance, but it assumes a certain distribution of inputs.
Comparing Algorithms. Let us Π be a problem statement. Then, how to choose the best possible
algorithm for Π? For the same problem Π, many algorithms may be possible. One of the key constraints
for choosing the best possible algorithm out of several existing algorithms for Π is to compare their time
7
complexities. More generally, one can ask for the best possible algorithm to solve Π or to show that Π
cannot be solved efficiently. Answering such questions form the motivation for the rich area of design and
analysis of algorithms (DAA).
8
2 Asymptotic Notations
We now introduce a methodology, known as the asymptotic approach, for predicting the approximate
running time of algorithms. This approach helps in comparing the performance of different algorithms by
ignoring constant factors, allowing us to analyze algorithms’ behavior as the input size tends to infinity.
Let us consider two algorithms, A and B, that solve the same problem, requiring 100n and 2n2 + 50
steps respectively, where n is the input size. If we ignore the constants 100, 2 and 50, we say the time
taken by A and B are approximately equal to n and n2 respectively. Since n2 is larger than n, we say that
the algorithm B is slower than the algorithm A, even though for n = 5, A takes 500 steps while B takes
100 steps. Of-course, this approximation is valid, but requires n to be large enough. One can easily verify
that when n ≥ 50, the algorithm B is indeed slower than A.
Suppose there is another algorithm C that takes 100n1.8 steps for the same problem. Since n2 is bigger
than n1.8 , the algorithm C is faster than that of B. However, to make 2n2 + 50 greater than 100n1.8 , it
requires n to be at least 3 × 108 . The asymptotic approach can be sometimes misleading as it absorbs
the information about constants. Fortunately, most algorithms have small constants, thus the asymptotic
approach works well in practice. Essentially, we will study the five well known asymptotic notations: big-O,
little-o, big-Ω little-ω and Θ. Informally speaking, the asymptotic notations O, Ω and Θ correspond to
“≤”, “≥” and “=” respectively. On the other hand, o and ω correspond to “<” and “>” respectively.
9
1. 2n2 + n = O(n2 ) as 2n2 + n ≤ 3n2 for all n ≥ 1.
The notation O(1) denotes a constant. Note that one can keep constant terms within O-notation, but
this does not make sense as both are essentially the same. For example, we would write O(n2 ) instead of
O(2n2 + 10). Indeed, one can easily verify that O(n2 ) = O(2n2 + 10) (exercise).
Example 2.3. For both, worst-case and average-case, the time complexities of the linear search and the
binary search are O(n) and O(log2 n) respectively.
The following theorem helps in distinguishing the growth rate of exponential functions and polynomial
functions.
Theorem 2.1. For all constants c > 0 and a > 1, and for all monotonically growing functions f (n),
In other words, an exponential function grows faster than does a polynomial function.
Proof. Exercise!
The above theorem can be used to compare many functions. For example see the corollary below.
Proof. Follows from Theorem 2.1 by substituting f (n) = n and f (n) = loga n respectively.
We now consider the following lemma, which helps in simplifying arithmetic expressions over O notation.
Lemma 2.3. Let f (n) = O(s(n)) and g(n) = O(r(n)). Then, the following equalities hold.
Proof. Exercise!
10
time 1 time 2 time 3 time 4
running times
1000 steps/sec 2000 steps/sec 4000 steps/sec 8000 steps/sec
log2 n 0.010 0.005 0.003 0.001
n 1 0.5 0.25 0.125
n log2 n 10 5 2.5 1.25
n1.25 32 16 8 4
n2 1,000 500 250 125
n3 1,000,000 500,000 250,000 125,000
1.1n 1039 1039 1038 1038
Table 1: Running times (in seconds) under different assumptions (n = 1000).
Example 2.4. Suppose f (n) = 3n2 + 5, g(n) = 4(log2 n)2 + 7, s(n) = n2 , and r(n) = (log2 n)2 . Then,
f (n) = O(s(n)) and g(n) = O(r(n)). Then by above lemma, we have f (n) + g(n) = 3n2 + 4(log2 n)2 + 12 ∈
O(s(n) + r(n)) = O(n2 + (log2 n)2 ) = O(n2 ).
The above lemma ensures addition and multiplication. However, it cannot be extended to include
subtraction and division. More precisely, when f (n) = O(s(n)) and g(n) = O(r(n)), the following are
not true in general.
2. f (n)/g(n) = O(s(n)/r(n)).
Exercise 2.5. Give an example of f and g for which the above identities are not true.
For better understanding of asymptotic behavior, we illustrate the time (in seconds) for various running-
time functions in the first column, across four different machines with different speeds, as shown in columns
2nd to 5th of Table 1.
Some commonly used running-time functions are listed below.
1. Constant: O(1).
4. Linear: O(n).
5. Quadratic: O(n2 ).
6. Exponential: O(2n ).
2. Show that O(nc1 ) ⊆ O(nc2 ) for any positive constants c1 and c2 with c1 ≤ c2 .
11
3. Show that O(n) ⊆ O(2n ) and O(log2 n) ⊆ O(n).
Definition 2.3 (Big-Ω notation). Let f, g : N → R+ be functions. We say that f (n) ∈ Ω(g(n)), if ∃ M > 0
and n0 ∈ N such that
f (n) ≥ M · g(n) for all n ≥ n0 .
Remark 2.7. We read Ω(g) as big-omega of g which is defined as a set and all its members are bounded
below by g. Similar to O notation, we would write f (n) = Ω(g(n)) for f (n) ∈ Ω(g(n)) and n0 does not
depend on M .
Example 2.8. Here are some examples of big-Ω notation.
3. n = Ω(n1−c ) for 0 ≤ c ≤ 1.
Exercise 2.9. Suppose f (n) = O(g(n)). What can be said about g(n) = Ω(f (n))?
Lemma 2.4. Let f (n) = Ω(s(n)) and g(n) = Ω(r(n)). Then, the following equalities hold.
Proof. Exercise!
Exercise 2.10. Suppose f (n) = Ω(s(n)) and g(n) = Ω(r(n)). Then, show that the following results do
not hold in general.
2. f (n)/g(n) = Ω(s(n)/r(n)).
Definition 2.4 (Θ-notation). Let f, g : N → R+ be functions. We say that f (n) = Θ(g(n)), if f (n) =
O(g(n)) and f (n) = Ω(g(n)).
1. 5n3 + 2n2 + 5 = Θ(n3 ), 106 log2 n = Θ(log2 n) and n log2 n − 106 n = Θ(n log2 n).
12
Definition 2.5 (Little-o notation). Let f, g : N → R+ be functions. We say that f (n) ∈ o(g(n)), if for all
M > 0 there exists n0 ∈ N such that
n
1. n = o(n2 ) as lim 2 = 0.
n→∞ n
n1/3
2. n1/3 = o(n) as lim = 0.
n→∞ n
The following theorem essentially strengthens Theorem 2.1 by replacing big-O with little-o
Theorem 2.5. For all constants c > 0 and a > 1, and for all monotonically growing functions f (n),
In other words, an exponential function grows faster than does a polynomial function.
Proof. Exercise!
Proof. Follows from Theorem 2.5 by substituting f (n) = n and f (n) = loga n respectively.
Exercise 2.13. What can be said about arithmetic over little-o as done in Lemma 2.3?
Definition 2.6 (Little-ω notation). Let f, g : N → R+ be functions. We say that f (n) ∈ ω(g(n)), if for
all M > 0 there exists n0 ∈ N such that
1. n2 = ω(n)
13
n
3. But 106 n2 + 106 n + 12 6= ω(n2 ) and log2 n 6= ω(n).
1. What can be said about the worst-case and average-case time complexities of the linear search and
binary search algorithm with respect to Θ-notation?
√
2. Let f1 (n) = n1/3 , f2 (n) = n and f3 (n) = log2 n. What can be said about the relationship between
fi and fj (where i, j ∈ {1, 2, 3} and i 6= j) with respect to the asymptotic notations O, o, Ω and ω?
3. Let f (n) = n and g(n) = log2 n. Establish the relationship between f (n) and g(n) with respect to
the asymptotic notations O, o, Ω and ω.
4. Let f (n) = n and g(n) = (log2 n)c for some constant c > 1. Establish the relationship between f (n)
and g(n) with respect to the asymptotic notations O, o, Ω and ω.
5. Compare the function n!, 2n and nn with respect to the asymptotic notations O, o, Ω and ω.
14
3 Algorithms Based on Induction
We have seen that mathematical induction is a powerful tool for proving mathematical statements that
depend on integers. In this section, we adopt mathematical induction as an approach to algorithm design.
While some problems require solutions to be developed from scratch, many others do not. For some
problems coming under the latter category, it is sufficient to address the following two constraints.
1. It is possible to solve a small instance of the problem (known as the base case).
2. A solution to every problem can be constructed from the solutions of smaller problems (known as
the inductive step).
The above approach is simply an analogy of the mathematical induction that we have already studied. By
employing this induction algorithm design technique, we explore a few algorithms.
Here we are interested in efficiently evaluating the value of a given polynomial at a given point. The
problem statement is formally defined below.
Polynomial Evaluation
Input: A sequence of real numbers an , an−1 , . . . , a1 , a0 and a real number x
Question: Compute Pn (x) = an xn + an−1 xn−1 + · · · + a1 x + a0
Output: A real number
We will go through several attempts using induction until we find an efficient solution, and the method
by which we achieve ultimate efficiency is called Horner’s rule.
1. First Attempt: Here, we reduce the given problem by removing an , leaving us with a smaller
problem: evaluating the polynomial
Hence, we can solve it by induction with the induction hypothesis stated below.
We can now use the induction hypothesis to solve the problem with the following inductive step:
Pk (x) = ak xk + Pk−1 (x) for any k ∈ [n]. The given problem is reduced to smaller problems as shown
15
below, and we can observe that ultimately we reach the base case (trivial): P0 (x) = a0 .
This method of solving the problem is known as top-down approach, which involves recursion as a
key technique. A bottom-up approach is also possible, where we start with a constant polynomial
and each time compute a polynomial of one degree higher, known as an iterative approach.
Time complexity: It requires n + (n − 1) + (n − 2) + · · · + 1 = n(n + 1)/2 multiplications and n
additions.
2. Second Attempt: The drawback of the above attempt is that we need to compute xn , xn−1 , . . . , x2 ,
which seems unnecessary. What if we assume both, xn−1 and Pn−1 (x) as part of the induction
hypothesis? Of-course, then, we can see a significant improvement. The induction we have used so
far is basically weak induction; our assumption was minimal. So, now we will consider a stronger
induction hypothesis, which is given below.
Note that with information about xk−1 and Pk−1 (x), we can compute xk and Pk (x) using only two
multiplications and one addition. In fact, the underlying inductive step is as follows: xk = x · xk−1
and Pk (x) = ak xk + Pk−1 (x) for any k ∈ [n].
Time complexity: It requires 2n multiplications and n additions.
Remark 3.1. It is efficient, simple and easy to implement. However, we can find a better algorithm,
which is given next.
3. Final Attempt: We show that using Horner’s rule, we can compute Pn (x) utilizing only n mul-
tiplications and n additions. In this approach, we will remove a0 , instead of an , then we will have
polynomial
0
x · Pn−1 (x) = an xn + an−1 xn−1 + · · · + a1 x,
0
where Pn−1 (x) = an xn−1 + an−1 xn−2 + · · · + a2 x + a1 is the reduced polynomial (or problem) of
degree (n − 1), and the leading and constant terms are an and a1 respectively. Here we consider a
induction hypothesis, which proceeds in reverse order stated as follows.
16
Induction hypothesis (reverse order): We know how to evaluate
the polynomial represented by the coefficients an , an−1 , . . . , a1 at the
0
given point x, that is, we know how to compute Pn−1 (x).
0
The induction step works as follows: Pk (x) = x · Pk−1 (x) + an−k for all k ∈ [n]. The induction can
be well understood from the following expression of Pn (x) thanks to Horner.
The highlighted 0-degree polynomial (in blue) essentially represents the base case. A detailed reduc-
tion of the problem (using a top-down approach) is shown below.
an xn + an−1 xn−1 + · · · + a1 x + a0
| {z }
⇓
x( a x + an−1 xn−1 + · · · + a1
n
) + a0
|n {z }
⇓
n−1
x( an x + an−1 xn−2 + · · · + a2 ) + a1
| {z }
⇓
n−2
x( an x + an−1 xn−3 + · · · + a3 ) + a2
| {z }
⇓
..
.
⇓
x( an x + an−1 ) + an−2
| {z }
⇓
x( an ) + an−1
|{z}
A complete illustration of the algorithm, but through iterative approach (i.e., bottom-up), is provided
in Algorithm 6.
Algorithm 6: Compute Pn (x)
Input: Coefficients of an n degree polynomial, that is, a sequence of real numbers an , an−1 , . . . , a1 , a0 , and
a real number x
Output: an xn + an−1 xn−1 + · · · + a1 x + a0
1 P = an ; // P is a temporary variable
2 for i = 1 to n do
3 P = x · P + an−i ;
4 end
5 return P ;
17
Remark 3.2. As we have seen, algorithm designs based on induction are always associated with either
recursion, iteration, or both.
Exercise 3.3. Write a complete algorithm based on recursion utilizing Horner’s rule.
Exercise 3.4. Demonstrate how the following problems can be analyzed using induction.
Here, we are interested in finding a maximum size induced subgraph H of a given graph G such that
all the vertices of H have a degree at least k, where G and k > 0 are part of the given problem. First,
we provide the definition of an induced subgraph, and then we formally define the problem statement
regarding maximal induced subgraphs.
Definition 3.1 (Induced subgraph). Let G = (V, E) be a graph and U ⊆ V . Then, H = (U, F ) is a
subgraph of G induced by U , if F = {(u, v) ∈ E : u, v ∈ U }.
Example 3.5. Let us denote the graphs (a), (b), (c) and (d) in Figure 3 by G = (V, E), H1 = (V1 , F1 ),
H2 = (V2 , F2 ) and H3 = (V3 , F3 ) respectively. Then, H1 and H2 are subgraphs of G induced by V1 and
V2 , whereas the subgraph H3 is not an induced subgraph of G.
v1 v2 v1 v2 v2 v2
3 v5 3 3 v5 3 v5
v3 v4 v3 v4 v4 v4
18
v2 v7 v2 v7
v1 v3 v5 v6 v8 v1 v3 v5 v6 v8
v4 v9 v4 v9
Figure 4: Illustration of the problem statement: The left hand side graph is the input along with k = 3
and the right hand side graph represents the solution.
We will be following an induction approach, where the given problem instance is reduced to a smaller
problem. In this case, the determination of ‘smaller’ or ‘bigger’ will be based on the number of vertices
in the underlying graph. The induction will be applied to the number of vertices. This essentially means
that we need to remove some vertices that do not contribute to any maximal induced subgraph. We have
to decide which vertices to eliminate to construct a smaller graph. Of course, any vertex with a degree less
than k can be removed. The induction hypothesis is stated below.
Induction hypothesis: We know how to find a maximum induced subgraphs all of whose
vertices have degree ≥ k when the number of vertices in the given graph is < n.
What is the base case here? Can any graph G = (V, E) with |V | ≤ k serve as a base case for the
induction? The answer is no, because any graph with at most k vertices cannot have a vertex of degree k
or more. Then, the question is: can the base case hold for n = |V | = k + 1? Yes, but then the graph needs
to be a complete graph (why).
The induction works as follows. Suppose G = (V, E) is an undirected graph with n > k + 1. If all
the vertices have degrees at least k, then we are done. Otherwise, there exists at least one vertex v with
deg(v) < k. Definitely, v does not contribute to any solutions. Then, we get the immediate smaller problem
G0 = (V 0 , E 0 ) by removing v, that is, V 0 = V \ {v} and E 0 = E \ {(u, w) ∈ E : u = v or w = v}. Since,
|V 0 | = n − 1, by the induction hypothesis we can solve the problem.
Question. What are the major operations in this induction approach?
Correctness: In this approach, we remove vertices whose degrees are less than k. The order of the
removals does not matter. The final subgraph, after all these removals, remains a maximal induced graph
because these removals are necessary.
Exercise 3.6. Write a complete algorithm that captures the description above.
Let A be a finite set and f : A → A be a mapping. For simplicity, let us consider the set A to be
[n] = {1, 2, . . . , n}. Thus, we can treat f as an array f [1, 2, . . . , n] with f (i) ∈ [n]. Note that in this case,
f (i) and f [i] are essentially the same. We are interested in finding a maximum size subset S of A such that
when the function f : A → A is restricted to S, denoted by fS , it becomes a bijective map fS : S → S. We
want to point out that when we restrict a map f : A → A to a subset S ⊂ A, the range of the restricted
map fS may not necessarily be a subset of S (i.e., fS (S) 6⊂ S in general). The problem statement is
formally defined below.
19
Finding Maximum Size Restricted Bijective
Input: A finite set A and a mapping f : A → A
Problem: Find a maximum size subset S ⊆ A such that (1) f maps every element of S to
some element of S (i.e., f maps S into itself), and (2) no two elements of S are
mapped to the same element (i.e., f is 1-1 when restricted to S).
Output: A maximum size subset S of A
Exercise 3.7. Show that the above problem always has a solution, that is, a subset S ⊆ A such that
fS : S → S bijective.
Before approaching this problem, let us make some observations that could help in solving it.
1. If the given function f : A → A is one-one, then the entire set A satisfies the conditions of the
problem, and thus, S = A will be the solution.
2. On the other hand, if f (i) = f (j) for some i 6= j, then S cannot contain both i and j as they violate
the bijectivity of the restricted map fS : S → S. For example, the set S that solves the problem
given in Figure 5 cannot contain both 2 and 3 since f (2) = f (3) = 1.
3. Like solving previous problems using induction, we have to eliminate the right candidate for employing
the induction hypothesis.
4. Essentially, we start with S = A and keep eliminating elements from S until the restricted map
fS : S → S becomes bijective.
5. For example, if we want to eliminate 3, then we have to eliminate 1 as well because 1 maps to 3.
6. The problem is to find some way for deciding which elements to eliminate.
1 1
2 2
3 3
4 4
5 5
6 6
7 7
When the mapping f : [7] → [7] defined in Figure 5 is restricted on S = {1, 3, 5}, then the restricted
mapping fS : S → A becomes a bijective mapping fS : S → S. Moreover, S = {1, 3, 5} will be the
maximum size subset of [7] such that fS : S → S is bijective.
Eventually, as part of the induction approach, we can reduce the given problem into a smaller one by
analyzing elements that do not belong to S, followed by applying the induction hypothesis given below
20
Induction hypothesis: We know how to solve the problem for sets of n − 1 elements.
Base case: Solve for a set of one element, which is trivial as the element maps to itself.
Now assume that A has n elements, that is, A = [n] = {1, 2, . . . , n}, and we are looking for a subset S of
A that satisfies the conditions given in the problem statement. Now everything boils down to eliminating
elements from A to obtain a reduced-size subset (hence, a reduced problem). The following types of
elements can be eliminated as they cannot belong to S:
Correctness: Exercise!!
Complexity: First, note that throughout the procedure, each element is enqueued into Q at most once.
Also, note that the cost of ENQUEUE and DEQUEUE operations is constant. As initialization requires O(n)
operations, the total time complexity is O(n). Is this an in-place algorithm? What is its space complexity?
Exercise 3.9. Write a complete recursive algorithm for the above problem.
Exercise 3.10. Do we always need to reach out to the base case?
Exercise 3.11. Does the original problem always require integrating solutions from the sub-problems? Or
does a solution to the sub-problem works as a solution of the original problem?
21
4 Some Variants of Binary Search
Definition 4.1 (Partially ordered set). Let S be a set and ≤ be a binary relation on S. Then, (S, ≤) is
said to be partially ordered set (a.k.a poset), if the following three properties are satisfied.
1. (R, ≤), (Z, ≤) and (Q, ≤) are partially ordered sets, where ≤ denotes usual ordering.
2. Let S be any set and P(S) denote its power set. Then (P(S), ⊆) is a poset.
3. For two elements a, b ∈ N, the notation ‘a|b’ represents ‘a divides b’. Then, (N, |) is a poset.
4. Let Σ be a finite set of symbols and L be the collection of all strings over Σ. Let ≤ denote the
lexicographical order. Then, (L, ≤) is a poset.
1. (R, ≤), (Z, ≤) and (Q, ≤) are totally ordered set, where ≤ denotes usual ordering.
3. (P(S), ⊆), (N, |) and (R, ≤5 ) are not totally ordered set.
Let us consider the following question: What is the difference between a sequence and a set? In a
sequence, the order of the elements matters, and repetition is allowed. In contrast, in a set, the order does
not matter, and repetition is not allowed.
When we feed inputs to algorithms like binary search, sorting, and selection, the input typically consists
of many elements (but finite), and they appear in a specific order. Therefore, we can regard the input as
a finite sequence. We assume that the representation of the input is an array, and the size of the array is
known. Furthermore, we assume that the elements of the array are taken from a totally ordered set, so
that they can be compared.
Pure Binary Search. The essential idea of binary search is reducing the search space by (nearly) half
by asking only one question. Let us recall the problem state of pure binary search as follows.
22
For simplicity, we look for only one index i such that xi = x. In general, we may be interested in finding
all such indices, the smallest one, the largest one, and so on. Here is the basic idea of binary search.
4. Then, solve it by induction: for base case (n = 1), compare directly with the key x.
Exercise 4.3. Does binary search work if a sorted sequence of objects (not necessarily numbers), denoted
by x1 , . . . , xn , and a search key x are given, where all xi ’s and x come from a totally ordered set? In
particular, suppose we have a collection L of words over the English alphabet in lexicographical order
and a search word, such as ‘hello’. Can we apply the pure binary search algorithm?
Variant-I. We now consider our first variant of binary search, where no search key is given. We have to
find an i ∈ [n] such that xi = i. More formally, it is defined as follows.
Exercise 4.4. Write a complete algorithm along with its time complexity and correctness.
Variant-II. We can find some sequences of real numbers that are not sorted, but they can be segmented
into two parts such that each part will be sorted. Furthermore, in some cases, if these parts are swapped,
the new sequence becomes sorted. These sequences are called cyclically sorted sequences. More formally,
it is defined below.
Definition 4.3 (Cyclically sorted sequence). A sequence x1 , x2 , . . . , xn is said to be cyclically sorted, if the
smallest number in the sequence2 is xi , for some unknown i, and the sequence xi , xi+1 , . . . , xn , x1 , . . . , xi−1
is sorted in increasing order3 .
23
x : 10 11 12 1 2 3 4 5 6
1 2 3 4 5 6 7 8 9
Here at index i = 4, the sequence has a smallest value and the newly arranged sequence x4 = 1, x5 =
2, x6 = 3, x7 = 4, x8 = 5, x9 = 6, x1 = 10, x2 = 11 and x3 = 12 is sorted.
1 2 3 4 5 6 10 11 12
x4 ≤ x5 ≤ x6 ≤ x7 ≤ x8 ≤ x9 ≤ x1 ≤ x2 ≤ x3
The idea is the following. For simplicity, assume that all the elements are distinct, and thus an unique
smallest element. Let min denote the smallest element and i be its index, that is min = xi .
Exercise 4.6. Write a complete algorithm for the above problem along with time complexity and correct-
ness.
Note that the principle of binary search appears even in problems that do not seem to require any search.
In this section, we show how to solve the stuttering-subsequence problem (a non-search problem) using
pure binary search. Before formally defining the stuttering-subsequence problem, we first address whether
a sequence B is a subsequence of another sequence A.
24
2. Let A = “Design and Analysis of Algorithms” and B = “DAA”. Then, B is a subsequence of A.
Deciding Subsequence
Input: Two sequences A = a1 a2 · · · an and B = b1 b2 · · · bm with m ≤ n.
Question: Is B a subsequence of A?
Output: A decision (YES or NO).
3. If at any step, bi (for some i ∈ [m]) is not found, then return NO.
Time complexity. The algorithm requires one linear scan of A and B, the time complexity of the
algorithm O(m + n).
Exercise 4.8. Write a complete algorithm to determine whether a sequence B is a subsequence of another
sequence A.
Stuttering-Subsequence Problem
Input: Two sequences A and B over some alphabet.
Question: Can we find a maximal integer i such that B i is a subsequence of A.
Output: A integer i.
The outline of the algorithm is given below. Let n and m denote the lengths of A and B respectively.
• Observation:
25
• Idea: Use binary search on {1, 2, . . . , t}, where t = dn/me.
1. Set i = dt/2e.
2. Check whether B i is a subsequence of A. (Required time-complexity is O(m · i + n))
3. If yes, then eliminate the lower range, i.e., the search space gets reduced to {i + 1, . . . , t}.
4. Otherwise, eliminate the upper range, i.e., the search space gets reduced to {1, . . . , i − 1}.
• Time complexity:
• Space complexity: Is this an in-place algorithm? What is the space complexity of this algorithm?
Exercise 4.9. Write a complete algorithm for the stuttering-subsequence problem along with correctness.
References:
2. Chapter 2 of A Course on Cooperative Game Theory by Satya R. Chakravarty, Palash Sarkar and
Manipushpak Mitra.
26