Lec10 dp2
Lec10 dp2
In the previous lecture, we reviewed Dynamic Programming and saw how it can be applied to problems
about sequences and trees. Today, we will extend our understanding of DP by applying it to other classes
of problems, like graphs, and explore how to speed up dynamic programming implementations with clever
tricks like data structures and matrices!
Objectives of this lecture
Problem is the same thing but does not require returning to the start. Both problems are NP-hard.
1
Step 2: Define our subproblems Based on the above, lets make our subproblems
C(S, t) = The minimum-cost path starting at x and ending at t going through all vertices in S
What is the solution to our original problem? Is it one of the subproblems? Actually the answer is no this
time. But we can figure it out by combining a handful of the subproblems. Specifically, we want to form a
tour (a cycle) using a path that starts at x. So we can just try every other vertex in the graph t, and make
a path from x to t then back to x again to complete the cycle. So the answer, once we solve the DP will be
answer = min (C(V, t) + w(t, x))
t∈(V −{x})
Step 3: Deriving a recurrence Using the substructure we described above, the idea that will power the
recurrence is that to get a path that goes from x to t, we want a path that goes from x to t′ plus an edge
from t′ to t. Which t′ will be the best? We can’t know for sure, so we should just try all of them and take
the best one. We also need a base case. We can’t use an empty path as our base case since we assume a
starting vertex x and an ending vertex t for every subproblem, so lets use a path of two vertices as our base
case. This gives us the recurrence
Step 4: Analysis The parameter space for C(S, t) is 2n−1 (the number of subsets S considered) times
n (the number of choices for t). For each recursive call we do O(n) work inside the call to try all previous
vertices t′ , for a total of O(n2 2n ) time. This is assuming we can lookup a set (e.g, S − {t}) in constant time.
Since this algorithm is too slow for large values of n, we usually assume that n is small, and so a
highly efficient way to store the set S is as a single integer, where you use the individual bits of the
integer to indicate which vertices are in or not in the set. This makes it easy to store and lookup the
subproblem solutions because an integer is a much better key for an array or hashtable than a set!
Exercise
Given the results of C(S, t) for a TSP problem, explain how to find the actual sequence of vertices
that make up the tour.
2
This technique is sometimes called “Subset DP”. These ideas apply in many cases to reduce a factorial
running to time to a regular exponential running time.
D[u][v][k] = the length of the shortest path from u to v using the intermediate vertices {1, 2, . . . , k}
Step 3: Deriving a recurrence We need to consider two cases. For the pair u, v, either the shortest
path using the intermediate vertices {1, 2, . . . , k} goes through k or it does not. If it does not, then the
answer is the same as it was before k became an option. If k now gets used, we can break the path at k and
use the optimal substructure to glue together the two shortest paths divided at k to get a new shortest path.
Writing the recurrence using this idea looks like this.
Step 4: Analysis We have O(n3 ) subproblems and each of them takes O(1) time to evaluate, so this
takes O(n3 ) time.
2.1 Optimizing the space: eliminating redundancies
One downside of the algorithm is that it uses a lot of space, O(n3 ), which is a factor n larger than the input
graph. This is bad if the graph is large. Can we reduce this? There is one straightforward way to reduce
it, and then a more subtle way to reduce it even further. First, notice that the subproblems for parameter
k only depend on the subproblems with parameter k − 1. So, we don’t actually need to store all O(n3 )
subproblems, we can just keep the last set of subproblems and compute bottom-up in increasing order of k.
Here’s an even simpler but more subtle way to optimize the algorithm. I claim that we can just write:
So what happened here, it looks like we just forgot the k parameter of the DP, right? It turns out that this
algorithm is still correct, but now it only uses O(n2 ) space because it just keeps a single 2D array of distance
3
estimates. Why does this work? Well, compared to the by-the-book implementation, all this does is allow
the possibility that D[u][k] or D[k][v] accounts for vertex k already, but a shortest path won’t use vertex k
twice, so this doesn’t affect the answer!
Sometimes our subproblems might not all be necessary to the solve the problem, so if we can eliminate
many of them, we will either speed up the algorithm or reduce the amount of space it requires.
A longest increasing subsequence is an increasing subsequence such that no other increasing subse-
quence is longer.
Step 1: Find some optimal substructure Given a sequence a1 , . . . , an and its LIS ai1 , . . . , aik−1 , aik ,
what can we say about ai1 , . . . , aik −1 ? Since ai1 , . . . , aik is an LIS, it must be the case that ai1 , . . . , aik−1 is
an LIS of a1 , . . . aik such that aik−1 < aik . Alternatively, it is also an LIS that ends at (and contains) aik−1 .
This suggests a set of subproblems.
Step 2: Define our subproblems Lets define our subproblems to be
LIS[i] = the length of a longest increasing subsequence of a1 , . . . ai that contains ai
Note that the answer to the original problem is not necessarily LIS[n] since the answer might not contain
an , so the actual answer is
answer = max LIS[i]
1≤i≤n
Step 3: Deriving a recurrence Since LIS[i] ends a subsequence with element i, the previous element
must be anything aj before i such that aj < ai , so we can try all possibilities and take the best one
0
if i = 0,
LIS[i] = 1 + max LIS[j] otherwise.
0≤j<i
aj <ai
Step 4: Analysis We have O(n) subproblems and each one takes O(n) time to evaluate, so we can
evaluate this DP in O(n2 ) time. Is this a good solution or can we do better?
3.1 Optimizing the runtime: better data structures
The by-the-book implementation of the recurrence for LIS gives an O(n2 ) algorithm, but sometimes we can
speed up DP algorithms by solving the recurrence more cleverly. Specifically in this case, the recurrence is
computing a minimum over a range, which sounds like something we know how to do faster than O(n)...
How about we try to apply a range query data structure (a SegTree) to this problem! Initially, its not clear
why this would work, because although we are doing a range query over 1 ≤ j < i, we have to account for the
4
the constraint that aj < ai , so we can not simply do a range query over the values of LIS[1 . . . (i − 1)] or this
might include larger elements. Here’s an idea. Let’s store the values of LIS in a different order! Rather than
storing LIS[1], LIS[2], . . . , as usual, we can store them in order of the values of ai , so that we can actually
do a range query over all values less thanai . To do this, we first sort (ai )i which takes O(n log n) with any
fast sorting algorithm, then use the rank of ai as the position of LIS[i]. In pseudocode, the optimized version
might therefore look something like:
This optimized algorithm performs one binary search and two SegTree operations per iteration, and it has
to sort the input, so in total it takes O(n log n) time.
If your DP recurrence involves computing a minimum, or a sum, or searching for something specific,
you can sometimes speed it up by storing the results in a data structure other than a plain array.
Exercise
Can you find a greedy algorithm that matches the O(n log n) performance of the LIS algorithm above?
5
4 Counting Paths in a Graph
Optional content — Will not appear on the homeworks or the exams
For this next problem lets consider a directed unweighted graph on n vertices. Our goal is to compute, for
every pair of vertices u, v, the number of directed walks2 of length k that start at u and end at v. A walk is
a path between u and v that might repeat vertices or edges.
Step 1: Find some optimal substructure A walk of length k between u and v is just a walk of length
k − 1 from u to some neighbor of v with one additional edge.
Step 2: Define our subproblems Lets define our subproblems to be
Step 3: Deriving a recurrence To find the number of walks of length k between u and v, we have to
look at walks of length k − 1 and then add another edge. This means we are interested in vertices that are
one away from u or v. To avoid overcounting, we should just pick one of them, so lets consider vertices
adjacent to v. Our goal is then to sum over all such vertices.
1u=v
X
if k = 0,
W [u][v][k] = W [u][x][k − 1] otherwise
v∈adj[x]
Here, I have used the notation 1u=v as an indicator function, i.e. it is equal to 1 if u = v, otherwise it is
equal to zero, and adj[x] to mean the set of vertices adjacent to x.
Step 4: Analysis Computing this recurrence as written, we have O(n2 k) subproblems and each takes
O(n) time to evaluate, so the total runtime is O(n3 k). Can we do better?
4.1 Optimizing the runtime: matrices
Although it might not look it from the way it is written, the recurrence above is actually of a useful form.
Lets rewrite it a bit to make it clearer. I’m going to move the k’s to subscripts, and rewrite the sum using
an indicator function. The reason will hopefully become clear soon.
X
Wk [u][v] = Wk−1 [u][x]A[x][v], W0 = I
x∈V
Here, A is the adjacency matrix of the graph (it contains 1 at [u][v] if there is an edge from u to v, otherwise
it contains zero), and I is the identity matrix (n × n in this case). Does this formula look familiar? Its
actually matrix multiplication!! Specifically, it says that
Wk = Wk−1 A,
where A is the adjacency matrix of the graph, and W0 = I. Which means by extension that
W k = Ak .
Lets just assume that we can still do arithmetic in constant time. One way to achieve this is to compute everything mod p
6
Key Idea: Speeding up DP with matrices
If your dynamic programming recurrence is a low-order linear recurrence relation, you can sometimes
speed it up by writing it as a matrix equation and using matrix exponentiation!
Option 2: diagonalization Another way to do better, if we remember our linear algebra class, is by
diagonalizing the matrix into the form A = P DP −1 , where D is a diagonal matrix. Then our equation
becomes
Wk = P Dk P −1 ,
which means we only have to compute powers of the diagonal elements of the matrix, rather than powers
of an entire n × n matrix, which is much cheaper! Since it just takes log k time to compute each power,
and there are n elements on the diagonal of D, this takes O(n log k) time plus the time required to actually
compute the diagonalization, which is O(n3 ), for a runtime of