DP
DP
Dynamic programming
Noam Zeilberger
3 March 2025
1 / 44
What is dynamic programming?
The name of a strategy for efficiently solving certain kinds of problems which have
both recursive structure and overlapping subproblems.
Originally introduced in the context of mathematical control theory, and often applied
to solving optimization problems, but has wider applications.
2 / 44
Example #1: minimizing transactions
3 / 44
Problem statement
Given a list C of coin denominations and an amount of money n, find the minimum
number of coins needed to pay n. (We’ll assume 1 ∈ C to avoid impossible payments.)
For example, you can pay 0.79€ using only five coins:
79 = 50 + 20 + 5 + 2 + 2
4 / 44
Greedy algorithm?
Idea: just take the largest c ∈ C such that c ≤ n, and repeat with n − c.
Well, this approach works for eurocents (C = {1, 2, 5, 10, 20, 50, 100, 200}), but will
not always yield an optimal payment for arbitrary C .
For example, taking1 C = {1, 2, 5, 10, 20, 50, 88, 100, 200}, an optimal payment of
n = 264 requires only three coins, but the greedy payment uses five.
1
See https://en.wikipedia.org/wiki/Fijian_dollar.
5 / 44
Brute force solution
1. Enumerate all ways of paying n with (any number of copies of) coins from C .
2. Return the minimum length of a payment.
6 / 44
Brute force solution2
2
Good for making payments up to about 0.30€
7 / 44
Step one of dynamic programming: finding a recurrence
Let MC (n) be the minimum number of coins needed to pay n units, drawing coins
from C . Observe that MC (n) obeys the following recurrence:
Since it takes zero coins to pay nothing, and the best way to pay n units is to choose a
coin c ∈ C making MC (n − c) is minimal, and use c together with an optimal
payment of n − c.
8 / 44
The recurrence translated to a recursive function
9 / 44
The recurrence translated to a recursive function
9 / 44
The recurrence translated to a recursive function
9 / 44
>>> mincoins_rec_traced([1,2,4], 5, 5) ...computing MC(0) needed for MC(2)
...computing MC(5) needed for MC(5) ...computing MC(0) needed for MC(4)
...computing MC(4) needed for MC(5) ...computing MC(3) needed for MC(5)
...computing MC(3) needed for MC(4) ...computing MC(2) needed for MC(3)
...computing MC(2) needed for MC(3) ...computing MC(1) needed for MC(2)
...computing MC(1) needed for MC(2) ...computing MC(0) needed for MC(1)
...computing MC(0) needed for MC(1) ...computing MC(0) needed for MC(2)
...computing MC(0) needed for MC(2) ...computing MC(1) needed for MC(3)
...computing MC(1) needed for MC(3) ...computing MC(0) needed for MC(1)
...computing MC(0) needed for MC(1) ...computing MC(1) needed for MC(5)
...computing MC(2) needed for MC(4) ...computing MC(0) needed for MC(1)
...computing MC(1) needed for MC(2) 2
...computing MC(0) needed for MC(1)
10 / 44
>>> mincoins_rec_traced([1,2,4], 6, 6) ...computing MC(0) needed for MC(1)
...computing MC(6) needed for MC(6) ...computing MC(1) needed for MC(5)
...computing MC(5) needed for MC(6) ...computing MC(0) needed for MC(1)
...computing MC(4) needed for MC(5) ...computing MC(4) needed for MC(6)
...computing MC(3) needed for MC(4) ...computing MC(3) needed for MC(4)
...computing MC(2) needed for MC(3) ...computing MC(2) needed for MC(3)
...computing MC(1) needed for MC(2) ...computing MC(1) needed for MC(2)
...computing MC(0) needed for MC(1) ...computing MC(0) needed for MC(1)
...computing MC(0) needed for MC(2) ...computing MC(0) needed for MC(2)
...computing MC(1) needed for MC(3) ...computing MC(1) needed for MC(3)
...computing MC(0) needed for MC(1) ...computing MC(0) needed for MC(1)
...computing MC(2) needed for MC(4) ...computing MC(2) needed for MC(4)
...computing MC(1) needed for MC(2) ...computing MC(1) needed for MC(2)
...computing MC(0) needed for MC(1) ...computing MC(0) needed for MC(1)
...computing MC(0) needed for MC(2) ...computing MC(0) needed for MC(2)
...computing MC(0) needed for MC(4) ...computing MC(0) needed for MC(4)
...computing MC(3) needed for MC(5) ...computing MC(2) needed for MC(6)
...computing MC(2) needed for MC(3) ...computing MC(1) needed for MC(2)
...computing MC(1) needed for MC(2) ...computing MC(0) needed for MC(1)
...computing MC(0) needed for MC(1) ...computing MC(0) needed for MC(2)
...computing MC(0) needed for MC(2) 2
...computing MC(1) needed for MC(3) 11 / 44
Memoization
Of course, we know that the value of mincoins_rec(C, n) doesn’t change for fixed
C and given n, but Python does not have a general mechanism for noticing that. Still,
we can program such memoization by hand.
Recipe:
1. Keep a “memo table” with all values of the function you’ve already computed.
2. Each time the function is called, check whether the output for the given input
value is already in the memo table, and if so return it.
3. Otherwise, compute the output, store it in the memo table, and return the output.
12 / 44
Mincoins with recursion and memoization, aka top-down dynamic programming
13 / 44
Mincoins with recursion and memoization, aka top-down dynamic programming
“Top-down” because computation starts up at the root of the recursion tree and works
down towards the leaves, while using the memo table to avoid recomputing subtrees.
(For some reason computer scientists like drawing trees upside-down.)
13 / 44
Bottom-up dynamic programming
In other words, starting with the smallest subproblems, we progressively consider larger
and larger cases until we reach the goal.
Concretely, this is often carried out using iteration while progressively filling in a table.
14 / 44
Mincoins using bottom-up dynamic programming
15 / 44
Animation of the bottom-up algorithm (added post-lecture)
i 0
[t[i-c] for c in C if c<=i] []
t[i] 0
16 / 44
Animation of the bottom-up algorithm (added post-lecture)
i 0 1
[t[i-c] for c in C if c<=i] [] [0]
t[i] 0 1
16 / 44
Animation of the bottom-up algorithm (added post-lecture)
i 0 1 2
[t[i-c] for c in C if c<=i] [] [0] [1]
t[i] 0 1 2
16 / 44
Animation of the bottom-up algorithm (added post-lecture)
i 0 1 2 3
[t[i-c] for c in C if c<=i] [] [0] [1] [2]
t[i] 0 1 2 3
16 / 44
Animation of the bottom-up algorithm (added post-lecture)
i 0 1 2 3 4
[t[i-c] for c in C if c<=i] [] [0] [1] [2] [3]
t[i] 0 1 2 3 4
16 / 44
Animation of the bottom-up algorithm (added post-lecture)
i 0 1 2 3 4 5
[t[i-c] for c in C if c<=i] [] [0] [1] [2] [3] [4,0]
t[i] 0 1 2 3 4 1
16 / 44
Animation of the bottom-up algorithm (added post-lecture)
i 0 1 2 3 4 5 6
[t[i-c] for c in C if c<=i] [] [0] [1] [2] [3] [4,0] [1,1]
t[i] 0 1 2 3 4 1 2
16 / 44
Animation of the bottom-up algorithm (added post-lecture)
i 0 1 2 3 4 5 6 7
[t[i-c] for c in C if c<=i] [] [0] [1] [2] [3] [4,0] [1,1] [2,2]
t[i] 0 1 2 3 4 1 2 3
16 / 44
Animation of the bottom-up algorithm (added post-lecture)
i 0 1 2 3 4 5 6 7 8
[t[i-c] for c in C if c<=i] [] [0] [1] [2] [3] [4,0] [1,1] [2,2] [3,3,0]
t[i] 0 1 2 3 4 1 2 3 1
16 / 44
Animation of the bottom-up algorithm (added post-lecture)
i 0 1 2 3 4 5 6 7 8 9
[t[i-c] for c in C if c<=i] [] [0] [1] [2] [3] [4,0] [1,1] [2,2] [3,3,0] [1,4,1]
t[i] 0 1 2 3 4 1 2 3 1 2
16 / 44
Animation of the bottom-up algorithm (added post-lecture)
i 0 1 2 3 4 5 6 7 8 9 10
[t[i-c] for c in C if c<=i] [] [0] [1] [2] [3] [4,0] [1,1] [2,2] [3,3,0] [1,4,1] [2,1,2]
t[i] 0 1 2 3 4 1 2 3 1 2 2
16 / 44
Animation of the bottom-up algorithm (added post-lecture)
i 0 1 2 3 4 5 6 7 8 9 10 11
[t[i-c] for c in C if c<=i] [] [0] [1] [2] [3] [4,0] [1,1] [2,2] [3,3,0] [1,4,1] [2,1,2] [2,2,3]
t[i] 0 1 2 3 4 1 2 3 1 2 2 3
16 / 44
Obtaining a more useful solution
Knowing that we can pay (say) 17.29 FJD using 12 coins is interesting, but not so
useful if we don’t know how to make such an optimal payment.
17 / 44
Computing the best payment with top-down DP
18 / 44
Computing the best payment with bottom-up DP
19 / 44
Advantages and disadvantages of top-down vs bottom-up DP, in general
Top-down DP:
+ relatively easy to convert a recursive function to use memoization
20 / 44
Advantages and disadvantages of top-down vs bottom-up DP, in general
Top-down DP:
+ relatively easy to convert a recursive function to use memoization
− can have large memory overhead
20 / 44
Advantages and disadvantages of top-down vs bottom-up DP, in general
Top-down DP:
+ relatively easy to convert a recursive function to use memoization
− can have large memory overhead
− computational overhead to testing if a value is in the memo table
(to keep this overhead small, input values should be hashable, or at least ordered)
20 / 44
Advantages and disadvantages of top-down vs bottom-up DP, in general
Top-down DP:
+ relatively easy to convert a recursive function to use memoization
− can have large memory overhead
− computational overhead to testing if a value is in the memo table
(to keep this overhead small, input values should be hashable, or at least ordered)
Bottom-up DP:
− requires more “programming” of the computation in advance
20 / 44
Advantages and disadvantages of top-down vs bottom-up DP, in general
Top-down DP:
+ relatively easy to convert a recursive function to use memoization
− can have large memory overhead
− computational overhead to testing if a value is in the memo table
(to keep this overhead small, input values should be hashable, or at least ordered)
Bottom-up DP:
− requires more “programming” of the computation in advance
+ usually more efficient than top-down DP
20 / 44
Advantages and disadvantages of top-down vs bottom-up DP, in general
Top-down DP:
+ relatively easy to convert a recursive function to use memoization
− can have large memory overhead
− computational overhead to testing if a value is in the memo table
(to keep this overhead small, input values should be hashable, or at least ordered)
Bottom-up DP:
− requires more “programming” of the computation in advance
+ usually more efficient than top-down DP
− occasionally wasteful if values of subproblems are ultimately unneeded
(one example: parsing words by a grammar using the CYK algorithm...)
20 / 44
Another perspective on optimal payments
21 / 44
Another perspective on optimal payments
0 1 2 3 4 5 6 7 8 9 10 11 12 13
21 / 44
Another perspective on optimal payments
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Key point: we do not need to try all paths to find a shortest path.
(We will return to the shortest path problem later in the course.)
21 / 44
Interlude
22 / 44
Origin of the name “dynamic programming”
I spent the Fall quarter (of 1950) at RAND. My first task was to find a name for multistage
decision processes. An interesting question is, “Where did the name, dynamic programming,
come from?” The 1950s were not good years for mathematical research. We had a very
interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he
actually had a pathological fear and hatred of the word “research”. I’m not using the term
lightly; I’m using it precisely. His face would suffuse, he would turn red, and he would get
violent if people used the term research in his presence. You can imagine how he felt, then,
about the term mathematical. [...] In the first place I was interested in planning, in decision
making, in thinking. But planning, is not a good word for various reasons. I decided therefore
to use the word “programming”. I wanted to get across the idea that this was dynamic, this
was multistage, this was time-varying. I thought, let’s kill two birds with one stone. Let’s
take a word that has an absolutely precise meaning, namely dynamic, in the classical physical
sense. It also has a very interesting property as an adjective, and that is it’s impossible to use
the word dynamic in a pejorative sense. Try thinking of some combination that will possibly
give it a pejorative meaning. It’s impossible.
23 / 44
Example #2: Fibonacci numbers revisited
24 / 44
(credit: user Romain on Wikipedia)
25 / 44
The Fibonacci recurrence naively translated into Python
def fib_rec(n):
if n == 0: return 0
elif n == 1: return 1
else: return fib_rec(n-1) + fib_rec(n-2)
26 / 44
The Fibonacci recurrence naively translated into Python
def fib_rec(n):
if n == 0: return 0
elif n == 1: return 1
else: return fib_rec(n-1) + fib_rec(n-2)
26 / 44
Recursion tree of F4
F4
F2 F3
F0 F1 F1 F2
0 1 1 F0 F1
0 1
27 / 44
Recursion tree of F5
F5
F3 F4
F1 F2 F2 F3
1 F0 F1 F0 F1 F1 F2
0 1 0 1 1 F0 F1
0 1
28 / 44
Recursion tree of F6
F6
F4 F5
F2 F3 F3 F4
F0 F1 F1 F2 F1 F2 F2 F3
0 1 1 F0 F1 1 F0 F1 F0 F1 F1 F2
0 1 0 1 0 1 1 F0 F1
0 1
29 / 44
Recursion tree of F6
F6
F4 F5
F2 F3 F3 F4
F0 F1 F1 F2 F1 F2 F2 F3
0 1 1 F0 F1 1 F0 F1 F0 F1 F1 F2
0 1 0 1 0 1 1 F0 F1
0 1
The tree grows exponentially!
29 / 44
Recursion tree of F6
F6
F4 F5
F2 F3 F3 F4
F0 F1 F1 F2 F1 F2 F2 F3
0 1 1 F0 F1 1 F0 F1 F0 F1 F1 F2
0 1 0 1 0 1 1 F0 F1
0 1
...but it contains many redundancies...
30 / 44
Overlapping subproblems
The structure of the recursion leads to the same numbers being computed over and
over, and so we can compute them efficiently using either top-down or bottom-up DP.
31 / 44
Fibonacci with top-down DP
32 / 44
Fibonacci with top-down DP (illustration)
5
3
2
1
!2 !3
0 1
!1 !1
33 / 44
Fibonacci with bottom-up DP
1 def fib_bu(n):
2 table = [0, 1]
3 for i in range(2,n+1):
4 table.append(table[i-1] + table[i-2])
5 return table[n]
34 / 44
Fibonacci with bottom-up DP
1 def fib_bu(n):
2 table = [0, 1]
3 for i in range(2,n+1):
4 table.append(table[i-1] + table[i-2])
5 return table[n]
Notice that we really only need to lookup the last two values (table[i-1] and
table[i-2]) to compute the next value. This leads to a space optimization...
34 / 44
Optimized bottom-up Fibonacci
35 / 44
Comparison
36 / 44
Comparison
37 / 44
Comparison
In[4]: timed(fib_td, 35)
Out[6]: (336447648764317832666216120051075433103021484606800639065647...,
0.009203672409057617)
Out[7]: (336447648764317832666216120051075433103021484606800639065647...,
0.003816843032836914)
Out[8]: (336447648764317832666216120051075433103021484606800639065647...,
0.002521514892578125)
38 / 44
Example #3: minimum-cost edit sequence
39 / 44
Problem statement
Given two strings, find optimal sequence of operations transforming one into another.
The cost |δ| of an optimal sequence A −→ B is called the “edit distance” of A from B.
δ
3
Edit distance under this cost model is called the Levenshtein distance between strings.
40 / 44
A recursive formula
41 / 44
Computing the edit distance from A to B, by hand
K I T T E N
0 1 2 3 4 5 6
S
1
Each corner of the grid represents a pair of prefixes
I of A and B. Our goal is to label each corner (i, j)
2
by the edit distance between A[: i] and B[: j].
T Yellow cells represent where the words match.
3
I
the minimum of the three labels to its North, West,
and North-West; and label a yellow cell by simply
5
copying the label to its NW.
N
6
G
7
42 / 44
d(kitten, sitting) = 3
K I T T E N
0 1 2 3 4 5 6
S S/K
1 1 2 3 4 5 6
I copy
2 2 1 2 3 4 5
T copy
3 3 2 1 2 3 4
T copy
4 4 3 2 1 2 3
I I/E
5 5 4 3 2 2 3
N copy
6 6 5 4 3 3 2
G ins G
7 7 6 5 4 4 3
43 / 44
Summary of dynamic programming
Technique for solving problems with recursive structure and overlapping subproblems.
Comes in “top-down” and “bottom-up” versions.
Top-down DP:
▶ starts at root of recursion tree and works “down” towards the leaves
▶ implemented as a recursive function with memoization for reusing computations
▶ recipe: 1. figure out recursive structure, 2. write recursive fn, 3. add memoization
Bottom-up DP:
▶ starts at leaves of recursion tree and works “up” towards the root
▶ implemented using iteration rather than (function) recursion
▶ requires figuring out recursive structure and suitable plan of computation
44 / 44