[go: up one dir, main page]

0% found this document useful (0 votes)
64 views26 pages

Recursion: Breaking Down Problems Into Solvable Subproblems

This document discusses recursion and related algorithms like divide and conquer and dynamic programming. It provides examples of recursively defined problems like Fibonacci numbers, binary search trees, and the Smith-Waterman algorithm for sequence alignment. It analyzes the time complexity of recursive algorithms and describes techniques like memoization and dynamic programming that store solutions to subproblems to avoid recomputing them. General formulas are given for analyzing the time complexity of recursive algorithms based on how the problem is divided and combined.

Uploaded by

albert_kerr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views26 pages

Recursion: Breaking Down Problems Into Solvable Subproblems

This document discusses recursion and related algorithms like divide and conquer and dynamic programming. It provides examples of recursively defined problems like Fibonacci numbers, binary search trees, and the Smith-Waterman algorithm for sequence alignment. It analyzes the time complexity of recursive algorithms and describes techniques like memoization and dynamic programming that store solutions to subproblems to avoid recomputing them. General formulas are given for analyzing the time complexity of recursive algorithms based on how the problem is divided and combined.

Uploaded by

albert_kerr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 26

Recursion

Breaking down problems into


solvable subproblems
Chapter 7
The good and the bad
Pluses
Code easy to understand
May give different view of problem
Sometimes, efficient
Bad
overhead of functions calls (20-30%)
may be hard to analyze
cost not apparent
can be very inefficient: exponential worse!
Finite Induction
Finite Induction principle
Let S
n
be a proposition depending on n
Base Case: Show that S
1
is true
Inductive case:
Assume S
k
and prove S
k+1

Conclude: S
n
is true for all n >= 1.
Variant:
Base Case: Show that S
1
is true
Inductive case:
Assume S
1
,S
2
S
k
and prove S
k+1

Conclude: S
n
is true for all n >= 1.

Example
Proofs involve two tricks
recognize subproblem (make take rearranging)
follow your nose
S
n
: 1+2+n = n(n+1)/2
Base case: S
1
is true 1 = 1(1+1)/2 check
Inductive Case
Given S
k
, show S
k+1
, i.e. 1+2+k+1 = something.
1+2..+k+ k+1 = k(k+1)/2 + k+1 = (k*k+k+ 2k+2)/2
= (k+1)(k+2)/2. QED.

Example: Complete k-ary tree
Sn: A full m-ary tree of depth n has (m
n+1
-1)/(m-1)
nodes
S0: (m
1
-1)/(m-1) = 1 .
Assume S
k
(visualize it)
...
...
m
m^2

m^k at depth k
The Math
Visualization provides the recurrence
Nodes in a S
k+1
trees is
# in S
k
tree + m*number of leaves in S
k

= (m
k+1
-1)/(m-1) +m*m
k

= (m
k+1
-1)/(m-1) + m^(k+2)/(m-1) -m^(k+1)/(m-1)
= (m^(k+2) -1)/(m-1)

Fibonacci
Fib(n)
if (n <= 1) return 1
else return fib(n-1)+ fib(n-2)
What is the cost?
Let c(n) be number of calls
Note c(n) =c(n-1)+c(n-2)+1 >= 2*c(n-2)
therefore c(n) >= 2
n/2
*c(1), ie. EXPONENTIAL
Also c(n)<= 2*c(n-1) so
c(n) <= 2
n
*c(1).
This bounds growth rate.
Is 3
n
in O(2
n
)?
Dynamic Fibonacci
Recursion Fibonacci is bad
Special fix
Compute fib(1), fib(2).up to fib(n)
Reordering computation so subproblems computed and
stored: example of dynamic programming.
Easiest approach: use array
fib(n)
array[1] =array[2]=1
for i = 3n
array[i] = array[i-1]+array[i-2]
return array[n]
note: you dont need an array.
This works for functions dependent on lesser values.
Combinations (k of n)
How many different groups of size k and you form
from n elements.
Ans: n!/[(n-k)! * k!]
Why? # of permutations of n people is n!
# of permutations of those invited is k!
# of permutations of those not invited is (n-k)!
Compute it. DOES not compute!
Dynamic does.
Note: C(k,n) = C(k-1,n-1) +C(k,n-1)
C(0,n) = 1
Use a 2-dim array. Thats dynamic programming
General Approach (caching)
Recursion + hashing to store solution of subproblems
Idea:
fib(n)
if (in hash table problem fib(n)) return value
else fib(n) = fib(n-1) + fib(n-2).
How many calls will there be?
How many distinct subproblems are there.
Good:
dont need to compute unneeded subproblems
simpler to understand
Bad: overhead of hashing
Not needed here, since subproblems obvious.
Recursive Binary Search
Suppose T is a sorted binary tree
values stored at nodes
Node
node less, more
int key
Object value
Object find(int k) // assume k is in tree
if ( k == key) return value;
else if (k < key) return less.find(k)
else if (k >key) return more.find(k)
What is the running time of this?
Analysis
If maximum depth of tree is d, then O(d).
What is maximum depth?
Worst case: n
What does tree look like?
Best case: log(n)
What will tree look like.
So analysis tells us what we should achieve, i.e. a
balanced tree. It doesnt tell us how.
Chapter 18 shows us how to create balanced trees.
Binary Search
Solving f(x) = 0 where a<b, f(a)<0, f(b)>0
Solve( f, a, b) (abstract algorithm)
mid = (a+b)/2
if -epsilon< f(mid) < epsilon return mid (*)
else if f(mid) <0 return f(mid,b);
else return f(a,mid)
Note: epsilon is very small; real numbers are not dealt
with exactly by computers
Analysis:
Each call divides the domain in half.
Max calls is number of bits for real number
Note algorithm may not work on real computer -- this
depends on how fast f changes.
Divide and Conquer: Merge Sort
Let A be array of integers of length n
define Sort (A)recursively via auxSort(A,0,N) where
Define array[] Sort(A,low, high)
if (low == high) return
Else
mid = (low+high)/2
temp1 = sort(A,low,mid)
temp2 = sort(A,mid,high)
temp3 = merge(temp1,temp2)



Merge
Int[] Merge(int[] temp1, int[] temp2)
int[] temp = new int[ temp1.length+temp2.length]
int i,j,k
repeat
if (temp1[i]<temp2[j]) temp[k++]=temp[i++]
else temp[k++] = temp[j++]
for all appropriate i, j.
Analysis of Merge:
time: O( temp1.length+temp2.length)
memory: O(temp1.length+temp2.length)
Analysis of Merge Sort
Time
Let N be number of elements
Number of levels is O(logN)
At each level, O(N) work
Total is O(N*logN)
This is best possible for sorting.
Space
At each level, O(N) temporary space
Space can be freed, but calls to new costly
Needs O(N) space
Bad - better to have an in place sort
Quick Sort (chapter 8) is the sort of choice.
General Recurrences
Suppose T(N) satisfies
Assume N is a power of 2 (wlog)
T(N) = 2T(N/2)+N
T(1) = 1
Then T(N) = NLogN+N or T is O(NlogN)
Proof
T(N)/N = T(N/2)/(N/2) +1
T(N/2)/ (N/2) = T(N/4)/ (N/4) + 1 etc.

T(2)/2 = T(1)/1 + 1
Telescoping Sum
Sum up all the equations and subtract like terms:
T(N)/N = T(1)/1+ 1*(number of equations)
Therefore T(N) = N +N*logN or O(N*log(N))
Note: The same proof shows that the recurrence
S(N) = 2*S(N/2) is O(N).
What happens if the recurrence is T(N) =2*T(N/2)+k?
What happens if the recurrence is T(N)=3*T(N/3)..?
General formula
General Recurrence Formulas
Suppose
Divide into A subproblems
Each of relative size B (problem size/subproblem size)
k overhead is const*N
k
Then
if A>B
k
T(N) is O(N
logB(A)
)
if A= B
k
T(N) is O(N
k
logN)
if A<B
k
T(N) is O(N
k
)
Proof: telescoping sum, see text.
Example: MergeSort; A=2, B=2, k = 1 (balanced)
MergeSort is O(NlogN) as before.

Dynamic Programming
Smith-Waterman Algorithm has many variants
Global SW algorithm finds the similarity between two
strings.
Goal: given costs for deleting and changing a character,
determine the similarity of two strings.
Main application: given two proteins (sequence of
amino acids), how far apart are they?
Evolutionary trees
similar sequences define proteins with similar functions
can compare across species
other version deal with finding prototype/consensus for
family of sequences
Smith - Waterman Example
See Handout from Biological Sequence Analysis
In this example, A,R,V are the 20 amino acids
Fig 2.2 defines how similar various amino acids are to
one another
Fig 2.5 shows the computation of the similarity of two
sequences
The deletion cost is 8.
The bottom right hand corner gives the value of the
similarity
Tracing backwards computes the alignment.
SW subproblem merging

c(i-1,j-1) c(i-1,j)

c(i,j-1) c(i,j)
Down: extend matching by deleting char si
Right: extend match by deleting char tj
Diagonal: extend match by substitution, if necessary
Smith-Waterman Algorithm
Recursive definition: -- find the subproblems
Let s and t be the strings of lengths n and m.
Define f(i,j) as max similarity score of matching the
initial i characters of s with initial j characters of t.
f(0,0) = 0
For i>0 or j>0
let sim(s
i
,t
j
) be similarity of char si and char tj.
Let d be cost of deletion.
f(i,j) = max {f(i-1,j) -d, f(i,j-1)-d, f(i-1,j-1)
+sim(s
i
,t
j
)}
Analysis: O(n*m)
Matrix Review
A matrix is 2-dim array of number, e.g. Int[2][3]
Matrices represent linear transformations
Every linear transformation is a matrix product of
scale changes and rotations
The determinant of a matrix is the change in volume of
the unit cube.
Matrices are used for data analysis and graphics
If M1 has r1 rows and c1 columns and
M2 has r2 rows and c2 columns, and r2= c1
then M3 = M1*M2 has r1 rows and c2 columns where
M3[i][j]= (row i of M1) * (col j of M2) (dot product)
In general M1*M2 =/= M2*M1 (not commutative)
but (M1*M2)*M3 = M1*(M2*M3) (associative)
Multiplying Many Matrices
Let Mi i=1..n be matrices with r
i-1
rows and r
i
columns
How many ways to multiply M1*M2M*n?
exponential (involves Catalan numbers)
But O(N
3
) dynamic programming algorithm!
Let mij = min cost of multiplying Mi*.*Mj
Subproblem Idea: M
i
* *M
k
* M
k+1
**M
j

if i = j then mij = 0
if j>i then mij = min{ i<=k <j: m
ik
+ m
k+1,j
+ r
i-1
r
k
r
j
}
Cost of this step is O(N).
To fill in all mij (i<j) yields O(N
3
) algorithm
Summary
Recursion begat divide and conquer and dynamic
programming
General Flow
Subdivide problem
Store subsolutions
Merge subsolutions
Beware of the costs:
cost of subdivision
cost of storage
cost of combining subsolution
Can be very effective

You might also like