CS2223: Algorithms Greedy Algorithms 1 Greedy Algorithm Fundamentals
CS2223: Algorithms Greedy Algorithms 1 Greedy Algorithm Fundamentals
Greedy Algorithms
2 Activity-selection Problem
Consider the problem of scheduling several competing activities that require
exclusive use of a common resource, with a goal of selecting a maximum-
size set of mutually compatible activities. Suppose we have a set S =
{a1 , a2 , . . . , an } of n proposed activities that wish to use a resource, such
as a lecture hall, which can be used by only one activity at a time. Each
activity ai has a start time si and a finish time fi , where 0 ≤ si < fi < ∞.
If selected, activity ai takes place during the half-open time interval [si , fi ).
Activities ai and aj are compatible if the intervals [si , fi ) and [sj , fj ) do not
overlap (i.e., ai and aj are compatible if si ≥ fj or sj ?fi ). The activity-
selection problem is to select a maximum-size subset of mutually compatible
activities.
This problem has the optimal substructure property. Suppose we de-
fine Sij as the set of activities that start after item i’s finish time, and finish
before item j’s start time. The overall optimal solution is given by optimal
1
solution among the activities in Skl , where ak has the earliest start time
among all activities, and al has the latest finish time among all activities.
Suppose activity ap ∈ Sij is in optimal solution for Sij . Then the optimal
solution for Sij includes optimal solution for Sip , optimal solution for Spj and
activity ap .
But the problem also exhibits the greedy-choice property. We can
make a choice as to what activity must be in the optimum solution and thus
solve only one subproblem.
Suppose activity am is the activity with the earliest finish time among
activities in Sij . There exists an optimum solution which includes am .
Proof: Order the activites in the optimum solution for Sij in the increasing
order of their finish times. Suppose the first activity is ah %= am . Replace ah
with am , and this is another valid solution that is also optimum. (Because
ah ’s finish time must be greater than the finish time of am , replacing ah with
am will not impact other activities in the optimum solution).
Therefore we can get the algorithm for activity-selection problem as fol-
lows.
2. Scan the above sorted list once, the current item is in the optimum
solution if it is compatible with the other activities already chosen.
2
solution that includes as much as possible of the first item in the above
ordered list.
Proof: Suppose item x has the largest value/weight ratio, and the opti-
mum solution does not include as much as possible of item x and took only
fx (fraction). Suppose also, we have taken some amount of item y, say fy
(fraction). Consider w = min {(1 − fx ) ∗ wx , fy ∗ wy } (in other words, w is
the minimum of the weight of x not taken, or the weight of y taken. If we
take fy − w/wy fraction of y and fx + w/wx fraction of x, we get a solution
at least as good as the optimum solution.
We can therefore do the above till we have taken as much as possible of
item x.
The above greedy-choice property yields the following algorithm. Sort
the items in the non-increasing order of value/weight. Traverse this sorted
list, taking as much as possible of each item.
Complexity is O(n log n) for sorting and O(n) for the linear search.
Therefore O(n log n) is the overall complexity.
4 Huffman codes
Suppose you want to transmit a string of characters where the alphabet is
from a string. The problem is what binary codes to allot to each character
so that the message size is minimized.
For example, consider the alphabet to consist of four characters, say
a, b, c, d, and your string is composed of these characters – say a string could
be aaabccbbbddaabb
A straight forward solution is for 4 characters, use 2 bits to represent each
character. Say a - 00, b - 01, c - 10, d - 11. So for a 100,000 long string, we
need 200,000 bits.
However, suppose the frequency of appearance of these characters in the
string vary – say a appears 60%, b appears 10%, c appears 25% and d appears
5%. Consider a code: a - 1, b - 001, c - 01, d - 000; number of bits needed
= 60000 + 30000 + 50000 + 15000 = 155,000 bits.
Let us come up with an algorithm to find out these codes, all frequencies
are > 0. We will consider only prefix-free codes – no code is a prefix of
another code. This helps in easier decoding.
We can depict the prefix codes as a binary tree.
3
Property: The binary tree must be full – in other words, every non-leaf
node will have exactly 2 children.
Proof: Suppose the binary tree is not full, and a non-leaf node has only
one child. Consider this non-leaf node. We can decrease the code-length for
all the descendants of this node by removing the ”edge” between this node
and its only child.
Greedy-Choice Property: Consider two characters (say x and y) with
the lowest frequency. There exists an optimum solution where x and y are
at the maximum depth, and are ”siblings” of the same non-leaf node.
Proof: Consider an optimal solution where a and b are siblings of the
node and have the maximum depth. Without loss of generality let fx ≤ fa
(fx denotes frequency of appearance of x), and fy ≤ fb . Swap a with x and
b with y and we get another optimum solution.
This gives the algorithm as follows – Consider two characters with the
lowest frequency. Construct a tree of height 1 with these 2 characters as
leaves. Now replace the two characters with a ”character block” whose fre-
quency is the sum of these two frequencies. Repeat the same technique till
we have one ”character block”.
We can use a min-heap data structure for the above. In which case, we
can obtain the Huffman codes in O(n log n).