Greedy Algorithms
Dr. Tathagata Ray
Professor, BITS Pilani, Hyderabad Campus
BITS Pilani rayt@hyderabad.bits-pilani.ac.in
Hyderabad Campus
Huffman Codes
• Suppose there is a file with the following content.
• “there was a brown crow he was very intelligent”
• In this text we have
Char a b c e g h i l n o r s t v w y
Freq 3 1 1 6 1 2 2 2 3 2 4 2 3 1 4 1
• We have to encode this file such that each character is
represented by a unique binary string called codewords.
BITS Pilani, Hyderabad Campus
Codewords
• If we use fixed length code, we require 4 bits to
represent this 16 characters.
• Thus we will encode the text using 38x4 = 152 bits.
• Can we do better?
BITS Pilani, Hyderabad Campus
Variable-length code
• Variable-length code can do considerably better than a
fixed length code.
• Give frequent characters short codewords and infrequent
character long codewords.
• For example
Char a b c d e f Total
bits
Freq 45000 13000 12000 16000 9000 5000 100000
Fixed 000 001 010 011 100 101 300000
len.
Var. 0 101 100 111 1101 1100 224,000
code
BITS Pilani, Hyderabad Campus
Prefix codes
• We consider here only codes in which no codeword is
also a prefix of some other codeword.
• These are called prefix codes.
• Prefix codes are simple to decode.
• For example 001011101 parses uniquely to 0.0.101.1101
i.e. aabe.
BITS Pilani, Hyderabad Campus
Decoding process
• Decoding needs a convenient representation for the
prefix code so that we can easily pick off the initial
codeword.
• A binary tree whose leaves are the given characters
provide such a representation.
BITS Pilani, Hyderabad Campus
100
0 1
a:45 55
0 1
25 30
0 1 0 1
c:12 b:13 14 d:16
0 1
f:5 e:9
BITS Pilani, Hyderabad Campus
Optimal code
• An optimal code for a file is always represented by a full
binary tree, in which every non leaf node has two
children.
• If C is the alphabet from which the characters are drawn
and all character frequencies are positive then the tree
for an optimal prefix code has exactly |C| leaves and
exactly |C-1| internal nodes.
BITS Pilani, Hyderabad Campus
Cost of a tree
• Given a tree T corresponding to a prefix code, the
number of bits required to encode a file is given by
• .
Depth of c in T
Frequency of character c
BITS Pilani, Hyderabad Campus
Constructing a Huffman code
HUFFMAN(C)
1. For to -1
2. allocate a new node .
3. EXTRACT-MIN()
4. EXTRACT-MIN()
5. INSERT(, )
6. return EXTRACT-MIN() //return root of the tree
BITS Pilani, Hyderabad Campus
Example
f:5 e:9 c:12 b:13 d:16 a:45
c:12 b:13 14 d:16 a:45
0 1
f:5 e:9
14 d:16 25 a:45
0 1 0 1
f:5 e:9 c:12 b:13
BITS Pilani, Hyderabad Campus
Example
14 d:16 25 a:45
0 1 0 1
f:5 e:9 c:12 b:13
30 a:45
25
0 1
0 1
c:12 b:13 14 d:16
0 1
f:5 e:9
BITS Pilani, Hyderabad Campus
Example
30 a:45
25
0 1
0 1
c:12 b:13 14 d:16
0 1
f:5 e:9
a:45 55 1
0
25 30
0 1
0 1
c:12 b:13 14 d:16
0 1
f:5 e:9
BITS Pilani, Hyderabad Campus
Example
a:45 55 1
0
25 30
0 1
0 1
c:12 b:13 14 d:16
0 1
100 f:5 e:9
0 1
a:45 55 1
0
25 30
0 1
0 1
c:12 b:13 14 d:16
0 1
f:5 e:9 BITS Pilani, Hyderabad Campus
Time Complexity
• If we use heap data structure to implement the minimum
priority queue Q then constructing the queue will take
O(n) time.
• Each extract minimum operation takes O(lg n) time and
the For loop runs for n-1 time, hence the total time taken
is O(n lg n).
BITS Pilani, Hyderabad Campus