Huffman Code
Dr. Raghavendra Mishra
Huffman Codes
• Most character code systems (ASCII, unicode) use fixed
length encoding
• If frequency data is available and there is a wide variety
of frequencies, variable length encoding can save 20% to
90% space
• Which characters should we assign shorter codes; which
characters will have longer codes?
• At first it is not obvious how decoding will
happen, but this is possible if we use prefix codes
Prefix Codes
• No encoding of a character can be the prefix of the
longer encoding of another character, for example,
we could not encode t as 01 and x as 01101 since 01
is a prefix of 01101
• By using a binary tree representation we will
generate prefix codes provided all letters are leaves
Some Properties
• Prefix codes allow easy decoding
– Given a: 0, b: 101, c: 100, d: 111, e: 1101, f: 1100
– Decode 001011101 going left to right, 0|01011101,
a|0|1011101, a|a|101|1101, a|a|b|1101, a|a|b|e
• An optimal code must be a full binary tree (a tree
where every internal node has two children)
• For C leaves there are C-1 internal nodes
• The number of bits to encode a file is
where f(c) is the freq of c, dT(c) is the tree depth of
c, which corresponds to the code length of c
Building the Encoding Tree
The Algorithm
• An appropriate data structure is a binary min-heap
• Rebuilding the heap is lg n and n-1 extractions are
made, so the complexity is O( n lg n )
• The encoding is NOT unique, other encoding may
work just as well, but none will work better
Correctness of Huffman’s Algorithm
• The following results are presented without proof