[go: up one dir, main page]

0% found this document useful (0 votes)
27 views24 pages

Huffman Code

The document discusses Huffman coding, a method for data compression that assigns variable-length binary codes to source symbols based on their probabilities. It outlines the two-step algorithm for creating Huffman codes, necessary conditions for optimality, and the complexity of the algorithm. Additionally, it covers the pros and cons of Huffman coding, including its efficiency and challenges with varying symbol probabilities.

Uploaded by

Sk Fayad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views24 pages

Huffman Code

The document discusses Huffman coding, a method for data compression that assigns variable-length binary codes to source symbols based on their probabilities. It outlines the two-step algorithm for creating Huffman codes, necessary conditions for optimality, and the complexity of the algorithm. Additionally, it covers the pros and cons of Huffman coding, including its efficiency and challenges with varying symbol probabilities.

Uploaded by

Sk Fayad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Lecture 6:

Huffman Code

Thinh Nguyen
Oregon State University
Review
‡ Coding: Assigning binary codewords to
(blocks of) source symbols.

‡ Variable-length codes (VLC)

‡ Tree codes (prefix code) are instantaneous.


Example of VLC
Creating a Code: The Data Compression
Problem
‡ Assume a source with an alphabet A and known
symbol probabilities {pi}.

‡ Goal: Chose the codeword lengths as to minimize


the bitrate, i.e., the average number of bits per
symbol ∑ li * pi.

‡ Trivial solution: li = 0 * i.

‡ Restriction: We want an decodable code, so


∑ 2 i <=1 (Kraft inequality) must be valid.
-l

‡ Solution (at least in theory): li = – log pi


In practice…
‡ Use some nice algorithm to find the codes

„ Huffman coding
„ Tunnstall coding
„ Golomb coding
Huffman Average Code Length
‡ Input: Probabilities p1, p2, ... , pm for symbols a1, a2, ... ,am,
respectively.

‡ Output: A tree that minimizes the average number of bits


(bit rate) to code a symbol. That is, minimizes

− m
l = ∑ pi li
i =1

Where li is the length of codeword ai


Huffman Coding
‡ Two-step algorithm:
1. Iterate:
– Merge the least probable symbols.
– Sort.
2. Assign bits.

0 0 0
a 0.5 0.5 0.5 Merge
10 10 1
b 0.25 0.25 0.5 Sort
110 11
c 0.125 0.25 Assign
111
d 0.125 Get code
More Examples of Huffman Code
More Examples of Huffman Code
More Examples of Huffman Code
More Examples of Huffman Code
Average Huffman Code Length
Optimality of A Prefix Code
‡ Necessary conditions for an optimal variable-length binary
code:

1. Given any two letters aj and ak, if P(aj) >= P(ak) , then lj <= lk,
where lj is the length of the codeword aj.

2. The two least probable letters have codewords with the same
maximum length lm.
3. In the tree corresponding to the optimum code, there must be
two branches stemming from each intermediate node.

4. Suppose we change an intermediate node into a leaf node by


combining all the leaves descending from it into a composite
word of a reduced alphabet. Then if the original tree was
optimal for the original alphabet, the reduced tree is optimal for
the reduced alphabet.
Condition 1: If P(aj) >= P(ak) , then lj <= lk, where lj is the length
of the codeword aj.

‡ Easy to see why?


‡ Proof by contradiction:

„ Suppose a code X is optimal with P(aj) >= P(ak), but lj > lk

„ By simply exchanging aj and ak, we have a new code Y in which,


its average length = ∑ lipi is smaller than that of code X.
„ Hence, the contradition is reached. Thus, condition must hold
Condition 2: The two least probable letters have codewords with
the same maximum length lm.

‡ Easy to see why?

‡ Proof by contradiction:
„ Suppose we have an optimal code X in which, two codewords with lowest
probabilities ci and cj and that ci is longer than cj by k bits.

„ Then because this is a prefix code, cj cannot be the prefix to cj. So, we
can drop the last k bits of ci.

„ We also guarantee that by dropping the last k bits of ci, we still have a
decodable codeword. This is because ci and cj have the longest length
(least probable codes), hence they cannot be the prefix of any other
code.
„ By dropping the k bits of ci , we create a new code Y which has shorter
average length, hence contradiction is reached.
Condition 3: In the tree corresponding to the optimum code,
there must be two branches stemming from each intermediate
node..

‡ Easy to see why?

„ If there were any intermediate node with only one branch


coming from that node, we could remove it without affecting the
decodability of the code while reducing its average length.

0 1
0 1

0 c
0 c
a: 00
a: 000
0 1
0 1 b: 01
b: 001
a b c: 1
a b c: 1
Condition 4:
‡ Suppose we change an intermediate node into a leaf node by combining
all the leaves descending from it into a composite word of a reduced
alphabet. Then if the orginal tree was optimal for the original alphabet,
the reduced tree is optimal for the reduced alphabet.

0 1
0 1

0 1 d
0 1 d
e c
c
0 1
a: 000 e: 00
a b
b: 001 c: 01
c: 01 d:1
d:1
Huffman code satisfies all four conditions
‡ Lower probable symbols are at longer depth of the tree
(condition 1).

‡ Two lowest probable symbols have equal length (condition


2).

‡ Tree has two branches (condition 3).

‡ Code for the reduced alphabet needs to be optimum for the


code of the original alphabet to be optimum by construction
(condition 4)
Optimal Code Length (Huffman Code
Length)

H (S ) ≤ l < H (S ) + 1


l : Average length of an optimal code
m
H ( S ) = −∑ P(ai ) log 2 P(a) i : Entropy of the source
i =1

Proof:
Extended Huffman Code
A = {a1, a2 ,...am }, An = {a1a1...a1 , a1a1...a2 ,..., am am ...am }
1424 3
n times
m n symbols in the A n alphabet

H (S ) ≤ l < H (S ) + 1 / n

l : Average length of Huffman Code
H ( S ) : Entropy of the source

Proof: page 53 of the book


Huffman Coding: Pros and Cons
+ Fast implementations.

+ Error resilient: resynchronizes in ~ l2 steps.

- The code tree grows exponentially when the source is


extended.

- The symbol probabilities are built-in in the code.


Hard to use Huffman coding for extended sources / large
alphabets or when the symbol probabilities are varying by
time.
Huffman Coding of 16-bit CD-quality
audio
Filename Original file Entropy (bits) Compressed Compression
size (bytes) File Size Ratio
(bytes)
Mozart 939,862 12.8 725,420 1.30
symphony
Folk rock 402,442 13.8 349,300 1.15
(Cohn)

Huffman coding of the Differences

Filename Original file Entropy (bits) Compressed Compression


size (bytes) File Size Ratio
(bytes)
Mozart 939,862 9.7 569,792 1.65
symphony
Folk rock 402,442 10.4 261,590 1.54
(Cohn)
Complexity of Huffman Code
‡ O(n log(n))
„ Log(n) is the depth of the tree and n operation to
compare for the lowest probabilities.
Notes on Huffman Code
‡ Frequencies computed for each input
„ Must transmit the Huffman code or frequencies as well
as the compressed input.
„ Requires two passes

‡ Fixed Huffman tree designed from training data


„ Do not have to transmit the Huffman tree because it is
known to the decoder.
„ H.263 video coder

‡ 3. Adaptive Huffman code


„ One pass
„ Huffman tree changes as frequencies change

You might also like