Introduction to Information
theory
channel capacity and models
A.J. Han Vinck
University of Duisburg-Essen
May 2012
This lecture
Some models
Channel capacity
Shannon channel coding theorem
converse
some channel models
Input X
P(y|x)
output Y
transition probabilities
memoryless:
- output at time i depends only on input at time i
- input and output alphabet finite
Example: binary symmetric channel (BSC)
1-p
Error Source
E
X
Input
Y X E
Output
0
p
1
1-p
E is the binary error sequence s.t. P(1) = 1-P(0) = p
X is the binary information sequence
Y is the binary output sequence
from AWGN
to BSC
Homework: calculate the capacity as a function of A and 2
Other models
0
0 (light on)
X
p
1-p
1-e
e
Y
1 (light off)
e
P(X=0) = P0
1
P(X=0) = P0
Z-channel (optical)
(MAC)
Erasure channel
Erasure with errors
p
p
1-p-e
e
e
1-p-e
burst error model (Gilbert-Elliot)
Random error channel; outputs independent
P(0) = 1- P(1);
Error Source
Burst error channel; outputs dependent
P(0 | state = bad ) = P(1|state = bad ) = 1/2;
Error Source
P(0 | state = good ) = 1 - P(1|state = good ) = 0.999
State info: good or bad
Pgb
Pgg
good
bad
Pbg
transition probability
Pbb
channel capacity:
I(X;Y) = H(X) - H(X|Y) = H(Y) H(Y|X) (Shannon 1948)
H(X)
H(X|Y)
channel
notes:
max I(X; Y) capacity
capacity depends onP (input
x ) probabilities
because the transition probabilites are fixed
Practical communication system design
Code book
Code
word in
message
2k
receive
estimate
channel
with errors
decoder
Code book
n
There are 2k code words of length n
k is the number of information bits transmitted in n channel uses
Channel capacity
Definition:
The rate R of a code is the ratio k/n, where
k is the number of information bits transmitted in n channel uses
Shannon showed that: :
for R C
encoding methods exist
with decoding error probability
Encoding and decoding according to Shannon
Code: 2k binary codewords where p(0) = P(1) =
Channel errors: P(0 1) = P(1 0) = p
i.e. # error sequences 2nh(p)
Decoder: search around received sequence for codeword
with np differences
space of 2n binary sequences
decoding error probability
1.
For t errors: |t/n-p|>
0 for n
(law of large numbers)
2. > 1 code word in region
(codewords random)
P( 1) (2 k 1)
2 nh ( p)
2n
2 n (1 h ( p) R ) 2 n (CBSC R ) 0
for
and
k
R 1 h ( p)
n
n
channel capacity: the BSC
I(X;Y) = H(Y) H(Y|X)
1-p
0
X
the maximum of H(Y) = 1
since Y is binary
H(Y|X) = h(p)
1-p
= P(X=0)h(p) + P(X=1)h(p)
Conclusion: the capacity for the BSC CBSC = 1- h(p)
Homework: draw CBSC , what happens for p >
channel capacity: the BSC
Explain the behaviour!
Channel capacity
1.0
0.5
Bit error p
1.0
channel capacity: the Z-channel
Application in optical communications
0
0 (light on)
X
p
1-p
P(X=0) = P0
Y
1 (light off)
H(Y) = h(P0 +p(1- P0 ) )
H(Y|X) = (1 - P0 ) h(p)
For capacity,
maximize I(X;Y) over P0
channel capacity: the erasure channel
Application: cdma detection
0
1-e
e
I(X;Y) = H(X) H(X|Y)
Y
e
H(X) = h(P0 )
H(X|Y) = e h(P0)
1
P(X=0) = P0
Thus Cerasure = 1 e
(check!, draw and compare with BSC and Z)
Capacity and coding for the erasure channel
Code: 2k binary codewords where p(0) = P(1) =
Channel errors: P(0 E) = P(1 E) = e
Decoder: search around received sequence for codeword
with ne differences
space of 2n binary sequences
decoding error probability
1.
For t erasures: |t/n-e|>
0 for n
(law of large numbers)
2. > 1 candidate codeword
agrees in n(1-e) positions
after ne positiona are
erased (codewords random)
P( 1) (2 k 1)2 n (1e )
2 n (1e R ) 2 n ( Cerasure R ) 0
k
for R 1 e, n
n
Erasure with errors: calculate the capacity!
p
p
1-p-e
e
e
1-p-e
example
0
1/3
1
1/3
Consider the following example
For P(0) = P(2) = p, P(1) = 1-2p
H(Y) = h(1/3 2p/3) + (2/3 + 2p/3); H(Y|X) = (1-2p)log 23
Q: maximize H(Y) H(Y|X) as a function of p
Q: is this the capacity?
hint use the following: log2x = lnx / ln 2; d lnx / dx = 1/x
channel models: general diagram
P1|1
x1
x2
y1
P2|1
P1|2
:
:
:
xn
y2
P2|2
Input alphabet X = {x1, x2, , xn}
Output alphabet Y = {y1, y2, , ym}
Pj|i = PY|X(yj|xi)
:
:
:
In general:
Pm|n
ym
calculating capacity needs more
theory
The statistical behavior of the channel is completely defined by
the channel transition probabilities Pj|i = PY|X(yj|xi)
* clue:
I(X;Y)
is convex in the input probabilities
i.e. finding a maximum is simple
Channel capacity: converse
For R > C
the decoding error probability > 0
Pe
k/n
C
Converse:
For a discrete memory less channel
Xi
channel
Yi
i 1
i 1
i 1
i 1
I ( X ; Y ) H (Y ) H (Yi | X i ) H (Yi ) H (Yi | X i ) I ( X i ; Yi ) nC
n
Source generates one
out of 2k equiprobable
messages
source
encoder
Xn
channel
Yn
decoder
Let Pe = probability that m m
converse
R := k/n for any code
k = H(M) = I(M;Yn)+H(M|Yn)
I(Xn;Yn) +H(M|Yn M)
Xn is a function of M
I(Xn;Yn) +H(M|M)
M is a function of Yn
I(Xn;Yn) + h(Pe) + Pe log2k
Fano inequality
nC + 1 + k Pe
Pe 1 C/R - 1/nR
Hence:
for large n, and R > C,
the probability of error Pe > 0
Appendix:
Assume:
binary sequence P(0) = 1 P(1) = 1-p
t is the # of 1s in the sequence
Then n , > 0
Weak law of large numbers
Probability ( |t/n p| > ) 0
i.e. we expect with high probability pn 1s
Appendix:
Consequence:
1.
2.
3.
n(p- ) < t < n(p + ) with high probability
n ( p )
n ( p )
n
n
2n
2n 2 nh ( p)
t
pn
1
lim n log 2 2n
n
n
h (p)
pn
h (p) p log 2 p (1 p) log 2 (1 p)
Homework: prove the approximation using ln N! ~ N lnN for N large.
Or use the Stirling approximation:
N ! 2 N N e
N N
Binary Entropy:
h(p) = -plog2p (1-p) log2 (1-p)
Note:
0.9
h(p) = h(1-p)
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Capacity for Additive White Gaussian Noise
Noise
Input X
Output Y
Cap : sup [H(Y) H( Noise)]
p( x )
x 2 S / 2 W
W is (single sided) bandwidth
Input X is Gaussian with power spectral density (psd) S/2W;
Noise is Gaussian with psd = 2noise
Output Y is Gaussian with psd = y2 = S/2W + 2noise
For Gaussian Channels: y2 =x2 +noise2
Noise
X
Cap 12 log 2 (2e( 2x 2noise )) 12 log 2 ( 2e 2noise ) bits / trans.
12 log 2 (
2noise 2x
Cap W log 2 (
p(z)
1
22z
z2 / 2 2z
2
noise
) bits / trans.
2noise S / 2W
2noise
) bits / sec .
; H(Z) 21 log2 (2e2z ) bits
Middleton type of burst channel model
0
Transition
probability P(0)
channel 1
channel 2
Select channel k
with probability
Q(k)
channel k has
transition
probability p(k)
Fritzman model:
multiple states G and only one state B
Closer to an actual real-world channel
G1
1-p
Gn
Error probability 0
B
Error probability h
Interleaving: from bursty to random
bursty
Message
interleaver
channel
interleaver
encoder
-1
message
decoder
random error
Note: interleaving brings encoding and decoding delay
Homework: compare the block and convolutional interleaving w.r.t. delay
Interleaving: block
Channel models are difficult to derive:
- burst definition ?
- random and burst errors ?
for practical reasons: convert burst into random error
read in row wise
transmit
column wise
De-Interleaving: block
read in column
wise
this row contains 1 error
read out
row wise
Interleaving: convolutional
input sequence 0
input sequence 1
delay of b elements
input sequence m-1
Example: b = 5, m = 3
delay of (m-1)b elements
in
out