Lecture 8: Channel Capacity, Continuous Random Variables: 1.1 Examples
Lecture 8: Channel Capacity, Continuous Random Variables: 1.1 Examples
Lecture 8: Channel Capacity, Continuous Random Variables: 1.1 Examples
1 Channel Capacity
Given a channel with inputs X and outputs Y with transition probability P (Y |X):
Define: Channel capacity C is the maximal rate of reliable communication (over memoryless channel
characterized by P (Y |X)).
Further, recall the following definition:
C (I) = max I(X; Y ).
PX
Theorem.
C = C (I) .
Proof: We will see this proof in the coming lectures.
• This theorem is important because C is challenging to optimize over, whereas C (I) is a tractable
optimization problem.
1.1 Examples
1
This can also be expressed in the form of additive noise.
To determine the channel capacity of a BSC, by the theorem we must maximize the mutual informa-
tion.
where h2 (p) is the binary entropy function. Taking X ∼ Ber( 21 ) achieves equality: I(X; Y ) = 1 − h2 (p)
(note: by symmetry X ∼ Ber( 21 ) implies Y ∼ Ber( 21 )). Thus, C = 1 − h2 (p).
Example II. Channel capacity of a Binary Erasure Channel (BEC).
Define alphabets X = {0, 1} and Y = {0, 1, e} where e stands for erasure. Any input symbol Xi has a
probability 1−α of being retained in the output sequence and a probability α of being erased. Schematically,
we have:
2
Examining the mutual information, we have that
(1 − α)H(X) ≤ 1 − α
Equality is achieved when H(X) = 1, that is X ∼ Ber( 21 ). Thus, the capacity of the BEC is C = 1−α.
Note that if we knew exactly which positions were going to be erased, we could communicate at this rate by
sending the input bits at exactly those positions (since the expected fraction of erasures is 1 − α). The fact
that C = 1 − α indicates that we can achieve this rate even when we do not know which positions are going
to be erased.
3
Definition: The differential entropy of a continuous random variable X with probability density function
fX is Z
h(X) = − fX (x) log fX (x) dx = E [− log fX (X)]
2.1 Exercises
Proof.
Z
fX,Y (x, y)
I(X; Y ) = fX,Y (x, y) log dxdy
fX (x)fY (y)
Z Z
fX,Y (x, y)
= fX,Y (x, y) log dxdy − fX,Y (x, y) log fY (y)dxdy
fX (x)
Z Z Z
= fX (x) fY |X (y|x) log fY |X (y|x)dy dx − fY (y) log fY (y)dy
Symmetrically the same can be shown for I(X; Y ) = H(X) − H(X|Y ). Also
Z
fX,Y (x, y)
I(X; Y ) = fX,Y (x, y) log dxdy
fX (x)fY (y)
Z Z Z
= fX,Y (x, y) log fX,Y (x, y)dxdy − fX,Y (x, y) log fX (x)dxdy − fX,Y (x, y) log fY (y)dxdy
h(X + c) = h(X).
4
and
Proof. We will combine them and prove that h(aX + b) = h(X) + log|a| whenever a 6= 0. Let Y = aX + b.
Then the pdf for Y can be written as
1 y−b
fY (y) = fX
|a| a
Thus,
Z
h(aX + b) = h(Y ) = − fY (y) log fY (y) dy
y−b y−b
Z
1 1
=− fX log fX dy
|a| a |a| a
2.2 Examples
• Notice that the differential entropy can be negative or positive depending on whether b − a is less than
or greater than 1. In practice, because of this property, differential entropy is usually used as means
to determine mutual information and does not have much operational significance by itself.
Example II: Differential entropy of a Gaussian random variable X ∼ N (0, σ 2 ).
1 2
1
• The pdf of a Gaussian random variable is f (x) = √2πσ 2
e− 2σ2 x .
The differential entropy is:
5
For simplicity, convert the base to e:
1
h(X) = E[− ln f (X)]
ln 2
1 1 2 1 2
= E ln 2πσ + 2 X
ln 2 2 2σ
1 1 2 1 2
= ln 2πσ + E X
ln 2 2 2σ 2
1 1 1
= ln 2πσ 2 + 2 σ 2
ln 2 2 2σ
1 1 1
= ln 2πeσ 2 = log 2πeσ 2
ln 2 2 2
• Per Exercise 3, differential entropy is invariant to constant shifts. Therefore this expression represents
the differential entropy of all Gaussian random variables regardless of mean.
• Claim: The Gaussian distribution has maximal differential entropy, i.e., if X ∼ fX with second moment
E[X 2 ] ≤ σ 2 and G ∼ N (0, σ 2 ) then h(X) ≤ h(G). Equality holds if and only if X ∼ N (0, σ 2 ).
Proof:
fX (X)
0 ≤ D(fX kG) = E log
fG (X)
1
= −h(X) + E log
fG (X)
X2
" #
1 2σ 2
D(fX kG) = −h(X) + E log√ +
2πσ 2 ln 2
Where C(P ) represents the ‘capacity’; the maximal rate of reliable communication when constrained
to power P . This maximization problem will be addressed in the next lecture.