[go: up one dir, main page]

0% found this document useful (0 votes)
19 views40 pages

Lec03 - Entropy

Uploaded by

huycursor1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views40 pages

Lec03 - Entropy

Uploaded by

huycursor1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Information Theory

Entropy
TS. Lê Nguyên Khôi
Trường Đại học Công nghệ, ĐHQGHN
Content
 Entropy
 Joint entropy and conditional entropy
 Relative entropy and mutual information
 Chain rules for entropy, relative entropy, and
mutual information
 Jensen inequality and consequences

Lê Nguyên Khôi Information Theory 1


Information Theory
 Information theory studies the quantification,
representation, and communication of
information
 What is information?
 Oxford English Dictionary says:
 facts provided or learned about something or
someone
 what is conveyed or represented by a particular
arrangement or sequence of things

Lê Nguyên Khôi Information Theory 2


What is Information
 Websters says:
 the communication or reception of knowledge or intelligence
 knowledge obtained from investigation, study, or instruction.
 the attribute inherent in and communicated by one of two or
more alternative sequences or arrangements of something that
produce specific effects
 a signal or character representing data
 something (as a message, experimental data, or a picture)
which justifies change in a construct (as a plan or theory) that
represents physical or mental experience or another construct
 a quantitative measure of the content of information;
specifically: a numerical quantity that measures the
uncertainty in the outcome of an experiment to be performed

Lê Nguyên Khôi Information Theory 3


What is Information
 Wikipedia says:
 Information in its most restricted technical sense is a message
(utterance or expression) or collection of messages in an
ordered sequence that consists of symbols, or it is the meaning
that can be interpreted from such a message or collection of
messages. Information can be recorded or transmitted. It can be
recorded as signs, or conveyed as signals. Information is any
kind of event that affects the state of a dynamic system. The
concept has numerous other meanings in different contexts.
Moreover, the concept of information is closely related to
notions of constraint, communication, control, data, form,
instruction, knowledge, meaning, mental stimulus, pattern,
perception, representation, and especially entropy

Lê Nguyên Khôi Information Theory 4


Information Theory
 Information theory is a formal mathematical
theory, uses probability and statistics to make
things mathematically precise
 Key is communication, relaying a message
from someone to someone else
 I’m communicating to you now, hopefully
 How much am I communicating to you
 Can we quantify it

Lê Nguyên Khôi Information Theory 5


Notation
 Vector & Matrix:
 𝑣 = vector, 𝑉 = matrix
 Random variable (r.v.) 𝑿
 𝑥 = specific value
 𝒳 = set of 𝑥 (alphabet)
 Range:
 𝑎: 𝑏 demotes the range 𝑎, 𝑎 + 1, … , 𝑏
 Cardinality (number of elementary outcomes):
 𝒳 = the number of elements in set 𝒳
 Probability vector of a r.v. 𝑿: 𝒑𝑿

Lê Nguyên Khôi Information Theory 6


Discrete Random Variable
 A random variable 𝑿 takes a value 𝑥 from the
alphabet 𝑿 with probability 𝑝𝑿 𝑥 .
 The vector of probabilities 𝒑𝑿 .
 Example:
 Dice: 𝒳 = 1; 2; 3; 4; 5; 6
1 1 1 1 1 1
𝒑𝑿 = [ ; ; ; ; ; ]
6 6 6 6 6 6
 English text: 𝒳 = [a; b; … ; x; y; z; < space >],
𝒑𝑿 = 𝟎. 𝟎𝟓𝟖; 𝟎. 𝟎𝟏𝟑; … ; 𝟎. 𝟎𝟏𝟔; 𝟎. 𝟎𝟎𝟕; 𝟎. 𝟏𝟗𝟑

Lê Nguyên Khôi Information Theory 7


Expected Value
 If 𝑔 𝑥 is a function defined on 𝑿 then
𝐸𝑿 𝑔 𝑥 = 𝑝 𝑥 𝑔(𝑥)
𝑥∈𝒳
 Example:
 Dice: 𝒳 = 1; 2; 3; 4; 5; 6
1 1 1 1 1 1
𝒑𝑿 = [ ; ; ; ; ; ]
6 6 6 6 6 6
𝐸 𝑿 = 3.5 = 𝜇 ← This is mean of 𝑿
2 2
 𝐸 𝑿 = 15.17 = 𝜎 + 𝜇
2 ← This is variance of 𝑿
 𝐸 sin 0.1𝑥 = 0.388

 𝐸 − log 2 𝑝 𝑿 = 2.58 ← This is entropy of 𝑿

Lê Nguyên Khôi Information Theory 8


Shannon Content Information
 The Shannon Information Content (SIC) of an
outcome with probability 𝑝 is − log 2 𝑝
 Shannon’s contribution – a statistical view
 Messages, noisy channels are random
 Pre-Shannon era: deterministic approach
 Example 1: coin tossing
1 1
 𝑿 = [Head; Tail], 𝒑𝑿 = [ ; ], SIC = [1; 1] bits
2 2
 Example 2: Is today your birthday
364 1
 𝑿 = [No; Yes], 𝒑𝑿 = [ ; ], SIC = [0.004; 8.512]bits
365 365
Unlikely outcomes give more information

Lê Nguyên Khôi Information Theory 9


Entropy – Definition
 Entropy is to measure the uncertainty of a
random variable
 Entropy of r.v. 𝑿 is:
 the average SIC of r.v. 𝑿
 the average uncertainty of r.v. 𝑿
 Formula:
𝐻 𝑿 = 𝐸 − log 2 𝑝 𝑿 =− 𝑝(𝑥) log 2 𝑝(𝑥)
𝑥∈𝒳
where 𝑝 𝑥 = 𝑃 𝑋 = 𝑥 , 𝑥 ∈ 𝜒

Lê Nguyên Khôi Information Theory 10


Entropy – Properties
 Formula:
𝐻 𝑿 = 𝐸 − log 2 𝑝 𝑿 =− 𝑝(𝑥) log 2 𝑝(𝑥)
𝑥∈𝒳

 Entropy is always non-negative


 Entropy 𝐻 𝑿 depends only on the probability

 use log(𝑥) = log 2 (𝑥) and measure 𝐻 𝑿 in bits


 If use log 𝑒 (), it measures in nats
 1 nat = log 2 (𝑒) bits = 1.44 bits

Lê Nguyên Khôi Information Theory 11


Entropy - Example
 Bernoulli random variable:
𝑿 = [0; 1], 𝒑𝑿 = [1 − p; p]
𝐻 𝑿 = − 1 − 𝑝 log 1 − 𝑝 − 𝑝 log 𝑝 ≝ 𝐻(𝑝)
 Given a r.v 𝑿:
 𝑿 = R; G; B; W
1 1 1 1
 𝒑𝑿 = [ ; ; ; ]
2 4 8 8
 Compute SIC of 𝑿
 − log 2 𝑝 𝑥

 and 𝐻 𝑿 - entropy of 𝑿
𝐸 − log 2 𝑝 𝑿

Lê Nguyên Khôi Information Theory 12


Interestingness
 Picking up the phone, we never know for sure who call
 A friend, uncle, aunt, …
 It is quite an interesting experiment to do
 Turning the key in my motorbike, it starts
 It doesn’t surprise why my motorbike start since it is well-
maintained and has a great mechanic
 It is not a very interesting thing to do
 Most of our day consisting of rather un-interesting
experiments
 Taking a step, greeting a friend, going to school, …
 If these events were suddenly to become interesting, we would
be in trouble

Lê Nguyên Khôi Information Theory 13


Entropy In A Nutshell

Low Entropy High Entropy


… the value (locations of … the value (locations of soup)
soup) sampled entirely unpredictable almost uniformly
from within the soup bowl sampled on the dining table

Lê Nguyên Khôi Information Theory 14


Entropy of Buckets

 Three buckets, with 8 letters each

Lê Nguyên Khôi Information Theory 15


Joint Entropy

 The entropy 𝐻 𝑿, 𝒀 of a pair of discrete


random variable 𝑿, 𝒀 with a joint distribution
𝑝 𝑿, 𝒀 is defined as:

𝐻 𝑿, 𝒀 = 𝐸 − log 2 𝑝 𝑿, 𝒀
=− 𝑝 𝑥, 𝑦 log 2 𝑝 𝑥, 𝑦
𝑥∈𝒳 𝑦∈𝒴

𝐻 𝑿, 𝒀 = −𝐸 log 𝑝 𝑿, 𝒀

Lê Nguyên Khôi Information Theory 16


Joint Entropy

 Compute 𝐻 𝑿, 𝒀

𝑝 𝑿, 𝒀 𝒀=0 𝒀=1
𝑿=0 1 1
2 4
𝑿=1 0 1
4

𝐻 𝑿, 𝒀 = 𝐸 − log 𝑝 𝑿, 𝒀
= − 1 2 log 1 2 − 1 4 log 1 4
−𝟎 𝐥𝐨𝐠 𝟎 − 1 4 log 1 4
= 1.5 bits
Lê Nguyên Khôi Information Theory 17
Conditional Entropy

 The entropy 𝐻 𝒀 𝑿 of a pair of discrete


random variable 𝑿, 𝒀 with a conditional
distribution 𝑝 𝒀 𝑿 is defined as:
𝐻 𝒀 𝑿 = −𝐸 log 𝑝 𝒀 𝑿
= 𝑝 𝑥 𝐻 𝒀𝑿=𝑥
𝑥∈𝒳

=− 𝑝 𝑥 𝑝 𝑦 𝑥 log 𝑝 𝑦 𝑥
𝑥∈𝒳 𝑦∈𝒴

=− 𝑝 𝑥, 𝑦 𝑙𝑜𝑔 𝑝 𝑦 𝑥
𝑥∈𝒳 𝑦∈𝒴

Lê Nguyên Khôi Information Theory 18


Conditional Entropy – Example
 Compute 𝐻 𝒀 𝑿
𝐻 𝒀𝑿 = 𝑝 𝑥 𝐻 𝒀𝑿=𝑥
𝑥∈𝒳

𝑝 𝑿, 𝒀 𝒀=0 𝒀=1 𝑝 𝑿
𝑿=0 1 1 3
2 4 4
𝑿=1 0 1 1
4 4
𝑝 𝒀𝑿 𝒀=0 𝒀=1 𝐻 𝒀𝑿=𝑥 𝑝 𝑿
𝑿=0 2 1 𝐻(2 3 , 1 3) 3
3 3 4
𝑿=1 0 1 𝐻 0,1 1
4
Lê Nguyên Khôi Information Theory 19
Conditional Entropy – Example
 Take a weighted average of the entropy of
each row using 𝑝 𝑥 as weight
𝑝 𝒀𝑿 𝒀=0 𝒀=1 𝐻 𝒀𝑿=𝑥 𝑝 𝑿
𝑿=0 2 1 𝐻(2 3 , 1 3) 3
3 3 4
𝑿=1 0 1 𝐻 0,1 1
4

𝐻 𝒀𝑿 = 𝑝 𝑥 𝐻 𝒀𝑿=𝑥
𝑥∈𝒳
= 3 4 × 𝐻 2 3 , 1 3 + 1 4 × 𝐻 0,1
= 0.689 bits
Lê Nguyên Khôi Information Theory 20
Conditional Entropy – Example
 Addition Entropy
𝑝 𝑿, 𝒀 𝒀=0 𝒀=1 𝑝 𝑿
𝑿=0 1 1 3
2 4 4
𝑿=1 0 1 1
4 4
𝑝 𝑿, 𝒀
𝐻 𝒀 𝑿 = 𝐸 − log 𝑝 𝒀 𝑿 = 𝐸 − log
𝑝 𝑿
= 𝐸 − log 𝑝 𝑿, 𝒀 − 𝐸 − log 𝑝 𝑿
= 𝐻 𝑿, 𝒀 − 𝐻 𝑿
= 𝐻 1 2 , 1 4 , 0, 1 4 − 𝐻 3 4 , 1 4
= 0.689 bits
Lê Nguyên Khôi Information Theory 21
Conditional Entropy

 𝐻 𝒀 𝑿 is the average additional information in


𝒀 when you know 𝑿

𝐻 𝑿, 𝒀

𝐻 𝑿 𝐻 𝒀𝑿

Lê Nguyên Khôi Information Theory 22


Chain Rule

 Probability
𝑝 𝑿, 𝒀 = 𝑝 𝒀 𝑿 𝑝 𝑿
 Joint Entropy & Conditional Entropy:
𝐻 𝑿, 𝒀 = 𝐻 𝑿 + 𝐻 𝒀 𝑿
 The log in the definition of entropy converts
products of probability into sums of entropy
 Corollary
𝐻 𝑿, 𝒀 𝒁 = 𝐻 𝑿 𝒁 + 𝐻 𝒀 𝑿, 𝒁

Lê Nguyên Khôi Information Theory 23


Exercise

 Given a joint probability distribution 𝑝 𝑿, 𝒀


𝑝 𝑿, 𝒀 𝒀=1 𝒀=2 𝒀=3 𝒀=4
𝑿=1 1 1 1 1
8 16 32 32
𝑿=2 1 1 1 1
16 8 32 32
𝑿=3 1 1 1 1
16 16 16 16
𝑿=4 1 0 0 0
4

 Compute: 𝐻 𝑿 , 𝐻 𝒀 , 𝐻 𝑿 𝒀
 Compute: 𝐻 𝑿, 𝒀 , 𝐻 𝒀 𝑿
Lê Nguyên Khôi Information Theory 24
Example
1 1 1 1
 The marginal distribution of 𝑿 is [ ; ; ; ]
4 4 4 4
1 1 1 1
 The marginal distribution of 𝒀 is [ ; ; ; ]
2 4 8 8
 It is easy to achieve 𝐻 𝑿 = 𝟐 and 𝐻 𝒀 = 𝟕/𝟒
𝐻 𝑿 𝒀 = 𝐻 𝑿, 𝒀 − 𝐻(𝒀)
4
4
𝐻 𝑿, 𝒀 = − 𝑝 𝑥, 𝑦 log 𝑝 𝑥, 𝑦
𝑦=1
𝑥=1
1 1 1 1 1 1 1 1
= − log + −2 log + −6 log + −4 log
4 4 8 8 16 16 32 32
1 1 1 1 1 1 1 1
= − log − 2 log − 6 log − 4 log
4 4 8 8 16 16 32 32
1 3 3 5 4 + 6 + 12 + 5 27
= + + + = =
2 4 2 8 8 8
27 7 13
𝐻 𝑿𝒀 = − =
8 4 8
Lê Nguyên Khôi Information Theory 25
Chain Rule – Proof
𝐻 𝑿, 𝒀 = 𝐻 𝑿 + 𝐻 𝒀 𝑿

𝐻 𝑿, 𝒀 = − 𝑝 𝑥, 𝑦 log 𝑝 𝑥, 𝑦
𝑥∈𝒳 𝑦∈𝒴

=− 𝑝 𝑥, 𝑦 log 𝑝 𝑥 𝑝 𝑦 𝑥
𝑥∈𝒳 𝑦∈𝒴

=− 𝑝 𝑥, 𝑦 log 𝑝 𝑥 − 𝑝 𝑥, 𝑦 log 𝑝 𝑦 𝑥
𝑥∈𝒳 𝑦∈𝒴 𝑥∈𝒳 𝑦∈𝒴

=− 𝑝 𝑥 log 𝑝 𝑥 − 𝑝 𝑥, 𝑦 log 𝑝 𝑦 𝑥
𝑥∈𝒳 𝑥∈𝒳 𝑦∈𝒴
=𝐻 𝑿 +𝐻 𝒀 𝑿

Lê Nguyên Khôi Information Theory 26


Chain Rule

 Probability
𝑝 𝑿, 𝒀, 𝒁 = 𝑝 𝒁 𝑿, 𝒀 𝑝 𝒀 𝑿 𝑝(𝑿)
 Entropy
𝐻 𝑿, 𝒀, 𝒁 = 𝐻 𝒁 𝑿, 𝒀 + 𝐻 𝒀 𝑿 + 𝐻 𝑿
𝑛

𝐻 𝑿1:𝑛 = 𝐻 𝑿𝑖 𝑿1:𝑖−1
𝑖=1
 The log in the definition of entropy converts
products of probability into sums of entropy

Lê Nguyên Khôi Information Theory 27


Relative Entropy
 Relative entropy measures how one probability
distribution is different from a second probability
distribution
 The relative entropy or Kullback–Leibler distance
between two probability vectors 𝒑(𝑿) and 𝒒(𝑿) is
defined as
𝑝 𝑥 𝑝 𝑿
𝐷(𝒑 𝒒) = 𝑝 𝑥 log = 𝐸𝒑 log
𝑞 𝑥 𝑞 𝑿
𝑥∈𝒳
 Remark:
 It is not a true distance between distributions since it is not
symmetric and does not satisfy the triangle inequality.
 Nonetheless, it is often useful to think of relative entropy as a
“distance” between distributions.
Lê Nguyên Khôi Information Theory 28
Relative Entropy – Example

 Let 𝑿 = [0; 1] and consider two distributions 𝒑


and 𝒒 on 𝑿. Let 𝒑 0 = 1 − 𝑟, 𝒑 1 = 𝑟, and let
𝒒 0 = 1 − 𝑠, 𝒒 1 = 𝑠. Then:
1−𝑟 𝑟
𝐷(𝑝 𝑞 = 1 − 𝑟 log + 𝑟log
1−𝑠 𝑠
1−𝑠 𝑠
𝐷(𝑞 𝑝 = 1 − 𝑠 log + 𝑠log
1−𝑟 𝑟
 If 𝑟 = 𝑠 then 𝐷(𝑝 𝑞 = 𝐷(𝑞 𝑝 = 0. If 𝑟 = 1/2 and
1
𝑠 = 1/4 then 𝐷(𝑝 𝑞 = 1 − log3 = 0.2075 bits,
2
3
whereas 𝐷(𝑞 𝑝 = − 1 = 0.1887 bits
4log3

Lê Nguyên Khôi Information Theory 29


Mutual Information
 Mutual information of two random variables is a
measure of the mutual dependence between the two
variables. It quantifies the amount of information
obtained about one random variable through observing
the other random variable
 Consider two random variables 𝑿 and 𝒀 with a joint
probability vector 𝒑(𝑿, 𝒀) and marginal probability
vectors 𝒑(𝑿) and 𝒑(𝒀). The mutual information 𝐼 𝑿; 𝒀
is the relative entropy between the joint distribution and
the product distribution 𝒑(𝑿)𝒑(𝒀)
𝑝(𝑥, 𝑦)
𝐼 𝑋; 𝑌 = 𝑝 𝑥, 𝑦 𝑙𝑜𝑔 = 𝐷(𝑝(𝑥, 𝑦) 𝑝 𝑥 𝑝 𝑦
𝑝 𝑥 𝑝(𝑦)
𝑥 𝑦
𝑝(𝑋, 𝑌)
= 𝐸𝑝(𝑥,𝑦) 𝑙𝑜𝑔
𝑝 𝑋 𝑝(𝑌)
Lê Nguyên Khôi Information Theory 30
Mutual Information & Entropy

Lê Nguyên Khôi Information Theory 31


Mutual Information
 The average amount of information that you
get about 𝑿 from observing 𝒀
 Or the reduction in the uncertainty of 𝑿 due to
the knowledge of 𝒀
𝐼 𝑿; 𝒀 = 𝐻 𝑿 − 𝐻 𝑿 𝒀 = 𝐻 𝑿 + 𝐻 𝒀 − 𝐻(𝑿, 𝒀)
Information in 𝑿 Information in 𝑿 when you already know 𝒀
𝐻 𝑿, 𝒀
 Mutual information is symmetric
𝐼 𝑿; 𝒀 = 𝐼 𝒀; 𝑿
Use “;” to avoid ambiguities 𝐻 𝑿 𝒀 𝐼 𝑿; 𝒀 𝐻 𝒀 𝑿

between 𝐼(𝑿; 𝒀, 𝒁) and 𝐼(𝑿, 𝒀; 𝒁) 𝐻 𝑿 𝐻 𝒀

Lê Nguyên Khôi Information Theory 32


Mutual Information - Example
𝑝 𝑿, 𝒀 𝒀=0 𝒀=1
 Guess value of 𝒚 1 1
𝑿=0
 50% chance of being correct 2 4
𝑿=1 0 1
 what if you know 𝑥? 4
3 2
 if 𝑥 = 0 (𝑝 = ) choose 𝑦 = 0, correct
4 3
1 1
 if 𝑥 = 1 (𝑝 = ) choose 𝑦 = 1, correct
4 1
3 2 1 1 3
 Overall × + × = = 75% correct probability
4 3 4 1 4
 Best guess: choose 𝑦 = 𝑥

Lê Nguyên Khôi Information Theory 33


Mutual Information - Example
𝑝 𝑿, 𝒀 𝒀=0 𝒀=1
 Compute 𝐼 𝑿; 𝒀 𝑿=0 1 1
2 4
𝑿=1 0 1
4
𝐼 𝑿; 𝒀 = 𝐻 𝑿 − 𝐻 𝑿 𝒀
= 𝐻 𝑿 + 𝐻 𝒀 − 𝐻(𝑿, 𝒀)
𝐻 𝑿, 𝒀 = 1,5

𝐻 𝑿 = 0,811 𝐻 𝒀 =1 𝐻 𝑿𝒀 𝐼 𝑿; 𝒀 𝐻 𝒀𝑿
𝐻 𝑿, 𝒀 = 1,5 𝐼 𝑿; 𝒀 = 0,311 = 0,5 = 0,311 = 0,689

𝐻 𝑿 = 0,811 𝐻 𝒀 =1

Lê Nguyên Khôi Information Theory 34


Exercise
 Given a joint probability distribution 𝑝 𝑿, 𝒀
𝑝 𝑿, 𝒀 𝒀=1 𝒀=2 𝒀=3 𝒀=4
𝑿=1 1 1 1 1
8 16 32 32
𝑿=2 1 1 1 1
16 8 32 32
𝑿=3 1 1 1 1
16 16 16 16
𝑿=4 1 0 0 0
4
 Compute
 𝐻 𝑿 ,𝐻 𝒀
 𝐻 𝑿𝒀 ,𝐻 𝒀𝑿
 𝐻 𝑿, 𝒀 , 𝐼 𝑿; 𝒀
Lê Nguyên Khôi Information Theory 35
Exercise 2.4 p.44
 (a): chain rule (joint entropy & conditional entropy)
 (b): 𝐻 𝑔(𝑿) 𝑿 = 0, as know 𝑿 imply 𝑔(𝑿)
 (c): chain rule (joint entropy & conditional entropy)
 (d): 𝐻 𝑿 𝑔(𝑿) ≥ 0, property: entropy non-negative

Lê Nguyên Khôi Information Theory 36


Exercise 2.5 p.44
 Nếu phương trình 𝑦 = 𝑔(𝑥) có cặp nghiệm,
cặp nghiệm 𝑥0 , 𝑦0 đó có xác suất dương
 Giả sử tồn tại 𝑥0 và 2 giá trị khác nhau 𝑦1 , 𝑦2 :
𝑝 𝑥0 , 𝑦1 > 0 và 𝑝 𝑥0 , 𝑦2 > 0
 Khi đó:
𝑝 𝑥0 ≥ 𝑝 𝑥0 , 𝑦1 + 𝑝 𝑥0 , 𝑦2 > 0
𝑝 𝑦1 𝑥0 và 𝑝 𝑦2 𝑥0 không bằng 0 hoặc 1
𝐻 𝒀𝑿 =− 𝑝 𝑥 𝑝 𝑦 𝑥 log 𝑝 𝑦 𝑥
𝑥 𝑦
≥ 𝑝 𝑥0 −𝑝 𝑦1 𝑥0 log 𝑝 𝑦1 𝑥0 − 𝑝 𝑦2 𝑥0 log 𝑝 𝑦2 𝑥0
>0 >0

Lê Nguyên Khôi Information Theory 37


Exercise 2.14a p.46

𝒁=𝑿+𝒀
therefore 𝑝 𝒁 = 𝑧 𝑿 = 𝑥 = 𝑝 𝒀 = 𝑧 − 𝑥 𝑿 = 𝑥

𝐻 𝒁𝑿 = 𝑝(𝑥)𝐻 𝒁 𝑿 = 𝑥

=− 𝑝 𝑥 𝑝 𝒁 = 𝑧 𝑿 = 𝑥 log 𝑝 𝒁 = 𝑧 𝑿 = 𝑥
𝑥 𝑧

=− 𝑝 𝑥 𝑝 𝒀 = 𝑧 − 𝑥 𝑿 = 𝑥 log 𝑝 𝒀 = 𝑧 − 𝑥 𝑿 = 𝑥
𝑥 𝑦

= 𝑝 𝑥 𝐻 𝒀𝑿=𝑥 =𝐻 𝒀𝑿

Lê Nguyên Khôi Information Theory 38


Exercise 2.29 p.50

Proof

1. 𝐻 𝑿, 𝒀 𝒁 ≥𝐻 𝑿𝒁
2. 𝐼 𝑿, 𝒀; 𝒁 ≥ 𝐼 𝑿; 𝒁
3. 𝐻 𝑿, 𝒀, 𝒁 − 𝐻 𝑿, 𝒀 ≤ 𝐻 𝑿, 𝒁 − 𝐻 𝑿
4. 𝐼 𝑿; 𝒁 𝒀 ≥ 𝐼 𝒁; 𝒀 𝑿 − 𝐼 𝒁; 𝒀 + 𝐼 𝑿; 𝒁

Lê Nguyên Khôi Information Theory 39

You might also like