Information Theory
Entropy
TS. Lê Nguyên Khôi
Trường Đại học Công nghệ, ĐHQGHN
Content
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy, and
mutual information
Jensen inequality and consequences
Lê Nguyên Khôi Information Theory 1
Information Theory
Information theory studies the quantification,
representation, and communication of
information
What is information?
Oxford English Dictionary says:
facts provided or learned about something or
someone
what is conveyed or represented by a particular
arrangement or sequence of things
Lê Nguyên Khôi Information Theory 2
What is Information
Websters says:
the communication or reception of knowledge or intelligence
knowledge obtained from investigation, study, or instruction.
the attribute inherent in and communicated by one of two or
more alternative sequences or arrangements of something that
produce specific effects
a signal or character representing data
something (as a message, experimental data, or a picture)
which justifies change in a construct (as a plan or theory) that
represents physical or mental experience or another construct
a quantitative measure of the content of information;
specifically: a numerical quantity that measures the
uncertainty in the outcome of an experiment to be performed
Lê Nguyên Khôi Information Theory 3
What is Information
Wikipedia says:
Information in its most restricted technical sense is a message
(utterance or expression) or collection of messages in an
ordered sequence that consists of symbols, or it is the meaning
that can be interpreted from such a message or collection of
messages. Information can be recorded or transmitted. It can be
recorded as signs, or conveyed as signals. Information is any
kind of event that affects the state of a dynamic system. The
concept has numerous other meanings in different contexts.
Moreover, the concept of information is closely related to
notions of constraint, communication, control, data, form,
instruction, knowledge, meaning, mental stimulus, pattern,
perception, representation, and especially entropy
Lê Nguyên Khôi Information Theory 4
Information Theory
Information theory is a formal mathematical
theory, uses probability and statistics to make
things mathematically precise
Key is communication, relaying a message
from someone to someone else
I’m communicating to you now, hopefully
How much am I communicating to you
Can we quantify it
Lê Nguyên Khôi Information Theory 5
Notation
Vector & Matrix:
𝑣 = vector, 𝑉 = matrix
Random variable (r.v.) 𝑿
𝑥 = specific value
𝒳 = set of 𝑥 (alphabet)
Range:
𝑎: 𝑏 demotes the range 𝑎, 𝑎 + 1, … , 𝑏
Cardinality (number of elementary outcomes):
𝒳 = the number of elements in set 𝒳
Probability vector of a r.v. 𝑿: 𝒑𝑿
Lê Nguyên Khôi Information Theory 6
Discrete Random Variable
A random variable 𝑿 takes a value 𝑥 from the
alphabet 𝑿 with probability 𝑝𝑿 𝑥 .
The vector of probabilities 𝒑𝑿 .
Example:
Dice: 𝒳 = 1; 2; 3; 4; 5; 6
1 1 1 1 1 1
𝒑𝑿 = [ ; ; ; ; ; ]
6 6 6 6 6 6
English text: 𝒳 = [a; b; … ; x; y; z; < space >],
𝒑𝑿 = 𝟎. 𝟎𝟓𝟖; 𝟎. 𝟎𝟏𝟑; … ; 𝟎. 𝟎𝟏𝟔; 𝟎. 𝟎𝟎𝟕; 𝟎. 𝟏𝟗𝟑
Lê Nguyên Khôi Information Theory 7
Expected Value
If 𝑔 𝑥 is a function defined on 𝑿 then
𝐸𝑿 𝑔 𝑥 = 𝑝 𝑥 𝑔(𝑥)
𝑥∈𝒳
Example:
Dice: 𝒳 = 1; 2; 3; 4; 5; 6
1 1 1 1 1 1
𝒑𝑿 = [ ; ; ; ; ; ]
6 6 6 6 6 6
𝐸 𝑿 = 3.5 = 𝜇 ← This is mean of 𝑿
2 2
𝐸 𝑿 = 15.17 = 𝜎 + 𝜇
2 ← This is variance of 𝑿
𝐸 sin 0.1𝑥 = 0.388
𝐸 − log 2 𝑝 𝑿 = 2.58 ← This is entropy of 𝑿
Lê Nguyên Khôi Information Theory 8
Shannon Content Information
The Shannon Information Content (SIC) of an
outcome with probability 𝑝 is − log 2 𝑝
Shannon’s contribution – a statistical view
Messages, noisy channels are random
Pre-Shannon era: deterministic approach
Example 1: coin tossing
1 1
𝑿 = [Head; Tail], 𝒑𝑿 = [ ; ], SIC = [1; 1] bits
2 2
Example 2: Is today your birthday
364 1
𝑿 = [No; Yes], 𝒑𝑿 = [ ; ], SIC = [0.004; 8.512]bits
365 365
Unlikely outcomes give more information
Lê Nguyên Khôi Information Theory 9
Entropy – Definition
Entropy is to measure the uncertainty of a
random variable
Entropy of r.v. 𝑿 is:
the average SIC of r.v. 𝑿
the average uncertainty of r.v. 𝑿
Formula:
𝐻 𝑿 = 𝐸 − log 2 𝑝 𝑿 =− 𝑝(𝑥) log 2 𝑝(𝑥)
𝑥∈𝒳
where 𝑝 𝑥 = 𝑃 𝑋 = 𝑥 , 𝑥 ∈ 𝜒
Lê Nguyên Khôi Information Theory 10
Entropy – Properties
Formula:
𝐻 𝑿 = 𝐸 − log 2 𝑝 𝑿 =− 𝑝(𝑥) log 2 𝑝(𝑥)
𝑥∈𝒳
Entropy is always non-negative
Entropy 𝐻 𝑿 depends only on the probability
use log(𝑥) = log 2 (𝑥) and measure 𝐻 𝑿 in bits
If use log 𝑒 (), it measures in nats
1 nat = log 2 (𝑒) bits = 1.44 bits
Lê Nguyên Khôi Information Theory 11
Entropy - Example
Bernoulli random variable:
𝑿 = [0; 1], 𝒑𝑿 = [1 − p; p]
𝐻 𝑿 = − 1 − 𝑝 log 1 − 𝑝 − 𝑝 log 𝑝 ≝ 𝐻(𝑝)
Given a r.v 𝑿:
𝑿 = R; G; B; W
1 1 1 1
𝒑𝑿 = [ ; ; ; ]
2 4 8 8
Compute SIC of 𝑿
− log 2 𝑝 𝑥
and 𝐻 𝑿 - entropy of 𝑿
𝐸 − log 2 𝑝 𝑿
Lê Nguyên Khôi Information Theory 12
Interestingness
Picking up the phone, we never know for sure who call
A friend, uncle, aunt, …
It is quite an interesting experiment to do
Turning the key in my motorbike, it starts
It doesn’t surprise why my motorbike start since it is well-
maintained and has a great mechanic
It is not a very interesting thing to do
Most of our day consisting of rather un-interesting
experiments
Taking a step, greeting a friend, going to school, …
If these events were suddenly to become interesting, we would
be in trouble
Lê Nguyên Khôi Information Theory 13
Entropy In A Nutshell
Low Entropy High Entropy
… the value (locations of … the value (locations of soup)
soup) sampled entirely unpredictable almost uniformly
from within the soup bowl sampled on the dining table
Lê Nguyên Khôi Information Theory 14
Entropy of Buckets
Three buckets, with 8 letters each
Lê Nguyên Khôi Information Theory 15
Joint Entropy
The entropy 𝐻 𝑿, 𝒀 of a pair of discrete
random variable 𝑿, 𝒀 with a joint distribution
𝑝 𝑿, 𝒀 is defined as:
𝐻 𝑿, 𝒀 = 𝐸 − log 2 𝑝 𝑿, 𝒀
=− 𝑝 𝑥, 𝑦 log 2 𝑝 𝑥, 𝑦
𝑥∈𝒳 𝑦∈𝒴
𝐻 𝑿, 𝒀 = −𝐸 log 𝑝 𝑿, 𝒀
Lê Nguyên Khôi Information Theory 16
Joint Entropy
Compute 𝐻 𝑿, 𝒀
𝑝 𝑿, 𝒀 𝒀=0 𝒀=1
𝑿=0 1 1
2 4
𝑿=1 0 1
4
𝐻 𝑿, 𝒀 = 𝐸 − log 𝑝 𝑿, 𝒀
= − 1 2 log 1 2 − 1 4 log 1 4
−𝟎 𝐥𝐨𝐠 𝟎 − 1 4 log 1 4
= 1.5 bits
Lê Nguyên Khôi Information Theory 17
Conditional Entropy
The entropy 𝐻 𝒀 𝑿 of a pair of discrete
random variable 𝑿, 𝒀 with a conditional
distribution 𝑝 𝒀 𝑿 is defined as:
𝐻 𝒀 𝑿 = −𝐸 log 𝑝 𝒀 𝑿
= 𝑝 𝑥 𝐻 𝒀𝑿=𝑥
𝑥∈𝒳
=− 𝑝 𝑥 𝑝 𝑦 𝑥 log 𝑝 𝑦 𝑥
𝑥∈𝒳 𝑦∈𝒴
=− 𝑝 𝑥, 𝑦 𝑙𝑜𝑔 𝑝 𝑦 𝑥
𝑥∈𝒳 𝑦∈𝒴
Lê Nguyên Khôi Information Theory 18
Conditional Entropy – Example
Compute 𝐻 𝒀 𝑿
𝐻 𝒀𝑿 = 𝑝 𝑥 𝐻 𝒀𝑿=𝑥
𝑥∈𝒳
𝑝 𝑿, 𝒀 𝒀=0 𝒀=1 𝑝 𝑿
𝑿=0 1 1 3
2 4 4
𝑿=1 0 1 1
4 4
𝑝 𝒀𝑿 𝒀=0 𝒀=1 𝐻 𝒀𝑿=𝑥 𝑝 𝑿
𝑿=0 2 1 𝐻(2 3 , 1 3) 3
3 3 4
𝑿=1 0 1 𝐻 0,1 1
4
Lê Nguyên Khôi Information Theory 19
Conditional Entropy – Example
Take a weighted average of the entropy of
each row using 𝑝 𝑥 as weight
𝑝 𝒀𝑿 𝒀=0 𝒀=1 𝐻 𝒀𝑿=𝑥 𝑝 𝑿
𝑿=0 2 1 𝐻(2 3 , 1 3) 3
3 3 4
𝑿=1 0 1 𝐻 0,1 1
4
𝐻 𝒀𝑿 = 𝑝 𝑥 𝐻 𝒀𝑿=𝑥
𝑥∈𝒳
= 3 4 × 𝐻 2 3 , 1 3 + 1 4 × 𝐻 0,1
= 0.689 bits
Lê Nguyên Khôi Information Theory 20
Conditional Entropy – Example
Addition Entropy
𝑝 𝑿, 𝒀 𝒀=0 𝒀=1 𝑝 𝑿
𝑿=0 1 1 3
2 4 4
𝑿=1 0 1 1
4 4
𝑝 𝑿, 𝒀
𝐻 𝒀 𝑿 = 𝐸 − log 𝑝 𝒀 𝑿 = 𝐸 − log
𝑝 𝑿
= 𝐸 − log 𝑝 𝑿, 𝒀 − 𝐸 − log 𝑝 𝑿
= 𝐻 𝑿, 𝒀 − 𝐻 𝑿
= 𝐻 1 2 , 1 4 , 0, 1 4 − 𝐻 3 4 , 1 4
= 0.689 bits
Lê Nguyên Khôi Information Theory 21
Conditional Entropy
𝐻 𝒀 𝑿 is the average additional information in
𝒀 when you know 𝑿
𝐻 𝑿, 𝒀
𝐻 𝑿 𝐻 𝒀𝑿
Lê Nguyên Khôi Information Theory 22
Chain Rule
Probability
𝑝 𝑿, 𝒀 = 𝑝 𝒀 𝑿 𝑝 𝑿
Joint Entropy & Conditional Entropy:
𝐻 𝑿, 𝒀 = 𝐻 𝑿 + 𝐻 𝒀 𝑿
The log in the definition of entropy converts
products of probability into sums of entropy
Corollary
𝐻 𝑿, 𝒀 𝒁 = 𝐻 𝑿 𝒁 + 𝐻 𝒀 𝑿, 𝒁
Lê Nguyên Khôi Information Theory 23
Exercise
Given a joint probability distribution 𝑝 𝑿, 𝒀
𝑝 𝑿, 𝒀 𝒀=1 𝒀=2 𝒀=3 𝒀=4
𝑿=1 1 1 1 1
8 16 32 32
𝑿=2 1 1 1 1
16 8 32 32
𝑿=3 1 1 1 1
16 16 16 16
𝑿=4 1 0 0 0
4
Compute: 𝐻 𝑿 , 𝐻 𝒀 , 𝐻 𝑿 𝒀
Compute: 𝐻 𝑿, 𝒀 , 𝐻 𝒀 𝑿
Lê Nguyên Khôi Information Theory 24
Example
1 1 1 1
The marginal distribution of 𝑿 is [ ; ; ; ]
4 4 4 4
1 1 1 1
The marginal distribution of 𝒀 is [ ; ; ; ]
2 4 8 8
It is easy to achieve 𝐻 𝑿 = 𝟐 and 𝐻 𝒀 = 𝟕/𝟒
𝐻 𝑿 𝒀 = 𝐻 𝑿, 𝒀 − 𝐻(𝒀)
4
4
𝐻 𝑿, 𝒀 = − 𝑝 𝑥, 𝑦 log 𝑝 𝑥, 𝑦
𝑦=1
𝑥=1
1 1 1 1 1 1 1 1
= − log + −2 log + −6 log + −4 log
4 4 8 8 16 16 32 32
1 1 1 1 1 1 1 1
= − log − 2 log − 6 log − 4 log
4 4 8 8 16 16 32 32
1 3 3 5 4 + 6 + 12 + 5 27
= + + + = =
2 4 2 8 8 8
27 7 13
𝐻 𝑿𝒀 = − =
8 4 8
Lê Nguyên Khôi Information Theory 25
Chain Rule – Proof
𝐻 𝑿, 𝒀 = 𝐻 𝑿 + 𝐻 𝒀 𝑿
𝐻 𝑿, 𝒀 = − 𝑝 𝑥, 𝑦 log 𝑝 𝑥, 𝑦
𝑥∈𝒳 𝑦∈𝒴
=− 𝑝 𝑥, 𝑦 log 𝑝 𝑥 𝑝 𝑦 𝑥
𝑥∈𝒳 𝑦∈𝒴
=− 𝑝 𝑥, 𝑦 log 𝑝 𝑥 − 𝑝 𝑥, 𝑦 log 𝑝 𝑦 𝑥
𝑥∈𝒳 𝑦∈𝒴 𝑥∈𝒳 𝑦∈𝒴
=− 𝑝 𝑥 log 𝑝 𝑥 − 𝑝 𝑥, 𝑦 log 𝑝 𝑦 𝑥
𝑥∈𝒳 𝑥∈𝒳 𝑦∈𝒴
=𝐻 𝑿 +𝐻 𝒀 𝑿
Lê Nguyên Khôi Information Theory 26
Chain Rule
Probability
𝑝 𝑿, 𝒀, 𝒁 = 𝑝 𝒁 𝑿, 𝒀 𝑝 𝒀 𝑿 𝑝(𝑿)
Entropy
𝐻 𝑿, 𝒀, 𝒁 = 𝐻 𝒁 𝑿, 𝒀 + 𝐻 𝒀 𝑿 + 𝐻 𝑿
𝑛
𝐻 𝑿1:𝑛 = 𝐻 𝑿𝑖 𝑿1:𝑖−1
𝑖=1
The log in the definition of entropy converts
products of probability into sums of entropy
Lê Nguyên Khôi Information Theory 27
Relative Entropy
Relative entropy measures how one probability
distribution is different from a second probability
distribution
The relative entropy or Kullback–Leibler distance
between two probability vectors 𝒑(𝑿) and 𝒒(𝑿) is
defined as
𝑝 𝑥 𝑝 𝑿
𝐷(𝒑 𝒒) = 𝑝 𝑥 log = 𝐸𝒑 log
𝑞 𝑥 𝑞 𝑿
𝑥∈𝒳
Remark:
It is not a true distance between distributions since it is not
symmetric and does not satisfy the triangle inequality.
Nonetheless, it is often useful to think of relative entropy as a
“distance” between distributions.
Lê Nguyên Khôi Information Theory 28
Relative Entropy – Example
Let 𝑿 = [0; 1] and consider two distributions 𝒑
and 𝒒 on 𝑿. Let 𝒑 0 = 1 − 𝑟, 𝒑 1 = 𝑟, and let
𝒒 0 = 1 − 𝑠, 𝒒 1 = 𝑠. Then:
1−𝑟 𝑟
𝐷(𝑝 𝑞 = 1 − 𝑟 log + 𝑟log
1−𝑠 𝑠
1−𝑠 𝑠
𝐷(𝑞 𝑝 = 1 − 𝑠 log + 𝑠log
1−𝑟 𝑟
If 𝑟 = 𝑠 then 𝐷(𝑝 𝑞 = 𝐷(𝑞 𝑝 = 0. If 𝑟 = 1/2 and
1
𝑠 = 1/4 then 𝐷(𝑝 𝑞 = 1 − log3 = 0.2075 bits,
2
3
whereas 𝐷(𝑞 𝑝 = − 1 = 0.1887 bits
4log3
Lê Nguyên Khôi Information Theory 29
Mutual Information
Mutual information of two random variables is a
measure of the mutual dependence between the two
variables. It quantifies the amount of information
obtained about one random variable through observing
the other random variable
Consider two random variables 𝑿 and 𝒀 with a joint
probability vector 𝒑(𝑿, 𝒀) and marginal probability
vectors 𝒑(𝑿) and 𝒑(𝒀). The mutual information 𝐼 𝑿; 𝒀
is the relative entropy between the joint distribution and
the product distribution 𝒑(𝑿)𝒑(𝒀)
𝑝(𝑥, 𝑦)
𝐼 𝑋; 𝑌 = 𝑝 𝑥, 𝑦 𝑙𝑜𝑔 = 𝐷(𝑝(𝑥, 𝑦) 𝑝 𝑥 𝑝 𝑦
𝑝 𝑥 𝑝(𝑦)
𝑥 𝑦
𝑝(𝑋, 𝑌)
= 𝐸𝑝(𝑥,𝑦) 𝑙𝑜𝑔
𝑝 𝑋 𝑝(𝑌)
Lê Nguyên Khôi Information Theory 30
Mutual Information & Entropy
Lê Nguyên Khôi Information Theory 31
Mutual Information
The average amount of information that you
get about 𝑿 from observing 𝒀
Or the reduction in the uncertainty of 𝑿 due to
the knowledge of 𝒀
𝐼 𝑿; 𝒀 = 𝐻 𝑿 − 𝐻 𝑿 𝒀 = 𝐻 𝑿 + 𝐻 𝒀 − 𝐻(𝑿, 𝒀)
Information in 𝑿 Information in 𝑿 when you already know 𝒀
𝐻 𝑿, 𝒀
Mutual information is symmetric
𝐼 𝑿; 𝒀 = 𝐼 𝒀; 𝑿
Use “;” to avoid ambiguities 𝐻 𝑿 𝒀 𝐼 𝑿; 𝒀 𝐻 𝒀 𝑿
between 𝐼(𝑿; 𝒀, 𝒁) and 𝐼(𝑿, 𝒀; 𝒁) 𝐻 𝑿 𝐻 𝒀
Lê Nguyên Khôi Information Theory 32
Mutual Information - Example
𝑝 𝑿, 𝒀 𝒀=0 𝒀=1
Guess value of 𝒚 1 1
𝑿=0
50% chance of being correct 2 4
𝑿=1 0 1
what if you know 𝑥? 4
3 2
if 𝑥 = 0 (𝑝 = ) choose 𝑦 = 0, correct
4 3
1 1
if 𝑥 = 1 (𝑝 = ) choose 𝑦 = 1, correct
4 1
3 2 1 1 3
Overall × + × = = 75% correct probability
4 3 4 1 4
Best guess: choose 𝑦 = 𝑥
Lê Nguyên Khôi Information Theory 33
Mutual Information - Example
𝑝 𝑿, 𝒀 𝒀=0 𝒀=1
Compute 𝐼 𝑿; 𝒀 𝑿=0 1 1
2 4
𝑿=1 0 1
4
𝐼 𝑿; 𝒀 = 𝐻 𝑿 − 𝐻 𝑿 𝒀
= 𝐻 𝑿 + 𝐻 𝒀 − 𝐻(𝑿, 𝒀)
𝐻 𝑿, 𝒀 = 1,5
𝐻 𝑿 = 0,811 𝐻 𝒀 =1 𝐻 𝑿𝒀 𝐼 𝑿; 𝒀 𝐻 𝒀𝑿
𝐻 𝑿, 𝒀 = 1,5 𝐼 𝑿; 𝒀 = 0,311 = 0,5 = 0,311 = 0,689
𝐻 𝑿 = 0,811 𝐻 𝒀 =1
Lê Nguyên Khôi Information Theory 34
Exercise
Given a joint probability distribution 𝑝 𝑿, 𝒀
𝑝 𝑿, 𝒀 𝒀=1 𝒀=2 𝒀=3 𝒀=4
𝑿=1 1 1 1 1
8 16 32 32
𝑿=2 1 1 1 1
16 8 32 32
𝑿=3 1 1 1 1
16 16 16 16
𝑿=4 1 0 0 0
4
Compute
𝐻 𝑿 ,𝐻 𝒀
𝐻 𝑿𝒀 ,𝐻 𝒀𝑿
𝐻 𝑿, 𝒀 , 𝐼 𝑿; 𝒀
Lê Nguyên Khôi Information Theory 35
Exercise 2.4 p.44
(a): chain rule (joint entropy & conditional entropy)
(b): 𝐻 𝑔(𝑿) 𝑿 = 0, as know 𝑿 imply 𝑔(𝑿)
(c): chain rule (joint entropy & conditional entropy)
(d): 𝐻 𝑿 𝑔(𝑿) ≥ 0, property: entropy non-negative
Lê Nguyên Khôi Information Theory 36
Exercise 2.5 p.44
Nếu phương trình 𝑦 = 𝑔(𝑥) có cặp nghiệm,
cặp nghiệm 𝑥0 , 𝑦0 đó có xác suất dương
Giả sử tồn tại 𝑥0 và 2 giá trị khác nhau 𝑦1 , 𝑦2 :
𝑝 𝑥0 , 𝑦1 > 0 và 𝑝 𝑥0 , 𝑦2 > 0
Khi đó:
𝑝 𝑥0 ≥ 𝑝 𝑥0 , 𝑦1 + 𝑝 𝑥0 , 𝑦2 > 0
𝑝 𝑦1 𝑥0 và 𝑝 𝑦2 𝑥0 không bằng 0 hoặc 1
𝐻 𝒀𝑿 =− 𝑝 𝑥 𝑝 𝑦 𝑥 log 𝑝 𝑦 𝑥
𝑥 𝑦
≥ 𝑝 𝑥0 −𝑝 𝑦1 𝑥0 log 𝑝 𝑦1 𝑥0 − 𝑝 𝑦2 𝑥0 log 𝑝 𝑦2 𝑥0
>0 >0
Lê Nguyên Khôi Information Theory 37
Exercise 2.14a p.46
𝒁=𝑿+𝒀
therefore 𝑝 𝒁 = 𝑧 𝑿 = 𝑥 = 𝑝 𝒀 = 𝑧 − 𝑥 𝑿 = 𝑥
𝐻 𝒁𝑿 = 𝑝(𝑥)𝐻 𝒁 𝑿 = 𝑥
=− 𝑝 𝑥 𝑝 𝒁 = 𝑧 𝑿 = 𝑥 log 𝑝 𝒁 = 𝑧 𝑿 = 𝑥
𝑥 𝑧
=− 𝑝 𝑥 𝑝 𝒀 = 𝑧 − 𝑥 𝑿 = 𝑥 log 𝑝 𝒀 = 𝑧 − 𝑥 𝑿 = 𝑥
𝑥 𝑦
= 𝑝 𝑥 𝐻 𝒀𝑿=𝑥 =𝐻 𝒀𝑿
Lê Nguyên Khôi Information Theory 38
Exercise 2.29 p.50
Proof
1. 𝐻 𝑿, 𝒀 𝒁 ≥𝐻 𝑿𝒁
2. 𝐼 𝑿, 𝒀; 𝒁 ≥ 𝐼 𝑿; 𝒁
3. 𝐻 𝑿, 𝒀, 𝒁 − 𝐻 𝑿, 𝒀 ≤ 𝐻 𝑿, 𝒁 − 𝐻 𝑿
4. 𝐼 𝑿; 𝒁 𝒀 ≥ 𝐼 𝒁; 𝒀 𝑿 − 𝐼 𝒁; 𝒀 + 𝐼 𝑿; 𝒁
Lê Nguyên Khôi Information Theory 39