Probability Pairs of random variables
Information Theory
Degree in Data Science and Engineering
Lesson 1: Discrete random variables
Jordi Quer, Josep Vidal
Mathematics Department, Signal Theory and Communications Department
{jordi.quer, josep.vidal}@upc.edu
2019/20 - Q1
1/18
Probability Pairs of random variables
Why probability?
2/18
Probability Pairs of random variables
Why probability?
Letters in an English text appear with different frequencies:
e is the most frequent: 12.702%
t is the second most frequent: 9.056%
z is the less frequent: 0.074%
The same happens with words: the 10 most frequent words in English are, in
the given order:
the, be, to, of, and, a, in, that, have, I.
Also for: instructions in a computer program, pixels in an image, samples in a
digitized wave sound, nucleotides in a DNA sequence, data content in a
database, weather forecast, etc.
3/18
Probability Pairs of random variables
Probability
Probability provides a way to model random phenomena: random experiments
are associated to a value within a set of possible outcomes, each occurring with
some probability. For example:
Flip a coin. Two possible outcomes: heads and tails, each with the same
probability 1/2.
Throw a dice. Six possible outcomes: 1 to 6 dots, each with the same
probability 1/6.
Throw two dice and take the
sum. Eleven possible
outcomes: S = {2, . . . , 12}
with probabilities shown in the
table at the right:
Probabilities are numbers in [0, 1] and their sum must be = 1.
4/18
Probability Pairs of random variables
Random variables
Mathematical formalization in terms of random variables: for us, a discrete
random variable X consists of
a discrete set X = {x1 , x2 , . . . , xq } of possible values xi ,
each occurring with a given probability
pi = p(xi ) = Pr(X = xi ) ∈ [0, 1],
Pq
and i=1 p(xi ) = 1.
If xi are numbers, the expected value (or expectation) of X is:
q
X
µX = E[X] = xi p(xi )
i=1
and the variance of X is
q
2 X
= E |X − E[X]|2 = |xi − E[X]|2 p(xi )
V ar(X) = σX
i=1
5/18
Probability Pairs of random variables
Random variables: examples
Bernoulli distribution. X has two outcomes: X = {1, 0}
with probabilities p(1) = p ∈ [0, 1] and p(0) = 1 − p.
Pr(X = x) = p(x) = px (1 − p)(1−x)
When flipping a coin X = {head, tail}. For a fair coin p = 1/2.
Uniform distribution. Y has n outcomes: Y = {y1 , . . . , yq }, each with
the same probability p(yi ) = 1/q. Throwing a dice corresponds to q = 6.
Binomial distribution. Z has n + 1 possible outcomes: Z = {0, 1, . . . , n},
with probabilities:
!
n k
Pr(Z = z) = p(z) = p (1 − p)n−z ,
z
that counts the number of 1’s in n independent samples of a random
variable X with Bernoulli distribution.
If p is the probability of heads in a coin flipping, the average number of
heads when flipping n coins is E[z] = pn.
6/18
Probability Pairs of random variables
A mathematical model for Information Theory
In information theory , a data source is a device that produces letters or
symbols that can be considered as observations of a random variable X taking
values in a finite alphabet X , (|X | = q) with certain probabilities.
Concrete values xi are not important. The relevant information about the
variable X is given by the probability distribution given by q numbers
q
X
p1 , p 2 , . . . , p q with pi ∈ [0, 1] and pi = 1.
i=1
When X is sampled several consecutive times, it produces a text: a string of
letters of the alphabet.
We shall see more sophisticated models that consider strings produced a
sequence X1 X2 X3 . . . of non-independent random variables.
7/18
Probability Pairs of random variables
Pairs of random variables
One may consider two or more random variables X and Y associated to the
same experiment. For example:
The experiment consists of rolling two different dice, and
X and Y are the outcomes of each dice;
X is the outcome of the first dice and Y is the sum;
X is the sum and Y says whether the two outcomes
are equal or different;
X is the sum and Y is the parity of the second dice; etc.
In each situation there are several relevant probability distributions.
Let X = {x1 , . . . , xq } and Y = {y1 , . . . , yr } be their values.
8/18
Probability Pairs of random variables
Joint and marginal probabilities
Everything is governed by the joint probability distribution:
p(xi , yj ) = Pr(X = xi , Y = yj ),
which satisfies:
q r
X X
p(xi , yj ) = 1.
i=1 j=1
The probabilities of each separate variables are called marginal probabilities.
They are computed from the joint distribution as:
r
X
p(xi ) = Pr(X = xi ) = p(xi , yj ),
j=1
q
X
p(yj ) = Pr(Y = yj ) = p(xi , yj ).
i=1
9/18
Probability Pairs of random variables
Conditional probabilities
Conditional probabilities can be defined as
p(xi , yj )
p(xi |yj ) = Pr(X = xi |Y = yj ) := , if p(yj ) 6= 0,
p(yj )
to be read as the probability of xi given yj or under the condition yj .
Must be interpreted as the probability that, in a sample of the pair of variables,
X takes the value xi under the condition that Y takes the value yj .
Analogously,
p(xi , yj )
p(yj |xi ) = Pr(Y = yj |X = xi ) := , if p(xi ) 6= 0,
p(xi )
10/18
Probability Pairs of random variables
Marginal and conditional probability distributions
Theorem
The marginal probabilities defined from a joint probability distribution satisfy:
q q q q
X X X X
p(xi ) = p(xi , yj ) = 1, p(yj ) = 1
i=1 i=1 j=1 j=1
Theorem
The conditional probabilities satisfy:
q r
X X
∀j p(xi |yj ) = 1, ∀i p(yj |xi ) = 1
i=1 j=1
11/18
Probability Pairs of random variables
Independent variables
Two variables X and Y are independent if the joint probability is always the
product of the marginal probabilities:
p(xi , yj ) = p(xi )p(yj ) ∀i, j.
This property is derived from the fact that the conditional probabilities do not
really depend on the condition:
p(xi |yj ) = p(xi ) and p(yj |xi ) = p(yj ) ∀i, j.
The dependency measures the degree of correlation between the two variables.
12/18
Probability Pairs of random variables
Pairs of random variables: examples
Experiment: Let X and Y be the outcomes of rolling two different dices,
X = Y = {1, 2, 3, 4, 5, 6}.
1 1 1
p(xi , yj ) = = · = p(xi )p(yj ) ∀ i, j.
36 6 6
They are independent, and their conditional distributions are:
p(yj |xi ) 1 2 3 4 5 6 p(xj |yi ) 1 2 3 4 5 6
1 1 1 1 1 1 1 1 1 1 1 1
1 6 6 6 6 6 6
1 6 6 6 6 6 6
1 1 1 1 1 1 1 1 1 1 1 1
2 6 6 6 6 6 6
2 6 6 6 6 6 6
1 1 1 1 1 1 1 1 1 1 1 1
3 6 6 6 6 6 6
3 6 6 6 6 6 6
1 1 1 1 1 1 1 1 1 1 1 1
4 6 6 6 6 6 6
4 6 6 6 6 6 6
1 1 1 1 1 1 1 1 1 1 1 1
5 6 6 6 6 6 6
5 6 6 6 6 6 6
1 1 1 1 1 1 1 1 1 1 1 1
6 6 6 6 6 6 6
6 6 6 6 6 6 6
13/18
Probability Pairs of random variables
Pairs of random variables: examples
Let S be the sum of the two dices, S = {2, 3, . . . , 12}.
The joint distribution of X and S is:
p(xi , sj ) 2 3 4 5 6 7 8 9 10 11 12 p(xi )
1 1 1 1 1 1 1
1 36 36 36 36 36 36
0 0 0 0 0 6
1 1 1 1 1 1 1
2 0 36 36 36 36 36 36
0 0 0 0 6
1 1 1 1 1 1 1
3 0 0 36 36 36 36 36 36
0 0 0 6
1 1 1 1 1 1 1
4 0 0 0 36 36 36 36 36 36
0 0 6
1 1 1 1 1 1 1
5 0 0 0 0 36 36 36 36 36 36
0 6
1 1 1 1 1 1 1
6 0 0 0 0 0 36 36 36 36 36 36 6
1 1 1 1 5 1 5 1 1 1 1
p(sj ) 36 18 12 9 36 6 36 9 12 18 36
1
The sum of all table entries = 1 (joint probability distribution).
The marginal probabilities are the sums of rows and columns.
14/18
Probability Pairs of random variables
Pairs of random variables: examples
Conditional distribution of S with respect to X: knowing the first dice,
probability of value of the sum.
p(sj |xi ) 2 3 4 5 6 7 8 9 10 11 12
1 1 1 1 1 1
1 6 6 6 6 6 6
0 0 0 0 0
1 1 1 1 1 1
2 0 6 6 6 6 6 6
0 0 0 0
1 1 1 1 1 1
3 0 0 6 6 6 6 6 6
0 0 0
1 1 1 1 1 1
4 0 0 0 6 6 6 6 6 6
0 0
1 1 1 1 1 1
5 0 0 0 0 6 6 6 6 6 6
0
1 1 1 1 1 1
6 0 0 0 0 0 6 6 6 6 6 6
The sum of each row must be = 1 (conditional probability distributions).
15/18
Probability Pairs of random variables
Pairs of random variables: examples
Conditional distribution of X with respect to S: knowing the sum, probability
of value of first dice.
p(xj |si ) 1 2 3 4 5 6
2 1 0 0 0 0 0
1 1
3 2 2
0 0 0 0
1 1 1
4 3 3 3
0 0 0
1 1 1 1
5 4 4 4 4
0 0
1 1 1 1 1
6 5 5 5 5 5
0
1 1 1 1 1 1
7 6 6 6 6 6 6
1 1 1 1 1
8 0 5 5 5 5 5
1 1 1 1
9 0 0 4 4 4 4
1 1 1
10 0 0 0 3 3 3
1 1
11 0 0 0 0 2 2
12 0 0 0 0 0 1
16/18
Probability Pairs of random variables
Bayes’ Theorem
What is the relation between p(xi |yj ) and p(yj |xi )? It is given by the Bayes’
theorem:
p(yj |xi )p(xi )
p(xi |yj ) =
p(yj )
where
q q
X X
p(yj ) = p(xi , yj ) = p(yj |xi )p(xi )
i=1 i=1
Example: When getting the result of a medical test on AIDS, what can be said
about our condition? Take as random variables:
Patient condition: X ∈ {healthy, sick}
Result of the test: Y ∈ {−, +}
17/18
Probability Pairs of random variables
Bayes’ Theorem
Use Bayes’ theorem and check the dependency with the prevalence p(sick).
Get a result using AIDS prevalence data in Spain. 18/18