HUST Guest Lecture 2022
HUST Guest Lecture 2022
at HUST
Autumn 2022
Matthias Pätzold
University of Agder
Faculty of Engineering and Science
Mobile Communications Group
P.O. Box 509, NO-4898 Grimstad, Norway
Tel.: (+47) 3723 - 3283
E-mail: Matthias.Paetzold@uia.no
Web: http://mcg.uia.no/
Table of Contents
Introduction i
I Random Variables 1
3 Repeated Trials 18
3.1 Combined Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Bernoulli Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Bibliography 136
Introduction
This lecture is divided in two parts. The first part presents an introductory treat-
ment of probability and random variables, and the second part deals with stochastic
processes.
ment with an outcome that is unpredictable. In case that the experiment is repeated,
the outcome can differ due to the influence of an underlying random phenomenon.
Such types of experiments are known as random experiments. A typical random ex-
periment is the observation of the result of tossing a fair coin, where the possible
outcomes of a trial are “heads” or “tails”. In this coin experiment, for example, the
percentage of heads approaches 1/2. Also in other random experiments, it has been
observed that certain averages approach a constant value as the number of observa-
tions increases. The purpose of the probability theory is to describe and predict such
If the experiment is performed n times and the event A occurs nA times, then
(with a high degree of certainty) the relative frequency nA /n of the occurrence
of A is close to P (A), i.e.,
nA
P (A) ≃
n
Approximately the first half of the lecture is devoted to the theory of probability and
ii CONTENTS
random variables.
Physical systems and models in which there is uncertainty or randomness play a very
important role in the area of electrical engineering (see Fig. 1). For example, the
signal, a random interference component, and channel noise. The interference compo-
systems operating in the vicinity of the receiver. A major source of channel noise is
thermal noise, which is caused by the random motion of the electrons in devices at the
front end of the receiver. Hence, the received signal is random in nature. Although
it is not possible to predict the exact value of a random signal (stochastic process) in
ties such as mean value, autocorrelation function, and power spectral density, as we
The intent of the course is to provide the foundation in statistics which would be
needed in other courses such as mobile radio communications, digital signal process-
M o b ile r a d io
c o m m u n ic a tio n s
I n fo r m a tio n A n a lo g /d ig ita l
& c o d in g th e o r y c o m m u n ic a tio n s
R a n d o m v a r ia b le s
&
s to c h a s tic p r o c e s s e s
M e a su r e m e n t D ig ita l
th e o r y s ig n a l p r o c e s s in g
C o n tr o l
th e o r y
Random Variables
Chapter 1
defined as follows.
Definition 1.1. The probability P (A) of an event A is the limit (Richard von Mises)
nA
P (A) = lim H(A, n) = lim (1.1)
n→∞ n→∞ n
where
H(A, n): relative frequency
nA : number of occurrences of A
n: number of trials.
NA
P (A) = (1.3)
N
where
NA : number of favorable outcomes
“In the absence of any prior knowledge, we must assume that the events Ai have
equal probabilities.”
Example 1.1. We roll two dice and we want to find the probability p that the sum
of the appearing numbers equals 7. To solve this problem, we must determine the
numbers NA and N .
(3, 4), (4, 3), (5, 2), (2, 5), (6, 1), (1, 6) → NA = 6.
N = 6 · 6 = 36.
6 1
P (sum equals 7) = = .
36 6
Given is a circle C of radius r. What is the probability p that the length l of a “ran-
√
domly selected” chord AB is greater than the length r 3 of the inscribed equilateral
triangle?
4 The Meaning of Probability
C
r 3
Solutions:
a) If the center M of the chord AB lies inside the circle C1 of radius r/2, then
√
l > r 3 (see Fig. 1.2). Hence, we conclude that
AC1 πr 2 /4 1
p= = 2
= .
AC πr 4
C
A
C1
M
r/2
r B
b) We now assume that the end A of the chord AB is fixed (see Fig. 1.3). Hence,
B is on the 120◦ arc DBE 2πr/3 1
p= = = .
B is on the circumference of C 2πr 3
B
D E
G
r/2 M
A B
r/2
Obviously three different solutions exist for the same problem. Note that these solu-
the ambiguities associated with the classical definition, and the need for a clear spec-
• B ⊂ A: “B is a subset of A”
• ζi ∈ A: ζi is an element of A
• ζi ∈
/ A: ζi is not an element of A
• S: space
Note that if a set consists of n elements, then the total number of its subsets equals
2n .
Example 2.1. Suppose that a coin is tossed twice. The resulting outcomes are the
Hence, the set S is given by S = {hh, ht, th, tt}. The set S has 24 = 16 subsets. For
Set Theory 7
example,
Set operations and set relations can be illustrated by the use of Venn diagrams.
Some basic set operations using Venn diagrams are shown in Fig. 2.1.
U
11111111111
00000000000
00000000000
11111111111 A+B
00000000000
11111111111 B
00000000000
11111111111
00000000000
11111111111
B
00000000000
11111111111 AB
00000000000
11111111111
A
00000000000
11111111111
A
Mutually exclusive sets: Two sets A and B are said to be mutually exclusive or
disjoint if
AB = ∅ .
8 The Axioms of Probability
A1 ∪ A2 ∪ · · · ∪ An = S, Ai Aj = ∅, i 6= j.
Thus,
U = [A1 , A2 , . . . , An ].
S
A1
A2
A3
A4
S1111111111111
0000000000000
0000000000000
1111111111111
0000000000000
1111111111111
0000000000000
1111111111111
0000000000000
1111111111111
0000000000000
1111111111111
A −
0000000000000
1111111111111
A
0000000000000
1111111111111
0000000000000
1111111111111
0000000000000
1111111111111
0000000000000
1111111111111
0000000000000
1111111111111
Fig. 2.3: The complement Ā of A.
Properties: A ∪ Ā = S
A Ā = ∅
 = A
S̄ = ∅
¯
∅=S
Probability Space 9
111111111111
000000000000
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
−−
000000000000
111111111111
AB
000000000000
111111111111
B
000000000000
111111111111
000000000000
111111111111
AB
000000000000
111111111111
A
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
Fig. 2.4: De Morgan’s law.
A mapping P , which assigns to each event A a real number P (A), is called the
I. 0 ≤ P (A) ≤ 1 (2.1)
These conditions are the axioms of the theory of probability. In the development of
the theory, all conclusions are based directly or indirectly on the axioms and only on
the axioms.
Properties:
P (∅) = 0 . (2.4)
⇒ P (∅) = 0 .
1 = P (S) = P (A + Ā)
= P (A) + P (Ā) .
Therefore,
I. If A ∈ F then Ā ∈ F (2.8)
Properties:
ii) A field contains the certain event and the impossible event, i.e.,
S ∈ F and ∅ ∈ F . (2.11)
From this it follows that all sets that can be written as unions or intersections of
finitely many sets in F are also in F. However, this is not necessarily the case for
P (AB)
P (A|B) = (2.12)
P (B)
Determine the conditional probability of the event {f2 } assuming that the event even
occurred.
12 The Axioms of Probability
With
we have
1 3
P (A) = and P (B) = .
6 6
Since AB = A, (2.12) yields
P (AB) P (A)
P (A|B) = =
P (B) P (B)
P {f2 } 1/6 1
P {f2 |even} = = = .
P {even} 3/6 3
This probability equals the relative frequency of the occurrence of the event {f2 } in
then
S
A2
A1 A3
B
A4
ally exclusive. Consequently BAi and BAj are mutually exclusive too, and thus
Consider the binary symmetric channel shown in Fig. 2.5. The transmission of binary
data (0, 1) over this channel results in an error if the receiver does not detect the
P(B0 |A 1 )
P(B 1 |A0 )
P(A 0 ) 0 0 P(B0 )
P(B0 |A 0 )
• 95% of all “1” are transmitted correctly, i.e., P (B1 |A1 ) = 0.95
Solution. In this example, the certain event S is given by S = {0, 1}2 , and the event
(A, B) ∈ S describes that the bit A was sent and the bit B was received. By using
14 The Axioms of Probability
P (B|Ai ) P (Ai )
P (Ai |B) = n . (2.15)
P
P (B|Ai ) · P (Ai )
i=1
The probabilities P (Ai ) and P (Ai |B) are often called a priori and a posteriori prob-
abilities, respectively.
When a MAP receiver is used, the detection of the received signal is based on the a
posteriori probability.
We continue with Example 2.4 and ask for the probability that the bit “1” was
transmitted on the condition that the bit “0” was received. With the events
Next, we ask for the probability that the bit “0” was transmitted on the condition
that the bit “0” was received. With the event A0 = {“0” was transmitted} and (2.15),
we find
Hence, P (A0 |B0 ) > P (A1 |B0 ) which implies that the MAP receiver decides that the
2.6 Independence
or when
The events A1 , A2 , and A3 are called (mutually) independent if they are independent
in pairs, i.e.,
and
It should be emphasized that three events might be independent in pairs but not
independent.
16 The Axioms of Probability
Example 2.6. If we toss a coin twice, then we generate the four outcomes hh, ht, th,
and tt.
A1 = {first coin shows head} = {hh, ht}
independent, because
A1 ∩ A2 ∩ A3 = ∅
and
1
P (A1 ∩ A2 ∩ A3 ) = 0 6= = P (A1 ) · P (A2 ) · P (A3 ) .
8
Example 2.7. Three switches connected in parallel operate independently (see Fig. 2.6).
Each switch remains closed with probability p. Find the probability of receiving an
S1
S2
Input Output
S3
Let R represent the event: “Input signal is received at the output.” Note that for the
Hence,
R = A1 ∪ A2 ∪ A3
and thus
= 1 − (1 − p)3
= 3p − 3p2 + p3 .
Chapter 3
Repeated Trials
If we perform both experiments, then the two experiments can be viewed as a single
f1 h, f2 h, . . . , f6 h, f1 t, f2 t, . . . , f6 t .
Let S1 and S2 be two spaces with elements ζ1 and ζ2 , respectively. The Cartesian
product
S = S1 × S2 (3.1)
of the spaces S1 and S2 defines the combined space whose elements are all pairs ζ1 ζ2 ,
S = S1 × S2 × · · · × Sn (3.2)
A1 × A2 × · · · × An
The outage probability p of a system describes the probability that the system fails.
Fig. 3.1 shows a system that is composed of four sub-systems each of which has an
System 1 System 2
System 4
System 3
A B
P {outage} = P {outage of A or B}
= [1 − (1 − p)2 ] p
= 2p2 − p3 . (3.5)
= p + 2p2 − 3p3 + p4 .
P (A) = p , P (Ā) = q , p + q = 1.
We repeat the experiment n times and we denote the resulting product space by
S n = S × S × . . . × S.
The probability pn (k) that the event A occurs exactly k times in any order is deter-
n
Note that the term k in (3.6) denotes the number of combinations that the event
Example 3.3. A coin with P {h} = p is tossed n times. The probability pn (k) that
with error probability p. The probability pn (k) that the sequence of n bits contains
Example 3.5. A pair of dice is tossed 5 times. Find the probability of obtaining at
Solution: The space S of a single roll of two dice consists of the 36 elements fi fj ,
Thus, p = P (A) = 1/36 and q = P (Ā) = 35/36. For the combined experiment with
X: S→R
s 7→ X(s) . (4.1)
S
X
s
X(s) x0 IR
1
All random variables will be denoted by uppercase letters, and fixed values of random variables
will be denoted by lowercase letters.
24 The Concept of a Random Variable
P {X = ∞} = P {X = −∞} = 0 .
Example 4.1. Consider the toss of one die. In our experiment, we assign to the
six outcomes si the numbers X(si ) = 10i. The mapping performed by the random
variable X is described by
X : S→R
si 7→ X(si ) = 10i .
Example 4.2. In the die experiment, we assign the number 1 to every even outcome
Thus,
The probability that the outcome equals 0 is P {X = 0} = 3/6 = 1/2, and the
Z = X + jY (4.2)
P {X = x} := P {s|X(s) = x}
P {X ≤ x} := P {s|X(s) ≤ x}
X is the function
FX (x) = P {X ≤ x} (4.3)
Example 4.3. In case of the die experiment of Example 4.1, the cumulative distri-
F X(x)
1/2
x
10 20 30 40 50 60
u = P {X ≤ xu } = FX (xu ) . (4.4)
26 The Concept of a Random Variable
Note that xu is the inverse of the function u = FX (xu ), i.e., xu = FX−1 (u). The
1. FX (x) is normalized:
if its distribution function FX (x) is constant expect for a finite number of jump
F X(x) F X(x)
1 1
pi
x xi x
(a) (b)
Note that if FX (x) is continuous, then lim FX (x − ε) = FX (x) for all x, and from
ε→0
(4.8) we get
P {X = x} = FX (x) − lim FX (x − ε) = 0 .
ε→0
called the probability density function (PDF) fX (x) of the random variable X. Thus,
dFX (x)
fX (x) = . (4.10)
dx
Example 4.4. With reference to the die experiment of Examples 4.1 and 4.3, the
Fig. 4.4.
28 The Concept of a Random Variable
fX (x)
1/6
10 20 30 40 50 60 x
because
Zx
FX (x) = fX (u) du . (4.12)
−∞
Z∞
fX (x) dx = 1 (4.13)
−∞
Zx2
P {x1 < X ≤ x2 } = FX (x2 ) − FX (x1 ) = fX (x) dx . (4.14)
x1
f X(x) F X(x)
1
b−a
x x
a b a b
(a) (b)
Fig. 4.5: (a) Uniform distribution and (b) corresponding distribution function.
To represent the uniform distribution in (4.15), we will use the notation X ∼ U [a, b].
variable with parameters µ and σ 2 if its density function is given by [see Fig. 4.6(a)]
1 2 2
fX (x) = √ e −(x−µ) /2σ . (4.17)
2πσ
30 The Concept of a Random Variable
Zx
1 2 2 △ x−µ
FX (x) = √ e −(y−µ) /2σ dy = G (4.18)
2πσ σ
−∞
where
Zx
△ 1 2
G(x) = √ e −y /2 dy (4.19)
2π
−∞
is known as the Gaussian error integral. To represent the Gaussian PDF in (4.17), we
often use the notation X ∼ N (µ, σ 2 ). The special case X ∼ N (0, 1) is referred to as
the standard normal random variable. Note that G(x) is the cumulative distribution
σ2 = 1 σ2 = 1
0.32 0.8
σ2 = 2
σ2 = 2 µµ==2
=2 σ2 = 3
0.24 σ2 = 3 0.6
0.16 0.4
µµ==2
=2
0.08 0.2
0 0
−4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8
x x
(a) (b)
Fig. 4.6: (a) Normal density function and (b) the corresponding distribution
function.
λe −λx if x ≥ 0
fX (x) = (4.20)
0 otherwise .
The Probability Density Function 31
fX (x) 3 FX (x) 1
2.5 0.8
2 λ=1
λ=1 0.6 λ=2
1.5 λ=2 λ=3
λ=3 0.4
1
0.2
0.5
0 0
0 1 2 3 4 0 1 2 3 4
x x
(a) (b)
Fig. 4.7: (a) Exponential distribution and (b) the corresponding distribution
function.
(a) (b)
Fig. 4.8: (a) Rayleigh distribution and (b) the corresponding distribution
function.
where I0 (· ) is the 0th-order modified Bessel function of the first kind. Note that if
ρ → 0, then the Rice distribution tends to the Rayleigh distribution. The distribution
where
Z∞
x2 +α2
Q1 (α, β) = xe − 2 I0 (αx) dx (4.26)
β
0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
x x
(a) (b)
Fig. 4.9: (a) Rice distribution and (b) the corresponding distribution
function.
to model the distribution of the signal amplitude of a randomly received signal in cases
2 m m 2m−1 −mx2 /Ω
Γ(m) Ω x e if x > 0
fX (x) = (4.27)
0 otherwise
able with parameters µ and σ 2 . Let us define a new random variable Y that is related
2
R∞
In (4.27), Γ(m) represents the gamma function defined as Γ(m) = xm−1 e −x dx. Some impor-
0
tant properties
√ and formulas for the gamma function are: Γ(m + 1) = mΓ(m) = m!, Γ(1) = 1, and
Γ(1/2) = π.
34 The Concept of a Random Variable
2 /2σ 2
√ 1 e −(ln y−µ) if y ≥ 0
fY (y) = 2πσy (4.28)
0 otherwise .
modelling the effect of shadowing of the transmitted signal due to large obstructions,
fX (1) = P {X = 1} = p (4.29a)
fX (0) = P {X = 0} = q = 1 − p . (4.29b)
n k n−k
fX (k) = P {X = k} = p q , p + q = 1, k = 0, 1, . . . , n . (4.30)
k
fX (k) FX (k) 1
0.15
n = 25 0.8
p = 1/2
0.1 0.6
n = 25
0.4 p = 1/2
0.05
0.2
0 0
0 5 10 15 20 25 0 5 10 15 20 25
k k
(a) (b)
Fig. 4.10: (a) Binomial density function and (b) the corresponding distribution
function.
λk
fX (k) = P {X = k} = e −λ , k = 0, 1, . . . , ∞. (4.31)
k!
0.2 0.8
λ==3
λ =3
0.15 0.6
0.1 λ==3
λ=3 0.4
0.05 0.2
0 0
0 5 10 15 20 25 0 5 10 15 20 25
k k
(a) (b)
Fig. 4.11: (a) Poisson density function and (b) the corresponding distribution
function.
Applications. The Poisson distribution is often used to describe the number of occur-
• The number of telephone calls at a telephone exchange over a fixed time interval
(0, t0 ).
The following theorem shows that the binomial distribution is closely connected to
Y = g(X) (5.1)
fY (y), which depends on the function g(x) as well as the PDF fX (x) of the random
variable X.
y g(x)
Input X Output Y
with fX (x) with f Y (y)
Fig. 5.1: A function g(x) maps the random variable X with PDF fX (x) into
another random variable Y with PDF fY (y).
The problem is to find the PDF fY (y) of Y for a given function g(x) and a known
theorem.
38 Functions of Random Variables
y = g(x), i.e.,
Proof. To avoid generalities, we assume that the equation y = g(x) has three roots
g(x)
g(x)
y+dy
dy
x1 x2 x3 x
f Y (y)
f X(x)
x
dx1 dx2 dx3
We know that
We have to find the set of values of x such that y < g(x) ≤ y + dy, then we can obtain
For the example shown in Fig. 5.2, this set contains the following three intervals:
where dx1 > 0, dx3 > 0 but dx2 < 0. From the foregoing it follows that
We can see from Fig. 5.2 that the terms on the right hand side are given by
Hence, when we have three roots for the equation y = g(x), we conclude that
Y = aX 2 + b
where a > 0 and b are constants, and X has a Gaussian distribution with parameters
Solution. If y ≤ b, then the function y = g(x) = ax2 + b has no real solution; hence
fY (y) = 0. If y > b, then the function y = g(x) = ax2 + b has two roots:
r r
y−b y−b
x1 = + x2 = − .
a a
p p
g′ (x1 ) = 2 a(y − b) g′ (x2 ) = −2 a(y − b)
fX (x1 ) fX (x2 )
fY (y) = ′
+ ′
|g (x1 )| |g (x2 )|
" r ! r !#
1 y−b y−b
= p fX + fX − .
2 a(y − b) a a
1 x2
fX (x) = √ e− 2
2π
we obtain y−b
1
√ e− 2a if y > b
fY (y) = 2πa(y−b)
0 otherwise .
Y = eX .
If y > 0, then the equation y = e x has a single solution x1 = ln y. Since the slope
fX (x1 ) fX (ln y)
fY (y) = ′
= , y > 0.
|g (x1 )| y
Expected Value 41
situations, however, we are interested in a few parameters that summarize the in-
formation provided by the PDF. Such parameters are the expected value and the
variance.
X is defined by
Z∞
E{X} = xfX (x) dx . (5.4)
−∞
Z∞
1 (x−µ)2
E{X} = √ xe− 2σ 2 dx .
2πσ
−∞
42 Functions of Random Variables
Suppose that the random variable X is discrete and takes the values xi with proba-
X
fX (x) = pi δ(x − xi ) . (5.5)
i
Inserting (5.5) into (5.4) and using the sifting property of the delta function
Z∞
x δ(x − xi ) dx = xi
−∞
results in
X
E{X} = xi p i , pi = P {X = xi } . (5.6)
i
p1 = p. Hence,
1
X
E{X} = xi p i , pi = P {X = xi }
i=0
= x0 p 0 + x1 p 1
= 0 · (1 − p) + 1 · p
= p.
Expected Value 43
Suppose that we are interested in finding the expected value of Y = g(X). The direct
approach involves first finding the PDF of Y , and then the evaluation of E{Y } using
(5.4). This, however, is not necessary. As the next theorem shows, E{Y } can be
expressed directly in terms of the function g(X) and the PDF fX (x) of X.
Theorem 5.2. Let X be a random variable and g(x) be a function. Then, the expected
Z∞
E{Y } = E{g(X)} = g(x)fX (x) dx . (5.7)
−∞
Proof. For simplicity, we suppose that g(x) is strictly increasing as shown in Fig. 5.3.
Let us divide the y-axis into intervals of length ∆y. We index the intervals with
the index k and we let yk be the value in the center of the kth interval. Then, the
ing equivalent event of width ∆x in the x-axis (see Fig. 5.3). Let xk be the value in
the kth interval such that g(xk ) = yk . Then, by using fY (yk )∆y = fX (xk )∆x, we
obtain
X
E{Y } ≈ yk fY (yk )∆y
k
X
= g(xk )fX (xk )∆x .
k
Hence, in the limit ∆y → 0 (∆x → 0), we obtain (5.7). It should be noted that this
equation is valid even if g(x) is not strictly increasing. In this case, for the derivation,
we have to take into account the fact that each y-interval has an equivalent event as
in Fig. 5.2.
44 Functions of Random Variables
g(x)
yk ∆y
∆x
x
xk
1
3 if 0 ≤ x ≤ 3
fX (x) =
0 otherwise .
Let Y be a random variable defined by Y = X 2 /2. Find the expected value E{Y } of
Y.
Z∞
E{Y } = g(x)fX (x) dx
−∞
Z3
x2 1
= · dx
2 3
0
3
1 1 1 3 3
= · · x = .
3 2 3 0 2
Variance 45
Let X and Y be two random variables, and a and b are two real constants. Then,
E{a} = a . (5.8a)
2. Linearity:
5.3 Variance
Apart from the mean, the variance is an additional parameter to describe the PDF of
is defined by
△
Var{X} = E{(X − µ)2 }
Z∞
= (x − µ)2 fX (x) dx > 0 . (5.9)
−∞
2
Var{X} = σX = E{(X − µ)2 }
= E{X 2 − 2Xµ + µ2 }
= E{X 2 } − 2µE{X} + µ2
= E{X 2 } − µ2
uniformly distributed in the interval (−q/2, q/2). Since the PDF of the quantization
1
noise Q is fQ (x) = q rect ( xq ), the variance of Q is given by1
2
Var{Q} = σQ = E{Q2 } − (E{Q})2
| {z }
=0
Z∞
= x2 fQ (x) dx
−∞
q
Zq/2 2
1 x3 1 q2
= x2 · dx = · = .
q 3 q 12
−q/2 − q2
1 (x−µ)2
fX (x) = √ e − 2σ2 .
2πσ
1
The rectangular function rect(x) is defined as
1 for |x| < 1/2
rect(x) = 1/2 for x = 1/2 (5.11)
0 for x > 1/2 .
Variance 47
From the fact that the area under fX (x) equals 1, i.e.,
Z∞ Z∞
1 (x−µ)2
fX (x) dx = √ e− 2σ 2 dx = 1
2πσ
−∞ −∞
it follows
Z∞
1 (x−µ)2
√ e− 2σ 2 dx = σ .
2π
−∞
variance of X is defined by
X
2
Var{X} = σX = pi (xi − µ)2 , pi = P {X = xi } . (5.12)
i
E{X} = 1 · p + 0 · q = p
E{X 2 } = 12 · p + 02 · q = p
we obtain
= p − p2 = p(1 − p) = pq .
Properties:
48 Functions of Random Variables
Var{a} = 0 . (5.13)
5.4 Moments
Besides the expected value and the variance, the following quantities are of interest
Z∞
Moments. mn = E{X n } = xn fX (x) dx (5.16)
−∞
Z∞
n
Central Moments. ηn = E{(X − µ) } = (x − µ)n fX (x) dx (5.17)
−∞
Absolute Moments. E{|X|n } E{|X − µ|n } (5.18)
Notice that the mean and variance can be seen to be defined in terms of the first two
Under certain conditions, the PDF fX (x) of a random variable is uniquely determined
if all its moments mn are known. The underlying theory is known in mathematics as
In general, the mean and variance do not provide enough information to determine
the PDF. However, the mean and variance of a random variable X do allow us to
σ2
P {|X − µ| ≥ ε} ≤ (5.19)
ε2
where ε > 0.
Z∞
2
σ = Var{X} = (x − µ)2 fX (x) dx
−∞
Z
≥ (x − µ)2 fX (x) dx
|x−µ|≥ε
Z
2
≥ ε fX (x) dx .
|x−µ|≥ε
Hence, (5.19) results because the last integral equals P {|X − µ| ≥ ε}.
µ
P {X ≥ ε} ≤ (5.20)
ε
where ε > 0.
50 Functions of Random Variables
The concept of the characteristic function will prove very useful when we consider
is defined by
The two expressions given above motivate the following two interpretations of the
characteristic function:
• In (5.21a), ΦX (ω) can be viewed as the expected value of the function e jωX .
• In (5.21b), ΦX (ω) is simply (the complex conjugate of) the Fourier transform
Properties:
3. The properties of characteristic functions are essentially the same as the proper-
ties of Fourier transforms. This follows from the fact that ΦX (ω) is the Fourier
transform of fX (x).
4. In particular, from the Fourier transform inversion formula, it follows that fX (x)
Example 5.9. Show that the characteristic function of an N (µ, σ 2 ) random variable
X equals
σ2 ω2
ΦX (ω) = e jµω− 2 . (5.25)
function of Y equals
Z∞
ΦY (ω) = fY (y)e jωy dy
−∞
Z∞
1 y2
= √ e− 2 e jωy dy .
2π
−∞
y2 1 ω2
− + jωy = − (y − jω)2 −
2 2 2
we obtain
Z∞
2 1 (y−jω)2 ω2
− ω2
ΦY (ω) = e √ e − 2 dy = e − 2 .
2π
−∞
| {z }
=1
function of X is given by
∞
X
ΦX (ω) = pi e jωxi pi = P {X = xi } . (5.26)
i=−∞
In this case, ΦX (ω) is the (complex conjugate of the) Fourier transform of the sequence
pi .
A further useful property of the characteristic function is its relation to the moments
of the random variable. As the following moment theorem states, the moments of X
1 dn
E{X n } = ΦX (ω) . (5.27)
j n dω n ω=0
(jωx)2
e jωx = 1 + jωx + + ··· .
2!
Substituting the power series in the definition of ΦX (ω) [see (5.21b)] gives
Z∞
(jωx)2
ΦX (ω) = fX (x) 1 + jωx + + ··· dx .
2!
−∞
(jω)2 (jω)n
ΦX (ω) = 1 + jωE{X} + E{X 2 } + · · · + E{X n } + · · · (5.28)
2! n!
finally obtain
dn
ΦX (ω)|ω=0 = j n E{X n }
dω n
d
E{X} = −j ΦX (ω) . (5.29)
dω ω=0
It should be observed that if the power series in (5.28) converges, then the charac-
teristic function ΦX (ω) and hence the PDF fX (x) are completely determined by the
The joint (cumulative) distribution function FXY (x, y) or, simply, F (x, y) of two
F (x, y) = P {X ≤ x, Y ≤ y} (6.1)
where x and y are two arbitrary real numbers, and D is the semi-infinite rectangle
Fig. 6.1: The joint cumulative distribution function is defined as the probability
that (X, Y ) is in the semi-infinite rectangle determined by the points x
and y.
Joint Distribution and Joint Density 55
Properties:
2. It is certain that X and Y will assume values less than infinity. Therefore,
F (∞, ∞) = 1 . (6.3)
y2
D
y1
x1 x2
Fig. 6.2: The joint cumulative distribution function can be used to determine
the probability that (X, Y ) is in the rectangle D.
56 Two Random Variables
The joint probability density function fXY (x, y) or, simply, f (x, y) of X and Y is by
∂ 2 F (x, y)
f (x, y) = . (6.7)
∂x∂y
Properties:
3. The probability that the point (X, Y ) is in a region D equals the integral of
f (x, y) in D, i.e.,
Z Z
P {(X, Y ) ∈ D} = f (x, y) dx dy (6.10)
D
where the event {(X, Y ) ∈ D} consists of all outcomes ζ such that the point
(X(ζ), Y (ζ)) is in D.
and FY (y) are obtained from the joint distribution function F (x, y) as follows
Marginal density. The marginal probability density function fX (x) and fY (y) are
obtained from the joint probability density function f (x, y) by integrating out the
Joint Distribution and Joint Density 57
Z∞ Z∞
fX (x) = f (x, y) dy fY (y) = f (x, y) dx . (6.12)
−∞ −∞
Two random variables X and Y are called (statistically) independent if the events
P {X ∈ A, Y ∈ B} = P {X ∈ A} P {Y ∈ B} (6.13)
where A and B are two arbitrary sets on the x and y axes, respectively. Let us apply
this to the events {X ≤ x} and {Y ≤ y}, then the independence of X and Y implies
that
and, thus,
Therefore, if X and Y are independent random variables, then the joint PDF is equal
to the product of the marginal PDFs. It can also be shown that, if the joint PDF of
X and Y equals the product of the marginal PDFs, then X and Y are independent.
Hence,
Example 6.1. Let X ∼ N (0, σ 2 ) and Y ∼ N (0, σ 2 ). Suppose that X and Y are
1 x2
fX (x) = √ e − 2σ2
2πσ
and
1 y2
fY (y) = √ e − 2σ2
2πσ
in (6.15). Hence,
2 2
1 − x +y
f (x, y) = fX (x) fY (y) = e 2σ 2 . (6.17)
2πσ 2
and Y , i.e.,
Z = g(X, Y ) (6.18)
where it is supposed that the joint PDF fXY (x, y) is given. The problem is then to
where DZ represents the region in which the inequality g(x, y) ≤ z is satisfied (see
Fig. 6.3). The PDF fZ (z) of Z is then found by taking the derivative of FZ (z), i.e.,
dFZ (z)
fZ (z) = . (6.20)
dz
One Function of Two Random Variables 59
Dz
g(x, y) <− z
In practice, we are quite often interested in functions g(X, Y ), which are of the type
max(X, Y ) X −Y
min(X, Y ) X/Y
g(X, Y )
√
X2 + Y 2 XY
Example 6.2. Let the random variables X ∼ N (0, σ 2 ) and Y ∼ N (0, σ 2 ) be inde-
√
pendent and Z = X 2 + Y 2 . Determine fZ (z).
Solution. We have
p ZZ
FZ (z) = P {Z ≤ z} = P { X 2 + Y 2 ≤ z} = √ fXY (x, y) dx dy
DZ = x2 +y 2 ≤z
p
where DZ = x2 + y 2 ≤ z represents the area of a circle with radius z as shown in
Fig. 6.5.
60 Two Random Variables
Dz z
p
Fig. 6.5: The region DZ = x2 + y 2 ≤ z.
The transform of the Cartesian coordinates (x, y) into polar coordinates (r, θ) by
means of
x = r cos θ y = r sin θ dx dy = r dr dθ
gives
Z2π Zz
FZ (z) = fXY (r cos θ, r sin θ)rdrdθ .
0 0
Since the joint PDF fXY (x, y) of two zero-mean independent Gaussian random vari-
Hence, it follows
Z2π Zz
1 r2
FZ (z) = e − 2σ2 r dr dθ
2πσ 2
0 0
Zz
1 r2
= re − 2σ2 dr .
σ2
0
dFZ (z) z z2
fZ (z) = = 2 e − 2σ2 , z ≥ 0. (6.21)
dz σ
Note that the PDF in (6.21) is the Rayleigh density (4.22). Thus, if W = X + jY ,
where X and Y are real independent normal random variables with zero mean and
One Function of Two Random Variables 61
√
equal variance, then the random variable Z = |W | = X 2 + Y 2 has a Rayleigh
density.
Z =X +Y . (6.22)
R∞
fZ (z) = fXY (z − y, y) dy . (6.23)
−∞
Proof. In the first step, we determine the cumulative distribution function FZ (z) of
Z. Therefore, we have to integrate the joint PDF fXY (x, y) of X and Y over the
region DZ of the xy-plane, where DZ = x+y ≤ z is the shaded area shown in Fig. 6.6.
FZ (z) = P {Z ≤ z} = P {X + Y ≤ z}
Z∞ Z z−y
x+
y=
z
x=z− y
x + y <− z
d
fZ (z) = FZ (z) .
dz
In this context, it is useful to apply the differentiation rule due to Leibnitz, which
results in
Zb(z)
d db(z) da(z) ∂f (x, z)
fZ (z) = FZ (z) = f (b(z), z) − f (a(z), z) + dx . (6.26)
dz dz dz ∂z
a(z)
Z∞
fZ (z) = fX (z − y)fY (y) dy (6.28)
−∞
which represents the convolution of fX (z) with fY (z). This result leads to the fol-
lowing conclusion: If two random variables are independent, then the density of their
X
Z= . (6.29)
Y
Z∞
fZ (z) = |y|fXY (yz, y) dy . (6.30)
−∞
Y < 0. Hence, the event {X/Y ≤ z} in (6.31) needs to be conditioned by the event
FZ (z) = P {X/Y ≤ z}
= P {X/Y ≤ z ∩ (A ∪ Ā)}
= P {X ≤ Y z, Y > 0} + P {X ≥ Y z, Y < 0} .
The areas corresponding to the first term and the second term are shown in Fig. 6.7(a)
y y
x
x/y = z
yz
yz
=
x
=
x
x/y = z
(a) (b)
Fig. 6.7: (a) The region x ≤ yz (y > 0) and (b) the region x ≥ yz (y < 0).
Note that if X and Y are independent random variables, then fXY (x, y) = fX (x)fY (y)
Z∞
fZ (z) = |y|fX (yz)fY (y) dy . (6.33)
−∞
FZ (z) = P {Z ≤ z} = P {max(X, Y ) ≤ z}
= P {(X ≤ z, X > Y ) ∪ (Y ≤ z, X ≤ Y )}
= P {X ≤ z, X > Y } + P {Y ≤ z, X ≤ Y } (6.36)
where we have used the fact that {X > Y } and {X ≤ Y } are mutually exclusive sets
that form a partition. Figure 6.8(a) and Fig. 6.8(b) show the regions satisfying the
corresponding inequalities in each term in (6.36). Figure 6.8(c) illustrates the total
y y
x=z
y
=
y
x≤y
=
x
x≤z (z, z) y=z
(z, z)
x x
x>y y≤z
(a) (b)
y
(z, z)
x ≤ z, y ≤ z
(c)
Fig. 6.8: The regions (a) x ≤ z (x > y) and (b) y ≤ z (x ≤ y), as well as the total
region (c) x ≤ z (y ≤ z).
Note that if X and Y are independent, then
and hence
dFZ (z)
fZ (z) = = FX (z) · fY (z) + fX (z) · FY (z) . (6.39)
dz
Let X and Y be two random variables with joint PDF fXY (x, y). Furthermore, let
g(x, y) and h(x, y) be two functions, and suppose that Z and W are two new random
Two Functions of Two Random Variables 67
variables defined as
Z = g(X, Y ) (6.40)
W = h(X, Y ) . (6.41)
We consider the problem of finding the joint PDF fZW (z, w) of Z and W . Obviously,
with fZW (z, w) in hand, one can easily determine the marginal PDFs fZ (z) and
fW (w).
The procedure for determining fZW (z, w) is essentially the same as that described
in the previous section. The joint cumulative distribution function FZW (z, w) of Z
and W at the point (z, w) equals the probability that (X, Y ) is in the region of the
= P {g(X, Y ) ≤ z, h(X, Y ) ≤ w}
ZZ
= P {(X, Y ) ∈ DZW } = fXY (x, y) dx dy (6.42)
(x,y)∈DZW
with DZW representing the region where the inequalities g(x, y) ≤ z and h(x, y) ≤ w
D ZW
D ZW
Theorem 6.4. Suppose that X and Y are random variables, which are described
68 Two Random Variables
by the joint PDF fXY (x, y). Furthermore, we suppose that g(x, y) and h(x, y) are
such that
is given by
n
X fXY (xi , yi )
fZW (z, w) = (6.44)
|J(xi , yi )|
i=1
Remark:
1
|J(xi , yi )| = (6.46)
|J(z, w)|
where
∂g1 ∂g1
J(z, w) = ∂z ∂w . (6.47)
∂h1 ∂h1
∂z ∂w
In (6.47), the functions g1 and h1 denote the inverse transformation in (6.43), so that
xi = g1 (z, w) and yi = h1 (z, w). Hence, due to (6.46), we can replace 1/|J(xi , yi )|
Example 6.3. Let X and Y be two zero-mean independent Gaussian random vari-
Solution. With reference to Example 6.1, the joint PDF fXY (x, y) of X and Y equals
2 2
1 − x +y
fXY (x, y) = e 2σ 2 . (6.49)
2πσ 2
Here, the functions g(x, y) and h(x, y) are given by
p y
r = g(x, y) = x2 + y 2 and θ = h(x, y) = tan−1 (6.50)
x
respectively, where 0 < r < ∞ and |θ| < π. In this case, we have only a single solution
= r (6.52)
so that
Thus,
Zπ
r − r22
fR (r) = fRΘ (r, θ) dθ = e 2σ 0<r<∞ (6.56)
σ2
−π
and
Z∞
1
fΘ (θ) = fRΘ (r, θ) dr = |θ| < π . (6.57)
2π
0
Note that fR (r) in (6.56) represents the density of a Rayleigh random variable with
The above results can be summarized in the following statement: If X ∼ N (0, σ 2 ) and
Y ∼ N (0, σ 2 ) are independent, then the magnitude R and phase Θ of the complex
X + jY = R e jΘ (6.59)
Z = g(X, Y ) . (6.60)
To determine the PDF fZ (z) of Z by using the relation in (6.44), we define an auxiliary
variable
W =X (or W = Y ) . (6.61)
Then, the PDF fZ (z) can be obtained from fZW (z, w) by integration over w.
Joint Moments 71
z = g(x, y) = xy w = h(x, y) = x
x1 = w y1 = z/w .
The Jacobian is
∂g ∂g
∂x ∂y y x
J(x1 , y1 ) = ∂h ∂h = = −x1 = −w .
1 0 x=x1
∂x ∂y x=x1 y=y1
y=y1
In the special case that the random variables X and Y are independent, it follows
Z∞ z
1
fZ (z) = fX (w)fY dw (6.63)
|w| w
−∞
In the following, we suppose that X and Y are random variables and g(x, y) is a
function.
Expected Value. The expected value of the random variable Z = g(X, Y ) is defined
by
Z∞
E{Z} = zfZ (z) dz (6.64)
−∞
72 Two Random Variables
or, alternatively,
Z∞ Z∞
E{Z} = E{g(X, Y )} = g(x, y)f (x, y) dx dy (6.65)
−∞ −∞
the number
= E{XY − XµY − Y µX + µX µY }
= E{XY } − µX µY
CXY
ρXY = . (6.68)
σX σY
covariance CXY is 0. This can be phrased in the following three equivalent forms:
E{X Y } = 0 . (6.71)
To indicate that the random variables X and Y are orthogonal, one often uses the
notation X⊥Y .
Theorem 6.5. If two random variables X and Y are independent, that is, if
It can be shown, however, that if two random variables X and Y are uncorrelated,
then they are not necessarily independent. The above results can be phrased in the
following statement:
to independence.
Fundamental results can also be obtained for the expected value and the variance of
µZ = E{Z} = E{X + Y }
= E{X} + E{Y }
= µX + µY . (6.75)
Thus, the expected value of the sum of two random variables equals the sum of their
σZ2 = σX
2
+ 2ρXY σX σY + σY2 . (6.76)
This leads to the conclusion that if two random variables are uncorrelated, i.e., if
ρXY = 0, then the variance of their sum equals the sum of their variances, that is,
σZ2 = σX
2
+ σY2 . (6.77)
Proof. The relation (6.76) can be proved by using (6.66) and (6.68) as follows:
= E{[(X − µX ) + (Y − µY )]2 }
Z∞ Z∞
k r
mkr = E{X Y } = xk y r fXY (x, y) dx dy (6.78)
−∞ −∞
Similar to (5.17) and (5.18), the joint central and absolute moments of X and Y can
be defined.
X and Y are called jointly normal (or jointly Gaussian) if their joint density is given
by
(x−µX )2 (y−µY )2
1 (x−µX )(y−µY )
− −2ρXY +
2(1−ρ2 ) σ2 σX σY σ2
fXY (x, y) = A e XY X Y (6.79)
where
1
A= q (6.80)
2πσX σY 1 − ρ2XY
and ρXY denotes the correlation coefficient. The function fXY (x, y) in (6.79) is also
2 , σ2 , ρ
denoted by N (µX , µY , σX Y XY ). Without proof, we mention that the marginal
(x−µX )2 (y−µY )2
1 −
2σ 2
1 −
2σ 2
fX (x) = √ e X , fY (y) = √ e Y (6.81)
2πσX 2πσY
From the results above, it follows that if two random variables are jointly normal, then
they are also marginally normal. Note that fXY (x, y) can be written as fXY (x, y) =
Definition 6.4. The joint characteristic function of the random variables X and Y
is defined as
By analogy to the one-dimensional case (see Section 5.6), the two expressions given
• In (6.82a), ΦXY (ω1 , ω2 ) can be viewed as the expected value of the two-dimensional
• In (6.82b), ΦXY (ω1 , ω2 ) is simply (the complex conjugate of) the two-dimensional
Fourier transform of the joint PDF fXY (x, y) of X and Y , i.e., ΦXY (ω1 , ω2 ) =
Properties:
1. The marginal characteristic functions can be obtained from the joint character-
istic function:
2. If Z = aX + bY , then
3. The inversion formula for the two-dimensional Fourier transform implies that
Independence. If the two random variables X and Y are independent, then the
i.e.,
Proof.
where the third equality follows from the fact that E{g1 (X) g2 (Y )} = E{g1 (X)} E{g2 (Y )}.
Proof.
Remember that the density fZ (z) of Z = X + Y equals the convolution of fX (x) and
fY (y) if X and Y are independent [see (6.28)]. From this and (6.87) it follows that
the characteristic function of the convolution of two densities equals the product of
their characteristic functions. This result is often used when dealing with sums of
random variables.
78 Two Random Variables
The definition of the conditional probability in Section 2.3 provides the formula for
computing the probability that the event {Y ≤ y} occurs given that we know the
P {Y ≤ y, X = x}
FY (y | X = x) = P {Y ≤ y | X = x} = . (6.88)
P {X = x}
Ry
fXY (x, y ′ ) dy ′
−∞
FY (y | X = x) = . (6.89)
fX (x)
P {Y ≤ y, x < X ≤ x + ∆x}
FY (y | x < X ≤ x + ∆x) =
P {x < X ≤ x + ∆x}
y
R x+∆x
R
fXY (x′ , y ′ ) dx′ dy ′
−∞ x
= x+∆x
R
fX (x′ ) dx′
x
Ry
fXY (x, y ′ ) dy ′ ∆x
−∞
≃ . (6.91)
fX (x)∆x
Conditional Probability 79
d fXY (x, y)
fY (y | X = x) = FY (y | X = x) = . (6.92)
dy fX (x)
Note that if the random variables X and Y are independent, then fXY (x, y) =
Total Probability. Inserting fXY (x, y) = fY (y | x)fX (x) into the marginal PDF
Z∞
fY (y) = fXY (x, y) dx
−∞
gives the PDF version of the total probability introduced in Section 2.4
Z∞
fY (y) = fY (y | x)fX (x) dx . (6.95)
−∞
As this result shows, to remove the condition X = x from the conditional density
that
fY (y | x)fX (x)
fX (x | y) = . (6.96)
fY (y)
Inserting (6.95) into (6.96), we obtain Bayes’ theorem (see Section 2.5) for densities
fY (y | x)fX (x)
fX (x | y) = R ∞ . (6.97)
−∞ fY (y | x)fX (x) dx
Chapter 7
Such problems can often be reduced to the problem of finding the distribution of
n
X
Y = Xi . (7.1)
i=1
In this section, we study the expected value and variance of Y , as well as the PDF of
µY = E{Y } = E{X1 + X2 + · · · + Xn }
= µ X1 + µ X2 + · · · + µ Xn . (7.2)
variables is equal to the sum of the expected values of the individual random variables.
2
Let Y = X1 + X2 + · · · + Xn be a sum of n random variables, each with variance σX i
(i = 1, 2, . . . , n), then
for i 6= j and
n
X
σY2 = 2
σX i
. (7.4)
i=1
Sums of Random Variables 83
Y = X1 + X2 + · · · + Xn (7.5)
that their densities fXi (x) are known for all i = 1, 2, . . . , n. To solve this problem,
where ΦXi (ω) = F ∗ {fXi (x)}. Equation (7.6) states that the characteristic function of
Y equals the product of the individual characteristic functions of the random variables
Note that by applying the convolution theorem for Fourier transforms, we obtain
◦—•
◦—•
◦—•
where ⋆ denotes the convolution operator. Hence, the density fY (y) of the sum
of Y = X1 + X2 + · · · + Xn .
84 Sums of Random Variables and Limit Theorems
jωµXi −ω 2 σX
2 /2
ΦXi (ω) = e i . (7.10)
n
Y
ΦY (ω) = ΦXi (ω) (7.11)
i=1
Yn
jωµXi −ω 2 σX
2 /2
= e i
i=1
jω n
P 2
Pn 2
i=1 µXi −ω ( i=1 σX )/2
= e i
2 σ2
= e jωµY −ω Y /2
(7.12)
Pn Pn
where µY = i=1 µXi and σY2 = 2
i=1 σXi . Note that (7.12) is the characteristic
function of a Gaussian random variable with expected value µY and variance σY2 .
2 = Var{X} of
In practice, the expected mean µX = E{X} and the variance σX
random variables are often unknown. A problem is then to estimate the mean and
which the mean and variance are unknown, then the sample mean µ̄X and the sample
The Sample Mean and the Sample Variance 85
2 are defined as
variance σ̄X
n
1X
µ̄X = Xi (7.13)
n
i=1
and
n
2 1 X
σ̄X = (Xi − µ̄X )2 (7.14)
n−1
i=1
respectively.
The sample mean µ̄X can be considered as an estimator for the mean µX = E{X}
2 represents an estimator for the variance σ 2 = Var {X}.
and the sample variance σ̄X X
and
2 2
(ii) E{σ̄X } = σX . (7.16)
Proof. Since it has been assumed that the random variables Xi are i.i.d., they must
2 .
have the same mean E{Xi } = µX and the same variance Var{Xi } = σX
(i) The proof of (7.15) results from the linearity of the expected value as follows
( n ) n n
1X 1X 1X
E{µ̄X } = E Xi = E{Xi } = µX = µX . (7.17)
n n n
i=1 i=1 i=1
Hence, the expected value of µ̄X is equal to the expected value of X.
(ii) To prove (7.16), we observe first that the variance of µ̄X is given by
( n ) n n
1X 1 X 1 X 2 σ2
Var{µ̄X } = Var Xi = 2 Var{Xi } = 2 σX = X . (7.18)
n n n n
i=1 i=1 i=1
We note that the variance of µ̄X decreases inversely with the number of samples. We
where the second equation holds because the random variables Xi and Xj (j 6= i) are
Remarks:
1. Note that the sample mean µ̄X is an estimator for the mean µX of X. Anal-
2 can be considered as an estimator for the
ogously, the sample variance σ̄X
2 of X.
variance σX
2. Equation (7.15) shows that the statistical average of the sample mean µ̄X is
equal to E{X} = µX . For this reason, we say that the sample mean µ̄X is
n
2 1X
σ̄X = (Xi − µX )2 . (7.19)
n
i=1
variables. Then,
for ε > 0, where µ̄X denotes the sample mean of the sequence X1 , X2 , . . . , Xn . We
say that µ̄X tends to µX in probability. This is also called stochastic convergence.
Var{µ̄X }
P {|µ̄X − E{µ̄X }| ≥ ε} ≤ . (7.21)
ε2
2
σX
P {|µ̄X − µX | ≥ ε} ≤ . (7.22)
nε2
The weak law of large numbers can be phrased as follows: The probability that the
sample mean µ̄X differs from the true mean µX by more than ε (ε > 0) tends to zero
as n approaches infinity.
It can be shown that µ̄X tends to µX not only in probability, but also with probability
variables. Then,
where µ̄X denotes the sample mean of the sequence X1 , X2 , . . . , Xn . We say that µ̄X
The strong law of large numbers can be phrased as follows: With probability 1, we
expect that each particular sequence of sample mean calculations tends to the true
mean µX = E{X} and stays there. This statement is illustrated in Fig. 7.1, where,
It should be noted that the strong low of large numbers predicts the convergence of
sample means µ̄X to the expected value µX . However, there are still gaps between the
mathematical theory and the real world, because we can never actually carry out an
µ̄X
µX
Fig. 7.1: Convergence of the sample mean µ̄X to the expected value µX .
The central limit theorem is extremely useful concerning the density of a sum of
random variables in the limit as the number of terms in the sum approaches infinity.
Let
Y = X1 + X2 + · · · + Xn (7.25)
be a sum of n i.i.d. random variables, each having a finite mean E{Xi } = µ and a
Y − nµ
Z= √ (7.26)
σ n
tends to a Gaussian random variable with zero mean and unit variance as n tends to
infinity. Hence,
Z ∼ N (0, 1) as n → ∞. (7.27)
90 Sums of Random Variables and Limit Theorems
ΦZ (ω) = E e jωZ
( Pn √ )
jω (Xi −µ)/(σ n)
= E e i=1
( n )
Y √
jω(Xi −µ)/(σ n)
= E e
i=1
n
Y n √ o
= E e jω(Xi −µ)/(σ n)
i=1
h n √ oin
= E e jω(Xi −µ)/(σ n) . (7.28)
It should be mentioned that the second last equality follows from the independence
of the random variables Xi , and the last equality follows from the fact that the Xi ’s
where Rn (ω)/n denotes the remainder. We note that the term Rn (ω) approaches zero
the form
n
ω 2 Rn (ω)
ΦZ (ω) = 1 − + .
2n n
1 1
ln(1 + x) = x − x2 + x3 − · · · .
2 3
ω2
lim ln ΦZ (ω) = −
n→∞ 2
or, equivalently,
ω2
lim ΦZ (ω) = e − 2 . (7.31)
n→∞
This is just the characteristic function of a Gaussian random variable with zero mean
The central limit theorem of Lindeberg and Lévy allows the following statement:
Although we have assumed that the random variables Xi in the sum Y are identically
distributed, the assumption can be relaxed provided that additional restrictions are
imposed on the properties of the Xi ’s. In the following variation of the central limit
favor of a condition on the third absolute moment of the random variables in the sum
of Y .
Let
92 Sums of Random Variables and Limit Theorems
Y = X1 + X2 + · · · + Xn
be a sum of n independent random variables, each with mean µi and variance σi2 > 0.
Furthermore, let
v v
u n u n
uX uX
3
Bn = t E{|Xi − µi |3 } Cn = t σi2 . (7.32)
i=1 i=1
If the condition
Bn
lim =0 (7.33)
n→∞ Cn
Z ∼ N (0, 1) as n → ∞. (7.35)
Note that Lyapunov’s central limit theorem is more general than the central limit
theorem of Lindeberg and Lévy, since the random variables Xi need not to be iden-
tically distributed.
The nature of the central limit theorem approximation depends on the form of the
densities fXi (x) of the random variables Xi . If the Xi ’s are i.i.d. random variables,
then values of n in the range from 10 to 20 are adequate for most applications. In
fact, if the densities fXi (x) are smooth, then values of n as low as 5 can be used.
Fig. 7.2 illustrates the central limit theorem. This figure shows the density fY (y) of
1 n=1
n=2
n=3
n=4
0.8
f Y (y)
0.6
0.4
0.2
X = X1 + X2 + · · · + Xn (7.36)
In Example 5.8, we have seen that the mean and the variance of a Bernoulli random
E{Xi } = p (7.38)
Var{Xi } = p (1 − p) . (7.39)
Since the Xi ’s in (7.36) are i.i.d. random variables, the mean and the variance of the
E{X} = np (7.40)
Var{X} = np (1 − p) . (7.41)
94 Sums of Random Variables and Limit Theorems
Hence, the evaluation of (7.42) is difficult for large n. A particular important applica-
tion of the central limit theorem is in the approximation of binomial distributions. The
central limit theorem states that a binomial random variable X [see (7.36)] tends to a
as n approaches infinity.
Let X be a binomial random variable with mean E{X} = np and variance Var{X} =
np (1 − p), then
n k
fX (k) = P {X = k} = p (1 − p)n−k
k
1 −
(k−np)2 (7.44)
= p e 2np (1−p) as n → ∞ .
2πnp (1 − p)
by [see (5.26)]
∞
X
ΦX (ω) = pk e jωxk , pk = P {X = xk } . (7.45)
k=−∞
random variable to be
n
X n
ΦX (ω) = pk q n−k e jωk
k
k=0
n
= p e jω + q . (7.47)
X − np
Z= √ . (7.48)
npq
Using the property (5.23), gives the characteristic function of Z in the form
jωZ
√
−jnpω/ npq ω
ΦZ (ω) = E{e }=e ΦX √
npq
√ √ n
= e −jnpω/ npq p e jω/ npq + q
√ √ n
= p e jqω/ npq + q e −jpω/ npq
( " ∞ #
jqω q2ω2 X 1 jqω k
= p 1+ √ − + √
npq 2npq k! npq
k=3
" ∞ #)n
jpω p2 ω 2 X 1 −jpω k
+ q 1− √ − + √
npq 2npq k! npq
k=3
n
ω2
= 1− [1 + Rn (ω)] (7.49)
2n
ω2
lim ΦZ (ω) = e − 2 (7.51)
n→∞
96 Sums of Random Variables and Limit Theorems
which is equivalent to
1 z2
lim fZ (z) = √ e − 2 (7.52)
n→∞ 2π
where fZ (z) denotes the density of Z. Hence, the distribution of Z tends to the
standard normal distribution, and from (7.48) we conclude that X ∼ N (np, npq).
From Theorem 7.5 it follows that if n is large, then the binomial distribution can be
Example 7.2. A fair coin is tossed 1000 times. Find the probability that the number
of heads is 500.
The next example indicates that the approximation (7.53) is satisfactory even for
moderate values of n.
Example 7.3. A fair coin is tossed 10 times. Find the probability that the number
of heads equals 5.
1 2 /2np(1−p)
P {X = 5} ≃ p e −(k−np)
2πnp(1 − p)
1
= √ = 0.252 .
5π
Form (7.53), we can find the following useful approximation for the probability that
k2 k2
X X n k
P {k1 ≤ X ≤ k2 } = fX (k) = p (1 − p)n−k
k
k=k1 k=k1
Zk2 2
1 (x−np)
− 2np
≃ p e (1−p) dx . (7.54)
2πnp(1 − p)
k1
that the integral in (7.54) does not have a closed-form solution. In electrical engi-
neering it is customary to work with the so-called error function1 [see Fig. 7.3]
Zx
1 2 /2
erf(x) = √ e −y dy , x>0 (7.55)
2π
0
which is listed in many books in form of looking up tables. Note that the symmetry
of the integrand in (7.55) implies that the error function is an odd function, i.e.,
Zx
2 2
1
In other references, the error function is often defined as erf(x) = √ e −y dy .
π
0
98 Sums of Random Variables and Limit Theorems
0.5
0.45
0.4
Rx 2 /2
← erf(x) = √1 e −y dy
0.35 2π
0
0.3
0.25
2 /2
0.2
← √1 e −x
2π
0.15
0.1
0.05
0
0 0.5 1 1.5 2 2.5 3 3.5 4
x
Using (7.55) in (7.54), we finally obtain the desired approximation in the form
! !
k − np k − np
P {k1 ≤ X ≤ k2 } ≃ erf p 2 − erf p 1 . (7.57)
np(1 − p) np(1 − p)
Example 7.4. A fair coin is tossed 10 000 times. What is the probability that the
Since,
k − np 100 k1 − np 100
p 2 = = 2 and p =− = −2
np (1 − p) 50 np (1 − p) 50
we conclude from (7.57) and (7.56) that the unknown probability is approximately
given by
= 2 erf(2)
= 0.9545.
Gaussian Approximation for Binomial Probabilities 99
An exchange is connected with 180 telephones. For each telephone, the probability
that a call will occur in a four-hour interval is equal to p = 1/3. What is the
probability that in a four-hour interval the number of calls is between 50 and 70?
trials with p = 1/3. Using (7.54) and (7.57), the probability that the number of calls
k2 70 k 180−k
X n k X 180 1 2
p (1 − p)n−k =
k k 3 3
k=k1 k=50
! !
k2 − np k1 − np
≃ erf p − erf p
np (1 − p) np (1 − p
1 1
70 − 180 · 3 50 − 180 · 3
= erf q − erf q
1 2 1 2
180 · 3 · 3 180 · 3 · 3
√ √
= erf( 2.5) − erf(− 2.5)
√
= 2 erf( 2.5)
≃ 0.8862 .
A binary data stream is transmitted over a binary symmetric channel with a bit error
probability of p = 0.1. The data stream is partitioned into packages of data blocks
of length n. A package error occurs if a block of n bits contains one or more than
one bit errors. To protect the data transmission from errors, block codes are used,
which can correct up to t errors in a block of n bits. For some important block codes,
the typical values for n and t are listed in Table 7.1. What is the probability for a
package error when the block codes listed in Table 7.1 are used?
100 Sums of Random Variables and Limit Theorems
Block code n t
Hamming 31 1
Reed-Muller 64 15
BCH 127 14
Solution. A package error occurs if the number of bit errors in a data block is larger
31
P 31 k
• Exact solution: P{Package error} = k p (1 − p)31−k = 0.8304
k=2
64
P 64 k
• Exact solution: P{Package error} = k p (1 − p)64−k = 4.467 · 10−4
k=16
127
P
127
• Exact solution: P{Package error} = k p k (1 − p)127−k = 0.2875
k=15
Y = X1 · X2 · · · Xn , Xi > 0 . (7.58)
where
n
X n
X
2
µ= E{ln Xi } σ = Var{ln Xi } . (7.60)
i=1 i=1
Z = ln Y = ln X1 + ln X2 + · · · + ln Xn
which is the sum of the random variables ln Xi . From the central limit theorem,
it follows that for large n, this random variable is nearly normal with mean µ and
variance σ 2 . Since
Y = eZ
we conclude from (4.28) that Y has a lognormal density. This theorem holds if
the random variables ln Xi satisfy the conditions for the validity of the central limit
theorem.
Part II
Stochastic Processes
Chapter 8
Stochastic Processes
S a number X(ζ).
where the domain of t is the set R of real numbers and the domain of ζ is the set S
Note that we use the notation X(t) to represent a stochastic process omitting its
processes X(t):
104 Stochastic Processes
1. If t and ζ are variables, then X(t) is a stochastic process, i.e., a family (or an
ensemble) of functions X(t, ζ).
Single number
Fig. 8.1. Relationship between stochastic processes, sample functions, random
variables, and numbers.
of real numbers R. If the domain of t is the set of integer numbers Z, then X(t) is a
discrete-time process.
We say that X(t) is a discrete-state process if its values are countable. Otherwise,
where X(t) and Y (t) are real stochastic processes. A vector process X (t) (or n-
the index i refers to the number of dots showing on the top face. For each outcome of
the experiment, let us arbitrarily assign the following functions of time t (see Fig. 8.2):
X(t), i.e.,
X(t, ζ4 )
X(t, ζ 3 )
Sample
4
space X(t, ζ 6 )
S 2 Stochastic
process
0 t X(t)
2 4 6 8 10
−2
X(t, ζ 5 )
−4
X(t, ζ 2 )
X(t, ζ 1 )
a) Let A be a random variable uniformly distributed in the interval [−1, 1]. Define
where f and θ are constants. The stochastic process X(t) = {X(t, ζ1 ), X(t, ζ2 ), . . . }
b) Let θ be a random variable uniformly distributed in the interval (−π, π]. Define
where A and f are constants. The stochastic process X(t) = {X(t, ζ1 ), X(t, ζ2 ), . . . }
c) Let f be a random variable uniformly distributed in the interval [−2, 2]. Define
where A and θ are constants. The stochastic process X(t) = {X(t, ζ1 ), X(t, ζ2 ), . . . }
A(ζ1 ) = 0.8
X(t, ζi ) ✂A(ζ ) = 0.6
✂✂ 2
✂
✂✌ ✂
A(ζ3 ) = −0.2
✂✌✂
X(t, ζi )
θ(ζ1 ) = 0
✂ θ(ζ ) = −π/4
✂✂ 2
✂✌✂ ✂✌
Fig. 8.3. Sinusoid with (a) random amplitude, (b) random phase, and (c) random
frequency.
108 Stochastic Processes
X(t, ζ 1)
t1 t2 t3 t4 t
X(t, ζ 2)
t1 t2 t3 t4 t
X(t, ζ 3)
t1 t2 t3 t4 t
terms of its nth-order joint cumulative distribution function of the random variables
X1 , X2 , . . . , Xn .
Statistics of Stochastic Processes 109
(8.5)
equals
∂ n F (x1 , x2 , . . . , xn ; t1 , t2 , . . . , tn )
f (x1 , x2 , . . . , xn ; t1 , t2 , . . . , tn ) = . (8.6)
∂x1 ∂x2 · · · ∂xn
where fX (x; t) is the PDF of X(t). In general, µX (t) is a function of time. Trends in
the behavior of X(t) are reflected in the variation of µX (t) with time.
tic process X(t) is defined as the joint moment of the random variables X(t1 ) and
X ∗ (t2 ):
Thus, the autocorrelation function is the mean of the product X(t1 )X ∗ (t2 ). Note
that the conjugate term is associated with the second variable in RXX (t1 , t2 ). From
and
The last result shows that the autocorrelation function RXX (t2 , t1 ) on the diagonal
process X(t) is defined as the covariance of the random variables X(t1 ) and X(t2 ):
From (6.67), it follows that the autocovariance function can be expressed in terms of
Many stochastic processes have the property that the nature of the randomness in the
process does not change with time. In such cases, an observation of the process in the
time interval (t1 , t2 ) exhibits the same type of random behavior as an observation in
some other time interval (t1 + c, t2 + c). This leads us to postulate that probabilities
involving samples taken at times t1 , t2 , . . . , tn will not differ from those taken at
t1 + c, t2 + c, . . . , tn + c.
Statistics of Stochastic Processes 111
f (x1 , x2 , . . . , xn ; t1 , t2 , . . . , tn ) = f (x1 , x2 , . . . , xn ; t1 + c, t2 + c, . . . , tn + c)
(8.15)
for any c.
If X(t) is a SSS process, then the processes X(t) and X(t + c) have the same statistics
2. The second-order PDF depends only on the time difference between the samples,
i.e.,
f (x1 , x2 ; t1 , t2 ) = f (x1 , x2 ; t1 + c, t2 + c)
= f (x1 , x2 ; t1 − t2 , 0)
= f (x1 , x2 ; t1 − t2 )
112 Stochastic Processes
X(t, ζ 1)
t1 t2 t
X(t, ζ 2)
t1 t2 t
X(t, ζ 3)
t1 t2 t
c
X(t 1) X(t 2)
Same statistics
for all t.
4. The second property implies that the autocorrelation function of X(t) depends
Statistics of Stochastic Processes 113
only on τ = t1 − t2 , i.e.,
since (8.15) is difficult to examine. But we can determine whether the mean is constant
E{X(t)} = µX (8.20)
Note that all SSS processes are WSS, since they satisfy (8.20) and (8.21), but a WSS
Let X(t) = a cos(2πf t + Θ), where a and f are constants and Θ is a random variable,
which is uniformly distributed in the interval (−π, π]. Is the stochastic process X(t)
WSS?
Solution. The mean µX (t) of X(t) is found by using (8.7) and (5.7):
1
cos(x) cos(y) = [cos(x + y) + cos(x − y)]
2
gives
a2
RXX (t1 , t2 ) = E{cos[2πf (t1 + t2 ) + 2Θ] + cos[2πf (t1 − t2 )]}
2
a2 a2
= E{cos[2πf (t1 + t2 ) + 2Θ]} + E{cos[2πf (t1 − t2 )]}
2 2
a 2
= 0+ cos[2πf (t1 − t2 )]
2
a2
= cos[2πf (t1 − t2 )] .
2
Note that µX (t) is a constant and RXX (t1 , t2 ) depends only on the time difference
Let X(t) = A cos(2πf t + θ), where f and θ are constants and A is a random variable
= E{A} cos(2πf t + θ)
= µA cos(2πf t + θ) .
Statistics of Stochastic Processes 115
Note that µX (t) varies with t and RXX (t1 , t2 ) depends not only on the time difference
to estimate the mean µX (t) of a stochastic process, we have to average over a large
a single sample function of X(t). In such cases, one refers to the ergodicity theorem
(see Theorem 8.1). The ergodicity theorem states conditions under which the time
average of a single sample function of X(t) converges to the ensemble average (mean)
of X(t).
Time average. The time average µ̄X of a single sample function X(t, ζi ) is defined
as
ZT
1
µ̄X =< X(t, ζi ) >= lim X(t, ζi ) dt . (8.23)
T →∞ 2T
−T
116 Stochastic Processes
ZT
∗ 1
R̄XX (τ ) =< X(t + τ, ζi )X (t, ζi ) >= lim X(t + τ, ζi )X ∗ (t, ζi ) dt . (8.24)
T →∞ 2T
−T
ZT
1
µX = E{X(t)} = lim X(t, ζi ) dt = µ̄X . (8.25)
T →∞ 2T
−T
lation function RXX (τ ) equals the time autocorrelation function R̄XX (τ ) of X(t, ζi ),
i.e.,
ZT
∗ 1
RXX (τ ) = E{X(t + τ )X (t)} = lim X(t + τ, ζi ) X ∗ (t, ζi ) dt = R̄XX (τ ) .
T →∞ 2T
−T
(8.26)
Figure 8.6 illustrates a mean-ergodic process X(t), and the following theorem states
X(t, ζ 1)
t1 t
X(t, ζ 2)
−
µX
t1 t
Ensemble average = time average
E{X(t1 )} = < X (t, ζ 2 ) >
X(t, ζ 3) −
µ X= µ X
t1 t
µ X=E{X (t 1)}
Z2T
1 α
CXX (α) 1 − dα −−
−−→ 0 . (8.27)
T 2T T →∞
0
Example 8.5. Suppose that Z is a random variable with mean µZ . Is the stochastic
Solution. In this case, X(t) is an ensemble of straight lines. The ensemble average of
X(t) is
µX = E{X(t)} = E{Z} = µZ .
ZT
1
µ̄X = lim X(t, ζi ) dt = Z(ζi )
T →∞ 2T
−T
not mean-ergodic.
The autocorrelation function RXX (τ ) is an appropriate measure for the average rate
of change of a random process. Indeed, if a random process changes slowly with time,
then it remains correlated with itself for a long period of time, and RXX (τ ) decreases
becomes quickly uncorrelated with itself, and RXX (τ ) decreases rapidly with τ . The
linear signal processing algorithms. For this reason, we will discuss here some prop-
Recall that the autocorrelation function RXX (τ ) of a WSS process X(t) depends only
Correlation Functions of Stochastic Processes 119
X(t, ζi ):
ZT
1
RXX (τ ) = lim X(t + τ, ζi )X ∗ (t, ζi ) dt . (8.29)
T →∞ 2T
−T
to digital processes.
The autocorrelation sequence RXX (m) of a WSS discrete-time process X(n) is defined
as
Finally, if a discrete-time process X(n) is WSS and autocorrelation ergodic, then the
autocorrelation sequence RXX (m) can be obtained from a single sample sequence
X(n, ζi ):
PN
1
RXX (m) = lim X(n + m, ζi )X ∗ (n, ζi ) . (8.31)
N →∞ 2N +1 n=−N
The autocorrelation function RXX (τ ) of a WSS process has the following properties:
∗
RXX (τ ) = RXX (−τ ) . (8.33)
Note that if X(t) is real-valued, then RXX (τ ) = RXX (−τ ), i.e., the autocorre-
Proof.
= E{X(t)X ∗ (t + τ )}
which holds for any two random variables X and Y . Applying this equation to
XX
ai a∗j RXX (τi − τj ) ≥ 0 (8.36)
i j
5. If RXX (0) = RXX (Tp ), then RXX (τ ) is periodic with period Tp and X(t) is
Thus, RXX (0) = RXX (Tp ) implies that the right-hand side of the equation is
Hence, RXX (τ ) is periodic with period Tp . The fact that X(t) is mean-square
If we deal with two or more stochastic processes, we are often interested in the rela-
tionship between the processes. In such situations, the following expected values are
useful to describe the relationship between any two WSS processes X(t) and Y (t):
Cross-Correlation Function.
Cross-Covariance Function.
Correlation Coefficient.
CXY (τ )
ρXY (τ ) = p (8.41)
CXX (τ ) CY Y (τ )
Using these expected values, we can determine the degree of dependence between
two stochastic processes. The cross-correlation function RXY (τ ) has the following
properties:
Example 8.6. Suppose that the stochastic processes X(t) and Y (t) are real-valued
and WSS. Find the autocorrelation function of the complex process Z(t) = X(t) +
jY (t).
cross-correlation functions.
We now present the Wiener-Khinchin theorem, which states that the power spectral
function.
The power spectral density (or power spectrum) SXX (ω) of a WSS process X(t) is
Z∞
SXX (ω) = F{RXX (τ )} = RXX (τ ) e −jωτ dτ . (8.47)
−∞
124 Stochastic Processes
We also use the notation SXX (ω) •—◦ RXX (τ ). From the Fourier inversion formula,
it follows that
Z∞
1
RXX (τ ) = F −1 {S XX (ω)} = SXX (ω) e jωτ dω . (8.48)
2π
−∞
2. If X(t) is real, then RXX (τ ) is real and even and hence SXX (ω) is also real and
even, i.e.,
4. If X(t) has periodic components, then SXX (ω) will have impulse components.
The cross-power spectral density (or cross-power spectrum) SXY (ω) of two processes
X(t) and Y (t) is the Fourier transform of their cross-correlation function RXY (τ ):
Z∞
SXY (ω) = F{RXY (τ )} = RXY (τ ) e −jωτ dτ . (8.51)
−∞
Z∞
1
RXY (τ ) = F −1 {SXY (ω)} = SXY (ω) e jωτ dω . (8.52)
2π
−∞
The Power Spectral Density 125
2. The real part of SXY (ω) is an even function of ω, and the imaginary part of
Some important and frequently used autocorrelation functions RXX (τ ) and the cor-
responding power spectral densities SXX (ω) are listed in Table 8.1.
δ(τ ) ◦—• 1
1 ◦—• 2πδ(ω)
2ω0
e −ω0 |τ | ◦—•
ω02 + ω2
√
2 π ω 2
−( 2ω )
e −(ω0 τ ) ◦—• e 0
ω0
Find the power spectral density of X(t) = a cos(ω0 t + Θ), where a > 0 and ω0 > 0
Solution. From Example 8.3, we know that the autocorrelation function of X(t) is
a2
RXX (τ ) = cos(ω0 τ ) .
2
Note that the power spectral density is infinite at ±ω0 , but the average power of X(t)
puts
In many cases, physical systems are modeled as linear time invariant (LTI) systems.
If the input signal of an LTI system is deterministic, then the output signal can be
computed in the time domain via the convolution integral or in the spectral domain
via the Fourier transform. In this section, we develop techniques for calculating the
Let us consider a system in which a stochastic input process X(t) is mapped into a
Y (t − t0 ) = L[X(t − t0 )] . (8.58)
We shall assume throughout that all linear systems under consideration are time-
invariant. A linear system is completely specified by its impulse response. The impulse
where δ(t) is the delta function. The Fourier transform of the impulse response h(t)
Z∞
H(ω) = F{h(t)} = h(t)e −jωt dt . (8.60)
−∞
If the input to a linear system is a stochastic process X(t), as shown in Fig. 8.7, then
Z∞
Y (t) = X(t) ⋆ h(t) = X(t − τ )h(τ ) dτ . (8.61)
−∞
✲ h(t) ✲
X(t) ❜ Y (t)
In the following, we compute the mean and the autocorrelation function of the output
process Y (t) of a linear system with impulse response h(t), where we assume that the
where H(0) is the transfer function of the linear system at ω = 0. In the second last
identity, we have used the fact that µX = E{X(t − τ )}, since X(t) is WSS. Equation
(8.62) shows that the mean of the output Y (t) does not depend on time.
by
RY Y (τ ) = E{Y (t + τ )Y ∗ (t)}
Z∞ Z∞
= E{ X(t + τ − s)h(s) ds X ∗ (t − r)h(r) dr}
−∞ −∞
Z Z∞
∞
where we have used the fact that X(t) is WSS. Note that the autocorrelation function
of Y (t) depends only on τ , and since the mean E{Y (t)} is a constant, it follows that
In particular, it can be shown that if the input process is a Gaussian WSS process,
then the output process Y (t) will also be a Gaussian WSS process.
The following theorem relates the input and output power spectral densities to the
transfer function.
The power spectral density SY Y (ω) of the output process Y (t) is given by
Z∞
SY Y (ω) = RY Y (τ )e −jωτ dτ
−∞
Z∞ Z∞ Z∞
= h(s)h(r)RXX (τ + r − s)e −jωτ ds dr dτ .
−∞ −∞ −∞
130 Stochastic Processes
Z∞ Z∞ Z∞
SY Y (ω) = h(s)h(r)RXX (u)e −jω(u−r+s) ds dr du
−∞ −∞ −∞
Z∞ Z∞ Z∞
−jωs jωr
= h(s)e ds h(r)e dr RXX (u)e −jωu du
−∞ −∞ −∞
∗
= H(ω)H (ω)SXX (ω)
The cross-correlation properties between the input and output processes are also of
Theorem 8.3. The cross-correlation function and the cross-power spectral densities
of the input process X(t) of a linear system and its response Y (t) are given by
SXY (ω) = SXX (ω) · H ∗ (ω) SY X (ω) = SXX (ω) · H(ω) . (8.66)
The cross-power spectral density SXY (ω) is obtained by taking the Fourier transform
of (8.67). Thus,
Note that this equation follows also from (8.69) by using (8.53) and taking into account
A white noise process N (t) is a stochastic process whose power spectral density equals
N0
SN N (ω) = for all ω. (8.71)
2
light, which contains all frequency components in equal amounts. Using Table 8.1,
we find that the autocorrelation function of a white noise process N (t) is given by
N0
RN N (τ ) = δ(τ ). (8.72)
2
Note that the average power of a white noise process is infinite, since RN N (0) = ∞.
Note also that (8.72) implies that N (t) and N (t + τ ) are uncorrelated for any value
of τ 6= 0.
Suppose that the input process of this system is white noise N (t). Find the power
spectral density SY Y (ω) and the autocorrelation function RY Y (τ ) of the output pro-
cess Y (t).
Solution. From the Wiener-Lee relation (8.64), it follows that SY Y (ω) is given by
Taking the inverse Fourier transform of SY Y (ω) gives the autocorrelation function
N0 sin(ω0 τ )
RY Y (τ ) = · . (8.75)
2 πτ
Note that the average power of the filtered white noise process Y (t) is given by
RY Y (0) = N0 ω0 /(2π). The power spectral density SY Y (ω) and the autocorrelation
SY Y (ω)
N0 /2
−ω0 ω0 ω
(a)
RY Y (τ )
N0 ω0
2π
(b)
Fig. 8.8. (a) Power spectral density SY Y (ω) and (b) autocorrelation function
RY Y (τ ) of a filtered white noise process.
8.7 Applications
follows:
134 Stochastic Processes
N0 N0
SXX (ω) = •—◦ RXX (τ ) = δ(τ ) . (8.76)
2 2
process Y (t) and the input process (white noise) X(t) and notice that RY X (τ )
RY X (τ ) = RXX (τ ) ⋆ h(τ )
N0
= δ(τ ) ⋆ h(τ )
2
N0
= h(τ ) . (8.77)
2
3. Finally, from (8.77), it follows that the impulse response of the unknown system
can by identified by
2
h(τ ) = RY X (τ ) . (8.78)
N0
Consider the transmission system shown in Fig. 8.9. The transmitted signal X(t) is
disturbed by additive white noise N (t). To minimize the bit error probability, a large
signal-to-noise ratio (SNR) is required. A received filter h(t) that is “matched” to the
transmit filter g(t) is called matched filter. Its impulse response is given by
TA
In the following, we show that the matched filter maximizes the signal-to-noise ratio.
and the average noise power at the output of the receiver is obtained as
Z∞
N = N0 h2 (t) dt .
−∞
h(t) = c · g(T0 − t) .
Bibliography
[Gub06] J. A. Gubner, Probability and Random Processes for Electrical and Com-
puter Engineers. Cambridge: Cambridge University Press, 2006.